• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 593
  • Last Modified:

Data Encoding Scheme 08 -> Human Read-able

Hi

I am total novice and newbie to GSM and all it's standards.  We are receiving SMS Messages in to our Visual Basic application and the majority of them come with a data encoding scheme of 00, so it is quite straight forward to convert the actual message in to human read-able text.

Occaisionally we get a message with a data encoding scheme of 08 and this causes an error when we try to convert the user's message portion in to text to stick in our database.

My question is, does anyone have any sample basic derived code that I can adapt to convert 08 encocded messages in to normal text.

Here's an example: 00790065007300200061006C006C002000
This should convert to: yes all

MTIA

David
0
Bagload
Asked:
Bagload
  • 2
1 Solution
 
SergeiKoCommented:
Hello, Bagload.

It is UCS2 (16bit) coding standart as described in
ISO/IEC10646: "Universal Multiple-Octet Coded Character Set (UCS)"; UCS2, 16 bit coding.


Simply said the simbol consists of 2bytes (16bits).
Such message can consist of up to 70 UCS2 characters.


UCS standardized in ISO 10646 integrates all previous internationally/nationally agreed character sets into a single code set. UCS is based on 4-octet (32-bit) coding scheme known as the "canonical form" (UCS-4), but a 2-octet (16-bit) form (UCS-2) is used for the BMP, where octets 1 and 2 are assumed to be 00 00. The code set is split into 128 "groups" of "planes" containing 256 "rows" with 256 "cells" for characters.
Each character is addressed using multiple octets, the third (in UCS-2 the first) of which identifies the row containing the character and the fourth (in UCS-2 the second) its cell number. The first 127 characters of the BMP used for 16-bit code interchange are those of ASCII. The characters forming the second half of the first row are those used in ISO 8859-1, the Latin-1 character set.
( from http://dret.net/glossary/ucs )

You should also look at UTF-16 encoding.


Accordinally to Latin-1:
'y' = 79
'e' = 65
's' = 73
' ' = 20
'a' = 61
'l' = 6C
'l' = 6C
' ' = 20


So decoding this UTF-16 decoded string

0079 0065 0073 0020 0061 006C 006C 0020 00

For Latin-1, first byte is 00, second is the byte of the simbol in Latin-1, so
'y' = 0079
'e' = 0065
's' = 0073
' ' = 0020
'a' = 0061
'l' = 006C
'l' = 006C
' ' = 0020


The last one 00 byte seems strange.


Regards.
0
 
SergeiKoCommented:
PS: all codes are hexademical.
0
 
aftCommented:
also for the getting message, i read it using +cmgr. but how can i know that the incoming sms is in unicode or english???????
0

Featured Post

Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now