• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 595
  • Last Modified:

Data Encoding Scheme 08 -> Human Read-able


I am total novice and newbie to GSM and all it's standards.  We are receiving SMS Messages in to our Visual Basic application and the majority of them come with a data encoding scheme of 00, so it is quite straight forward to convert the actual message in to human read-able text.

Occaisionally we get a message with a data encoding scheme of 08 and this causes an error when we try to convert the user's message portion in to text to stick in our database.

My question is, does anyone have any sample basic derived code that I can adapt to convert 08 encocded messages in to normal text.

Here's an example: 00790065007300200061006C006C002000
This should convert to: yes all


  • 2
1 Solution
Hello, Bagload.

It is UCS2 (16bit) coding standart as described in
ISO/IEC10646: "Universal Multiple-Octet Coded Character Set (UCS)"; UCS2, 16 bit coding.

Simply said the simbol consists of 2bytes (16bits).
Such message can consist of up to 70 UCS2 characters.

UCS standardized in ISO 10646 integrates all previous internationally/nationally agreed character sets into a single code set. UCS is based on 4-octet (32-bit) coding scheme known as the "canonical form" (UCS-4), but a 2-octet (16-bit) form (UCS-2) is used for the BMP, where octets 1 and 2 are assumed to be 00 00. The code set is split into 128 "groups" of "planes" containing 256 "rows" with 256 "cells" for characters.
Each character is addressed using multiple octets, the third (in UCS-2 the first) of which identifies the row containing the character and the fourth (in UCS-2 the second) its cell number. The first 127 characters of the BMP used for 16-bit code interchange are those of ASCII. The characters forming the second half of the first row are those used in ISO 8859-1, the Latin-1 character set.
( from http://dret.net/glossary/ucs )

You should also look at UTF-16 encoding.

Accordinally to Latin-1:
'y' = 79
'e' = 65
's' = 73
' ' = 20
'a' = 61
'l' = 6C
'l' = 6C
' ' = 20

So decoding this UTF-16 decoded string

0079 0065 0073 0020 0061 006C 006C 0020 00

For Latin-1, first byte is 00, second is the byte of the simbol in Latin-1, so
'y' = 0079
'e' = 0065
's' = 0073
' ' = 0020
'a' = 0061
'l' = 006C
'l' = 006C
' ' = 0020

The last one 00 byte seems strange.

PS: all codes are hexademical.
also for the getting message, i read it using +cmgr. but how can i know that the incoming sms is in unicode or english???????
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

We Need Your Input!

WatchGuard is currently running a beta program for our new macOS Host Sensor for our Threat Detection and Response service. We're looking for more macOS users to help provide insight and feedback to help us make the product even better. Please sign up for our beta program today!

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now