Decoding SMS PDUs

Dear experts,

I am developing a library that can be used to encode and decode SMS text messages. I have attached a debug log from the DLL.

The DLL works perfectly for plain text SMSs  (with no EMS content), including concatenated/multi-part SMSs. My challenge is when the SMS has an EMS attachement like a SmallPicture, PredefinedSound, or Formated Text. It fails to decode the User Data part (SM)  when the Default 7Bit Alphabet has been used... see the attached log for details. However, if you remove some bytes from the beginning of the User Data, you will be able to decode part of the message correctly.

For example, in the log there is the PDU message (the very last message in the log as received from the GSM modem):


All details (MTI, Addresses, Time Stamp, User Data Header, Short Message) are correct, but I am failing to decode the User Data (i.e. Short Message):


When I try to decode this message using my DLL I am getting garbage like this:


whereas I should be getting the text:

"in two parts. 2nd part has a melody: "

If I remove the first 6 characters from the beginning of the user data, I am able to get part of the text like so:


gives me this:  

"two parts. 2nd part has a melody: "

Any Idea where I could be getting it all wrong?
LVL 11
Who is Participating?
bmatumburaAuthor Commented:
Thanks abel, here is the solution:

The UDL + UDH + UD PDU (i.e. the entire TPDU payload SM) is:


This can be broken into:

31 - UDL
090003FC02020B022505 -UDH
907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520 - UD

Now, the UDH is 10 octets, i.e. 10 * 8 = 80 bits in total. This means it doesn't end on a septet boundary as required when decoding the UD using a GSM7Bit decoder. Thus fill bits have to be added to the first octet of the UD which makes it difficult to decode the UD. To work around this problem, I decided to decode the entire UDH + UD; and take out the first 12 characters from the result as they really represent the UDH when decoded (i.e. 80 bits + 5 fill bits to make UDH end on a septet boundary = 84 = 12 septets/GSM7Bit Characters (84/7))

So I decoded:


after taking out the octet: 31 representing the UDL.
Judging from your data string SMS-DELIVERY TYPE indicates 44, which means the TP-UD contains a TP-UDHI

If I am not off

31 = TP-UDL = 46 bytes
09 = Length of user data header

0003 FC0202 - IE A (concatenated message?)
0B02 2505 - IE B


You said you removed 6 bytes to get your text, but in your example you removed only 3?

I will have to encode your text to be sure what the test should look like
bmatumburaAuthor Commented:
Thanks for the timely response xtravagan:

Correct, the User Data Header + SM is:


Thus the User Data Header is:


and the SM is:


Take NOTE: I said I removed 6 characters/digits, thus implying 3 bytes/HEX digits.

Let me know if you do succeed in decoding the message correctly.
Protect Your Employees from Wi-Fi Threats

As Wi-Fi growth and popularity continues to climb, not everyone understands the risks that come with connecting to public Wi-Fi or even offering Wi-Fi to employees, visitors and guests. Download the resource kit to make sure your safe wherever business takes you!

bmatumburaAuthor Commented:
The user data header is being correctly decoded as shown in the extract from the log in the code window below.

Thus the header:


has two information elements:

0003FC0202, interpreted as follows :-

00 - ConcatenatedShortMessage8BitRef
03 - Data Length
FC - Message Reference
02 - Total number of concatenated parts
02 - Second part of concatenated message

0B022505, interpreted as follows:-

0B - PredefinedSound/Melody
02 - Data Length
25 - ???
05 - Predefined Melody number???
+ SM User Data Header +
Length of UDH: 9
Number of Information Elements: 2
Information Element 1
+ UDH Information Element +
  IE Identifier: ConcatenatedShortMessage8BitRef
  IE Data Length: 3
  IE Data: FC0202
Information Element 1
+ UDH Information Element +
  IE Identifier: PredefinedSound
  IE Data Length: 2
  IE Data: 2505

Open in new window

I don't think I follow now. I would have thought that the SM is the melody in some sort of format and that the first chunk was the above mentioned text?

Because I can't seem to find the text in?
With or without 907683

This to me in 7bit encoded GSM alphabet is not

"two parts. 2nd part has a melody: "


bmatumburaAuthor Commented:
You need to take into account some Fill Bits at the beginning of the SM:


The byte 90 has some fill bits in it as the User Data Header does not end on a septet boundary. Please refer to page 71 - Figure (a)  of the "3GPP TS 23.040" specification ( for details.

If you attempt to decode the entire Header + SM:


using the default 7Bit alphabet, you should get something like, taking into account any fill bits in the SM:

"£ç@Æ¿/¡¡é$J›@in two parts. 2nd part has a melody: "

@bmatumbura: please do not delete a question when there's an answer available. Instead, post the answer and select "Accept As Solution" for your own comment. I have found this an interesting discussion to follow (had similar problem) and would love to see the question archived with the proper solution.
Thanks for the extensive follow-up, that will help others well.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.