Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3614
  • Last Modified:

Decoding SMS PDUs

Dear experts,

I am developing a library that can be used to encode and decode SMS text messages. I have attached a debug log from the DLL.

The DLL works perfectly for plain text SMSs  (with no EMS content), including concatenated/multi-part SMSs. My challenge is when the SMS has an EMS attachement like a SmallPicture, PredefinedSound, or Formated Text. It fails to decode the User Data part (SM)  when the Default 7Bit Alphabet has been used... see the attached log for details. However, if you remove some bytes from the beginning of the User Data, you will be able to decode part of the message correctly.

For example, in the log there is the PDU message (the very last message in the log as received from the GSM modem):

07916277010120F4440B916277640266F300009040210145908031090003FC02020B022505907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520

All details (MTI, Addresses, Time Stamp, User Data Header, Short Message) are correct, but I am failing to decode the User Data (i.e. Short Message):

907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520

When I try to decode this message using my DLL I am getting garbage like this:

"Hvù"?àù$å?'Éw$Fs&ù$å?'ìBÅßì
ùj.fü&NW¥"

whereas I should be getting the text:

"in two parts. 2nd part has a melody: "

If I remove the first 6 characters from the beginning of the user data, I am able to get part of the text like so:

E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520

gives me this:  

"two parts. 2nd part has a melody: "

Any Idea where I could be getting it all wrong?
SM-Debug.txt
0
bmatumbura
Asked:
bmatumbura
  • 4
  • 2
  • 2
1 Solution
 
xtravaganCommented:
Judging from your data string SMS-DELIVERY TYPE indicates 44, which means the TP-UD contains a TP-UDHI

If I am not off

31 = TP-UDL = 46 bytes
09 = Length of user data header

0003 FC0202 - IE A (concatenated message?)
0B02 2505 - IE B

UD
907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520

You said you removed 6 bytes to get your text, but in your example you removed only 3?

I will have to encode your text to be sure what the test should look like
0
 
bmatumburaAuthor Commented:
Thanks for the timely response xtravagan:

Correct, the User Data Header + SM is:

31090003FC02020B022505907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520

Thus the User Data Header is:

0003FC02020B022505

and the SM is:

907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520

Take NOTE: I said I removed 6 characters/digits, thus implying 3 bytes/HEX digits.

Let me know if you do succeed in decoding the message correctly.
0
 
bmatumburaAuthor Commented:
The user data header is being correctly decoded as shown in the extract from the log in the code window below.

Thus the header:

0003FC02020B022505

has two information elements:

0003FC0202, interpreted as follows :-

00 - ConcatenatedShortMessage8BitRef
03 - Data Length
FC - Message Reference
02 - Total number of concatenated parts
02 - Second part of concatenated message
and

0B022505, interpreted as follows:-

0B - PredefinedSound/Melody
02 - Data Length
25 - ???
05 - Predefined Melody number???
+ SM User Data Header +
=========================
Length of UDH: 9
Number of Information Elements: 2
Information Element 1
+ UDH Information Element +
=============================
  IE Identifier: ConcatenatedShortMessage8BitRef
  IE Data Length: 3
  IE Data: FC0202
 
Information Element 1
+ UDH Information Element +
=============================
  IE Identifier: PredefinedSound
  IE Data Length: 2
  IE Data: 2505

Open in new window

0
Worried about phishing attacks?

90% of attacks start with a phish. It’s critical that IT admins and MSSPs have the right security in place to protect their end users from these phishing attacks. Check out our latest feature brief for tips and tricks to keep your employees off a hackers line!

 
xtravaganCommented:
I don't think I follow now. I would have thought that the SM is the melody in some sort of format and that the first chunk was the above mentioned text?

Because I can't seem to find the text in?
E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520
With or without 907683

This to me in 7bit encoded GSM alphabet is not

"two parts. 2nd part has a melody: "

?

0
 
bmatumburaAuthor Commented:
You need to take into account some Fill Bits at the beginning of the SM:

907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520

The byte 90 has some fill bits in it as the User Data Header does not end on a septet boundary. Please refer to page 71 - Figure 9.2.3.24 (a)  of the "3GPP TS 23.040" specification (http://www.3gpp.org/ftp/Specs/archive/23_series/23.040/23040-840.zip) for details.

If you attempt to decode the entire Header + SM:

31090003FC02020B022505907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520

using the default 7Bit alphabet, you should get something like, taking into account any fill bits in the SM:

"£ç@Æ¿/¡¡é$J›@in two parts. 2nd part has a melody: "

0
 
abelCommented:
@bmatumbura: please do not delete a question when there's an answer available. Instead, post the answer and select "Accept As Solution" for your own comment. I have found this an interesting discussion to follow (had similar problem) and would love to see the question archived with the proper solution.
0
 
bmatumburaAuthor Commented:
Thanks abel, here is the solution:

The UDL + UDH + UD PDU (i.e. the entire TPDU payload SM) is:

31090003FC02020B022505907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520

This can be broken into:

31 - UDL
090003FC02020B022505 -UDH
907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520 - UD

Now, the UDH is 10 octets, i.e. 10 * 8 = 80 bits in total. This means it doesn't end on a septet boundary as required when decoding the UD using a GSM7Bit decoder. Thus fill bits have to be added to the first octet of the UD which makes it difficult to decode the UD. To work around this problem, I decided to decode the entire UDH + UD; and take out the first 12 characters from the result as they really represent the UDH when decoded (i.e. 80 bits + 5 fill bits to make UDH end on a septet boundary = 84 = 12 septets/GSM7Bit Characters (84/7))

So I decoded:

090003FC02020B022505907683E8F737081E96D3E72E90CC4D06C1C3723A081D9E83C2A07699FD26E77520

after taking out the octet: 31 representing the UDL.
0
 
abelCommented:
Thanks for the extensive follow-up, that will help others well.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

NEW Internet Security Report Now Available!

WatchGuard’s Threat Lab is a group of dedicated threat researchers committed to helping you stay ahead of the bad guys by providing in-depth analysis of the top security threats to your network.  Check out this quarters report on the threats that shook the industry in Q4 2017.

  • 4
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now