Back to old school times, we studied the alphabet before learning words and other things. The same rule also applies in GSM
world. The GSM has its own set of alphabet: some characters are similar to ASCII some others are not. You can get the specification (3GPP TS 03.38) freely here. So let's start learning!
The Primary Character Table
Open the 03.38 specification and go to section 6.2.1. Like the title said, the characters are defined only using 7-bit. If we need to put them on a byte, then we set the MSB (most significant bit) with zero plus the 7-bit value of the alphabet.
Let's take a look on the character table:
To get the encoding of a character:
- Find the corresponding character in the table
- Read the b7, b6, and b5 value on the top row of the cell
- Read the b4, b3, b2, and b1 value on the leftmost column of the cell
- Concatenate the value you get from step 2 and 3 above, the result is the 7-bit value
- You can also read the nibble value, which is located under the b7, b6, b5 and beside b4, b3, b2, b1 and concatenate them to get the byte value
Example:
- Character "2" has b7=0, b6=1, and b5=1 and b4=0, b3=0, b2=1, b1=0. Hence, the 7-bit value is 0110010b ('32').
- Character "a" has b7=1, b6=1, and b5=0 and b4=0, b3=0, b2=0, b1=1. Hence, the 7-bit value is 1100001b ('61').
Some specials encoding in the table are:
- LF ('0A') is Line Feed character
- CR ('0D') is the Carriage Return character
- 1) ('1B') is escape to extension table, which is discussed in the following section.
- SP means space character
The Extension Table
If you are going to use a characters that is defined in the following table, then you need to use the Escape Sequence '1B' (0011011b).
Example:
- The left curly bracket "{" value is '28' (0101000b). To use this character in 8-bit we shall use '1B 28'.
- The hypen "|" value is '40' (1000000b). To use this character in 8-bit we shall use '1B 40'
The two examples explains the reason why the number of characters is decreased by two if we are using "^", "{", "}", "\", "[", "~", "]","|", or "€" in the SMS (Try to send SMS
using the characters and notice the reduction!).
Some specials encoding in the table are:
- 3) ('0A') means Page Break character
- 1) ('1B') is escape to extension table, which is currently
not used.
If the mobile station is not capable of showing symbol from the extension table, then it may show the character from primary table. In such case, Page Break become Line Feed, curly brackets become parentheses, backslash become slash, euro become small e, ... and so on.