. 6
( 132 .)


The analogue voice is ¬ltered, digitized into a binary stream and coded for transmission.
It will travel across the mobile network(s) in digital form until it reaches the destination
mobile device. This will convert from digital back to analogue for output to the device™s
loudspeaker. Converting the analogue signal to digital and then back to analogue does
introduce a certain amount of noise but this is minimal compared to leaving the signal in
its original analogue state.

Before real-time analogue data can be transmitted on a digital packet switched network
it must undergo a conversion process. The original analogue signal must be sampled
(or measured), converted to a digital form (quantized), coded, optionally compressed
and encrypted.

2.3.1 Sampling
Sampling is the process whereby the analogue signal is measured at regular intervals and
its value recorded at each discrete time interval. It is very important that the signal is

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time (ms)

Figure 2.5 Aliasing

sampled at a rate higher than twice the highest frequency component of the original ana-
logue signal otherwise a form of interference called aliasing may be introduced. Consider
the problem highlighted in Figure 2.5. Here a 1 kHz signal is being sampled at 4000/sec-
ond (4 kHz). However, there is a 5 kHz component also present, and the two produce
the same result after sampling. For this reason the signal is ¬ltered before sampling to
remove any high-frequency components. For the PSTN, the signal is ¬ltered such that the
highest frequency is 3.4 kHz and sampling takes place at 8 kHz. Once the signal has been
sampled it can then be generally compressed by encoding to reduce the overall amount
of data to be sent. This encoded data is then bundled in packets or cells for transmission
over the network. The exact amount of data that is carried in each packet is important.
Packing a lot of data per packet causes a delay while the packet is being ¬lled. This is
referred to as packetization delay, and is described in Section 2.3.6. On the other hand,
if the packets are not ¬lled suf¬ciently this can lead to inef¬ciency as most of the packet
can be taken up by protocol headers.

2.3.2 Coding and CODECs
When converting information from an audio or video stream into digital data, large
amounts of information can be generated. Consider, for example, capturing a single frame
on a 24-bit true colour graphics screen with a resolution of 1024 — 768 bits. Without com-
pression this will generate 1024 — 768 — 3 (3 bytes = 24 bits of colour) = 2 359 296 or
2.25 megabytes of data. Sending 24 frames per second when capturing a video image will
produce 54 megabytes of data every second, yielding a required data rate of 432 Mbps,
which is unsustainable on the wireless network.
To reduce the amount of data in the transmission the information is compressed before
sending. Many techniques have been employed for both video and audio data but all
compression algorithms use one of two basic types of method:

• Lossless compression removes redundancy from the information source and on decom-
pression reproduces the original data exactly. This technique is used by graphics
compression standards such as GIF and PNG. One technique used for PNG com-
pression is the colour lookup table. Without compression the colour image on a screen
requires each colour to be represented by 3 bytes (24 bits), even though there may be
256 or fewer different colours within a particular image. To compress the image each
3-byte code is replaced with a single byte and the actual 3-byte colour data stored in
a separate table. This will produce a three-fold saving, less the small space to store
the colour table of 768 bytes, and will involve little extra processing of the original
image data.
• Lossy compression, on the other hand, relies on the fact that there is a lot of information
within the image that the eye will not notice if removed. For example, the human
eye is less sensitive to changes in colour than changes in intensity when looking at
information in a picture. Consequently when images are compressed using the JPEG
standard, the colour resolution can be reduced by half when scanning the original image.
Lossy compression tends to produce higher compression rates than lossless compression
but only really works well on real-world images, for example photographs. Lossless
compression techniques such as PNG are more suitable for simple graphics images
such as cartoons, ¬gures or line drawings.

A CODEC is a term which refers to a coder/decoder and de¬nes a given compres-
sion/decompression algorithm or technique. For audio compression the technique used
for voice data is generally different to that used for music or other audio data. The reason
for this is that voice CODECs exploit certain special human voice characteristics to reduce
the bandwidth still further. These voice CODECs work well with a voice signal but will
not reproduce music well since the CODEC will throw away parts of the original signal
not expected to be there. Table 2.1 shows a summary of popular audio CODECs that are
currently in use. Some of these are already used in wireless cellular networks such as
GSM; others are recommended for use with UMTS and IP. Note that in the table, all the
CODECs are optimized for voice apart from MP3, which is used predominantly on the
Internet for music coding. The speci¬c CODEC for voice used in UMTS is the adaptive
multirate (AMR) CODEC, which is described in more detail in Chapter 6.
When choosing a voice CODEC, a number of characteristics have to be taken into con-
sideration. Ideally a requirement is to use the least bandwidth possible but this generally
comes at the expense of quality. The mean opinion score (MOS) de¬nes the perceived
quality of the reproduced sound: 5 means excellent, 4 good, 3 fair, 2 poor and 1 bad. The

Table 2.1 Audio CODECs
Standard Bit rate (kbps) Delay (ms) MOS Sample size
G.711 64 0.125 4.3 8
GSM-FR 13 20 3.7 260
G723.1 6.3 37.5 3.8 236
G723.1 5.3 37.5 3.8 200
UMTS AMR 12.2“4.75 Variable Variable Variable
MP3 Variable Variable Variable Variable

MOS for a given CODEC is relatively subjective, as it is calculated by asking a number
of volunteers to listen to speech and score each sample appropriately. G.711, which is the
standard pulse code modulation (PCM) coding technique used for PSTN digital circuits,
scores well. However, it uses a lot of bandwidth and is therefore not suitable for a wireless
link. Generally as the data rate reduces, so does the MOS; however, surprisingly, G.723.1
scores better than standard GSM coding. The reason for this is that G.723.1 uses more
complex techniques to squeeze additional important voice data into the limited bandwidth.
MP3 (MPEG layer 3 audio) is of interest in that it can provide a variable compression
service based on either a target data rate or target quality. With target data rate the
CODEC will try to compress the data down until the data rate is achieved. For music
with a high dynamic range (for example classical music) higher rates may be required
to achieve acceptable levels of reproduction quality. One report states that a data rate of
128 kbps with MP3 will reproduce sound which is very dif¬cult to distinguish from the
original. But, again, all reports are subjective and will very much depend on the source
of the original signal.
It is also possible to set the MP3 CODEC to target a given quality. In this mode the
data rate will go up if the complexity of the signal goes up. The problem with this type of
mode of transmission is that it is dif¬cult to budget for the correct amount of bandwidth
on the transmission path.
When packing the voice data into packets it is important to be able to deliver the data
to the voice decompressor fast enough so that delay is kept to a minimum. For example,
if a transmitter packs one second™s worth of speech into each packet this will introduce
a packing/unpacking delay between the sender and receiver because one second™s worth
of data will have to be captured before each packet can be sent. For the higher-rate
CODECs such as G.711, packing 10 milliseconds of data per packet would produce a
data length of 80 bytes and a packing latency of only 0.01 seconds. For the CODECs
which support lower date rates the minimum sample size is longer. With G.723.1 for
example, the minimum sample size is 37.5 ms, which will introduce a longer ¬xed delay
into the link. Also, if each packet contains one voice sample this will result in a packet
length of only 30 bytes, which can result in inef¬ciencies due to the header overhead,
For example, the header overhead for an IP packet is 20 bytes + higher-layer protocols
(TCP, UDP, RTP, etc.) and for this reason header compression is generally used.
When looking at the video CODECs in Table 2.2 it can be seen that most of them do
support a range of bit rates, which allows the encoder to pick a rate that suits the channel
that it has available for the transmission. Standards such as MPEG-1 and MPEG-2 were
designed for the storage, retrieval and distribution of video content. MPEG-1 is used in

Table 2.2 Video CODECs
Name Bandwidth Resolution
N — 64 kbps 352 — 288 (CIF)
180 — 144 (QCIF)
H.263 10 kbps“2 Mbps
H26L Variable Variable
352 — 240
MPEG-1 1.5 Mbps
352 — 240 to 1920 — 1080
MPEG-2 3“100 Mbps
MPEG-4 Variable Variable

the video CD standard at a ¬xed resolution of 352 — 240. MPEG-2, on the other hand,
provides a wide range of resolutions from standard TV to high de¬nition TV (HDTV) and
for this reason is the predominant coding standard for DVD and digital TV transmission.
The other CODECs are more suited and optimized for distribution over a network where
bandwidth is at a premium, such as the cellular network. H.261 and H.263 were both
designed to support video telephony and are speci¬ed as CODECs within the H.323
multimedia conferencing standard. Of particularly interest to UMTS service providers
will be MPEG-4 and H26L. H26L is a low bit rate CODEC especially designed for
wireless transmission. It has a variable bit rate and variable resolution. MPEG-4 was also
designed to cope with narrow bandwidths and has a particularly complex set of tools
to help code and improve the transmission of audio channels. This high-quality audio
capability is of interest to content providers looking to deliver music and movies on
demand over the radio network. Within MPEG-4 there are a number of different coding
pro¬les de¬ned, and the appropriate pro¬le is chosen depending on the data rate available
and the reliability of the channel. H26L has now been speci¬ed as one of the coding
pro¬les within MPEG-4. It should be noted that while support for transport of video is a
requirement of a 3G network, it is considered an application and, unlike voice, the coding
scheme used is not included within the speci¬cation.

2.3.3 Pulse code modulation
Historically, the most popular method for performing this digitizing function on the ana-
logue signal is known as pulse code modulation (PCM). The technique samples the
analogue signal at regular intervals where the rate of sampling is twice the highest
frequency present in the signal. This sampling rate is de¬ned as the rate required to
completely represent the analogue signal.
For the telephone network the assumption is made that the signals are below 4 kHz
(actually 300 Hz to 3.4 kHz). Therefore the sample rate needed is 8000 samples per
second. Each sample must be converted to digital. To do this, each analogue level is
assigned a binary code. If 256 levels are required, then eight bits are used to split the
amplitude up; an amplitude of zero is represented by binary 0000 0000, and a maximum
amplitude by binary 1111 1111. Figure 2.6 shows a simple example of PCM, with 16
levels, i.e. 4 bits. For an 8-bit representation, with 8000 samples per second, a line of
64 kbps is required for the digital transmission of the voice signal. This is the standard
coding scheme used for the ¬xed-line PSTN and ISDN telephone networks.

2.3.4 Compression
Compression involves the removal of redundancy from a data stream. This can be achieved
through a number of techniques:

• Run length encoding replaces multiple occurrences of a symbol with one occurrence
and a repetition count.


0110 0111 1011 1100 1011 1011 1111 1101 1010 1000 0111 0111 1001 1011 0111 0011 0001 011
1010 1110 1101 1101 1010 1001 1001 1001 1000 0110 0100 0010 0001 0011 0111 1001 1100 101

Figure 2.6 Pulse code modulation

• Dictionary replaces multiple symbols with single tokens which can be looked up in
a dictionary.
• Huffman sends shorter codes for symbols which occur more often and longer codes
for less frequent symbols.

Since the voice coding process removes most of the redundancy from the voice data
itself, compression is largely used on the packet headers. Many schemes of header com-
pression have been proposed and they are widely used on voice packets since these tend
to be short and thus the header overhead (the percentage of data taken by the header)
tends to be signi¬cant. Refer to Chapter 5 for more details.

2.3.5 Comfort noise generation and activity detection
To avoid transmitting unnecessarily, most systems use activity detection so that when
a speaker is not talking, active background noise is not transmitted down the channel.
In this case the CODEC usually encodes a special frame which informs the receiver to
generate low-level noise (comfort noise), which reassures the listener they have not been
cut off.

2.3.6 Packetization delay
Before the data is transmitted on the packet switched network is must be placed in
the packets or cells. Some further explanation and examples of this packetization delay
are presented in Chapter 7 in the context of the UMTS ATM transmission network. As
discussed, the longer a packet is, the longer the delay suffered when forwarding the

packet between switches or routes. In essence, the forwarding delay of a packet is just
L/B where L is the length of the packet in bits and B the data rate of the link in bits
per second. If the packet length is doubled, the forwarding delay is doubled. For very
fast local area network (LAN) links this delay does not present a major problem. For
example with a gigabit Ethernet link, a 1500 byte packet will take 12 µs to ¬ll with
voice data:
Delay = 1500 — 8 bits/1 Gbps = 12 µs

This delay does not include any component resulting from buffering or processing
times, and therefore with a heavily loaded router or switch, actual packet delays may be
somewhat larger.


. 6
( 132 .)