Most of us would have experienced that the quality of digital audio is better than the analog audio (e.g.. audio cassettes). But why is this? It is because digital audio reproduces the original sound more closely than the analog audio and also without much errors. We usually call this error as noise. Again the question arises, why is the noise low in digital audio and hence the better quality?
In analog audio, while recording itself because of limitation of equipment most of the times the originality is lost. But the main area of problem is the devices used in playback. While playback, from the time the signal is retrieved from the magnetic tape (in case of audio cassette) before the signal reaches the loud speaker it has to go through several electronic circuits like amplifiers and equalizers. These analog devices are no were near perfect. For example an amplifier will not just amplify the signal and give as output, it will add some noise which arises because of the basic limitation of the electronic circuits, to the signal. Also the amplifier response may not be completely linear. This means the amplification is not same for all the sound bands. These errors distort the output sound and hence the resulting lower quality. However good amplifiers and electronic components we use this is not avoidable completely and hence we have a fundamental limitation for the analog circuits.
Now in digital audio the signal is sampled, quantized and represented in numeric values. (for explanation of this read this). This representation introduces a very small error called quantization noise. This error will be very less if we use more number of bits to represent the numerical values of the signal. (I will give here examples wave forms with different bit precessions soon.) Once this representation is achieved there is almost zero error introduced because of further blocks and even this error can be controlled to a minimum if we use proper processing equipments. A significant difference between an analog audio and digital audio if you have observed is, during silent periods of the original audio, the analog audio gives out a hissing sound while in digital audio it is completely silent.
Before we move onto see what is MPEG audio we will see little bit about CD audio which gives the motivation for doing audio compression. CD audio uses 16 bits of precision for each sample value of the 44.1 kHz (sampling frequency) sampled signal. Now let us calculate the amount of memory space required to store a one minute stereo audio clip.
60 secs * 44100 samples/sec * 2 bytes / sample * 2 (stereo)
= 10584000 bytes
= 10 MB approximately.
So we need 10 Mega bytes just for storing a stereo audio clip of one minute long. So if you want to store a complete song there wont be any space left on your hard disc. Even if you acquire hard discs with lots of space problems will come if you want to send the song over the net to your friend. It will take hours to send a five minute clip.
The solution for this is audio compression. Compression
means what most of us think of immediately is winzip. But winzip cant help
much in audio compression. Utmost we will get 50 % compression using winzip
or any other similar compression utility. You can readily experiment this
out by trying winzip on a wav file in your pc.
MPEG stands for moving pictures expert group. This group was formed to come up with standardized algorithms for compressing audio and video for storage and transmission through net. This group mainly comprises of the research groups of leading companies in the area of audio and video. For e.g.. Dolby, Fraunhofer institute (not a company), Sony, IBM etc. The MPEG group has finalized the standards MPEG-1 and 2. It is currently working on MPEG-4 and MPEG-7.
MPEG-1 is a simple form of compression (compared to recent developments), MPEG-2 is more complex and addressed more issues. MPEG-1 proposed 3 layers of audio coding algorithms. These are of increasing complexity from layer 1 to 3. The higher layers are downward compatible meaning a layer-3 decoder should decode an audio file which is compressed using layer-1 or layer-2 algorithms. The compression achieved is more in layer-3 (around 8 times) and is the least in layer-1. MPEG Layer-3 audio compression scheme is the one we normally call as mp3.
How compression is achieved in mp3?
MPEG-layer-3 audio or mp3 attacks the problem of audio compression from the receiving end. Here receiving end means our human ear. Human ear has some limitations. It can not hear all the sounds which is produced from a CD audio. Firstly we all would have studied in schools that humans can hear only from 20 Hz to 20 kHz. So rest of the signals in the audio signal can be removed. Further the listening capability of the ear becomes very low at high frequencies, that is above 10 kHz. We will not hear the sounds if it is low in amplitude and is of high frequency. One more property is that if there are two sounds presented to the ear simultaneously and is of very high amplitude and the other one of very low amplitude we wont be able to hear the low amplitude one. This is called masking. (Masking is more complex, this is only a simple explanation.) So if we remove all these unwanted (to be precise unheard) signals from the original audio we can save a lot of space. This is is the main technique behind the mp3 audio compression schemes. This scheme in general is called perceptual audio coding.
Layer-3 compression scheme tries to model the listening properties of the human ear and tries to remove the unwanted signals from the original source audio. The model is called psycho acoustic model. This is evolved using a lot of subjective experiments. The model also gives how much important is each part of the signal and depending on that we can increase or decrease the number of bits given to that part of the signal. This tool in the mp3 format reduces a lot of unwanted portion fro the signal.
The next important tool in mp3 format is the huffman coding. The function of this is to reduce the average number of bits per sample. This is done based on the frequency of occurrence of values. In simple terms if we represent the values which are coming more frequently with less number of bits and the values which rarely come with more number of bits on an average we will save lot of bits. There are also other optional tools recommended in the mp3 scheme which may not be used by a wave to mp3 converter. These are prediction and intensity stereo. Prediction increases compression by predicting future samples from current samples. Intensity stereo increases compression in the case of stereo signals. For stereo the spatial information need not be retained for high frequency signals and hence it is enough if we send one channel (mono) at high frequencies.
All the above tools combine to make up the complex algorithm which gives the compression of audio signals which is almost equal to the original signal.
The present evolving audio coding scheme is Advanced Audio Coding (AAC) which gives even more compression than mp3 with out much loss of quality and naturally is more complex than mp3 scheme.
Further links on mp3 format
Fraunhofer IIS Audio Home of MPEG Layer-3 and AAC.
MP3Tech (Has an overview of mp3, software, and info about AAC, Dolby-AC3 and Twin-VQ).
MPEG.ORG MPEG Audio resources and software.
Resources specific to mp3 at MPEG.ORG.
MPEG/Audio Tutorial which appeared on IEEE multimedia journal.
MPEG Audio web page at Univ. of Hannover. (News, Press releases and FAQs).
MPEG Audio FAQ Version 9
General Links to audio video research
Back to Home page.
Disclaimer: I am not responsible for anything done based on this and i dont assure these materials are correct.