Friday, 28 January 2011

A Guide to the MP3 File Conversion Process

The whole point of converting a file to MP31 is to reduce the size of an audio file to that which is easily downloadable. This is achieved by shedding as much of the information contained in a file as possible without losing too much quality. Though MP3 is described as a 'lossy' format due to the sound quality lost through conversion, steps are taken to ensure that this loss is through the removal of the least important details.

An important step in conversion to MP3 is the removal of all superfluous sounds:

  • Any sounds outside the range of human hearing (20 - 20,000 Hertz) are removed.

  • Any lower-pitch sounds which would be impossible or very difficult to hear due to high pitch sounds playing at the same time are removed.

  • If converting a stereo file, the second channel of sound for lower frequencies is removed, as humans cannot distinguish between mono and stereo at lower frequencies. The result is known as 'joint stereo', but it is also possible to skip this step and produce fully mono and fully stereo MP3 files.

All this is done using a psycho-acoustics model, which determines which sounds can actually be heard at any point during the audio track, essentially simulating a pair of human ears, then tells the conversion program which sounds to keep and which should be got rid of. Some sounds may not be audible when actually listened to because they are masked by other sounds coming immediately before or after them, and so there is no real point in keeping a record of those sounds that can't be heard.

A common trick used by these programs is 'noise-shaping', whereby higher frequency sounds are given fewer bits so that other sounds can be given more bits. The higher frequencies therefore end up with more errors and thus more 'noise', but the human brain can only pick out loud high frequency sounds, and so any extra noise produced at those frequencies gets ignored. While the removal of inaudible sounds helps reduce the file size, 'noise-shaping' is an invaluable tool when compressing files.

The exact psycho-acoustics software used varies greatly between different conversion programs, leading to different levels of compression and quality depending on the software used. The only other steps involved in MP3 compression are sampling of the file at a certain sampling frequency and bit depth, and application of the Huffman Algorithm.

Sampling Frequency and Bit Depth

First, let's look at the equation for file size of an MP3 file:

(size) = (bit depth) x (sampling frequency) x (length)

  • (size) is the file size in bits.

  • (bit depth) is the number of bits used to encode each sample.

  • (sampling frequency) is the number of samples per second.

  • (length) is the length of the audio track in seconds.

Assuming that we've already shortened the length of the file as much as is reasonable, the things that really matter are the sampling frequency and bit depth. The best way to explain bit rate is by looking at how a sound wave is turned into a digital audio file which can be stored by a computer, although this does not properly reflect the process through which an uncompressed digital file is converted into an MP3 file. If an MP3 file were to be created straight from an analogue sound, then sampling would have to occur before the psycho-acoustic model was applied. However, an uncompressed digital file can have the psycho-acoustic model applied to it before it is compressed, which is why the discussion of bit depth and sampling frequency comes after that of psycho-acoustics in this entry.

Imagine, if you will, a sound wave, as seen on a screen. The x-axis is time, and the y-axis is amplitude, but in simple terms we just want to record enough of the wave graph's co-ordinates in a digital file that we can recreate the wave from the file. This is known as sampling the wave, as the computer will sample the wave's amplitude at regular intervals. If the sampling frequency is 44,100Hz, then the computer will sample the wave and record its coordinates every 1/44,100 of a second. We could easily reduce the sampling frequency and only sample every 1/32,000 of a second, but this would lead to a decrease in quality. Nyquist's theorem states that the sampling frequency must be twice that of the highest frequency sound on the audio track being converted.

However, the file size will still be too large. Suppose we sample a wave ten times and get amplitudes of 47, 23, -2, -19, -38, -17, 11, 29, 43, 31. Assuming the wave goes no higher than about 50 and no lower than -50, we could use a 100-point scale to record all the numbers exactly as they are.

However, each of these 100 points would need its own unique code, and so each measurement of amplitude would be recorded using 7 bits2. This would be terribly wasteful after all the bother we're going to to remove all the superfluous sounds and compress the file. After all, humans aren't too picky about the exact amplitude of the sounds, and won't even notice a certain drop in quality. So instead, we could use a 50-point scale, which would require only 6 bits3 to record each measurement of amplitude. The result would be the rounding up of all the measurements to even numbers: 48, 24, -2, -18, -38, -16, 12, 30, 44, 32.

As you can see, the numbers are roughly the same, although we have lost the fine detail. Naturally, in reality the number of different values is much larger and will be a power of two so that all the available values can be put to use. For instance, the Red Book Audio files used on audio CDs use a bit depth of 16 bits, allowing a 65,536-point scale to be used.

An audio file with a very low bit rate, such as 32 kilobits per second (kbps), will sound pretty awful - a similar effect can be gained by shoving cotton wool in your ears4, though this is mainly due to the drop in sampling frequency involved. A good bit rate to aim for is 128 kbps, though 96 kbps isn't too bad.

But what exactly is bit rate? Basically, it is the number of bits used to sample each second of the audio track, and is therefore the product of the bit depth and sampling frequency. Decreasing either leads to a lower bit rate, but also leads to a drop in quality.

Constant Bit Rate

Constant bit rate (CBR) is the oldest form of bit rate used in MP3 files and is generally supported by anything that can play MP3 files. Say you are using a 50-point scale to cover 100 different possible values - in other words you are rounding each value to the nearest multiple of two. As if to upset your plans, a section of music suddenly appears that requires the recording of 150 different values. The CBR solution to this is to keep the 50-point scale and round the values for that section to the nearest multiple of three. In other words, to avoid increasing our bit rate, we sacrifice the quality of that section. Meanwhile, if there is a section which requires only 25 different values to be recorded (eg a quiet section of music), then we carry on using the 50-point scale, even though it's just slightly overkill. The benefit of this approach is that we can use the equation above to determine the size of the file, as the bit depth remains constant throughout.

Variable Bit Rate

Variable bit rate (VBR) is a newer invention which isn't yet supported by all MP3 players. The premise is the exact opposite of that of CBR - the bit depth and thus the bit rate is changed according to the complexity of each section of music, leading to the MP3 file having the same level of quality throughout. While 20 seconds of music at 256 kbps CBR will take up 650KB of space, a VBR file with the same quality of sound could take up just 400KB. As mentioned, VBR is not supported by all players, with some just producing a sound similar to that of Alvin and The Chipmunks.

Available Frequencies and Bit Rates

As the sampling frequency and bit rate must be recorded in a small header before each section of music, there are only a certain number of frequencies and bit rates available. However, when converting a file to VBR MP3, you must choose the level of sound quality as opposed to the bit rate.

Common Sampling Frequencies: 32,000, 44,100 and 48,000 Hz

Common Bit Rates: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbps.

The Huffman Algorithm

After the file has had its bit rate adjusted, it is compressed to make it even smaller. The Huffman algorithm is a compression algorithm, meaning it makes files smaller by reducing the number of bits used to store the information, but it is a lossless algorithm, meaning that no information is lost when the file is compressed. The algorithm is used in both MPEG and JPEG files, amongst other things. Note: understanding how the algorithm works is not particularly important compared with understanding bit rates, so don't worry if you don't get this next bit.

Say we have a series of bits in a file, such as 1100 1011 0100 0100 1011 0100 0101 1011 1100 0100 . The Huffman algorithm allocates each chunk its own short sequence, with the most common chunk receiving the shortest sequence:

0100 --> 0
1011 --> 10
1100 --> 110
0101 --> 111

This leads to the following sequence being produced:

110 10 0 0 10 0 111 10 110 0

As if by magic, we end up with a series of 19 bits which contain the information from the 40 bits we started with. Note that only certain sequences of bits have been used (0, 10, 110, 111) as these avoid any confusion as to where each sequence starts and finishes (1101000100111101100 can only be broken up into 110 10 0 0 10 0 111 10 110 0, provided you start at the beginning). In a full-sized MP3 file, all the bits would be compressed like this, with just a small key at the beginning of the file telling the computer what each sequence represents.

The Results

20 seconds of stereo sound recorded in a WAV5 audio file will take up about 3,750KB of space. When converted to MP3 using different bit rates, the file size can be reduced as follows:

  • Bit rate: 32kbps CBR

    • Size: 88KB
    • Compression: 42:1
    • Quality: Poor
  • Bit rate: 96kbps CBR

    • Size: 250KB
    • Compression: 15:1
    • Quality: Acceptable
  • Bit rate: 128kbps CBR

    • Size: 335KB
    • Compression: 11:1
    • Quality: Good
  • Bit rate: 256kbps VBR

    • Size: 424KB
    • Compression: 9:1
    • Quality: Very Good
  • Bit rate: 256kbps CBR

    • Size: 670KB
    • Compression: 5:1
    • Quality: Very Good
  • Bit rate: 320kbps CBR

    • Size: 835KB
    • Compression: 4:1
    • Quality: Excellent

Converting Between File Types

Naturally, MP3 files are not usually created straight from an analogue signal; instead, they are created from other file types by converters designed specifically for the job. Programs that convert between WAV and MP3 files are commonly available. Conversion to MP3 format is commonly referred to as encoding, while conversion back from MP3 to the original is known as decoding. Media players not designed for playing MP3 files may have to download an audio codec6, which will then decode the MP3 file into a file type suitable for the player.


1 Short for MPEG Audio Layer 3.
2 7 bits allow up to 128 different values to be recorded.
3 6 bits allow up to 64 different values to be recorded.
4 This also makes everything sound pretty awful, so make sure you can get it out again.
5 WAV is short for Waveform audio format, and is the most basic file type used for storing raw audio data on Windows computers. This is different from the Red Book Audio file type used on audio CDs.
6 Short for COder-DECoder.

No comments:

Post a Comment