Site icon Audio Recording

Dithering and Sample rate conversion before MP3 encoding? Complete Study

In this post, an experiment will be conducted on the effects of MP3 quality in relation to dithering and sample rate conversion which are done in the mastering process. The tools used are the following:

a.) Reaper digital audio workstation with LAME encoder functionality for converting to MP3 and doing test dithering.
b.) Voxengo R8brain free – for doing sample rate conversion and test dithering.
c.) Adobe Audition 1.5 – for doing spectral analysis of the converted result.
d.) The 24-bit/96KHz WAV sweep test tone signal provided here.

Objective and Methodology of the Study

This study will aim to investigate the results on the following combinations (using free/open source tools):

Test Flow#1: The quality of the MP3 as result of direct conversion

24-bit/96KHz sweep tone signal == > Reaper LAME MP3 encoder == > Assess spectral quality results of MP3

Test Flow#2: Sample rate conversion and dithering is being done first before MP3 conversion

24-bit/96KHz sweep tone signal == > Sample rate conversion using Voxengo R8brain to 44.1KHz == > Dithering and noise shaping using Reaper built-in dither functionality == > 16-bit/44.1KHz wav input to Reaper LAME MP3 encoder == > Assess results

Test Flow#3: Sample rate conversion and dithering to be done entirely by R8brain

24-bit/96KHz sweep test tone signal == > Sample rate and dithering by Voxengo R8brain == > 16-bit/44.1KHz WAV input to Reaper LAME MP3 encoder== > Analyze results

Test Flow#4: SRC (Sample rate conversion) is done first but no dithering has been externally applied. Then the 24-bit wav is inputted directly to the LAME encoder.

24-bit/96KHz sweep test tone signal == > SRC by R8brain == > 24-bit/44.1KHz WAV input to Reaper LAME MP3 encoder== > Analyze results

Spectral result of the original source audio

Using Adobe Audition 1.5 Spectral graph analysis, the original/unaltered 24-bit/96KHz WAV sweep tone plot is shown below:

Original test tone spectral result

The x-axis is the time in seconds while the y-axis is frequencies. The curve blue line is trending upward (because it’s as sweep tone) indicates the change of frequency content with respect to time. So we can say approximately by looking at the plot that at 4 seconds; the signal frequency of the content is at 10,000Hz. The black background/region indicates the absence of signal frequency content.

Since the sample rate is 96 KHz, it can accommodate signals up to 48 KHz as the maximum depicted in the plot. According to Nyquist theory on the post on 44.1 KHz vs. 48 KHz audio recording sample rate.

Maximum frequency content = Sample rate/2

Beyond 20 KHz, human ear cannot anymore distinguish or detect these ultra-sonic frequencies but flying bats do.

Test Flow#1: Direct Encoding

In this test, a high resolution test tone of 24-bit/96KHz is directly feed to the Reaper LAME encoder (File – Batch File/Item converter). In Reaper, these are the options: [sample rate=source, channels=source, re-sample if needed=best, Output format=MP3, mode=maximum bit rate/quality]. If you are new to Reaper, you can add MP3 functionality by reading this guide on Reaper DAW Tutorial. And then go to the “Using MP3 with Reaper” section.

For the above test; this is the spectral result of the MP3:

Test flow #1 spectral response

As you can see, very small artifacts are now present and the background is not anymore pure black. This indicates the presence of slight aliasing distortion (those light hazy blue lines) brought by the MP3 encoder sample rate conversion. It is worth noting that LAME encoder converts a 24-bit/96 KHz sample rate to a 16-bit/48 KHz mp3. If it’s directly converted to a standard 16-bit/44.1KHz MP3 using the same process, this is the result:

More aliasing distortion

And now you have much bigger problems of aliasing distortion; fully obvious and more audible. See those lines crossing the ideal signal which are not present in the original test tone? These are distortions/artifacts that can make your MP3 sound bad.

Test Flow#2: R8brain SRC and Reaper Dither Output

In this test, the original test tone is down-sampled first from 96 KHz to 44.1 KHz using R8brain. The output bit depth is set constant/unchanged at 24-bit and quality is set to “very high”.

Then the R8brain output is now 24-bit/44.1KHz is dithered using Reaper (File – Batch file/item converter). Sample rate and channels is set to source. Output format is set to wav while Dither and Noise shaping are checked. The WAV bit depth output is 16-bit PCM.

Finally the 16-bit/44.1KHz WAV test tone is then converted to 16-bit/44.1KHz MP3 by setting at maximum bit rate/quality which is 320kbps. This is the result:

Test 2 result

The Reaper dithering introduces substantial noise around 15000Hz. These are called dithering noise. For songs with critical high frequencies (such as vocals, cymbals), the dithering noise can interfere at this frequency range. However, there are no serious aliasing distortion issues in the MP3.

Test Flow#3: SRC and Dithering by R8brain only

In this experiment, the Voxengo R8brain is used for both dithering (by setting output bit depth to 16-bit) and sample rate conversion. The 16-bit/44.1KHz output is then converted to MP3. This is the result:

Test Flow #3 result

As expected the noise is spread out throughout the spectrum even to middle frequencies where human ears are sensitive. This is because R8brain free does not have any noise-shaping functions for dithering.

Test Flow#4: SRC by R8brain and no external dithering

In this last test, sample rate conversion is done first by R8brain then the 24-bit/44.1KHz output is directly feed to the Reaper LAME MP3 encoder. The output is now 16-bit/44.1KHz MP3. This is the result:

Test Flow result

It is amazing to know that there is no significant dithering noise (unlike in previous tests). Aliasing distortion is also very minimal due to the fact that it has been SRC with R8brain before converting to MP3.

Conclusions and Recommendations

Based on the above study, you can conclude that:

a.) LAME mp3 encoder internal sample rate conversion does not perform very well. It is shown by converting from 96 KHz sample rate to 44.1 KHz and there are serious aliasing distortions in the result.

b.) LAME mp3 encoder does a great job in eliminating dithering noise in the mp3 result as shown in Test #4 where a 24-bit/44.1KHz test tone is feed to the LAME encoder yielding a 16-bit/44.1KHz standard MP3.

c.) Using other dithering tools like in Reaper, R8brain, etc. adds more dithering noise than the LAME MP3 encoder itself. Although there are other great dithering algorithms, the study shows that you should not need to dither your high resolution masters to produce a high quality MP3.

d.) Converting a 24-bit/96KHz WAV test tone directly to MP3 would result to a cleaner 16-bit/48KHz 320kbps MP3 (see test#1). However 48 KHz mp3 sample rate is not commonly distributed unlike 44.1 KHz. If it is directly converted to 44.1 KHz mp3 (from 24-bit/96KHz), the MP3 result has significant aliasing distortion.

Therefore below are recommended for best MP3 conversion quality:

1.) Perform sample rate conversion first on your high resolution masters before the MP3 encoding. Use a quality SRC like R8brain (even free version would do). Do not enable the dithering feature.

2.) You do not need to dither the high resolution masters (e.g. your 24-bit masters) before MP3 encoding. LAME MP3 encoder does a great job in eliminating the dither noise brought about by the reduction in bit depth (from 24-bits to 16-bits) as shown on the above test.

3.) If your target audio would be used in video and film projects (where 48KHz sample rate are common), it would appropriate to use the Test#1 procedure where a high resolution audio such as 24-bit/96KHz would have a cleaner MP3 at 16-bit/48KHz 320kbps. This can be done by directly encoding the high resolution master using lame mp3 encoder without passing to external SRC or dithering.

3.) The standard/recommended workflow for most projects would be (your tools would vary):

High resolution audio master == > Sample rate conversion only (convert to 44.1 KHz standard MP3 sample rate, bit depth unaltered) == > LAME MP3 encoder (set to maximum quality) == > High quality 16-bit/44.1KHz 320kbps MP3

4.) An in-depth “listening” test would be recommended for critical audio projects. So after mp3 conversion, you can do an A-B test (dithered or not dithered) and decide for yourself.

Content last updated on June 28, 2012

Exit mobile version