Wednesday 12 August 2015

Sigma-Delta Modulators - Part II.

Yesterday, we saw how a SDM can be used to faithfully reconstruct an incoming signal, even if the output is constrained to an apparently hopelessly reduced bit depth.  We do this by ensuring that the Signal Transfer Function (STF) and Noise Transfer Function (NTF) have appropriate characteristics.  This, of course, is a lot harder to achieve that you might have concluded from the expansive tone of yesterday’s post, which we concluded with the open question of how to design an appropriate loop filter.

Addressing those issues remains at the bleeding edge of today’s digital audio technology.  The best approach to understanding the design of an SDM remains the “Linear Model” I alluded to yesterday, where we treat the quantization error introduced at the quantizer stage as a noise source.  This model ought to be as accurate as its limiting assumption, which is that the quantization error is well represented by a noise source.  Unfortunately, the results don’t appear to bear that out.  According to this model, relatively simple SDMs should exhibit stunningly good performance, where in reality they do not.  In fact they fall very substantially short of the mark.  Clearly, the noise source is not as good a substitute for the quantization error as we thought.  Furthermore the reasons why are not clear, and we don’t have a better candidate available.

In the absence of a good guiding model, SDM designers stick to an empirical methodology based on the well-known “suck-it-and-see” approach.  The most successful approach is based on increasing the “order” of the modulator.  The simple SDM I described yesterday has a single Sigma stage, and is called a “first order” SDM.  If we simply add a second Sigma stage we get a “second order” SDM.  We can add as many Sigma stages as we like, and however many we add, that’s the “order” of the SDM.  The higher the “order” of the SDM, the better its performance ought to be.  I make that sound so much easier than it actually is, particularly when it comes to the task of fine-tuning the SDM’s noise-shaping (or the NTF if you like) performance.

In practice, real-world SDM designs run into problems.  Lots of them.  First of these is overloads.  If the signal fed into the quantizer overloads the quantizer then the SDM will go unstable.  This is the same as any PCM representation - if the signal level is too high, then the PCM format, due to its fixed bit depth, will not have an available level with which to represent the signal, and something has to give (typically, a simple PCM encoder will allow the signal to hard clip).  In a SDM, because a Sigma modulator is in fact a very simple IIR filter, the result of such an overload will reverberate within the output of the SDM for a very considerable time.

The second problem is that high-order digital filters can themselves be rather unstable, not so much because of any inherent instability, but generally because of CPU truncation errors in the processing and execution of the filter.  Proper filter design tools can identify and optimize for these errors, but can never make them go away entirely.  Unstable filters can cause all sorts of problems in SDMs, from the addition of noise and distortion to total malfunction.

The third problem is that SDMs are found to have any number of unexpected error or fault loops in which they can find themselves trapped, which are not yet adequately explained or predicted by any theoretical treatment.  These include phenomena known as “limit cycles”, “birdies”, “idle tones” and others.  They can be astonishingly difficult to detect, or even to describe, let alone to design around.

Real-world high performance SDMs for DSD applications are typically between 5th and 10th order.  Below 5th order the performance is inadequate, and above 10th order they are rarely sufficiently stable.  The professional audio product Weiss Saracon, for example, contains a choice of loop filters in its SDM, having orders 6, 8, and 10.  Each loop filter produces a DSD output file with subtly different sonic characteristics, differences which many well-tuned audiophile ears can reliably detect.  And, as with religion, the fact that there are several of them from which to choose doesn’t guarantee that one of them is correct!

Interestingly enough, one of those limitations can be readily made to go away.  The problem of overloads can be entirely eliminated by using a multi-bit quantizer.  This approach is used in almost all commercial ADCs which use an analog SDM in the input stage, configured to provide a 3-bit to 5-bit intermediate result.  This intermediate result is then converted in the digital domain to the desired output format, whether PCM or DSD.  Likewise, almost all commercial DACs employ a digital SDM in the input stage, configured to provide a 3-bit to 5-bit intermediate result which is then converted to analog using a 3-bit to 5-bit R-2R ladder.  SDMs are therefore deeply involved at both ends of the PCM audio chain, though they mostly don’t use the 1-bit bit depth of DSD (or, for that matter, its 2.8MHz sample rate).  When you listen to PCM, you cannot escape the fact that you are listening to SDMs.

The key takeaway from the study of SDMs is that while their performance can indeed be extremely good, the current state-of-the-art does not permit us to quantify that performance on an a priori basis to a high degree of accuracy.  Instead, SDMs must be evaluated phenomenologically.  In other words we must carefully measure their characteristics - linearity, distortion, noise, dynamic range, phase response, etc.  In this regard, SDMs are very much like analog electronic devices such as amplifiers.  We can bring a lot of design intelligence to bear, but at the end of the day those designs cannot tell us all we need to know about their performance, and the skill of the designer
(not to mention the keen ear of the person making the final voicing decisions) becomes the critical differentiating factor.

At this point I promised to conclude by touching on some of the differences between DSD and PCM formats.  Much has been written about this, and it can tend to confuse and obfuscate.  Frankly, I'm not so sure this will help much.  On one hand, with a PCM data stream, the specific purpose of every single bit in the context of the encoded signal is clear and unambiguous.  Each bit is a known part of a digital word, and each word stipulates the exact magnitude of the encoded signal at a known instant in time.  The format responds to random access, by which I mean that if we want to know the exact magnitude of the encoded signal at some stipulated moment in time, we can go right in there and grab it.  Of course, when we say “exact” we understand that to be limited by the bit depth of the PCM word.

The situation with SDM bitstreams is slightly different, and I will illustrate this with the extreme example of a DSD 1-bit bitstream.  On one level, we can see the DSD bitstream as being exactly identical to what I have just described.  Each bit is a known part of a digital word, except that in this case the single bit comprises the entire word!  This word then represents the exact magnitude of the encoded signal at a known instant in time - but this time to a resolution of only 1-bit.  That is because the DSD bitstream has encoded not only the signal, but also the heavy dose of shaped noise that we have been describing in noxious detail.  That noise gets in the way of our ability to interpret an individual word in the light of the original encoded signal.  By examining one word in isolation we cannot determine how much of it is signal and how much is noise.

If we want to extract the original signal from the DSD bitstream, we must pass the entire bitstream through a filter which will eliminate the noise.  And because we have already stipulated that the SDM is capable of encoding the original signal with a very high degree of fidelity, it stands to reason that we will require a bit depth much greater than 1-bit to store the result of doing so.  In effect, by passing the DSD bitstream through a low-pass filter, we end up converting it to PCM.  This is how DSD-to-PCM conversion is done.  You simply pass it through a low-pass filter.  The quality of the resultant PCM representation can be very close to a perfect copy of the original signal component in the DSD file.  It will be limited only by the accuracy of the low-pass filter used.

When we started developing our product DSD Master, we realized very quickly that the choice of filter was the most critically important factor in getting the best possible DSD-to-PCM conversions.  A better choice of filter gave rise to a better-sounding conversion.  FYI, we continue to work on better and improved filters for our DSD Master product, and for our next release we will be introducing a new class of filter that we believe will make virtually perfect PCM conversions!

Unlike SDMs, digital filters are very well understood.  There is virtually no significant aspect of a digital filter’s performance which has not been successfully analyzed to the Nth degree.  The filter’s amplitude and phase responses are fundamentally known.  We can stipulate with certainty the extent to which computer rounding errors are going impact the filter’s real-world performance, and take measures to get around that if necessary.  In other words, if we know what is in the filter’s input signal, then we know exactly, and I mean EXACTLY, what is going to be in the filter’s output signal.  SDMs, as we have seen above, are not like that.

What does that mean for the DSD-vs-PCM argument?

I really don’t know the answer to that!  On one hand, I am convinced that before too long I will be able to make conversions from DSD to PCM which are virtually perfect, at least to the extent that any PCM representation can be perfect.  On the other hand, I am equally convinced that conversions from PCM to DSD are less perfect, and that SDM technology still has some major advances to be made.  Here at BitPerfect we are working on Look-Ahead SDMs (which, being pedantic, are not strictly speaking SDMs at all) which have the potential to take some small steps forward.  The problem is, they require phenomenal computing power, so our LA-SDM remains very much a lab curiosity.  My feeling is that when each is performed in accordance with the current state-of-the-art, PCM-to-DSD conversion lags DSD-to-PCM conversion in the ultimate quality stakes.

So why - and I’ve said this before - do I still have a lingering preference for DSD over PCM?  I have come to the following conclusion.

DSD is primarily listened to by audio enthusiasts.  The market for DSD comprises people who like music, but still want to hear it well recorded.  It is still a small market, and it is served almost exclusively by specialist providers why are happy to put in the time, expense, and inconvenience required to generate quality product for that market.  People like Cookie Marenco at Blue Coast Records, Jared Sacks at Channel Classics, Morten Lindberg at 2L, Todd Garfinkel at MA Recordings, Gus Skinas at Super Audio Centre and many others, focus on delivering to consumers truly exceptional recordings of uncompromised quality.  DSD, for those people, drives three things, aside from the fact that some of them have their own firmly-established preference for DSD.

First, because of the issues described at length above, tools do not exist to do even the simplest of studio work in the DSD domain.  Even panning and fading require conversion to an intermediate PCM format.  Forget added reverb, pitch correction, and any number of studio tricks of the Pro-Tools ilk.  Recording to DSD forces recordists to strip everything down to its basics, and capture the music in the simplest and most natural manner possible.  That alone usually results in significant increases in the sort of qualities that appeal to audiophiles.

Second, when remastering old recordings for re-release on SACD, or even for digital download as DSD files, mastering engineers will typically pay a lot more attention to details than would normally be the case for a CD release.  Gone will be the demands for compression (or loudness).  The mastering engineer will get the opportunity to dust off that old preamp he always wanted to use, or those old tube amplifiers that he only brings out when the twenty-something suits from the label are not prowling around.  Try Dire Straits’ classic “Brothers In Arms”, which sounds a million times better when specially remastered for SACD (I love the Japanese SHM-SACD remastering) than it ever did on any CD, even though the master tape was famously recorded in 16-bit PCM and mixed down to DAT.  Go figure.

Third, unless you are using one of the few remaining ancient Sonoma DSD recording desks, if you are recording to DSD you will be using some of the latest and highest-spec studio equipment.  That’s where the DSD options are all positioned.  You will be using top-of-the-line mics, mic preamps,
ADCs, cables, etc.  As with most things in life, you tend to get what you pay for, and if you are using the best equipment your chances of laying down the best recording can only improve.

So I like DSD, I continue to look out for it, and it continues to sound dramatically better than the vast majority of PCM audio that comes my way.  Is that due to some fundamental advantages of the DSD format, or is it that PCM offers a million new and exciting ways to shoot a recording in the foot?  I’ll leave to others decide.