Monday, 20 July 2015

How Do DACs Work?

All digital audio whether PCM or DSD stores the analog audio signal as a stream of numbers, each one representing an instantaneous snapshot of its continuously evolving value.  With either format, the digital bit pattern is its best representation of the analog signal value at each instant in time.  With PCM the bit pattern typically comprises either 16- (or 24-bit) numbers each representing the exact value of analog signal value to a precision of one part in 65,535 (or one part in 16,777,216).  With DSD the precision is 1 bit, which means that it encodes the instantaneous analog voltage as either maximum positive or maximum negative with nothing in between (and you may well wonder how that manages to represent anything, which is a different discussion entirely, but nevertheless it does).  In either case, though, the primary task of the DAC is to generate those output voltages in response to the incoming bitstream.  Lets take a look at how that is done.

For the purposes of this post I am going to focus exclusively on the core mechanisms involved in transforming a bit stream into an analog signal.  Aside from these core mechanisms there are further mission-critical issues such as clock timing and noise, but these are not the subject of this post.  At some point I will write another post on clocks, timing, and jitter.

The most conceptually simple way of converting digital to analog, is to use something called an R-2R ladder.  This is a simple sequence of resistors of alternating values ‘R’ and ‘2R’, wired together in a ‘ladder’-like configuration.  There’s nothing more to it than that.  Each ‘2R’ resistor has exactly twice the resistance value as each ‘R’ resistor, and all the ‘R’s and all the ‘2R’s are absolutely identical.  Beyond that, the actual value of the resistances is not crucial.  Each R-2R pair, if turned “on” by its corresponding PCM bit, contributes the exact voltage to the output which is encoded by that bit.  It is very simple to understand, and in principle is trivial to construct, but in practice it suffers from a very serious drawback.  You see, the resistors have to be accurate to a phenomenal degree.  For 16-bit PCM that means an accuracy of one part in 65 thousand, and for 24-bit PCM one part in 16 million.  If you want to make your own R-2R ladder-DAC you need to be able to go out and buy those resistors.

As best as I can tell, the most accurate resistors available out there on a regular commercial basis are accurate to ±0.005% which is equivalent to one part in 20,000.  Heaven knows what they cost.  And that’s not the end of the story.  The resistance value is very sensitive to temperature, which means you have to mount them in a carefully temperature-controlled environment.  And even if you do that, the act of passing the smallest current through it will heat it sufficiently to change its resistance value.  [Note:  In fact this tends to be what limits the accuracy of available resistors - the act of measuring the resistance actually perturbs the resistance by more than the accuracy to which you’re trying to measure it!  Imagine what that means when you try to deploy the resistor in an actual circuit…]  The resistor’s inherent inductance (even straight wires have inductance) also affects the DAC ladder when such phenomenal levels of precision enter the equation.  And we’re still not done yet
unfortunately the resistance values drift with time, so your precision assembled, thermally cushioned and inductance-balanced R-2R network may leave the factory operating to spec, but may well be out of spec by the time it has broken in at the customer’s system.  These are the problems that a putative R-2R ladder DAC designer must be willing and able to face up to.  Which is why there are so few of them on the market.

Manufacturers of some R-2R ladder-DACs use the term ‘NOS’ (Non-Over-Sampling) to describe their architecture.  I don’t much like that terminology because it is a rather vague piece of jargon and can in principle be used to mean other things, but the blame lies at the feet of many modern DAC chipset manufacturers (and the DAC product manufacturers who use them) who describe their architectures as "Over-Sampling", hence the use of the term NOS as a distinction.

Before moving on, we’ll take an equally close look at how DSD gets converted to analog.  In principle, the incoming bit stream can be fed into its own 1-bit R-2R ladder, which, being 1-bit, is no longer a ladder and comprises only the first resistor R, whose precision no longer really matters.  And that’s all there is to it.  Easy, in comparison to PCM.  Something which has not gone unnoticed … and which we’ll come back to again later.

Aside from what I have just described, for both PCM and DSD three major things are left for the designer to deal with.  First is to make sure the output reference voltages are stable and with as little noise as possible.  Second is to ensure that the switching of the analog voltages in response to the incoming digital bit stream is done in a consistent manner and with sufficient timing accuracy.  Third is to remove any unwanted noise that might be present in the analog signal that has just been created.  These are the implementation areas in which a designer generally has the most freedom and opportunity to bring his own skills to bear.

The third of these is the most interesting in the sense that it differs dramatically between 1-bit (DSD) and multi-bit (PCM) converters.  Although in both cases the noise that needs to be removed lives at inaudible ultrasonic frequencies, with PCM there is not much of it at all, whereas with DSD there is so much of it that the noise power massively overwhelms the signal power.  With PCM, there are even some DACs which dispense with analog filtering entirely, working on the basis that the noise is both inaudible, and at too low a level to be able to upset the downstream electronics.  With DSD, though, removing this noise is a necessary and significant requirement.

Regarding the analog filters, most designers are agreed that although different audio stream formats can be optimized such that each format has its own ideal analog filter, if a DAC is designed to support multiple stream formats it is impractical to provide multiple analog filters and switch them in and out of circuit according to the format currently being played.  Therefore most DACs will have a single analog output filter which is used for every incoming stream format.

The developers of the original SACD players noted that the type of analog filter that was required to perform this task was more or less the same as the anti-aliasing filters used in the output of the CD format, which they were trying to improve upon.  They recognized that those filters degraded the sound.  So instead, in the earliest players, they decided to upconvert the DSD from what we today call DSD64 to what we would now call DSD128.  With DSD128 the ultrasonic filter was found to be less of a problem and was considered not to affect the sound in the same way.  Bear in mind, though, that in doing the upconversion from DSD64 to DSD128 you still have to filter out the DSD64’s ultrasonic noise.  However, this can be done in the digital domain, and (long story short) digital filters almost always sound better than their analog counterparts.

As it happens, similar techniques had already been in use with PCM DACs for over a decade.  Because R-2R ladder DACs were so hard to work with, it was much easier to convert the incoming PCM to a DSD-like format and perform the physical D-to-A conversion step in a 1-bit format.  Although the conversion of PCM to DSD via an SDM is technically very complex and elaborate, it can be done entirely in the digital realm which means that it can also be done remarkably inexpensively.

When I say "DSD-like" what do I mean?  DSD, strictly speaking, is a trademark developed by Sony and Philips (and currently owned by Sonic Studio, LLC).  It stands for Direct Stream Digital and refers specifically to a 1-bit format at a sample rate of 2.8224MHz.  But the term is now being widely used to refer to a broad class of formats which encode the audio signal using the output of a Sigma-Delta Modulator (SDM).  An SDM can be configured to operate at any sample rate you like and with any bit depth you like.  For example, the output of an SDM could even be a conventional PCM bitstream and such an SDM can actually pass a PCM bitstream through unchanged.  A key limitation of an SDM is that they can be unstable when configured with a 1-bit output stream.  However, this instability can be practically eliminated by using a multi-bit output.  For this reason, most modern PCM DACs will upconvert (or ‘Over-Sample’) the incoming PCM before passing it through an SDM with an output bit depth of between 3 and 5 bits.  This means that the physical D-to-A conversion is done with a 3- to 5-stage resistor ladder, which can be easily implemented.

These SDM-based DACs are so effective that today there are hardly any R-2R ladder DACs in production, and those that are
such as the Light Harmonic Da Vinci can be eye-wateringly expensive.  The intermediate conversion of an incoming signal to a DSD-like format means that, in principle, any digital format (including DSD) can be readily supported, as evidenced by the plethora of DSD-compatible DACs on the market today.  Because these internal conversions are performed entirely in the digital domain, manufacturers typically produce complete chip sets capable of performing all of the conversion functionality on-chip, driving the costs down considerably when compared to an R-2R ladder approach.  The majority of DACs on the market today utilize chip sets from one of five major suppliers ESS, Wolfson, Burr-Brown (TI), AKM, and Philips although there are others as well.

Interestingly, all of this is behind the recent emergence of DSD as a niche in-demand consumer format.  In a previous post I showed that almost all ADCs in use today use an SDM-based structure to create a ‘DSD-like’ intermediate format which is then digitally converted to PCM.  Today I showed the corollary in DAC architectures where incoming PCM is digitally converted to a ‘DSD-like’ intermediate format which is then converted to analog.  The idea behind DSD is that you get to ‘cut out the middlemen’ - in this case the digital conversions to and from the ‘DSD-like’ intermediate formats.  Back when SACD was invented the only way to handle and distribute music data which required 3-5GB of storage space was using optical disks.  Today, not only do we have hard disks that can hold the contents of hundreds upon hundreds of SACDs, but we have an internet infrastructure in place that allows people to download such files as a matter of convenience.  So if we liked the sound of SACD, but wanted to implement it in the more modern world of computer-based audio, the technological wherewithal now exists to support a file-based paradigm similar to what we have become used to with PCM.  This is what underpins the current interest in DSD.

To be sure, the weak link of the above argument is that DSD is not the same as ‘DSD-like’, and in practice you still have to convert digitally between ‘DSD-like’ and DSD in both the ADC and the DAC.  But a weak is link is not the same thing as a fatal flaw, and DSD as a consumer format remains highly regarded in many discerning quarters.

Thursday, 25 June 2015

On DSD vs PCM … again

Mark Waldrep (aka ‘Dr. AIX’) has put a couple of DSD posts on his RealHD-Audio web site this month.  Mark writes quite knowledgeably on audiophile matters, but is prone to a ‘you-can’t-argue-with-the-facts’ attitude predicated on an overly simplistic subset of what actually comprises ‘the facts’.   In particular, Mark insists that 24-bit, 96kHz PCM is better than DSD, and one of the posts I am referring to discusses his abject bewilderment that 530 people (and counting) on the ‘Computer Audiophile’ blog would go to the trouble of participating in a thread which actively debates this assertion.  He writes as though it were a self-evident ‘night-follows-day’ kind of an issue, almost a point of theology.

Let’s look at some of those facts.  First of all, properly-dithered 24-bit PCM has a theoretical background noise signal within a dB or so of 144dB, whereas DSD64 rarely approaches within even 20dB of that.  No argument from me there.  Also, he points out that DSD64’s noise shaping process produces a massive amount of ultrasonic noise, which starts to appear just above the audio band and continues at a very, very high level all the way out to over 1MHz, which, he argues, all but subsumes the audio signal unless it is filtered out.  We’ll grant him some hyperbolic license, and agree that, technically, what he says is correct.

Another ‘fact’ is, though, that much to Waldrep’s chagrin, there is a substantial body of opinion out there that would prefer to listen to DSD over 24/96.  Why should this be, given that the above technical arguments (and others that you could also add into the mix with which I might also tend to agree) evidently set forth ‘the facts’?  Yes, why indeed… and the answer is simple to state, but complex in scope.  The main reason is that the pro-PCM arguments conveniently ignore the most critical aspect that differentiates the sound quality, which is the business of getting the audio signal into the PCM format in the first place.  Let’s take a look at that.

If we are to encode an audio signal in PCM format, the most obvious way to approach the problem is using a sample-and hold circuit.  This circuit looks at the incoming waveform, grabs hold of it at one specific instant, and ‘holds’ that value for the remainder of the sampling period.  By ‘holding’ the signal, what we are doing is zeroing in on the value that we actually want to measure long enough to actually measure it.

Next we have to assign a digital value to this sampled voltage, and there are a couple of distinct ways to do this.  One technique involves comparing the sampled signal level to the instantaneous value of a sawtooth waveform generated by a precision clock.  As soon as the comparator detects that the instantaneous value of the sawtooth waveform has exceeded the value of the sampled waveform, by looking at the number of clock cycles that have passed we can calculate a digital value for the sampled waveform.  Another technique is a ‘flash ADC’ where a number of simultaneous comparisons are made to precise DC values, each being a unique digital level.  Obviously, for a 16-bit DAC this would mean 65,535 comparator circuits!  That’s doable, but rather expensive.  Think of it as the ADC equivalent of an R-2R ladder DAC.  Yet another method is a hybrid of the two, where a sequence of comparators successively home in on the final result through a series of successive approximations whose logic I won’t attempt to unravel here.  Each of these methods is limited by the accuracy of both the timer clock and the reference voltage levels.

Ultimately, in mixed-signal electronics (circuits with both analog and digital functions), it ends up being far easier to achieve a clock of arbitrary precision than a reference voltage of arbitrary precision.  Way more so, in fact.  For this reason, sample-and-hold ADC architectures have fallen from favour in the world of high-end audio.  Instead, a technique called Sigma-Delta Modulation is used.  You will recognize this term - it is the architecture that is used to create the 1-bit bitstream used in DSD.  The SDM-ADC has for all practical purposes totally eliminated the sample-and-hold architectures in audio applications.

In an SDM-ADC, the trade-off between clock precision and reference voltage precision is resolved entirely in favour of the clock, which can be made as accurate as we want.  In effect, we increase the sample rate to something many, many times higher than what is actually required, and accept a significantly reduced measurement accuracy.  The inaccuracy of the instantaneous measurements are taken care of by a combination of averaging due to massive over-sampling and local feedback with the SDM.  That will have to do in terms of an explanation, because an SDM is a conceptually complex beast, particularly in its analog form.  In any case, the output of the SDM is a digital bitstream which can be 1-bit, but in reality is often 3-5 bits deep.  The PCM output data is obtained on-chip by a digital conversion process similar to that which happens within DSD Master.

As you know, if you are going to encode an analog signal in a PCM format, the price you have to pay is to strictly band-limit the signal to less than one half of the sample rate prior to encoding it.  This involves putting the signal through a ‘brick wall’ filter which removes all of the signal above a certain frequency while leaving everything below that frequency unchecked.  In a sample-and-hold ADC this is performed using an all-analog filter located within the input stage of the ADC.  In the SDM-ADC it is performed in the digital domain during the conversion from the 1-bit (or 3-5 bit) bitstream to the PCM output.

Brick wall filters are nasty things.  Let’s look at a loudspeaker crossover filter as an example of a simple low-pass analog filter that generally can’t be avoided in our audio chain.  The simplest filter is a single-stage filter with a cut-off slope of 6dB per octave (6dB/8ve).  Steeper filters are considered to be progressively more intrusive due to phase disturbances which they introduce, although in practical designs steeper filters are often necessary to get around still greater issues elsewhere.  Now compare that to a brick-wall ‘anti-aliasing’ filter.  For 16/44.1 audio, this needs to pass all frequencies up to 20kHz, yet attenuate all frequencies above 22.05kHz by at least 96dB.  That means a slope of at least 300dB/8ve is required.

If we confine ourselves purely to digital anti-aliasing filters used in a SDM-ADC, a slope of 300dB/8ve inevitably requires an ‘elliptic’ filter.  Whole books have been devoted to elliptic filters, so I shall confine myself to saying that these filters have rather ugly phase responses.  In principle they also have a degree of pass-band ripple, but I am willing to stipulate to an argument that such ripple is practically inaudible.  The phase argument is another matter, though.  Although conventional wisdom has it that phase distortion is inaudible, there is an increasing body of anecdotal evidence that suggests the opposite is the case.  One of the core pillars of Meridian’s recent MQA initiative is based on the assumed superiority of “minimum phase” filter architectures, for example.

By increasing the sample rate of PCM we can actually reduce the aggression required of our anti-aliasing filters.  I have written a previous post on this subject, but the bottom line is that only at sample rates above the 8Fs family (352.8/384kHz) can anti-aliasing filters be implemented with sufficiently low phase distortion.  And Dr. AIX poo-poohs even 24/352.8 (aka ‘DXD’) as a credible format for high-end audio.  Here at BitPerfect we are persuaded by the notion that the sound of digital audio is actually the sound of the anti-aliasing filters that are necessary for its existence, and that the characteristic that predominantly governs this is their phase response.

PCM requires an anti-aliasing filter, whereas DSD does not (actually, strictly speaking it does, but it is such a gentle filter that you could not with any kind of a straight face describe it as a ‘brick-wall’ filter).  DSD has no inherent phase distortion resulting from a required filter.  Instead, it has ultrasonic noise, and this is where Dr. AIX’s argument encounters difficulties.  The simple solution is to filter it out.  However, if we read his post, he seems to think that no such filtering is used … I quote: "It’s supposed to be out of the audio band but there is no ‘audio band’ for your playback equipment".  Seriously?  All it calls for is a filter similar to PCM’s ‘anti-aliasing’ filter, except not nearly as rigorous in its requirements.

Let me tell you how DSD Master approaches this in our DSD-to-PCM conversions.  We know that, for 24/176.4 PCM conversions for example, we need only concern ourselves in a strict sense with that portion of the ultrasonic noise above 88.2kHz.  It needs to be filtered out by at least 144dB or we will get aliasing.  However, the steepness of the filter and its phase response are governed by the filter’s cut-off frequency.  For the filters we use, the phase response remains pretty much linear up to about 80% of this frequency.  Therefore we have some design freedom to push this frequency out as far as we want, and we choose to place it at a high enough frequency that the phase response remains quasi-linear across the entire audio band.  Of course, the further we push it out, the more of the ultrasonic noise is allowed to remain in the encoded PCM data.

As an aside, you might well ask: If the ultrasonic noise is inaudible, then why do we have to filter it out in the first place?  And that would indeed be a good question.  According to auditory measurements, it is simple to determine that humans can’t hear anything above 20kHz - or even less as we age.  However, more elaborate investigations indicate that we do respond subconsciously to ultrasonic stimuli that we cannot otherwise demonstrate that we hear.  So it remains an interesting open question whether the presence of heavy ultrasonic content would actually have an impact on our perception of the sound.  On the other hand, a lot of audio equipment is not designed to handle a heavy ultrasonic signal content.  We know of one high-end TEAC DAC that could not lock onto a signal that contained even a modest -60dB of ultrasonic content (that problem, once identified, was quickly fixed with a firmware update).  Such are probably as good reasons as any to want to filter it out.

So what do we do with the DSD content above 20kHz?  In developing DSD Master we take the view that the content of this frequency range contains both the high-frequency content of the original signal (if any), plus the added high frequency noise created by the SDM’s noise-shaping process.  We try to maintain any high frequency content within the signal flat up to 30kHz, and then begin our roll-off above that.  Consequently, our DSD conversions at high sample rates (88.2kHz and above) do contain a significant ultrasonic peak in the 35-40kHz range.  However, that peak is limited to about -80dB, which is way too low to either be audible(!) or to cause instability in anyone’s electronics.  Meanwhile, the phase response is quasi-linear up to the point at which the ultrasonic noise rises above the signal level.

In designing DSD Master, we make those design compromises on the basis that the purpose of these conversions is to be used for final listening purposes.  But if a similar functionality is being designed for the internal conversion stage of a PCM SDM-ADC then we know that a residual ultrasonic noise peak in the output data is not going to be acceptable.  In our view, this means that design choices will be made which do not necessary coincide with the best possible sound quality.

As a final point, all the above observations are specific to ‘regular’ DSD (aka ‘DSD64’).  The problem with ultrasonic noise pretty much goes away with DSD128 and above, something I have also written about in detail in a previous post.

So, from the foregoing, purely from a logical point of view, it seems somewhat contradictory for Dr. AIX to suggest that 24/96 PCM is inherently better than DSD, since DSD comes directly out of a SDM in its native form, whereas PCM is derived through digital manipulation of an SDM output with, among other things, a ‘brick-wall’ filter with a less-than-optimal configuration.  I’ll also point out that his argument suggests that DSD (i.e the output of an SDM) will not deliver the full bit depth that he offers up as a key distinguishing feature of 24/96.  Of course, those arguments apply only to ‘purist’ recordings which seek to capture the microphone output as naturally as possible.  In that way the discussion is not coloured by any post-processing of the signal, which in any case is not possible in the native DSD domain.

Monday, 22 June 2015

Day One - Intellectual Property

I have mentioned before that I subscribe to B&W’s Society of Sound, and have done so for the last five years.  It costs me $60 for a 12-month subscription for which I get to download 24 high-resolution albums, two per month.  I think it’s a great deal.  Each month, I get one album from London Symphony Orchestra’s LSO Live label, and one from Peter Gabriel’s RealWorld label.  For me, the classical downloads are the major pull, but occasionally the RealWorld offering turns out to be the bigger gem.  Such was the case this month, when the offering was Day One’s new album Intellectual Property.

Day One is a band I have never heard of.  Over the course of 15 years, this would appear to be only the English Duo's third album, but they have managed to make their mark with contributions to a number of TV and movie soundtracks.  Check out the link below.

How would I describe the latest album “Intellectual Property”?  On one hand there are a number of apparent influences which include David Bowie, Peter Gabriel, Lou Reed, Ian Dury, and the Red Hot Chilli Peppers.  On the other hand I detect stylistic tips of the hat to Motown, Country, and New Age, all of which underly an overall vibe of something you might call “stoner hip-hop”.  Maybe it is all those 70’s and 80’s influences that appeal to me.  What’s that you said?…. the stoner element?

Anyway, call it what you will, it is a superb album.  The songwriting is sharp and observant without trying to be too deep.  Each and every track has a clear hook, and the recording is clean and full, with a sensitive hand on the production levers, although the sound overall does fall short of the highest audiophile standards.  At any rate, it is quite simply a first-rate album.  And, at the moment, I think the only way to get hold of it is via B&W’s Society of Sound.  So, if nothing else, it is a good excuse for you to check it out.  There is even a Free Trial option, so what’s holding you up?  :)

http://www.bowers-wilkins.com/Society_of_Sound/Society_of_Sound/Music/Day-One-Intellectual-Property.html

Dynamic Compression

Most computer users will be familiar with data compression.  This was a godsend at the dawn of the internet age when internet connections were achieved via dial-up modems with bandwidths restricted to clockwork speeds.  The first ever document I received over the internet was a 1MB WordPerfect file, and the transmission took about three hours using a modem that set me back $280.  Even so, this was still a great thing, given that the file wouldn’t fit on a 5.25” floppy disk which could otherwise have been posted to me.  I didn’t have PKZip at that time, but eventually a colleague introduced me to it.  For many years thereafter nobody would ever consider sending an e-mail attachment without first “zipping” it.  Zipping a word processor file could reduce the file size by enormous factors of 5X or more.  Data compression was, and still is, a great thing.  In audio, formats like FLAC and Apple Lossless use data compression to reduce the size of an audio file without compromising its audio content.  By contrast, formats like MP3 and AAC go a step further and irretrievably delete some of the audio content to make the file smaller yet.  But dynamic compression is a different beast entirely. 

When an audio signal continues to increase in volume, at some point you will run into a limitation.  For example, beyond a certain (catastrophically loud) volume, air itself loses the ability to faithfully transmit a sound.  If you drive your loudspeaker with too many Watts, the drive units will self destruct.  If you feed your amplifier’s inputs with too large of a signal, its outputs will clip.  If you try to record too loud of a signal onto an analog tape, the tape will distort.  And if you try to encode too large of a signal in a digital format … well, you can’t get there from here, and you just have to encode something else instead - typically digital hard clipping.

Therefore, whether in today’s digital age or in the analog age of yore, anybody who is tasked with capturing and recording an analog signal has to be concerned with level matching.  If you turn the signal level up too high, you will encounter one of the previously mentioned problems (hopefully one of the last two).  If you turn it down too low, the sound will eventually descend into the noise and be lost.  However, analog tape had a built-in antidote.  It turns out that if you overload an analog tape, the overload is managed ‘gracefully’, which means that you could record at a level higher than the linear maximum and it wouldn’t sound too bad.

In fact, not only does it not sound too bad, but if you play back the resultant recording over a low-fidelity system like a radio or a boom box, it can actually sound better than a recording that properly preserves the full dynamic range.  This is because the dynamic range within a high quality recording is greater than the ability of the low-fi system to reproduce it, and the result can be a sound that appears to be quiet and lifeless.  By allowing the analog tape to saturate, the dynamic range of the recorded signal is effectively reduced (or ‘compressed’), and better matched to that of a low-fi system.  In fact, in all but the very finest systems, a little bit of dynamic compression is found by most people to be slightly preferable to none at all.  Which is a problem for those of us fortunate enough to enjoy the finest systems, whose revealing nature tends to deliver the opposite result.

With analog tape, managing dynamic compression through tape saturation is a finely balanced skill.  It is not something that you can easily bend to your design.  It’s sometimes considered to be more of an art than a science.  On the other hand, in the digital domain, dynamic compression can be tailored umpteen different ways according to your whim, and you can dial in just the right amount if you believe your recording needs it.  Most digital dynamic compression algorithms are seriously simple, being nothing more than a non-linear transfer function based on Quadratic, Cubic, Sinusoidal, Exponential, Hyperbolic tangent, or Reciprocal functions (to name but a few).  Ideally, the transfer function would remain linear up to a point, above which the non-linearity would progressively kick in, and the better regarded algorithms (such as the Cubic) do behave like that.  But most serious listeners agree that digital dynamic compression never sounds as good as ‘natural’ dynamic compression from magnetic tape.  Maybe this is one of the reasons analog still has its strong adherents.

The thing about digital dynamic compression is that, once it kicks in, its effect on the sound is rather drastic.  Harmonic distortion components at levels as high as -20dB are common.  Moreover, the technique can create substantial harmonic distortion components above the Nyquist frequency, which get mirrored down into the audio band where they appear as inharmonic frequencies which are subjectively a lot more discomforting than harmonic frequencies.  It also creates huge intermodulation distortion artifacts, also highly undesirable.

There are papers out there which do a very thorough job of analyzing what various dynamic compression systems, both real and theoretical, could do if they were implemented, and the conclusions they come to are pretty consistent.  Digital dynamic compression fundamentally sucks, and there’s not much you can do about it.  But having said that, if you have some understanding of how compression works, are willing to limit the amount of applied compression judiciously, and have sufficient computing power available, you can bring to bear a whole grab-bag of tricks to try to minimize them.  Such techniques include side-chain processing (where several analyses of the signal happen in parallel as inputs to the core compression tool), look-ahead (analysis of the future input signal, obviously not for real-time applications), advanced filtering (seeks to reduce unwanted distortions by filtering them out), and active attack/release control (governs the extent to which the sudden onset of compression is audible).  Sophisticated pro-audio tools can bring all these techniques - and more - to the party.

Dynamic compression as a serious issue of sound quality came to a head (or descended to its depths, depending on your viewpoint) during the early 2000’s with the so-called “loudness wars”.  The music industry was coming to terms with the notion that a lot of popular music was being listened to in MP3 format on portable players of limited fidelity.  While with their left hands they were trying their best to prevent the proliferation of music in the MP3 format, with their right hands they were recognizing that if music was going to be listened to on portable systems with restricted dynamic range it might sound better if the recordings themselves had a similarly restricted dynamic range.  It is a well known psychoacoustic effect that, when comparing two similar recordings, people overwhelmingly tend to perceive the louder one to be better, and dynamic compression is a way to increase the perceived loudness of a recording.  The labels therefore started falling over themselves to release recordings with more and more “loudness”, or put another way, with more and more dynamic compression.

Take U2’s “How to Dismantle an Atomic Bomb”, released in 2004.  This album is a downright disgrace.  It sounds absolutely appalling.  I bought it when it came out and haven’t listened seriously to it since.  And if there is any doubt as to why that might be, just take a look at the attached screenshot image.  These are waveform envelopes obtained using Adobe Audition.  The top track is “Vertigo” from this album.  The bottom track is “With or Without You” from their 1988 release Joshua Tree.  Both are ripped from the standard commercial CD releases.  The difference is laughable.  You can clearly see how the one on the top has been driven deeply into dynamic compression.



To attempt to quantify this effect, the “Loudness War” website endorses a free tool called the Tischmeyer Technology (TT) Loudness Meter.  This measures Vertigo as DR5 which it classifies as “Bad” (DR0 - DR7), and With or Without You as DR12 which is in the “Transition” range (DR8 - DR13), but getting close to Good (which starts at DR14).  All else being equal, the higher the number the better the sound, but the numerical result is quite dependent on the program material.  Next time you play an album, see if it is listed on dr.loudness-war.info and check its rating.  If it isn’t listed, it is a simple job to download the free TT Loudness Meter tool, measure the album yourself, and upload the data.

And it isn’t just the music business that faces this issue.  Incredibly, I also encounter it in the ultra-low-fi world of the TV sound track.  Just when you thought plain old dynamic compression was bad enough, the more aggressive “loudness shaping” algorithms also heavily modulate the volume of the sound track, winding it up during “quiet” passages when there is no dialog, or even between breaths during the dialog itself.  This has the effect of raising the background noise to the same loudness level as the dialog itself - and you can plainly hear it winding up and down - making watching the TV show a most unpleasant experience.  For me, for example, it ruined the last season of “House”.  I can’t begin to imagine how bad a TV set would have to be for such measures to be remotely beneficial.

As a final observation, for the purists who like to work in DSD, there are a couple of important considerations to bear in mind.  The first is that, in native DSD mode, you simply cannot do any sort of signal processing whatsoever - not even something as trivial as volume control (fade-in/fade-out for example), let alone dynamic compression.  You have to convert to PCM to do that and then convert back to DSD, which most DSD purists find unacceptable.  The other interesting thing is in the Sigma-Delta Modulators which convert analog (or PCM digital) to DSD format, which warrants a discussion all of its own.

As you increase the signal level in these modulators the result is far from deterministic.  Overloading the modulator can make it go unstable in an unpredictable manner.  For that reason, the SACD standard requires the analog signal level encoded in DSD to be 6dB below the theoretical maximum that the format can support.  But interesting things happen if you over-drive the modulator.  Most contain special circuits or algorithms which detect the onset of instability and apply corrective measures.  This means that the modulators can normally accept inputs that exceed the supposed -6dB limit, with a penalty limited to a slight increase in distortion.  Keep pushing it further, though, and the modulator self-resets, resulting in an audible click.


In a sense, if you are a recording engineer, DSD is a bit like analog tape on steroids.  If your signal exceeds the -6dB limit then to a large degree you are going to be able to get away with it, unlike the situation with PCM digital, where the signal will either clip, or the dynamic compressor will to cut in.  With DSD you get the ‘graceful’ overload of analog tape, but without the associated dynamic compression.  The result is probably the best of all worlds.  Interestingly, with our DSD Master tool, it gives us an accurate view into whether or not the recording/mastering engineer has “pushed” the recording beyond the -6dB guideline, and you would be seriously surprised at the extent to which such behaviour appears to be the norm.

Friday, 19 June 2015

Dark Mode Icons

Is anybody out there experienced in designing menu bar icons for OS X Yosemite's 'Dark Mode'?  We are having a spot of trouble and need some sage advice.  E-mail me.

Monday, 15 June 2015

Yes, but is it Art?

Most aspects of modern life at the personal level tend to operate on the basis of meritocracy.  The better you are at something, the more likely you are to be recognized and rewarded for it.  Of course, there are exceptions and points of disagreement.  And the field in which these tend to be most strongly debated is the Fine Arts.

Throughout history, recognition in the fine arts has traditionally come by dint of serious talent.  Think of Michelangelo, Rembrandt, or Turner.  Sometimes the extent of that talent is not immediately recognized - the perfect example would be Van Gogh - but that generally represents a re-assessment of the nature of the talent rather than a debate as to whether it existed in the first place.  These days, however, it is all too common for there to be serious debate as to whether an artist actually has any talent whatsoever - the names of Jackson Pollock and (admittedly, a controversial inclusion here) Thierry Guetta may come to mind.

In many ways, Jackson Pollock is an easy target.  When he started out, his early work did undoubtedly exhibit elements of form and composition.  But as he matured, his methods increasingly shed anything that could be attributed to a considered application of technical skill.  He would throw liquid paint at the canvas in an uncontrolled manner, producing finished works that courted both controversy and adulation.  Check him out on YouTube.  A noted alcoholic, ‘Jack the Dripper’ died at 44, drunk at the wheel, in a single-car accident which also took the life of one of his passengers.

Pollock’s No 5, 1948 assumed the mantle of the world’s most expensive painting when it sold privately in 2006 for the incredible sum of $140,000,000.  The painting was originally bought from a gallery by a collector, but it was damaged during delivery to the purchaser’s home.  After much to-and-fro, Pollock agreed to ‘repair’ the damaged painting.  He did this by repainting the whole thing, reportedly saying of the customer “He’ll never know”.  The collector, it turned out, did know, but was satisfied nonetheless, even though the ‘repaired’ image was not even the same as the original (no record of which is believed to exist).  Check out images of No 5, 1948 on the Internet.  What do you think?  Worth $140 Million? - and if not, then how much?  Since Pollock is dead, I stand ready to step in.

I’ve always enjoyed photography, although I don’t have much talent for it.  Back in the day, before the Internet, I used to subscribe to a particular photography magazine.  One recurring theme was readers’ letters which expressed their lack of appreciation for some of the photographs that had appeared in the magazine.  The general gist of the complaints was always “That was not a particularly good photograph - I could have taken it myself”.  To which the reply repeatedly offered was something smug along the lines of “Well, why didn’t you then?”  To me, this seemed to be intentionally skirting what was an obvious issue.  What the reader was really trying to convey was “If I had taken that photograph I would not have considered it worthy of publication”.  To my mind that was a serious point that could - and should - have been productively explored.

These days there are many web sites which run a ‘photograph of the month’ competition, and the quality of the winning entries is always stunning.  I am sure that there would be unanimous agreement among viewers that those pictures were, if nothing else, at least worthy of submission to the competition.  Rare would be the viewer who would not have been pretty pleased with themselves had they been the one to take the picture.  Ordinary people are in general very appreciative of a good photograph.  They recognize the skill involved in taking the picture, which they appreciate may be beyond their own abilities.

Although both are images, there are some significant differences between a painting and a photograph.  Chief among them is that a photograph, inherent in its very nature, can be exactly replicated, whereas each painting is a unique entity.  Nonetheless a photograph makes for a useful analogy in making the point I want to make about art.  In the world of Fine Art these days, new art rarely makes a positive impression on the market in isolation.  The artist generally needs to ‘sell’ the piece by presenting some abstract rationale behind the existence of the work.  Imagine how that would work in photography.

Like this, I imagine:  I invent some elaborate strategy for taking pseudo-random photographs.  I then ‘carefully’ pick one and present it to a selected audience of influential photograph collectors.  I come up with some rationale for how the photograph is the outcome of a particular artistic endeavour and consequently encapsulates those artistic principles in this or that manner.  As a finishing touch, I have the word put out that I am a troubled and cantankerous soul, thin-skinned and hard to work with.  I don’t know about you, but that sets off my BS meter way into the red zone.  Yet this is how the Fine Art market has operated for most of the last century.

Although both art and music fall into the general sphere of ‘The Arts’, for the purpose of this discussion I shall choose to use the term ‘Art’ to describe only visual art - painting, sculpture, photography, etc, so that ‘Art’ and ‘Music’ are to have distinct and separate meanings.  I shall then break both spheres down into two separate categories, which I shall term ‘background’ and ‘foreground’.

Foreground Art and Music are those specific categories which are intended to be appreciated for their own intrinsic merit as standalone entities.  Foreground Art is something which transcends the mere decorative.  It is something we want exclusively for what it is, rather than for how well it blends in.  Foreground Art is something we make room for.  Background Art is something we only desire because it blends in so well.  Background Art is the background to our lives as we go about them … it is part of the decor.  Foreground Art is are the things for which we pause our lives to focus on our appreciation of them.  Foreground Art is the meat to background Art’s potatoes.  Likewise, foreground Music is the reason we have HiFi systems.  We set aside time to listen specifically to foreground Music, and when we do we usually immerse ourselves into it.  Me, I like to turn the lights off.

Background Music is a term which is already established in our lexicon.  It is the soundtrack to our lives.  It has to be harmless, comfortable, and above all not distracting … even while it can often be annoyingly loud.  It helps if its rhythms reflect and enhance the rhythms of whatever we are occupying ourselves with.  Background Music has become a necessary beat to the dance of life.  Modern movies, for example, often have a virtually continuous soundtrack.  By contrast, foreground Music is for when we want to stop whatever else we are doing and just listen.  Sometimes we may choose it specifically for the mood it embodies, but mostly we want to appreciate it entirely on its own merits.

In both Art and Music there are highbrow adherents who like to think of themselves as the arbiters and standard bearers of taste and formal appreciation.  Oftentimes they succeed in those endeavours, in that they are willing and able to devote significant proportions of their lives and personal resources to the task.  But it doesn’t mean that the rest of us - you and I - don’t have taste and a true sense of appreciation.  It just means that when the latest Jackson Pollocks of our age come up for sale we have neither the desire nor the wherewithal to be anywhere near the front of the line.  We are, to use a term Richard Nixon popularized for an entirely different meaning, the ‘Silent Majority’.

We - the Silent Majority - tend to be very clear that we expect both our Art and our Music to require a measure of clear and present skill in its conception, execution and delivery.  We’re even willing to take two out of three.  When we see a 30-foot square canvas painted entirely in red in an art gallery, and read that the gallery acquired it for a million dollars, we shout BS!!!  We no longer write to the magazine editors saying “I have taken better pictures than that and don’t think especially highly of them”, but we still think it.  Likewise, we expect our musicians to perform with an evident degree of serious skill and/or write moving, observant, and incisive songs.

Don’t get confused by the latest pop, rap, and lounge superstars.  That’s not foreground music.  That’s background music.  Way back when Milli Vanilli were being outed as being a fake pop band, whose music was actually recorded by session musicians (‘The Monkees’, anyone?), I don’t recall there being any real consumer outrage.  It was all the embarrassed industry insiders, whose carefully crafted aura of expertise and fine judgement was very publicly pricked, who made all the noise.  If it happened today, would anybody actually care whether or not Ariana Grande actually sings on her own songs?

Me?  I’m hoping to score tickets to see Rodrigo y Gabriela at the Montreal Jazz Festival.  Wish me luck.

Thursday, 11 June 2015

Happiness Is A Warm Bun

At least it can be, when that bun is freshly baked and straight from the oven.  And, happily for me, my wife bakes a pretty mean bun!  But can you quantify just how good that bun is?  And does it necessarily follow that a random sampling of people will agree on what makes a good bun, or that a particular warm bun will make all of those people equally happy?  Finally, when a person says the warm bun makes him happy, do we actually have anything beyond his word for it?  Is it possible to quantify exactly how happy he is?  And would The Beatles agree?

Such are the problems with the subjective/objective debate.  Some things which seem blindingly obvious at the macro level, are a lot harder to pin down at the micro level.  If everybody is agreed that cinnamon in the bun makes for a good bun, does a pinch more or less cinnamon make for a slightly better or poorer bun?  If you are as serious about warm buns as some of us are about high-end audio, questions such as these can lead you down a rabbit hole.

Ultimately, my stereo system makes me happy.  If I could improve it, that could probably make me happier.  But if I sold my car, my house, and my wife to raise the money to buy the MBL über-system of my dreams, the net outcome would most assuredly not be an overall increase in my happiness, so there is always a balance to be found.

On the design and manufacturing side, though, there are many objective tools that can be brought to bear which, if carefully selected and implemented, can be shown to correlate well with a wholly subjective assessment of the outcome.  But this gives rise to an often acrimonious debate which afflicts our industry - or at least its community of users and commentators - what happens when the objective assessment is at odds with the subjective assessment.  Because the fact is that, deep down at the micro level, this is uncomfortably often the case.

Here at BitPerfect, most of what we do is governed by a subjective assessment of our efforts.  Sure, some of the work we do requires intelligently-designed signal processing, and this work is solidly underpinned by both theory and measurement.  But for the most part we release the products we develop only when we think they sound right - when they make us happiest.

For the most part, BitPerfect itself does not rely on any signal processing.  We focus just on getting the original audio data from your Mac to your DAC as cleanly as possible.  The mere fact that software can have an audible impact on how that sounds without in any way altering the data presents us with an objective minefield.  The truth is we don’t have anything beyond a vague arm-waving rationale to explain how software can have that sort of impact, and we don’t have the measurement tools (or for that matter the technology, let alone the budget) at our disposal to give substance to it.  Indeed, the field of audio in general lacks tools that enable us to objectively quantify many of the subjective attributes we value, particularly when it comes to differentiating performance at the bleeding edge of the art.  My favourite example here is stereo imaging.  Some systems image incredibly well, others less so.  I am not aware of any parameter that directly measures this attribute, although there are many parameters that we know can correlate well with it.  Holographic Imaging is a 100% subjective quality.

Happiness is a similar thing.  There are many attributes that correlate very well with it, and we can measure or quantify most of them.  But none of them **are** happiness.  At the end of the day, only we as individuals can truly know whether we are happy or not.  Only we can know whether some external thing makes us happy or not.  But the Internet is home to some very special people.  They, evidently, know far better than I do what makes me happy, and therefore, by extension, presume to be the arbiters of whether I am in fact happy or not.  And they can do all this without ever having met me!  I’m sure you must have come across some of them.  What they know to an absolute certainty is that I cannot claim to be truly happy unless a double-blind test proves that to be the case.  Absent such proof, my protestations to the contrary are evidence of nothing but the ‘placebo effect’.  They could use a warm bun or two.  In its warm glow, they could perhaps devote their pent-up energies to devising a foolproof double-blind happiness test!