Thursday, 25 June 2015

On DSD vs PCM … again

Mark Waldrep (aka ‘Dr. AIX’) has put a couple of DSD posts on his RealHD-Audio web site this month.  Mark writes quite knowledgeably on audiophile matters, but is prone to a ‘you-can’t-argue-with-the-facts’ attitude predicated on an overly simplistic subset of what actually comprises ‘the facts’.   In particular, Mark insists that 24-bit, 96kHz PCM is better than DSD, and one of the posts I am referring to discusses his abject bewilderment that 530 people (and counting) on the ‘Computer Audiophile’ blog would go to the trouble of participating in a thread which actively debates this assertion.  He writes as though it were a self-evident ‘night-follows-day’ kind of an issue, almost a point of theology.

Let’s look at some of those facts.  First of all, properly-dithered 24-bit PCM has a theoretical background noise signal within a dB or so of 144dB, whereas DSD64 rarely approaches within even 20dB of that.  No argument from me there.  Also, he points out that DSD64’s noise shaping process produces a massive amount of ultrasonic noise, which starts to appear just above the audio band and continues at a very, very high level all the way out to over 1MHz, which, he argues, all but subsumes the audio signal unless it is filtered out.  We’ll grant him some hyperbolic license, and agree that, technically, what he says is correct.

Another ‘fact’ is, though, that much to Waldrep’s chagrin, there is a substantial body of opinion out there that would prefer to listen to DSD over 24/96.  Why should this be, given that the above technical arguments (and others that you could also add into the mix with which I might also tend to agree) evidently set forth ‘the facts’?  Yes, why indeed… and the answer is simple to state, but complex in scope.  The main reason is that the pro-PCM arguments conveniently ignore the most critical aspect that differentiates the sound quality, which is the business of getting the audio signal into the PCM format in the first place.  Let’s take a look at that.

If we are to encode an audio signal in PCM format, the most obvious way to approach the problem is using a sample-and hold circuit.  This circuit looks at the incoming waveform, grabs hold of it at one specific instant, and ‘holds’ that value for the remainder of the sampling period.  By ‘holding’ the signal, what we are doing is zeroing in on the value that we actually want to measure long enough to actually measure it.

Next we have to assign a digital value to this sampled voltage, and there are a couple of distinct ways to do this.  One technique involves comparing the sampled signal level to the instantaneous value of a sawtooth waveform generated by a precision clock.  As soon as the comparator detects that the instantaneous value of the sawtooth waveform has exceeded the value of the sampled waveform, by looking at the number of clock cycles that have passed we can calculate a digital value for the sampled waveform.  Another technique is a ‘flash ADC’ where a number of simultaneous comparisons are made to precise DC values, each being a unique digital level.  Obviously, for a 16-bit DAC this would mean 65,535 comparator circuits!  That’s doable, but rather expensive.  Think of it as the ADC equivalent of an R-2R ladder DAC.  Yet another method is a hybrid of the two, where a sequence of comparators successively home in on the final result through a series of successive approximations whose logic I won’t attempt to unravel here.  Each of these methods is limited by the accuracy of both the timer clock and the reference voltage levels.

Ultimately, in mixed-signal electronics (circuits with both analog and digital functions), it ends up being far easier to achieve a clock of arbitrary precision than a reference voltage of arbitrary precision.  Way more so, in fact.  For this reason, sample-and-hold ADC architectures have fallen from favour in the world of high-end audio.  Instead, a technique called Sigma-Delta Modulation is used.  You will recognize this term - it is the architecture that is used to create the 1-bit bitstream used in DSD.  The SDM-ADC has for all practical purposes totally eliminated the sample-and-hold architectures in audio applications.

In an SDM-ADC, the trade-off between clock precision and reference voltage precision is resolved entirely in favour of the clock, which can be made as accurate as we want.  In effect, we increase the sample rate to something many, many times higher than what is actually required, and accept a significantly reduced measurement accuracy.  The inaccuracy of the instantaneous measurements are taken care of by a combination of averaging due to massive over-sampling and local feedback with the SDM.  That will have to do in terms of an explanation, because an SDM is a conceptually complex beast, particularly in its analog form.  In any case, the output of the SDM is a digital bitstream which can be 1-bit, but in reality is often 3-5 bits deep.  The PCM output data is obtained on-chip by a digital conversion process similar to that which happens within DSD Master.

As you know, if you are going to encode an analog signal in a PCM format, the price you have to pay is to strictly band-limit the signal to less than one half of the sample rate prior to encoding it.  This involves putting the signal through a ‘brick wall’ filter which removes all of the signal above a certain frequency while leaving everything below that frequency unchecked.  In a sample-and-hold ADC this is performed using an all-analog filter located within the input stage of the ADC.  In the SDM-ADC it is performed in the digital domain during the conversion from the 1-bit (or 3-5 bit) bitstream to the PCM output.

Brick wall filters are nasty things.  Let’s look at a loudspeaker crossover filter as an example of a simple low-pass analog filter that generally can’t be avoided in our audio chain.  The simplest filter is a single-stage filter with a cut-off slope of 6dB per octave (6dB/8ve).  Steeper filters are considered to be progressively more intrusive due to phase disturbances which they introduce, although in practical designs steeper filters are often necessary to get around still greater issues elsewhere.  Now compare that to a brick-wall ‘anti-aliasing’ filter.  For 16/44.1 audio, this needs to pass all frequencies up to 20kHz, yet attenuate all frequencies above 22.05kHz by at least 96dB.  That means a slope of at least 300dB/8ve is required.

If we confine ourselves purely to digital anti-aliasing filters used in a SDM-ADC, a slope of 300dB/8ve inevitably requires an ‘elliptic’ filter.  Whole books have been devoted to elliptic filters, so I shall confine myself to saying that these filters have rather ugly phase responses.  In principle they also have a degree of pass-band ripple, but I am willing to stipulate to an argument that such ripple is practically inaudible.  The phase argument is another matter, though.  Although conventional wisdom has it that phase distortion is inaudible, there is an increasing body of anecdotal evidence that suggests the opposite is the case.  One of the core pillars of Meridian’s recent MQA initiative is based on the assumed superiority of “minimum phase” filter architectures, for example.

By increasing the sample rate of PCM we can actually reduce the aggression required of our anti-aliasing filters.  I have written a previous post on this subject, but the bottom line is that only at sample rates above the 8Fs family (352.8/384kHz) can anti-aliasing filters be implemented with sufficiently low phase distortion.  And Dr. AIX poo-poohs even 24/352.8 (aka ‘DXD’) as a credible format for high-end audio.  Here at BitPerfect we are persuaded by the notion that the sound of digital audio is actually the sound of the anti-aliasing filters that are necessary for its existence, and that the characteristic that predominantly governs this is their phase response.

PCM requires an anti-aliasing filter, whereas DSD does not (actually, strictly speaking it does, but it is such a gentle filter that you could not with any kind of a straight face describe it as a ‘brick-wall’ filter).  DSD has no inherent phase distortion resulting from a required filter.  Instead, it has ultrasonic noise, and this is where Dr. AIX’s argument encounters difficulties.  The simple solution is to filter it out.  However, if we read his post, he seems to think that no such filtering is used … I quote: "It’s supposed to be out of the audio band but there is no ‘audio band’ for your playback equipment".  Seriously?  All it calls for is a filter similar to PCM’s ‘anti-aliasing’ filter, except not nearly as rigorous in its requirements.

Let me tell you how DSD Master approaches this in our DSD-to-PCM conversions.  We know that, for 24/176.4 PCM conversions for example, we need only concern ourselves in a strict sense with that portion of the ultrasonic noise above 88.2kHz.  It needs to be filtered out by at least 144dB or we will get aliasing.  However, the steepness of the filter and its phase response are governed by the filter’s cut-off frequency.  For the filters we use, the phase response remains pretty much linear up to about 80% of this frequency.  Therefore we have some design freedom to push this frequency out as far as we want, and we choose to place it at a high enough frequency that the phase response remains quasi-linear across the entire audio band.  Of course, the further we push it out, the more of the ultrasonic noise is allowed to remain in the encoded PCM data.

As an aside, you might well ask: If the ultrasonic noise is inaudible, then why do we have to filter it out in the first place?  And that would indeed be a good question.  According to auditory measurements, it is simple to determine that humans can’t hear anything above 20kHz - or even less as we age.  However, more elaborate investigations indicate that we do respond subconsciously to ultrasonic stimuli that we cannot otherwise demonstrate that we hear.  So it remains an interesting open question whether the presence of heavy ultrasonic content would actually have an impact on our perception of the sound.  On the other hand, a lot of audio equipment is not designed to handle a heavy ultrasonic signal content.  We know of one high-end TEAC DAC that could not lock onto a signal that contained even a modest -60dB of ultrasonic content (that problem, once identified, was quickly fixed with a firmware update).  Such are probably as good reasons as any to want to filter it out.

So what do we do with the DSD content above 20kHz?  In developing DSD Master we take the view that the content of this frequency range contains both the high-frequency content of the original signal (if any), plus the added high frequency noise created by the SDM’s noise-shaping process.  We try to maintain any high frequency content within the signal flat up to 30kHz, and then begin our roll-off above that.  Consequently, our DSD conversions at high sample rates (88.2kHz and above) do contain a significant ultrasonic peak in the 35-40kHz range.  However, that peak is limited to about -80dB, which is way too low to either be audible(!) or to cause instability in anyone’s electronics.  Meanwhile, the phase response is quasi-linear up to the point at which the ultrasonic noise rises above the signal level.

In designing DSD Master, we make those design compromises on the basis that the purpose of these conversions is to be used for final listening purposes.  But if a similar functionality is being designed for the internal conversion stage of a PCM SDM-ADC then we know that a residual ultrasonic noise peak in the output data is not going to be acceptable.  In our view, this means that design choices will be made which do not necessary coincide with the best possible sound quality.

As a final point, all the above observations are specific to ‘regular’ DSD (aka ‘DSD64’).  The problem with ultrasonic noise pretty much goes away with DSD128 and above, something I have also written about in detail in a previous post.

So, from the foregoing, purely from a logical point of view, it seems somewhat contradictory for Dr. AIX to suggest that 24/96 PCM is inherently better than DSD, since DSD comes directly out of a SDM in its native form, whereas PCM is derived through digital manipulation of an SDM output with, among other things, a ‘brick-wall’ filter with a less-than-optimal configuration.  I’ll also point out that his argument suggests that DSD (i.e the output of an SDM) will not deliver the full bit depth that he offers up as a key distinguishing feature of 24/96.  Of course, those arguments apply only to ‘purist’ recordings which seek to capture the microphone output as naturally as possible.  In that way the discussion is not coloured by any post-processing of the signal, which in any case is not possible in the native DSD domain.

Monday, 22 June 2015

Day One - Intellectual Property

I have mentioned before that I subscribe to B&W’s Society of Sound, and have done so for the last five years.  It costs me $60 for a 12-month subscription for which I get to download 24 high-resolution albums, two per month.  I think it’s a great deal.  Each month, I get one album from London Symphony Orchestra’s LSO Live label, and one from Peter Gabriel’s RealWorld label.  For me, the classical downloads are the major pull, but occasionally the RealWorld offering turns out to be the bigger gem.  Such was the case this month, when the offering was Day One’s new album Intellectual Property.

Day One is a band I have never heard of.  Over the course of 15 years, this would appear to be only the English Duo's third album, but they have managed to make their mark with contributions to a number of TV and movie soundtracks.  Check out the link below.

How would I describe the latest album “Intellectual Property”?  On one hand there are a number of apparent influences which include David Bowie, Peter Gabriel, Lou Reed, Ian Dury, and the Red Hot Chilli Peppers.  On the other hand I detect stylistic tips of the hat to Motown, Country, and New Age, all of which underly an overall vibe of something you might call “stoner hip-hop”.  Maybe it is all those 70’s and 80’s influences that appeal to me.  What’s that you said?…. the stoner element?

Anyway, call it what you will, it is a superb album.  The songwriting is sharp and observant without trying to be too deep.  Each and every track has a clear hook, and the recording is clean and full, with a sensitive hand on the production levers, although the sound overall does fall short of the highest audiophile standards.  At any rate, it is quite simply a first-rate album.  And, at the moment, I think the only way to get hold of it is via B&W’s Society of Sound.  So, if nothing else, it is a good excuse for you to check it out.  There is even a Free Trial option, so what’s holding you up?  :)

http://www.bowers-wilkins.com/Society_of_Sound/Society_of_Sound/Music/Day-One-Intellectual-Property.html

Dynamic Compression

Most computer users will be familiar with data compression.  This was a godsend at the dawn of the internet age when internet connections were achieved via dial-up modems with bandwidths restricted to clockwork speeds.  The first ever document I received over the internet was a 1MB WordPerfect file, and the transmission took about three hours using a modem that set me back $280.  Even so, this was still a great thing, given that the file wouldn’t fit on a 5.25” floppy disk which could otherwise have been posted to me.  I didn’t have PKZip at that time, but eventually a colleague introduced me to it.  For many years thereafter nobody would ever consider sending an e-mail attachment without first “zipping” it.  Zipping a word processor file could reduce the file size by enormous factors of 5X or more.  Data compression was, and still is, a great thing.  In audio, formats like FLAC and Apple Lossless use data compression to reduce the size of an audio file without compromising its audio content.  By contrast, formats like MP3 and AAC go a step further and irretrievably delete some of the audio content to make the file smaller yet.  But dynamic compression is a different beast entirely. 

When an audio signal continues to increase in volume, at some point you will run into a limitation.  For example, beyond a certain (catastrophically loud) volume, air itself loses the ability to faithfully transmit a sound.  If you drive your loudspeaker with too many Watts, the drive units will self destruct.  If you feed your amplifier’s inputs with too large of a signal, its outputs will clip.  If you try to record too loud of a signal onto an analog tape, the tape will distort.  And if you try to encode too large of a signal in a digital format … well, you can’t get there from here, and you just have to encode something else instead - typically digital hard clipping.

Therefore, whether in today’s digital age or in the analog age of yore, anybody who is tasked with capturing and recording an analog signal has to be concerned with level matching.  If you turn the signal level up too high, you will encounter one of the previously mentioned problems (hopefully one of the last two).  If you turn it down too low, the sound will eventually descend into the noise and be lost.  However, analog tape had a built-in antidote.  It turns out that if you overload an analog tape, the overload is managed ‘gracefully’, which means that you could record at a level higher than the linear maximum and it wouldn’t sound too bad.

In fact, not only does it not sound too bad, but if you play back the resultant recording over a low-fidelity system like a radio or a boom box, it can actually sound better than a recording that properly preserves the full dynamic range.  This is because the dynamic range within a high quality recording is greater than the ability of the low-fi system to reproduce it, and the result can be a sound that appears to be quiet and lifeless.  By allowing the analog tape to saturate, the dynamic range of the recorded signal is effectively reduced (or ‘compressed’), and better matched to that of a low-fi system.  In fact, in all but the very finest systems, a little bit of dynamic compression is found by most people to be slightly preferable to none at all.  Which is a problem for those of us fortunate enough to enjoy the finest systems, whose revealing nature tends to deliver the opposite result.

With analog tape, managing dynamic compression through tape saturation is a finely balanced skill.  It is not something that you can easily bend to your design.  It’s sometimes considered to be more of an art than a science.  On the other hand, in the digital domain, dynamic compression can be tailored umpteen different ways according to your whim, and you can dial in just the right amount if you believe your recording needs it.  Most digital dynamic compression algorithms are seriously simple, being nothing more than a non-linear transfer function based on Quadratic, Cubic, Sinusoidal, Exponential, Hyperbolic tangent, or Reciprocal functions (to name but a few).  Ideally, the transfer function would remain linear up to a point, above which the non-linearity would progressively kick in, and the better regarded algorithms (such as the Cubic) do behave like that.  But most serious listeners agree that digital dynamic compression never sounds as good as ‘natural’ dynamic compression from magnetic tape.  Maybe this is one of the reasons analog still has its strong adherents.

The thing about digital dynamic compression is that, once it kicks in, its effect on the sound is rather drastic.  Harmonic distortion components at levels as high as -20dB are common.  Moreover, the technique can create substantial harmonic distortion components above the Nyquist frequency, which get mirrored down into the audio band where they appear as inharmonic frequencies which are subjectively a lot more discomforting than harmonic frequencies.  It also creates huge intermodulation distortion artifacts, also highly undesirable.

There are papers out there which do a very thorough job of analyzing what various dynamic compression systems, both real and theoretical, could do if they were implemented, and the conclusions they come to are pretty consistent.  Digital dynamic compression fundamentally sucks, and there’s not much you can do about it.  But having said that, if you have some understanding of how compression works, are willing to limit the amount of applied compression judiciously, and have sufficient computing power available, you can bring to bear a whole grab-bag of tricks to try to minimize them.  Such techniques include side-chain processing (where several analyses of the signal happen in parallel as inputs to the core compression tool), look-ahead (analysis of the future input signal, obviously not for real-time applications), advanced filtering (seeks to reduce unwanted distortions by filtering them out), and active attack/release control (governs the extent to which the sudden onset of compression is audible).  Sophisticated pro-audio tools can bring all these techniques - and more - to the party.

Dynamic compression as a serious issue of sound quality came to a head (or descended to its depths, depending on your viewpoint) during the early 2000’s with the so-called “loudness wars”.  The music industry was coming to terms with the notion that a lot of popular music was being listened to in MP3 format on portable players of limited fidelity.  While with their left hands they were trying their best to prevent the proliferation of music in the MP3 format, with their right hands they were recognizing that if music was going to be listened to on portable systems with restricted dynamic range it might sound better if the recordings themselves had a similarly restricted dynamic range.  It is a well known psychoacoustic effect that, when comparing two similar recordings, people overwhelmingly tend to perceive the louder one to be better, and dynamic compression is a way to increase the perceived loudness of a recording.  The labels therefore started falling over themselves to release recordings with more and more “loudness”, or put another way, with more and more dynamic compression.

Take U2’s “How to Dismantle an Atomic Bomb”, released in 2004.  This album is a downright disgrace.  It sounds absolutely appalling.  I bought it when it came out and haven’t listened seriously to it since.  And if there is any doubt as to why that might be, just take a look at the attached screenshot image.  These are waveform envelopes obtained using Adobe Audition.  The top track is “Vertigo” from this album.  The bottom track is “With or Without You” from their 1988 release Joshua Tree.  Both are ripped from the standard commercial CD releases.  The difference is laughable.  You can clearly see how the one on the top has been driven deeply into dynamic compression.



To attempt to quantify this effect, the “Loudness War” website endorses a free tool called the Tischmeyer Technology (TT) Loudness Meter.  This measures Vertigo as DR5 which it classifies as “Bad” (DR0 - DR7), and With or Without You as DR12 which is in the “Transition” range (DR8 - DR13), but getting close to Good (which starts at DR14).  All else being equal, the higher the number the better the sound, but the numerical result is quite dependent on the program material.  Next time you play an album, see if it is listed on dr.loudness-war.info and check its rating.  If it isn’t listed, it is a simple job to download the free TT Loudness Meter tool, measure the album yourself, and upload the data.

And it isn’t just the music business that faces this issue.  Incredibly, I also encounter it in the ultra-low-fi world of the TV sound track.  Just when you thought plain old dynamic compression was bad enough, the more aggressive “loudness shaping” algorithms also heavily modulate the volume of the sound track, winding it up during “quiet” passages when there is no dialog, or even between breaths during the dialog itself.  This has the effect of raising the background noise to the same loudness level as the dialog itself - and you can plainly hear it winding up and down - making watching the TV show a most unpleasant experience.  For me, for example, it ruined the last season of “House”.  I can’t begin to imagine how bad a TV set would have to be for such measures to be remotely beneficial.

As a final observation, for the purists who like to work in DSD, there are a couple of important considerations to bear in mind.  The first is that, in native DSD mode, you simply cannot do any sort of signal processing whatsoever - not even something as trivial as volume control (fade-in/fade-out for example), let alone dynamic compression.  You have to convert to PCM to do that and then convert back to DSD, which most DSD purists find unacceptable.  The other interesting thing is in the Sigma-Delta Modulators which convert analog (or PCM digital) to DSD format, which warrants a discussion all of its own.

As you increase the signal level in these modulators the result is far from deterministic.  Overloading the modulator can make it go unstable in an unpredictable manner.  For that reason, the SACD standard requires the analog signal level encoded in DSD to be 6dB below the theoretical maximum that the format can support.  But interesting things happen if you over-drive the modulator.  Most contain special circuits or algorithms which detect the onset of instability and apply corrective measures.  This means that the modulators can normally accept inputs that exceed the supposed -6dB limit, with a penalty limited to a slight increase in distortion.  Keep pushing it further, though, and the modulator self-resets, resulting in an audible click.


In a sense, if you are a recording engineer, DSD is a bit like analog tape on steroids.  If your signal exceeds the -6dB limit then to a large degree you are going to be able to get away with it, unlike the situation with PCM digital, where the signal will either clip, or the dynamic compressor will to cut in.  With DSD you get the ‘graceful’ overload of analog tape, but without the associated dynamic compression.  The result is probably the best of all worlds.  Interestingly, with our DSD Master tool, it gives us an accurate view into whether or not the recording/mastering engineer has “pushed” the recording beyond the -6dB guideline, and you would be seriously surprised at the extent to which such behaviour appears to be the norm.

Friday, 19 June 2015

Dark Mode Icons

Is anybody out there experienced in designing menu bar icons for OS X Yosemite's 'Dark Mode'?  We are having a spot of trouble and need some sage advice.  E-mail me.

Monday, 15 June 2015

Yes, but is it Art?

Most aspects of modern life at the personal level tend to operate on the basis of meritocracy.  The better you are at something, the more likely you are to be recognized and rewarded for it.  Of course, there are exceptions and points of disagreement.  And the field in which these tend to be most strongly debated is the Fine Arts.

Throughout history, recognition in the fine arts has traditionally come by dint of serious talent.  Think of Michelangelo, Rembrandt, or Turner.  Sometimes the extent of that talent is not immediately recognized - the perfect example would be Van Gogh - but that generally represents a re-assessment of the nature of the talent rather than a debate as to whether it existed in the first place.  These days, however, it is all too common for there to be serious debate as to whether an artist actually has any talent whatsoever - the names of Jackson Pollock and (admittedly, a controversial inclusion here) Thierry Guetta may come to mind.

In many ways, Jackson Pollock is an easy target.  When he started out, his early work did undoubtedly exhibit elements of form and composition.  But as he matured, his methods increasingly shed anything that could be attributed to a considered application of technical skill.  He would throw liquid paint at the canvas in an uncontrolled manner, producing finished works that courted both controversy and adulation.  Check him out on YouTube.  A noted alcoholic, ‘Jack the Dripper’ died at 44, drunk at the wheel, in a single-car accident which also took the life of one of his passengers.

Pollock’s No 5, 1948 assumed the mantle of the world’s most expensive painting when it sold privately in 2006 for the incredible sum of $140,000,000.  The painting was originally bought from a gallery by a collector, but it was damaged during delivery to the purchaser’s home.  After much to-and-fro, Pollock agreed to ‘repair’ the damaged painting.  He did this by repainting the whole thing, reportedly saying of the customer “He’ll never know”.  The collector, it turned out, did know, but was satisfied nonetheless, even though the ‘repaired’ image was not even the same as the original (no record of which is believed to exist).  Check out images of No 5, 1948 on the Internet.  What do you think?  Worth $140 Million? - and if not, then how much?  Since Pollock is dead, I stand ready to step in.

I’ve always enjoyed photography, although I don’t have much talent for it.  Back in the day, before the Internet, I used to subscribe to a particular photography magazine.  One recurring theme was readers’ letters which expressed their lack of appreciation for some of the photographs that had appeared in the magazine.  The general gist of the complaints was always “That was not a particularly good photograph - I could have taken it myself”.  To which the reply repeatedly offered was something smug along the lines of “Well, why didn’t you then?”  To me, this seemed to be intentionally skirting what was an obvious issue.  What the reader was really trying to convey was “If I had taken that photograph I would not have considered it worthy of publication”.  To my mind that was a serious point that could - and should - have been productively explored.

These days there are many web sites which run a ‘photograph of the month’ competition, and the quality of the winning entries is always stunning.  I am sure that there would be unanimous agreement among viewers that those pictures were, if nothing else, at least worthy of submission to the competition.  Rare would be the viewer who would not have been pretty pleased with themselves had they been the one to take the picture.  Ordinary people are in general very appreciative of a good photograph.  They recognize the skill involved in taking the picture, which they appreciate may be beyond their own abilities.

Although both are images, there are some significant differences between a painting and a photograph.  Chief among them is that a photograph, inherent in its very nature, can be exactly replicated, whereas each painting is a unique entity.  Nonetheless a photograph makes for a useful analogy in making the point I want to make about art.  In the world of Fine Art these days, new art rarely makes a positive impression on the market in isolation.  The artist generally needs to ‘sell’ the piece by presenting some abstract rationale behind the existence of the work.  Imagine how that would work in photography.

Like this, I imagine:  I invent some elaborate strategy for taking pseudo-random photographs.  I then ‘carefully’ pick one and present it to a selected audience of influential photograph collectors.  I come up with some rationale for how the photograph is the outcome of a particular artistic endeavour and consequently encapsulates those artistic principles in this or that manner.  As a finishing touch, I have the word put out that I am a troubled and cantankerous soul, thin-skinned and hard to work with.  I don’t know about you, but that sets off my BS meter way into the red zone.  Yet this is how the Fine Art market has operated for most of the last century.

Although both art and music fall into the general sphere of ‘The Arts’, for the purpose of this discussion I shall choose to use the term ‘Art’ to describe only visual art - painting, sculpture, photography, etc, so that ‘Art’ and ‘Music’ are to have distinct and separate meanings.  I shall then break both spheres down into two separate categories, which I shall term ‘background’ and ‘foreground’.

Foreground Art and Music are those specific categories which are intended to be appreciated for their own intrinsic merit as standalone entities.  Foreground Art is something which transcends the mere decorative.  It is something we want exclusively for what it is, rather than for how well it blends in.  Foreground Art is something we make room for.  Background Art is something we only desire because it blends in so well.  Background Art is the background to our lives as we go about them … it is part of the decor.  Foreground Art is are the things for which we pause our lives to focus on our appreciation of them.  Foreground Art is the meat to background Art’s potatoes.  Likewise, foreground Music is the reason we have HiFi systems.  We set aside time to listen specifically to foreground Music, and when we do we usually immerse ourselves into it.  Me, I like to turn the lights off.

Background Music is a term which is already established in our lexicon.  It is the soundtrack to our lives.  It has to be harmless, comfortable, and above all not distracting … even while it can often be annoyingly loud.  It helps if its rhythms reflect and enhance the rhythms of whatever we are occupying ourselves with.  Background Music has become a necessary beat to the dance of life.  Modern movies, for example, often have a virtually continuous soundtrack.  By contrast, foreground Music is for when we want to stop whatever else we are doing and just listen.  Sometimes we may choose it specifically for the mood it embodies, but mostly we want to appreciate it entirely on its own merits.

In both Art and Music there are highbrow adherents who like to think of themselves as the arbiters and standard bearers of taste and formal appreciation.  Oftentimes they succeed in those endeavours, in that they are willing and able to devote significant proportions of their lives and personal resources to the task.  But it doesn’t mean that the rest of us - you and I - don’t have taste and a true sense of appreciation.  It just means that when the latest Jackson Pollocks of our age come up for sale we have neither the desire nor the wherewithal to be anywhere near the front of the line.  We are, to use a term Richard Nixon popularized for an entirely different meaning, the ‘Silent Majority’.

We - the Silent Majority - tend to be very clear that we expect both our Art and our Music to require a measure of clear and present skill in its conception, execution and delivery.  We’re even willing to take two out of three.  When we see a 30-foot square canvas painted entirely in red in an art gallery, and read that the gallery acquired it for a million dollars, we shout BS!!!  We no longer write to the magazine editors saying “I have taken better pictures than that and don’t think especially highly of them”, but we still think it.  Likewise, we expect our musicians to perform with an evident degree of serious skill and/or write moving, observant, and incisive songs.

Don’t get confused by the latest pop, rap, and lounge superstars.  That’s not foreground music.  That’s background music.  Way back when Milli Vanilli were being outed as being a fake pop band, whose music was actually recorded by session musicians (‘The Monkees’, anyone?), I don’t recall there being any real consumer outrage.  It was all the embarrassed industry insiders, whose carefully crafted aura of expertise and fine judgement was very publicly pricked, who made all the noise.  If it happened today, would anybody actually care whether or not Ariana Grande actually sings on her own songs?

Me?  I’m hoping to score tickets to see Rodrigo y Gabriela at the Montreal Jazz Festival.  Wish me luck.

Thursday, 11 June 2015

Happiness Is A Warm Bun

At least it can be, when that bun is freshly baked and straight from the oven.  And, happily for me, my wife bakes a pretty mean bun!  But can you quantify just how good that bun is?  And does it necessarily follow that a random sampling of people will agree on what makes a good bun, or that a particular warm bun will make all of those people equally happy?  Finally, when a person says the warm bun makes him happy, do we actually have anything beyond his word for it?  Is it possible to quantify exactly how happy he is?  And would The Beatles agree?

Such are the problems with the subjective/objective debate.  Some things which seem blindingly obvious at the macro level, are a lot harder to pin down at the micro level.  If everybody is agreed that cinnamon in the bun makes for a good bun, does a pinch more or less cinnamon make for a slightly better or poorer bun?  If you are as serious about warm buns as some of us are about high-end audio, questions such as these can lead you down a rabbit hole.

Ultimately, my stereo system makes me happy.  If I could improve it, that could probably make me happier.  But if I sold my car, my house, and my wife to raise the money to buy the MBL ├╝ber-system of my dreams, the net outcome would most assuredly not be an overall increase in my happiness, so there is always a balance to be found.

On the design and manufacturing side, though, there are many objective tools that can be brought to bear which, if carefully selected and implemented, can be shown to correlate well with a wholly subjective assessment of the outcome.  But this gives rise to an often acrimonious debate which afflicts our industry - or at least its community of users and commentators - what happens when the objective assessment is at odds with the subjective assessment.  Because the fact is that, deep down at the micro level, this is uncomfortably often the case.

Here at BitPerfect, most of what we do is governed by a subjective assessment of our efforts.  Sure, some of the work we do requires intelligently-designed signal processing, and this work is solidly underpinned by both theory and measurement.  But for the most part we release the products we develop only when we think they sound right - when they make us happiest.

For the most part, BitPerfect itself does not rely on any signal processing.  We focus just on getting the original audio data from your Mac to your DAC as cleanly as possible.  The mere fact that software can have an audible impact on how that sounds without in any way altering the data presents us with an objective minefield.  The truth is we don’t have anything beyond a vague arm-waving rationale to explain how software can have that sort of impact, and we don’t have the measurement tools (or for that matter the technology, let alone the budget) at our disposal to give substance to it.  Indeed, the field of audio in general lacks tools that enable us to objectively quantify many of the subjective attributes we value, particularly when it comes to differentiating performance at the bleeding edge of the art.  My favourite example here is stereo imaging.  Some systems image incredibly well, others less so.  I am not aware of any parameter that directly measures this attribute, although there are many parameters that we know can correlate well with it.  Holographic Imaging is a 100% subjective quality.

Happiness is a similar thing.  There are many attributes that correlate very well with it, and we can measure or quantify most of them.  But none of them **are** happiness.  At the end of the day, only we as individuals can truly know whether we are happy or not.  Only we can know whether some external thing makes us happy or not.  But the Internet is home to some very special people.  They, evidently, know far better than I do what makes me happy, and therefore, by extension, presume to be the arbiters of whether I am in fact happy or not.  And they can do all this without ever having met me!  I’m sure you must have come across some of them.  What they know to an absolute certainty is that I cannot claim to be truly happy unless a double-blind test proves that to be the case.  Absent such proof, my protestations to the contrary are evidence of nothing but the ‘placebo effect’.  They could use a warm bun or two.  In its warm glow, they could perhaps devote their pent-up energies to devising a foolproof double-blind happiness test!

Monday, 25 May 2015

The 747 Bus

A lot is made of the differences between IIR and FIR filters for high-end audio applications.  FIR stands for Finite Impulse Response, and IIR stands for Infinite Impulse Response.  It is perhaps not surprising, therefore, that in discussing the characteristics of various filters, the the one thing you tend to read more often than any other is that IIR filters have this or that type of impulse response, whereas FIR filters have such and such an impulse response, as though the impulse response itself were the purpose of the filter.  Nothing could be further from the truth.

Although an impulse response has a waveform-like aspect, and is derived directly from the frequency and phase responses of the filter, there is an unfortunate affectation which is common in the world of high-end audio to characterize the audible characteristics of a filter in terms of the features of its impulse response.  It is a bit like saying, with the smug certainty of one stating the self-evidently obvious, that the number of a bus tells you where the bus is going.  Where I live, there is a bus route that goes to the airport, which has the (to me, at any rate) faintly amusing route number of ‘747’ (the famous Boeing 747 is the eponymous Jumbo Jet).  When I see the 747 bus I know it is going to the airport, but it would be wrong to deduce that the bus has an upper deck (which it doesn’t, as it happens), or that it holds more passengers than the average bus (which it also doesn’t).  Neither would it be wise to assume in other cities around the world, that the way to get to the airport is to take the 747 bus.

All digital filters, whether IIR or FIR, work by manipulating the available data.  Broadly speaking, the available data comprises two sets of numbers.  One set is the actual music data that gets fed into the filter.  The other is the set of numbers comprising the previous outputs of the filter.  In practical terms, the primary distinction between FIR and IIR filters is that FIR filters are confined to using only the actual music data as its inputs, whereas the IIR filter can use both.

The impulse response of an FIR filter is nothing more than a graphical representation of the so-called ‘taps’ of the filter.  Each ‘tap’ represents one of the actual input music data values, so the more ‘taps’ the filter has, the more of the input music data goes into the calculation of each output value.  The more complex the performance requirements of an FIR filter, the more ‘taps’ are needed to define it, and the more detail will be found in its impulse response.  With an IIR filter, however, its calculation uses previous output values as well as previous input values, and of course each previous output value will have been calculated from the input and output values before that.  As a result, if you were to go through all the math, you would find that an IIR filter can be re-written in a form that uses ALL of the actual music input values, and NONE of the previous output values to calculate each new output value.  For this reason, it is mathematically identical to an FIR filter with an infinite number of taps.  Hence the term “Infinite” Impulse Response.  But the IIR filter can achieve this with a surprisingly compact filter structure, one that uses relatively few of the previous input and output values, and obviously requires very many fewer calculations in order to arrive at the exact same result.

The biggest practical and worthwhile difference between FIR and IIR filters actually lies in the tools that are used to design them.  To learn more you need to read my previous posts on “Pole Dancing” where I discuss the basics of filter design.  This involves nothing more than placing a bunch of “Poles” and “Zeros” on a graph called a Z-space.  Once you have the poles and the zeros positioned, the digital filter itself is pretty much defined.  The problem is that the relationship between the performance of the filter and the location and number of poles and zeros is not a two-way street.  If I have my poles and zeros, I can easily calculate the frequency, phase, and impulse responses of my filter.  But if I start by defining the responses that I want from my filter, it is not possible to make the opposite calculation, and derive the requisite poles and zeros.  In other words, when it comes to digital filter design, the operative phrase is usually “you can’t get there from here”.

The way we get around this bottleneck is the same way we get around every other tricky mathematical problem.  We reduce the complexity of the problem by simplifying it, and only considering a strict subset of the total spectrum of possibilities.  I’m not going to get into the technical details, but the outcome of that approach is that we end up with certain design methods that can be used to design certain classes of filters.  In other words, design tool for FIR filters produce slightly different results than design tools for IIR filters.  The extent to which FIR and IIR filters differ in any audible sense is more down to whether design tools available in IIR or FIR space do a better job of realizing the desired output response.

Aside from the challenges faced in designing the filter, there are significant distinctions between IIR and FIR filters when it comes to actually implementing the filter.  In general IIR filters are very efficient.  Rarely do they have more than 10-20 ‘taps’ (a term not actually used when referring to IIR filters).  Therefore they tend to be inherently more efficient when run on computer architectures.  IIR filters can be very compact, and are usually designed to run in sequential blocks, with the output of one block forming the input to the next.  On the other hand, FIR filters lend themselves really well to being realized in massively parallel architectures.  For each FIR filter tap we need to multiply two numbers together, and when that’s done add all the answers together.  None of these multiplications rely on the outcomes of any of the other multiplications, so they can all be done in parallel.

In a computer you usually don’t have many parallel processing paths available, but an FPGA can be easily programmed to do hundreds of operations in parallel.  An FIR filter with hundreds upon hundreds of taps can therefore be realized very efficiently indeed in an FPGA.  Additionally, FIR filters are stable when implemented using Integer arithmetic, something that will rapidly trip up an IIR filter.  The ability to use integer arithmetic is something else that can be quite significant in an FPGA and less so in a computer.  ‘Efficiency’ is critically important in the majority of audio applications which have to operate in ‘real time’, and will fail if they run more slowly than the music which they are processing!

For all of those reasons, and also because the design tools available for IIR filters are generally a better match to the performance characteristics we are usually looking for, here at BitPerfect our preference is inevitably for an IIR filter.  And the filter we choose is one whose performance characteristics are the closest to what we need, regardless of what the impulse response might look like.  The filter we will offer is always the one which sounds best to us, rather than the one whose impulse response meets some perceived aesthetic ideal.