BitPerfect: January 2014

Monday 27 January 2014

… Who is The Fairest of Them All?

I left you with the conundrum of why a reflection in the mirror is reflected horizontally and not vertically. This is particularly puzzling when you consider that if you lie down in front of the mirror, the direction which was left-right when standing up becomes up-down when lying down, and for some bizarre reason is not inverted when lying down. And the up-down, which is not inverted when standing up, becomes the left-right when lying down and is inverted again! What the..?….

Suppose you and your identical twin stand face to face. If I ask both of you to point to your left, each will point to a different side. If I asked you both to point forwards, you would point at each other, again in opposite directions. On the other hand, if I had asked both of you to point West, or North, you would both point in the same direction. By stipulating Left/Right and Forward/Backward, I am stipulating a frame of reference which is different for both you and your twin. Does this help? No, not really. But I will come back to it.

Time to consider the optics of the problem. What we see in the mirror is what we call a “Virtual Image”. This means that the light rays that reach our eyes appear to have come from a particular place - behind the mirror - whereas in fact they did not. They reflected off the mirror. Our eyes (in fact anything in the world that detects light) can only detect the fact that light has impinged upon them. They cannot tell the direction from which the light came, and they certainly cannot discern the path it took along the way, like we can with a tennis ball for example. We can only infer these things. So when we look into a mirror, we see not light rays bouncing off its surface, but an entirely false “Virtual” image of ourselves standing behind it. This is very convenient when it comes to combing our hair.

In fact what you see in the mirror is an entire “Virtual World”. A reflection of You and everything else around you. It is this Virtual World which has the apparent property of being inverted horizontally and not vertically. In this case, to understand the image of the Virtual World, we need to start with a good old-fashioned photographic slide, the kind you can put in a projector, or hold up to the light and squint at. Those of you old enough to have loaded a stack of slides into a projector will know that (apart from getting them the right way up) there is a right and a wrong orientation. If you get it wrong, the slide will come out as a mirror image. The “Virtual World” in the mirror is just like that old picture slide. The real image is oriented a certain way, but the virtual image is oriented differently. In order to see the virtual image, we simply flip the picture slide over and look at it from the other side.

If we flip the picture slide vertically, we can see that what we have is a top-to-bottom mirror-image of the original. Left is on the left, and right is on the right. Now, by the simple expedient of rotating the image 180 degrees clockwise to bring the bottom of the image to the top, we find that the resultant image is now a left-to-right mirror-image of the original. You really need to try this for yourself sometime. And that is the hidden truth in all this. We have drawn back the curtain and revealed Professor Oz in all his glory. The same image is simultaneously a top-to-bottom OR a left-to-right mirror image of the original, depending only on how you look at it. We call this an image with "inverted parity".

Looking at a Virtual World in the mirror is the exact same thing. The Virtual You in the mirror lives in a Virtual World, but, like the picture slide, we are in effect seeing it from “behind”, or in "inverted parity", and therefore flipped side-to-side. Or top-to bottom, if we just lie down.

Apart from being able to comb your hair, this has consequences that you may not have thought of. How many of you know someone - usually a wife - who complains about how she never looks good in pictures? I know mine does. As a result, their appearances in the family album are inevitably few and far between. Most people have an asymmetric face. The left half differs from the right half. Occasionally the differences are subtle, but usually they are quite marked. As an individual, for the most part, the image you recognize as being that of your face, is the one you see looking back at you from the mirror. It is, of course, a “mirror image” of your real face. Since your facial features are asymmetric, it is a different image from how everyone else in the world sees your face. Everyone else only sees your face as it really is. You only see the mirror image. So when you see a photograph of yourself, suddenly your see it as everyone else sees it, but this is not how you are used to seeing yourself. It is a mirror-image of what you have become accustomed to believing you look like. And, quite often, it looks plain wrong. As I write this, I suddenly recall a vivid memory of being a small child, and wondering why my mother always pulled an odd face whenever she looked in the mirror!

Here is a trick to try on the “non-photogenic” wife. Print out a picture of her which has been flipped horizontally, but don’t tell what you’re doing, and see what she thinks!

I need to put this post to bed with a further observation. The “mirror-image” nature of the image in the mirror is the consequence of light being reflected off the surface of the mirror. Mathematically, the reflection off the mirror’s surface flips the “parity” of the virtual image. This parity has two states, “normal” and “inverted”. By reflecting off the mirror’s surface, the parity of the virtual image flips to the “inverted” state. Suppose we then reflect the light off a second surface. This should flip the parity of the image back to “normal” again. How might that work? Here’s how we can test this. We take two mirrors and join them together at exactly 90 degrees. Hopefully, where the two mirrors meet will be as smooth a join as possible. We now stand in front of the pair of mirrors, and gaze directly into the “Vee” of the join. Light reflecting off the mirror will actually have two reflections, once off each surface of the compound mirror. What will we see? Guesses, anyone?…

Once again, we see ourselves in the mirror. Except this time the image is not a “mirror-image”. If I stick my arm out and point left, the fellow in the mirror sticks his opposite arm out and points right. Indeed, he points to HIS left. If I hold up a newspaper, the fellow in the mirror also holds up a newspaper and all the newsprint on it is normally aligned and perfectly readable. This is just like where we started off, with me and my identical twin standing face-to-face. The only annoying thing is that there is a line that insists on interrupting the image and it insists on going right between the eyes of the fellow in the mirror. It is, of course, the artifact of the less-than-perfect joint between the two mirrors. What you see in this mirror is your face exactly as it looks to other people (OK, with an annoying line down the middle). Or how it looks in photographs. This is an example of a non-inverting mirror. You can often get this effect in the elevators of high-end olde-worlde European hotels.

So why are non-inverting mirrors not at all popular? Well, you might want to try combing your hair in one….

Saturday 25 January 2014

Mirror, mirror, on the wall...

Here's a quick challenge for you. When you look at something in a mirror, why is the image always reversed left-to-right, and never top-to-bottom? Answer on Monday...

Friday 24 January 2014

iTunes 11.1.4

We have been using the latest update to iTunes (11.1.4) and there don't appear to be any obvious compatibility issues.

Tuesday 21 January 2014

DSD is coming … yes it is!

We have been talking for a long time about bringing DSD playback capability to BitPerfect, and yet nothing has been forthcoming. What’s going on here? How hard can it be? Time for a little bit of an update.

As long ago as January 2013 at CES in Las Vegas, a special version of BitPerfect was playing DSD in one of the high-end rooms in the Venetian, all controlled through a nifty little iPad App. In fact, Tim spent most of the show sitting in the shadows of the demo room, laptop on lap, coding refinements in close to real time. Anybody who saw him probably assumed he was being consumed by his twitter feed.

Playing DSD is really not at all hard. Once you have the file specification it is just a matter of reading the file, formatting the data the correct way, and sending it off to the DAC. Job done. So how come a year's gone by, and still no DSD support in the product?

One reason lies in the fact that iTunes itself does not support DSD, and so there is no way to import DSD files into iTunes. Below, I will discuss what we have been doing on that front.

Another reason relates to the single biggest drawback of DSD as we have it today. Within the Mac ecosystem, the only way to play DSD is using a format called DoP, which stands for DSD over PCM. I really wish we had been involved in the genesis of this “Standard” because it is fundamentally flawed. The way DoP works is it takes a DSD data stream and dresses it up to look like a 24-bit 176.4kHz PCM data stream, fooling the DAC in the process. Any DAC/Computer interface that supports 24/176.4 PCM will be able to accept this data stream. But although DoP looks like 24/176.4 PCM, if you tried to play it it would sound like white noise.

DoP works by packing the DSD data, 16 bits at a time, into the 16 least significant bits of a 24-bit PCM data field. The remaining 8 bits comprise a special bit pattern. When a DoP-compliant DAC receives 24/176.4 data and sees that each data field has the same special bit pattern in its most significant 8 bits, it knows it is receiving DSD data. It is then the DAC’s job to route that data through its DSD processor, and not its PCM processor. If the data should inadvertently end up in its PCM processor, then the result is white noise at a level of approximately -30dB.

Why, then, is this flawed? Because ultimately it is going to confuse the bejeezus out of everyday consumers. It will confuse them because if they want to use DSD they will have to familiarize themselves with at least the basics of what DoP is about. Consumers are able to deal with the existing bewildering array of PCM formats because they don’t have to understand anything about it if they don’t want to. If you connect your DAC to your computer using USB, the DAC and computer talk to each other and establish the various formats that the other supports. That way, if you try to play a 24/192 track, and the DAC turns out not to support it, the computer knows to convert it to a format the DAC can play, and, generally speaking, does so automatically and without further ado. If the consumer does not know what he is doing, then, in a worst case scenario he is still listening to music, just not necessarily in the format he maybe thinks he is. If he didn't know the difference in the first place, that is unlikely to be all that big of a problem to him. At BitPerfect, we term this fail-safe. If the consumer uses BitPerfect and asks to play a file in a format the DAC does not support, we instead deliver the desired sound in a different format which the DAC does support, and we do that without any further intervention from the customer. And if the customer cares to know, we make sure we tell him what it is we’re doing.

BitPerfect can deliver fail-safe PCM performance because it is able to make a list of every last possible format which the output device can support. If you look into BitPerfect’s “Device Info Report” you will see all those formats listed, albeit somewhat cryptically. It can do this because the Mac can communicate with the DAC and receive all that information from it. But with DSD, if the DAC happens to support DoP, then it actually has no way to report that particular capability to the Mac! Therefore BitPerfect, in turn, has no way of knowing whether the DAC it is playing through can play DSD files! We have therefore designed the forthcoming version of BitPerfect to have a “Master DSD Enable” checkbox. Only if you check that box will you have access to the DSD-playback capabilities of BitPerfect. If you don’t have a DSD-compatible DAC there is absolutely no reason whatsoever to check the check box.

This is why we think DoP is flawed. It is fundamental problem, and not just a BitPerfect problem. When any Music Player sends a DoP data stream to a DAC, it has no way of knowing whether the result will be music or white noise. The Music Player relies entirely on the User taking care to tell it whether or not each connected audio output device supports DSD. It also relies on the User knowing what he is doing when he makes these settings in the Music Player. Computer users are generally predisposed to want to toggle any settings they may come across, without the first clue what any of them mean, and then having the temerity to blame someone else because it didn’t do what they wanted it to do. Reading the manual is rarely accepted as a valid pre-requisite. DoP is only a great idea if you assume the user has read - and understood - the manual.

There are other inconveniences involved in adding DSD to your music collection. Typically, a User who has a DSD-compatible DAC only has one such item. They tend to be quite expensive, after all. But most computer-based audio setups tend to have an element of whole-house networking associated with them, and they can play music through different DACs in different rooms, sometimes with multiple computers sharing a single database. If you buy an album in DSD format, you can then find yourself in the situation where it will only be playable through one DAC in the whole house. The solution is to make PCM copies of your DSD albums. There are a couple of Apps I know of that will do this. Korg Audiogate is free, but forces you to tweet everything you convert(!!!), and its PCM conversions are of less than stellar quality. Weiss Saracon makes better quality conversions, but is priced to make you whistle hard enough to blow the enamel off your teeth. Fear not, though, BitPerfect has a solution in the works.

Even with high quality PCM conversions, you will still have an awkward situation. Your music library will have both DSD and PCM copies of your music. Everyone in the house therefore will need to know (i) which DACs in the house play DSD, and (ii) which tracks in the library are DSD and which are PCM. And it won’t help all that much to simply identify your albums as “Toad The Wet Sprocket - DSD” and “Toad The Wet Sprocket - PCM”. That voice will still come screaming down the stairs at you - “Dad!!... Which of these albums is it I can’t play in my bedroom?????”. Here at BitPerfect we think the most obvious solution is to have a single audio file that plays DSD through a DSD DAC, and PCM through a PCM DAC.

Last summer, we thought we had come up with such a file format. It was apparently very elegant, and seemed to work just fine. It stored the DSD pre-converted into DoP format and encoded it in two channels of a multi-channel file, both labelled as “unused”. The PCM version of the file was stored in channels 1 and 2 (normal usage for L and R). This format had to be abandoned, through, when we discovered that some multi-channel receivers would blithely ignore the “unused” flags for the two DoP channels and would proceed to try to play them anyway. This, and other limitations, eventually caused us to abandon the format.

So it was that in the late fall we came up with a better format that we term “Hybrid-DSD”. This format is an ordinary Apple Lossless file format, but the DSD data is hidden cleverly inside it, in a similar manner to the metadata. Being Apple Lossless, it can be imported into iTunes and will play just fine. Any software that supports the Apple Lossless format will be able to read it and play its PCM content, blissfully unaware of the existence of the DSD content. BitPerfect, on the other hand, will recognize the DSD content, and, if the DAC supports it, will play the DSD content, otherwise it will play the PCM content. It is the most user-friendly approach we can think of.

BitPerfect’s implementation of DSD is based on the use of Hybrid-DSD files, since no other DSD file format can be imported into iTunes. But, as discussed above, it is also based on the need for Users to exercise a degree of intelligence, not to mention diligence, in setting up BitPerfect so that only those DACs which are known to support DSD are designated as supporting it. Also, they need to understand such arcane details such as whether the DAC supports the different variants of DSD - DSD64 and DSD128. Thankfully, in the current climate, all those who have gone to the trouble and expense of buying a DSD-compatible DAC, and downloading DSD source material, should have little or no trouble with any of this.

The problem BitPerfect will face is with a certain group of Users - you know who you are, but you’re probably not even reading this - who have no clue what DSD is, but will set about enabling it anyway with the full expectation of suddenly being able to play DSD files. And no, these Users have most assuredly NOT read the user manual. These people are the bane of software developers everywhere.

So, right now, we are entering the final stretch of subjecting BitPerfect to every possible line of misuse that we can think of. While playing DSD, for example, BitPerfect absolutely cannot be allowed to perform volume control, sample rate conversion, mixing, dithering, or any other conceivable sort of signal processing. Anything of that nature will turn the DoP data stream from music to white noise. So, we must check on every possible change of state of BitPerfect, and ask whether or not the right checks and balances are in place to prevent something unacceptable from happening. This is more complicated by an order of magnitude than any other feature that we have ever added to BitPerfect. Already this has had significant ramifications. One aspect of our audio engine turned out to be all but incompatible with some of these possible scenarios. We have therefore been forced to accelerate the launch of our next-generation audio engine, which was originally scheduled to be released following the DSD-support release. DSD support will therefore now be launched in conjunction with the new audio engine. At the moment, it is actually the fine details of the new audio engine, rather than the DSD support, which is undergoing last minute debugging. We will be launching this as BitPerfect Version 2.0.

We thank you for your patience, and we hope you will find it all worthwhile when we finally get V2.0 into the App Store. When it does, I hope you will all take the opportunity to rush out to your local high-end audio store and audition some DSD-capable playback hardware. I am sure there will be very few of you who are not blown away by what DSD can do. These may still be the days of the early adopters, but it is good to at least get a feel for what you may be missing out on.

UPDATE: BitPerfect 2.0 and DSD Master have now been released. Read about it here.

Friday 17 January 2014

Memory Playback

One of the techniques BitPerfect uses to maximize playback sound quality is to play from RAM memory. Most music players will play directly from the music file. The advantage of that method is that you only read the music data from the file as you need it, and stream it to the audio output device. If you change your mind and decide to play something else, or move playback to a different part of the track, that is easy to accomplish - at least from a programming simplicity perspective - using that method.

BitPerfect instead takes advantage of the phenomenal processing power of a modern computer and pre-loads the music into a RAM buffer. When you want to play the music, you simply pass the “address” of the RAM buffer to the audio output device, and it automatically handles the rest. In order for this to be practical, you must be able to not only read the file, decode it, and load the music data into the buffer, but also perform any manipulation that might be necessary, such as sample rate conversion, all in double-quick time.

So how practical is this? Surprisingly so, it turns out. Here is a peg in the ground taken from the totally basic 2013 Mac Mini I use in our reference system. It has a 2.5GHz Core i5 processor, and 4GB of system RAM. I allocate 512MB of RAM for the audio buffers. For this test I have BitPerfect set to upsample from 44.1kHz to 176.4kHz and (for purposes of full disclosure) I am using the as-yet unreleased version 1.1 of BitPerfect which has a new and improved 64-bit audio engine. Using this system, a piece of music of 1 minute 35 seconds duration, is read from an Apple Lossless file, decoded, upsampled to 176.4kHz, and loaded into its RAM buffer in only 1.9 seconds.

There are many advantages to doing it this way, and one of those is the ability to minimize CPU time during playback. In the early days of BitPerfect we discovered that reducing the CPU time tended to have an improving effect on sound quality. More recently we have established that it is only certain kinds of CPU activity that have this effect, but nonetheless, minimizing “undesirable” CPU load is one BitPerfect’s key goals in improving playback quality.

BitPerfect is able to start playing the track as soon as you start loading it into the buffer, so playback commences pretty much instantly, and the loading of the track continues during the first 1.9 seconds of playback. Beyond that, BitPerfect requires CPU cycles only for the direct management of playback. In principle, then, there exists the possibility that the sound quality of playback is degraded slightly by the additional CPU load during those first 1.9 seconds, but frankly, we know of nobody who has been able to detect this audibly. Certainly, we can’t. However, for the remaining 1 minute 33 seconds, playback will be at BitPerfect’s highest caliber.

What happens if the track is too big to fit into the buffer? No problem. The RAM that you allocate in BitPerfect’s Preferences menu is actually equally divided into two buffers. If the whole track does not fit into the first buffer, the remainder is put into the second buffer. As soon as the contents of the first buffer finishes playing, playback is instantaneously switched to the second buffer. The switching occupies only a tiny fraction of the time interval between consecutive samples, so the audio output device does not even know it is happening. And if the track still does not fit into two buffers, then as soon as the first buffer finishes playing and playback switches to the second buffer, the remaining unplayed content is loaded into the first buffer. BitPerfect can do this ad infinitum, in effect maintaining the two buffers as a “wash and wear” pair, with one in use and the other containing whatever is up next.

This has important ramifications for those who like to claim that lossless music encoded in different formats, such as WAV, AIFF, FLAC, and Apple Lossless, all sound different. Of those three formats, the third and fourth are losslessly compressed. Like unzipping a ZIP file, they need to be decoded after opening before they can be played. But after decoding, their contents are absolutely identical (“bit perfect”, if you like) to the contents of the uncompressed WAV or AIFF file. I suppose there is room for playback of a losslessly compressed file to sound different to a WAV/AIFF file if you make the assumption that the additional processing involved in decoding the file results in some kind of audible degradation. But either way, if you are playing those files using BitPerfect, then once the 1.9 second buffer load is over, then there is nothing left that would be in any way different depending on whether the music data had originally been extracted from a WAV, AIFF, FLAC, or Apple Lossless file. Absolutely none whatsoever.

The other take-away from this is that it is not really necessary to allocate a particularly large amount of system RAM to the audio buffer. We used to think that by pre-loading the whole track into memory it would sound better than if it was loaded chunk by chunk into a pair of wash-and-wear buffers. The additional CPU cycles involved in loading and switching ought to be degrading the sound. Over time, though, it has not turned out that way. We find that setting a larger audio buffer size seems to imbue the system with no audible improvements that we can reliably observe. I have for some time now left the audio buffer size on my reference system set to either 256MB or 512MB, and it seems to sound just fine to me. Tim does the same - and he listens on Stax SR009s.

All this is good news, because OS/X Mavericks seems to appreciate having lots of system RAM to play with.

Thursday 16 January 2014

Building Teams

I spent the latter half of my career as an entrepreneur, building two venture capital backed technology corporations. These are proper, bricks-and-mortar, hardware-based companies, engaged in the development of real-world, cutting edge products, requiring R&D and significant up-front investment in materials, equipment and people. Not software-based businesses like BitPerfect. Both those companies are still operating today, which is not so bad, I guess.

Building companies such as these is largely about building teams. Sure, the technological smarts that underlie what you do are the fundamental elements, but the success of the enterprise rests firmly on the shoulders of the team you put in place. They are the ones who do all the real work. Building an effective team, with limited available time, and having that team execute a complex and challenging mission, can be a source of great personal pride, not only for those who build it, but for the team members who accomplish those goals.

One gentleman, an investor in, and director of, one of my companies, had a lot to say about building effective teams. He liked to address the employees and tell them how his greatest pleasure in life was working with people and building teams. He was a great motivational speaker in that sense. I and my co-founder arranged to spend some quality time with him so as to benefit from his knowledge and insights, so we had a nice, long dinner together one evening. Over dinner he expanded on his thoughts about effective teams and team building. In every team, he said, 10% of the team are over-achievers and another 10% are under-achievers. Team building, in his view, is about identifying and continuously replacing the under-achievers. Sounds sensible, in a Johnny Appleseed kind of way.

But what happens after you remove the under-achieving 10%, assuming that you have the wherewithal to be able to identify and attract suitable higher-achievers to replace them with? You will have, on balance, an over-achieving team, no? Not in his view. It turns out that you still have an under-achieving 10%. In his view, he defines the bottom 10% as inherently under-achievers, who need to be replaced. By continuously following this strategy, your team gets continuously better, no matter that the same performance which categorizes an employee as an over-achiever one year, may see her categorized as an under-achiever soon afterward, and shown the door. This, apparently, was what he enjoyed when it came to working with people and building teams. Not surprisingly, he was a big college football fan.

This “giving 150%” type-A personality is something you hear being touted a lot these days as the prototype and paragon of a successful business person and team leader. Particularly, if I might put it this way, in American corporations. It is a philosophy that allows arrogant, in-your-face, all-action, breakfast-meeting types to move up rapidly in an organization that doesn’t have an intelligent hand on the wheel. The core theme of this philosophy is that the problems of the team all boil down to the inadequacies of its weakest members, and that as team leader you will best do your job by continuously and ruthlessly rooting it out. It is a very useful philosophy, because nothing’s ever your fault. Let's schedule a breakfast meeting and we can discuss it further.

The fact of the matter is that sometimes people you hire don’t turn out to be who you thought they were. A good leader will seek to ferret those people out within their first six months, and will be justifiably ruthless in letting them go if they look like they are not going to work out. Just bear in mind that unless the person has lied on their resume, the hiring was your bad, not theirs. Hey, not all of us are perfect, and we are all bound to make some bad hires from time to time. Hire enough bad people, though, and you need to start looking at your own recruitment methods and skills. But once a new hire has passed his probationary period, your expectation will be for him to make the mandated contribution to your project.

At this point it is a mistake to imagine that your job then becomes one of making sure your team members continue to get their work done. That is the difference between a manager and a leader. A leader is not satisfied merely by achieving objectives A, B, and C, on time and on budget. A leader will at the same time seek to continuously develop his employees into better and more useful resources. Instead of continuously measuring up your staff for the purpose of weeding out the bottom 10%, find out what is going wrong and fix it. Teach them how to be top-10% contributors. If they grow their skills sufficiently you will have the pleasant task of promoting them, or, if there isn’t an opening, the satisfaction of seeing them advance their careers in an excellent position elsewhere. Don’t be afraid of losing employees that have outgrown your ability to satisfy ambitions that you have nurtured. Pay it forward.

A wise man once told me that the most valuable thing you can do as a manager is to groom a steady stream of subordinates fully capable of replacing you. Effective managers don’t grow on trees. Rather than making yourself vulnerable to being replaced by one of them, managers who are able to make managers are a particularly valuable commodity.

Wednesday 15 January 2014

“Bit Perfect”

Here at BitPerfect we unabashedly took our name from the term of art “bit perfect” which, in the early days of computer audio, was considered to be the most important attribute a computer-based audio system needed to have in order to sound good. Today, we realize that being “bit perfect”, whilst a laudable objective, is neither a requirement for great sound, nor a guarantee of achieving it. Time to talk about what “bit perfect” means.

Your computer-based music music collection comprises a bunch of music files. Each of these contains a digital representation of the music - basically a whole bunch of numbers. The job of the audio playback software is to read those numbers out of the file and send them to an audio output device whose function is to turn them into music. The theory of “bit perfect” playback is that if the numbers that reach the audio output device are the exact same numbers as those in the music file, then there is not much more that the computer and its software can do to improve upon the resultant sound quality.

Lets stick with that notion for a while.

Why did people have any concerns that the computer might do anything different in the first place? The answer is that no computer was ever designed to be first and foremost an audiophile-grade music source. Computers generate all sorts of sounds, including warning beeps and notification chimes, as well as proper audio content. These audio events may be generated in different audio formats (after all, why generate a “beep” in CD-quality 16/44.1). The role of the audio subsystem in a computer is to manage all of these audio sources in as seamless a manner as possible. Additionally, they often provide “added value” functionality, such as equalizers, stereo image enhancers, and loudness control. Audio signals, from whatever source, are fed into the audio subsystem and are routed through all of this signal processing functionality. Sometimes, in order to meet objectives deemed preferable by the system’s designers, there may be additional functionality such as sample rate conversion. The upshot of all this is that, in many cases, the bits that reach the audio output device are no longer the same as those that were in the original music file. The modifications made along the way as often as not degrade the sound. So a very good first step would be to establish a “bit perfect” playback chain as a baseline and move on from there.

But are all departures from “bit perfect”-ness destructive to sound quality? It turns out that no, they are not. Lets look at what happens when you manipulate a signal.

Digital volume control is an obvious example. In a previous post I set out some thoughts on the subject. Basically, every 6dB of attenuation results in the loss of one bit of audio resolution, so it would make sense that digital volume control results in a de facto loss of sound quality. But if volume control is performed in the analog domain, the analog signals encoded by the ‘lost’ bit (the Least Significant Bit, or LSB) are themselves attenuated, pushed further down into the amplifier’s background noise. If the sounds encoded by the LSB lie below the background noise level of the amplifier, they should not be audible. With 16-bit audio, the noise floor of the best amplifiers can lie below the SNR floor of the encoded signal, so it is arguable that the resolution loss introduced by digital volume control can be audible - certainly measurable - but with 24-bit audio the SNR of the encoded signal is always swamped by amplifier noise, and so should never be audible. However, this argument assumes that analog-domain volume control has zero audible impact, and most (but not all) audio designers accept that this is not the case.

Beyond bit reduction, digital volume control involves recalculating the signal level at every sample point. The answer spit out by the algorithm may not be an exact 16-bit (or 24-bit) value, and so a quantization step is inevitably introduced, and a further quantization error encoded into the audio data stream. As pointed out in another of my previous posts, quantization error can be rendered virtually inaudible - and certainly less objectionable - by the judicious use of dither. Most audio authorities agree that quantization noise and dither can be audible on 16-bit audio but that dither is way less objectionable than undithered quantization error. Both are generally held to be inaudible with 24-bit data. Therefore digital volume control with 16-bit data is normally best performed with dither.

So the volume control method with the least deleterious effect on sound will be the one that sounds best. Certainly, in every experiment performed by my own ears, digital volume control has proven sonically superior to analog with 24-bit source material. Clearly a digitally-attenuated audio stream is inherently not “bit perfect”, so here is one example where “bit perfect”-ness may not be an a priori requirement for optimal sound quality.

Musical data consists, typically, of a bunch of 16-bit (or 24-bit numbers). This means that they take on whole number values only, between zero and 65,535 (or 16,777,215). Signal processing - of any description - involves mathematical processing of those numbers. Lets consider something simple, such as adding two 16-bit numbers together. Suppose both of those 16-bit numbers are the maximum possible value of 65,535. The answer will be 131,070. But we cannot store that as a 16-bit integer whose maximum value is 65,535! This is a fundamental problem. The sum of two 16-bit numbers produces an answer which is a 17-bit number. In general, the sum of two N-bit numbers produces an answer which is a (N+1)-bit number. The situation is more alarming if we consider multiplication. The product of two 16-bit numbers is a 32-bit number - more generally the product of two N-bit numbers is a (2N)-bit number. So if you want to do arithmetic using integer data, you need to take special measures to account for these difficulties.

Generally, signal processing becomes easier if you transform your integers into floating point numbers. I apologize if the next bit gets too heavy, but I put it into a paragraph of its own that you can skip if you prefer.

Floating point numbers come in two parts, a magnitude and a scaling factor. In a 32-bit floating point number, the magnitude occupies 24 of the bits and the scaling factor 8 bits. The magnitude ranges between roughly -1 and +1, and the scaling factor is represented by an 8-bit number which ranges between -128 and +127 (the actual scaling factor is 2 raised to the power of the 8-bit number). 32-bit floating point numbers therefore have approximately 7 significant figures of precision, and can represent values as large as 10 raised to the power 37, and as small as 10 raised to the power -37. The value in using 32-bit floating point format is that whatever the value being represented, it is always represented with full 24-bit precision (equivalent to 7 significant figures in decimal notation) across nearly 70 orders of magnitude of dynamic range. The down-side is that if you instead devoted all 32-bits to an integer representation you would have the equivalent of 10 significant figures of precision, but with no dynamic range at all. By using 64-bit floating-point numbers the benefits get even greater - the precision is equivalent to 15 significant figures (48 bits), and the dynamic range is for all practical signal processing purposes unlimited.

Why is any of this important? Well, generally speaking, depending on what type of manipulations you want to do on the audio data, it might not be. Volume control, for example, can be accomplished just as effectively (if, admittedly, less conveniently) on integer data as on float data. With more advanced manipulations, however, such as Sample Rate Conversion, the benefits of floating point begin to emerge. One calculation that often arises in signal processing is to calculate the difference between two numbers, and multiply the result by a third number. Where this type of calculation trips up the unwary is when the first two numbers are nearly identical, and the third number is very large. This can create what we term Data Precision Errors. I will illustrate this problem using an example that employs three 16-bit numbers:

First of all, I will take two 16-bit numbers, 14,303 and 14,301, and take the difference between them. The answer is 2. I will then multiply that by a third 16-bit number, 7,000. The answer is 14,000. Seems straightforward, no? Well, the difference answer I got was 2, and this answer has a precision of just one significant figure. In other words, the answer could have been 1, 2, or 3, but it could never have been 1.09929. Consequently, when I multiplied my difference by 7,000 the result could have been 7,000, 14,000, or 21,000. It could never have been 7,695.03 for example. Now, if my starting numbers (14,303 and 14,301) were raw data, then there is no further argument. But suppose instead that those numbers were the result of a prior calculation whose outcomes were actually 14,302.58112 and 14,301.48183. What happened was that the five significant figures that should have been after the decimal point got lost because the 16-bit format could not represent them and the results were rounded up or down to the nearest 16-bit integer. The difference between the two, instead of being 2, should have been 1.09929 and the result, when multiplied by 7,000 should have been 7695.03 instead of 14,000. That error is actually very big indeed. That is the difference between using 16-bit integer format and 64-bit float format (the differences are admittedly exaggerated by my unfair comparison of 16-bit Int to 64-bit Float, but it serves to illustrate the point). In a complicated process like Sample Rate Conversion, these types of processes are performed millions of times per second on the raw audio data, and something as simple as the choice of numerical format can made a big difference in how the outcome is judged qualitatively.

The point of all that is that, depending on your DAC, you may get better audio performance by upsampling to a higher sample rate. Self-evidently, upsampled data is no longer “bit perfect”, and the minutiae of how the upsampling is done can and will impact the sound quality of the result.

It is no longer either sufficient, nor necessary, for a computer-based audio playback system to be “bit perfect”. Of course, if you configure it to be truly “bit perfect” then it is unlikely to sound good if it fails to deliver “bit perfect” performance. But as long as the playback software does its job competently, then it should not be a fundamental cause for concern if the result is not strictly “bit perfect”. All you need, really, is for it to be “BitPerfect” :)

Monday 13 January 2014

Lossy and Lossless Compression

There are two uses of the word Compression in the world of digital audio. The first is dynamic compression. This is where we want to increase the volume of a track, but in doing so we make the loudest bits so loud that their signal level is larger than the maximum value the format can encode. Here we would use “dynamic compression” to selectively reduce the gain on those loudest passages so that they fit inside the available headroom. This note is not about dynamic compression. Instead it is all about file compression.

File compression is a process that takes a computer file which takes up a certain number of Megabytes of storage space, and manipulates it so it takes up a lesser number of Megabytes. Ideally, but not necessarily always, this compression is lossless, by which we mean that identical raw data can be extracted from both the original file and the compressed file. There are two reasons for wanting to do this. To reduce the amount of storage space required to store the file, and to reduce the bandwidth required to transmit a file from one place to another within a constrained amount of time.

Most of the time, we find that everyday computer files can be readily compressed. Why is this? In the software world, the format of a file is typically chosen so as to allow the computer to write data to the file, and read data from the file, in an efficient manner. Scant regard is often paid to the resultant efficiency of data storage. An example might be a simple text file. A simple ASCII character set uses only 7 bits to encode it. However, computer files are typically written in chunks of 8-bits, called Bytes. So every time we want to write a character we use up 8 bits of storage when in practice we only needed 7 bits. A simple file compression technique can use this observation to recover the unused storage space and reduce the file size by one eighth. With more complex file structures, a general-purpose strategy is not so obvious. Native music file formats are similarly inefficient.

Anybody who has used a zipping program to make a ZIP file to transmit a file over the Internet will be familiar with lossless compression. A ZIP file is a general-purpose lossless file compression utility. Some files, for example Bitmap (BMP) image files will compress very nicely into much smaller ZIP files. On the other hand, files such a JPG images are very seldom reduced at all in file size by zipping. This is because the file format used for BMP files is particularly inefficient, whereas by contrast the file format for JPG files is highly efficient. In principle, any computer file can be reduced in size by a well-chosen lossless compression utility, unless the file format was specified to be efficiently compressed in the first place.

In general, the more we know about a file, and about the data that the file contains, the more freedom we can have in selecting an optimum strategy to compress it. With music files there are number of attributes that can be exploited to effect lossless compression. Here are two of the easier to describe attributes: (i) Because music files encode a waveform, and because the waveform is not totally random (in which case it would be noise, not music), we can use the waveform’s immediate past to predict what its immediate future might look like, and encode instead the differences between the predictions and the actual values. This is used very effectively in many well-known lossless encoders. (ii) Stereo music, content is dominated by centred images which contain identical information in the right and left channels. If instead of encoding L and R, we encode L+R and L-R we find we end up with waveforms that are more readily susceptible to other compression methodologies.

Despite the effectiveness of these methods, there are still realistic limits on how much a native music file can be compressed without losing data. For most music this averages out at around 50%. To reduce file sizes by more than that, it is necessary to adopt lossy compression features. Lossy is exactly what it says it is. In order to further reduce the file size, we take something that we think you probably can’t hear and we throw it away. Lossy compression makes great use of the findings of the field of psychoacoustics in order to help us decide what, exactly, you ‘probably’ can’t hear. Lossy compression technology is fabulously creative, extremely clever, and very interesting, but for all that it still makes your music sound worse.

MP3 is the granddaddy of lossy audio compression technologies. I do not propose to go into detail about how MP3 does its thing, but at its core it makes use of a key finding of psychoacoustics, that of ‘masking’. Masking states that certain sounds are more effectively masked by some sounds than by others. For example, a louder sound masks a quieter one (well, duh!). Also, a sound at one frequency effectively masks other sounds at adjacent frequencies. So if we we can identify and extract one element of a waveform, and determine that it is ‘masked’ by another one, then we could, for example, encode the ‘masked’ element using a much lower bit depth.

MP3 sets about breaking the music into as many as 572 frequency subbands, the contents of which are then scaled up or down according to the aforementioned psychoacoustic principles, and end up being encoded using a technique called “Huffman Coding”, by which the most commonly-occurring values are encoded using fewer bits than the less-common values (quite simple, yet really rather clever). Using this approach we can, in effect, controllably reduce the resolution of the encoded music, reducing it more for those elements in the music which are ‘masked’, and less for those doing the masking. The Huffman Codes are typically stored in one or more look-up tables, and by choosing an appropriate table we can end up with a larger or smaller effective bit rate.

In effect, lossy compression techniques employ much more in the way of signal processing than lossless compression in order to identify and extract which components can be effectively thrown away while minimizing (note, never eliminating) the audible deterioration in the perceived sound quality. For this reason, more recent encoders such as Apple’s AAC, which are more elaborate and require more processing power than MP3, tend to sound better at equivalent bit rates.

Saturday 11 January 2014

Provenance

Wherever anything is offered for sale or barter, you can be sure there will some lowlife somewhere ready to step in and offer you something that is not quite what it seems in the hope of making off with a quick buck. These actions can range anywhere from criminal intent, through genuine confusion or ignorance, to marketing hype. Oftentimes the buyer is as complicit as the seller - if you buy a $25 Rolex from a street vendor in NYC you would have to be a very special chump if you really think you are buying the genuine article.

Sometimes it is fundamentally difficult to determine whether something is what it purports to be, or is a fake. Other times there can be no doubt that an item is genuine, but its quality or condition remains uncertain - some fine wines can sell for tens of thousands of dollars a bottle, but only if there is a guarantee that it has been stored (“cellared”) under optimum conditions. In such circumstances buyers and sellers rely on “provenance” to guide them. The term Provenance, in its literal sense, is a documented chain of custody, but is more widely used to convey the sum total of factors which a person provides to assist in determining whether a certain thing is what he or she claims it to be. Provenance is not the proof itself, but merely an evidentiary basis purporting to provide reliable information. There is nothing in the term “provenance” which precludes the provenance itself from being fake.

Anything which is not what it is claimed to be is - broadly speaking - a fake. Of course, from a legal standpoint, if you choose your wording carefully, you can imply something without actually claiming it to be so. This is what we term “Sales Hype”. For example, the Danish brewing giant Carlsberg brews what it has for decades claimed is “Probably the Best Lager in the World”. Without the word “probably”, they would have found themselves in a heap of legal trouble in many parts of the world. Which is not to suggest that Carlsberg is anything other than a fine Lager. A particular bugbear of mine concerns shampoo. My local supermarket has a whole aisle full of shampoo. There must be literally dozens upon dozens of different shampoos offered for sale. Each one has its own particular set of claims - most of which must surely be all but meaningless. Go to the Pantene web site for an example. Their shampoo line comprises no less than 27 different shampoos - count 'em - each one with its own set of product claims. Are these all fakes? Well, no, not from a legal standpoint. But c’mon man, 27 different shampoos? Me, I just buy the cheapest bottle in the aisle and hope it doesn’t smell too objectionable.

Has this got anything to do with audio? I’m getting there.

Audiophiles have long pursued the goal of perfect audio quality. Back in the good old analog days the biggest challenge was in transferring the audio content as accurately as possible from the studio’s mixing desk to your Hi-Fi system, and the LP was the medium of choice. Analog meant that there was a lot of technology to go through before the sound of the master tape could make it to your living room. That technology generally wasn’t up to the task. Sure, there were still “best practices” you could pursue. Direct cut recordings were one option, although admittedly not without implementation drawbacks (Sheffield Lab were the best known exponents). There were favoured pressings to hunt down, even lists of preferred stamper numbers, and so on. All of us of a certain age have at one time or another gone in search of a particular German or Japanese pressing. But in general, what the recording engineer heard on his console, and what you heard on your Hi-Fi were poles apart. Today, digital audio has largely eliminated that as a technological hurdle. It don’t mean that the industry has stopped screwing it up, but for sure it DOES mean that there is no longer any real excuse for them to do so. When the recording engineer sits back and says “That’s a wrap!” the technology to transmit the source material he is listening to, absolutely unaltered, to your Hi-Fi system, is today quite trivial.

Today, digital audio has reached its first level of maturity. Of course, technology continues to advance, but a certain established standard has emerged and the world, conveniently, has more or less agreed upon it. That standard is 16-bit, 44.1kHz Linear PCM. The vast, vast majority of music played in the world today is encoded in 16-bit 44.1kHz LPCM format. There are those who insist that “properly dithered” 16-bit 44.1kHz Linear PCM is fully adequate to represent any audible sounds that humans care to listen to. But many audiophiles disagree, and continue to support higher resolution formats including Hi-Rez PCM and DSD. The jury is still out on which of Hi-Rez PCM or DSD sounds fundamentally better, and frankly it is unlikely a final verdict will be reached any time soon, but a significant body of opinion is coming out in favour of DSD. Certainly, the best sounds my own aging ears have ever heard were produced by DSD, and the margin was not insignificant.

Back in the analog days, the performing artist recorded a Master Tape. That master tape was then used to make a number of Production Copies, depending on how widely the recording was to be distributed. The Production Copies were used to cut Master Stampers. The Master Stampers were used to stamp Production Stampers, and, finally, the Production Stampers were used to stamp LPs that you purchased at the record store. The production stampers typically wore out, and had a useful life of about 1,000 copies after which it had to be thrown away and replaced. Of course, there was nothing to stop the cheapskate stamping plant manager from pushing his stamper life beyond 1,000 copies to save a couple of bucks, so that the poor old customer might end up with a very poor pressing. At each step of this chain there was a not inconsiderable loss of quality. It is a wonder that LPs were playable at all.

Today these steps have been largely eliminated, and in many ways we are all the better for it. But digital technology does provide for some pretty egregious - some might say cynical - manipulation undertaken for the express purpose of deceiving the customer. Fakery, in other words. Think for a moment of a digital photograph. It might be, say, a 12 megapixel image with dimensions of 3,000 x 4,000 pixels. With a colour depth of 24 bits (3 Bytes) per pixel, that image would occupy a 36MB file. That’s a big file, but we can use some clever mathematics to reduce the file size. The JPG file format is well known. Using the JPG format allows you to reduce the file size needed to store an image down to microscopic proportions. Although you can reduce the file size quite usefully without actually throwing away any image information, in order to seriously reduce the file size it is necessary to seriously reduce the quality of the image. JPG allows you to select a desired image quality (for example “low”, “medium” or “high”). If you choose to make a “low” quality JPG with a file size 1/100th the size of the original image file, you will be able to tell at a glance that the image has lost most of its quality. The reduction in quality is the result of throwing away massive quantities of image data, none of which can ever be recovered again. The resultant JPG image may be of extremely poor quality, but its format will still be that of a 12 megapixel image with dimensions of 3,000 x 4,000. This is the important lesson. The format in and of itself is no guarantee of the quality. In practice it merely puts an upper limit on the *potential* quality.

Audio formats are the same. If we take a high quality 16-bit 44.1kHz LPCM recording, and convert it to an MP3 file, this is entirely analogous to the JPG situation. By specifying the “quality” of the MP3 file (typically expressed in kbps) we can specify how much of the original music data is irretrievably thrown away in order to produce a smaller file size. A 32kbps MP3 file will sound absolutely dreadful, but is nevertheless a 16-bit 44.1kHz LPCM encoded file. A 128kbps file will sound better, likewise a 190kbps file, and a 320kbps file better still, but all of these these are still 16/44.1 files.

There is nothing to stop me from taking my hypothetical 32kbps file and converting it, for example, to an uncompressed native file format such as WAV or AIFF. This uncompressed file will be the exact same size as a WAV or AIFF of the original high quality 16-bit 44.1kHz LPCM recording. However, whereas the original recording will sound magnificent, the converted 32kbps MP3 will sound no different from the MP3 itself. It will sound just as dreadful. Worse than that, I can then take my 32kbps MP3 and convert it to a Hi-Rez 24/192 file or even a DSD file. It will now encode the same awful sounds faithfully in the chosen Hi-Rez format. There is nothing to stop anyone from doing that. And other than observing how absolutely awful it sounds, it can prove to be surprisingly difficult - even arguably impossible - to analyze the Hi-Rez file and determine unambiguously what its origins were. In other words, perhaps quite surprisingly, its Provenance does not yield readily to examination and analysis.

In the early days of SACD, some of the record labels thought they could get away with some (typically, you might say) cynical shenanigans. Instead of using the original master tapes to remaster an album in DSD, they simply took the original 16/44.1 CD and used that. They then sold it (at twice the price) in “Dual-Disc” format with the CD-derived SACD recording on one layer and the actual CD on the other. If they thought the audiophile community would not notice, boy were they mistaken! I’m not going to name names, but if you know where to look you can easily research this for yourselves.

You think this sort of sharp practice is a thing of the past? Think again. With the rise of DSD as a downloadable file format, the music industry’s Darth Vaders are already at work. If a cynical marketing type somewhere thinks he can get away with repricing his 16/44.1 catalog simply by converting it to DSD and selling it as a big-ticket item, there is really nothing to get in his way, beyond customers calling him out on it after the fact. DSD’s promoters are well aware of this, and are as worried about it as the rest of the audiophile community. There is a movement afoot, therefore, to push for the adoption of some sort of Audio Provenance standard, which will inform consumers which recordings are pure DSD and which are remodulated PCM.

Look for this movement to fall flat on its face for two reasons. First, at the end of the day, no two people will ever agree on a meaningful set of Provenance standards. Second of all, even if they could, it would not be enforceable, either practically or legally. The Provenance debate is a big distraction. Laudable, noble even, but a distraction nonetheless. Even as the capabilities of the formats available for delivering high quality audio content to consumers continue to improve in leaps and bounds, those formats are still going to deliver the usual mixture of audio content quality, ranging from the sublime to the ridiculous. Simply because they can.