Thursday 1 January 2015

A Convoluted Discussion

In mathematics, the word ‘convolution’ describes a very important class of manipulations.  If you want to know more about it, a pretty good treatment is shown on its Wikipedia page.  And even if you don’t, I am going to briefly summarize it here before going on to make my point :)

A convolution is an operation performed on two functions, or on two sets of data.  Typically (but not always) one is the actual data that we are trying to manipulate, and the other is a weighting function, or set of weights.  Convolution is massively important in the field of signal processing, and therefore is something that anybody who wants (or needs) to talk knowledgeably about digital audio needs to bone up on.  The most prominent convolution processes that you may have heard of are Fourier Transforms (which are used to extract from a waveform its audio spectrum) and digital filtering.  It is the latter of those that I want to focus on here.

In very simple terms, a filter (whether digital or analog) operates as a convolution between a waveform and an impulse response.  You will have heard of impulse responses, and indeed you may have read about them in some of my previous posts.  In digital audio, an impulse response is a graphical representation of the ‘weights’ or ‘coefficients’ which define a digital filter.  Complicated mathematical relationships describe the way in which the impulse response relates to the key characteristics of the filter, and I have covered those in my earlier posts on ‘Pole Dancing’.

Impulse responses are therefore very useful.  They are nice to look at, and easy to categorize and classify.  Unfortunately, it has become commonplace to project the aesthetic properties of the impulse response onto the sonic properties arising from the filter which uses it.  In simple language, we see a feature on the impulse response, and we imagine that such a feature is impressed onto the audio waveform itself after it comes out of the filter.  It is an easy mistake to make, since the convolution process itself is exactly that - a mathematical impression of the impulse response onto the audio waveform.  But the mathematical result of the convolution is really not as simple as that.

The one feature I see misrepresented most often is pre-ringing.  In digital audio, an impulse is just one peak occurring in a valley of flat zeros.  It is useful as a tool to characterize a filter because it contains components of every frequency that the music bit stream is capable of representing.  Therefore if the filter does anything at all, the impulse is going to be disturbed as a result of passing through it.  For example, if you read my posts on square waves, you will know that removing high frequency components from a square wave results in a waveform which is no longer square, and contains ripples.  Those ripples decay away from the leading edge of the square wave.  This is pleasing in a certain way, because the ripples appear to be caused by, and arise in response to, the abrupt leading edge of the square wave.  In our nice ordered world we like to see effect preceded by cause, and are disturbed by suggestions of the opposite.

And so it is that with impulse responses we tend to be more comfortable seeing ripples decaying away after the impulse, and less comfortable when they precede the impulse, gathering in strength as they approach it.  Our flawed interpretation is that the impulse is the cause and the ripples the effect, and if these don’t occur in the correct sequence then the result is bound to be unnatural.  It is therefore common practice to dismiss filters whose impulse response contains what is termed “pre-ringing” because the result of such filters is bound to be somewhat “unnatural”.  After all, in nature, effects don’t precede their cause, do they?

I would like you to take a short break, and head over to your kitchen sink for a moment.  Turn on the tap (or faucet, if you prefer) and set the water flow to a very gentle stream.  What we are looking for is a smooth flow with no turbulence at all.  We call this ‘laminar’ flow.  What usually happens, if the tap outlet is sufficiently far above the bottom of the sink is that the laminar flow is maintained for some distance and then breaks up into a turbulent flow.  The chances are good that you will see this happening, but it is no problem if you don’t - so long as you can find a setting that gives you a stable laminar stream.  Now, take your finger, and gently insert it into the water stream.  Look closely.  What you will see are ripples forming in the water stream **above** your finger.  If you don’t, gradually move your finger up towards the tap and they should appear (YTMV/YFMV).  What you will be looking at is an apparently perfect example of an effect (the ripples) occurring before, or upstream of, the cause (your finger).

What I have demonstrated here is not your comfortable world breaking down before your eyes.  What is instead breaking down is the comfort zone of an over-simplistic interpretation of what you saw.  Because the idea of the finger being the cause and the ripples being the effect is not an adequate description of what actually happened.

In the same way, the notion of pre-ringing in the impulse response of a filter resulting in sonic effects that precede their cause in the resultant audio waveform, is not an adequate description of what is happening.  However, the misconception gains credence for an important, if inconvenient reason, which is that filters which exhibit pronounced pre-ringing do in fact tend to sound less preferable than those which don’t.  These sort of things happen often in science - most notably in medical science - and when it does it opens a door to misinformation.  In this case, the potential for misinformation lies in the reason given for why one filter sounds better than another - that the one with pre-ringing in its impulse response results in sounds that precede the things which caused them.  By all means state your preference for filters with a certain type of impulse response, but please don’t justify your preference with flawed reasoning.  It is OK to admit that you are unclear as to why.

I want to finish this with an audio example to make my point.  The well-known Nyquist-Shannon theory states that a regularly sampled waveform can be perfectly recreated if (i) it is perfectly sampled; and (ii) it contains no frequency components at or above one half of the sample rate.  The theory doesn’t just set forth its premise, it provides a solid proof.  In essence, it does this by convolving the sampled waveform with a Sinc() function, in a process pretty much identical to the way a digital filter convolves the waveform with an Impulse Response.  Nyquist-Shannon proves that this convolution results in a mathematically perfect reconstruction of the original waveform if - and only if - the two stipulations I mentioned are strictly adhered to.  This is interesting in the context of this post because the Sinc() function which acts as the Impulse Response exhibits an infinitely long pre-ring at a significantly high amplitude.  Neither Nyqvist or Shannon, nor the entire industry which their theory spawned, harbour any concerns about causality in reconstructed waveforms!