8-bit sound at the Oscars
There've been widespread complaints about the sound at the Oscars on Sunday, as demonstrated here by a clip that happens to also be snarking over whether J-Lo was popping out of her dress or not. The noise has been variously described as 'an echo', 'like Autotune', and 'someone's playing Atari over the entire damn show'.
What you are hearing there is a delayed, frequency-restricted echo of what the presenter is saying, and it comes from someone failing remarkably hard at working the soundboard.
I didn't watch the telecast, but I'm seeing people say that there was more than one microphone on the stage, and I know for a fact that they would have had a bunch out to catch applause from the audience. When you have more than one source feeding audio into an amplification system, and those sources are different distances away from the origin of whatever you're trying to catch, the sound hits them at slightly different times. Sound travels at about 345m/s in dry room-temperature air, and a delay of as little as 16ms is enough for the human ear to perceive an 'echo' effect. If you've ever gotten the echo on a VoIP or speakerphone conference call, you know how discombobulating it can be -- and in fact, telephony was one of the earliest applications of echo suppression, back when it was analog-only, and is still where it's most widely implemented.
What happened at the Oscars was that someone attempted to put an echo suppression filter on the output, and did it badly. The idea was to take the primary sound source (i.e., the one closest to the mic, which came in first), delay it until it came in through the second source, and then subtract it from the combined signal to remove the annoying echo. The problem they had was that somewhere in the audio chain, either the frequency of the echo was shifted somehow, or there was some post-processing done to one of the signals, such that the delayed signal that the echo suppression filter was using no longer exactly matched the frequencies of the actual echo they were trying to filter out.
What remains on the broadcast audio track sounds like just the high frequencies of speech or the white-ish noise of applause. It's not immediately recognizable as speech if you aren't already aware that that's probably what it is, but it does match up pretty well with what was produced by the sound chips on early video game consoles, hence the "Atari" effect. When just a few frequencies remain, the sound also lacks the natural overtones of a human voice and takes on the tinny "autotune" sound, which is what happens when you "pixelize" sound, more or less. (Autotune is meant to shift a vocal input slightly up or down, so you hear a true note rather than something slightly off. The electro-robot sound comes from turning it way the hell up and then telling the singer to wobble up and down, either in a glissando or something randomly melismatic. Anything with a smooth transition up or down will do. The autotune then aggressively rounds up or down to the nearest note until the output looks like a staircase instead of a smooth curve.) The result is an annoying insectoid warble after anyone says anything on screen.
Similar effects are sometimes employed deliberately. A ring modulation unit, which multiplies a signal by a sine wave of fixed and regular frequency rather than subtracting out a time-delayed input, is used for the characteristically chilling Dalek voices on Doctor Who. In general, anything that selectively modifies only part of the frequency spectrum we associate with the human voices will make the resulting output sound very artificial, distractingly so if we're expecting to hear normal speech.
What you are hearing there is a delayed, frequency-restricted echo of what the presenter is saying, and it comes from someone failing remarkably hard at working the soundboard.
I didn't watch the telecast, but I'm seeing people say that there was more than one microphone on the stage, and I know for a fact that they would have had a bunch out to catch applause from the audience. When you have more than one source feeding audio into an amplification system, and those sources are different distances away from the origin of whatever you're trying to catch, the sound hits them at slightly different times. Sound travels at about 345m/s in dry room-temperature air, and a delay of as little as 16ms is enough for the human ear to perceive an 'echo' effect. If you've ever gotten the echo on a VoIP or speakerphone conference call, you know how discombobulating it can be -- and in fact, telephony was one of the earliest applications of echo suppression, back when it was analog-only, and is still where it's most widely implemented.
What happened at the Oscars was that someone attempted to put an echo suppression filter on the output, and did it badly. The idea was to take the primary sound source (i.e., the one closest to the mic, which came in first), delay it until it came in through the second source, and then subtract it from the combined signal to remove the annoying echo. The problem they had was that somewhere in the audio chain, either the frequency of the echo was shifted somehow, or there was some post-processing done to one of the signals, such that the delayed signal that the echo suppression filter was using no longer exactly matched the frequencies of the actual echo they were trying to filter out.
What remains on the broadcast audio track sounds like just the high frequencies of speech or the white-ish noise of applause. It's not immediately recognizable as speech if you aren't already aware that that's probably what it is, but it does match up pretty well with what was produced by the sound chips on early video game consoles, hence the "Atari" effect. When just a few frequencies remain, the sound also lacks the natural overtones of a human voice and takes on the tinny "autotune" sound, which is what happens when you "pixelize" sound, more or less. (Autotune is meant to shift a vocal input slightly up or down, so you hear a true note rather than something slightly off. The electro-robot sound comes from turning it way the hell up and then telling the singer to wobble up and down, either in a glissando or something randomly melismatic. Anything with a smooth transition up or down will do. The autotune then aggressively rounds up or down to the nearest note until the output looks like a staircase instead of a smooth curve.) The result is an annoying insectoid warble after anyone says anything on screen.
Similar effects are sometimes employed deliberately. A ring modulation unit, which multiplies a signal by a sine wave of fixed and regular frequency rather than subtracting out a time-delayed input, is used for the characteristically chilling Dalek voices on Doctor Who. In general, anything that selectively modifies only part of the frequency spectrum we associate with the human voices will make the resulting output sound very artificial, distractingly so if we're expecting to hear normal speech.
Comments
Post a Comment