In this article we will see how to remove vocals from a track. “Why?” you might ask. “There are already some apps that will do it for you. There are even some free apps that will do the job. Hell, there are even simple effects in apps that can get it done.”
OK, you wanted to know, so here are the answers: First of all, my boss asked me to do so. To be totally honest, he didn’t even ask: He just announced it. And if a man with scar tissue right below his left eye, sitting in a revolving easy chair next to the fireplace, and petting a huge, evil–looking, Maine Coon cat says you do something, you do it. He knows where I live.
There are more practical reasons of course: First, it’s fun and easy. Then, it might come handy. I am not talking about karaoke nights here, but how about vocal practice. And, third and most importantly, tasks like this, help someone learn lots of stuff about sound, mixing and mastering.
If you are interested even minimally in sound editing, music production and sound engineering, then during this tutorial you will learn some useful things that will help you in your future mixes, even if you’ll never use the method itself.
How to remove vocals from a track
First of all, this tutorial examines the -most common- cases, where the vocals are panned in the middle of a track, and the rest of the instruments maintain a stereo image, more or less ample. This wasn’t always the case: In the past, most recording were in mono, meaning that, when wearing headphones, everything is heard in the middle.
Somewhere in the mid-1960’s, technology had progressed enough as to offer commercial stereo recordings. These first years, roughly from ’64 to ’69, producers would experiment, using the now largely abandoned extreme, or radical panning. Take The Doors recordings, for example, which are a very known example of this: If you listen to say, Light My Fire, you will notice that half of the instruments come from one speaker, and the other half from the other speaker. If you disconnect one of the speakers, you lose half the song. If you play it in a car, the driver perceives a totally different mix compared to the passenger on the other seat. These early stereo mixes would require a different approach for removing the vocals – and, depending on the mix, success is not always guaranteed.
In the 70’s, logic prevailed and producers settled for the less panned mix that we still use today, which would take note of the importance that listeners give to the various instruments, and pan them accordingly: The very basic, indispensable stuff, namely vocals, bass and drums, would be panned in the middle, while the secondary elements, such as guitars, keyboards, strings, and b vocals, would be panned left or right.
This is exactly the reason why we are able to remove vocals from a track:
Practically, we take advantage of the fact that the vocals are present in both left and right channels. By inverting one of the channels, we change the polarity of it: The positive samples become negative and vice versa. In practice, this way we subtract the elements that are present in both channels, because in one channel they are positive and on the other negative, so they eliminate themselves. Meaning, we get rid of the vocals.
There is a rather important “detail” here though: As said above, vocals are not the only thing panned in the middle and present in both channels, but bass and drums are there as well.
So, by using the method above, we also get rid of the bass and the drums. So we end up, not with a track missing the vocals, but with a total disaster.
Here is where we have to be a bit creative: Usually, the average voice of a singer, doesn’t go much below 180-200hZ – unless we are talking about Johnny Cash, but who on earth would ever want to remove vocals from a Johnny Cash track?
The area below 200hZ, is luckily the area where most of the bass and drums belong too.
Another thing, is that the upper limit of the human voice, rarely goes above 10.000hZ, calculating also the sibilance (the “sss” sound). And, in this area, a whole deal of music is to be found on the center: Usually hi hats and similar percussive devices – unless we are talking about a lavishly produced track, where someone put the effort into panning them left or right.
So, in order to save our reputation as engineers, we should think beforehand and make a copy of the center channel frequencies below 200hZ and above 9.00hZ, as to mix them back to the track when we ar