In this article we will see how to remove vocals from a track. “Why?” you might ask. “There are already some apps that will do it for you. There are even some free apps that will do the job. Hell, there are even simple effects in apps that can get it done.”
OK, you wanted to know, so here are the answers: First of all, my boss asked me to do so. To be totally honest, he didn’t even ask: He just announced it. And if a man with scar tissue right below his left eye, sitting in a revolving easy chair next to the fireplace, and petting a huge, evil–looking, Maine Coon cat says you do something, you do it. He knows where I live.
There are more practical reasons of course: First, it’s fun and easy. Then, it might come handy. I am not talking about karaoke nights here, but how about vocal practice. And, third and most importantly, tasks like this, help someone learn lots of stuff about sound, mixing and mastering.
If you are interested even minimally in sound editing, music production and sound engineering, then during this tutorial you will learn some useful things that will help you in your future mixes, even if you’ll never use the method itself.
How to remove vocals from a track
First of all, this tutorial examines the -most common- cases, where the vocals are panned in the middle of a track, and the rest of the instruments maintain a stereo image, more or less ample. This wasn’t always the case: In the past, most recording were in mono, meaning that, when wearing headphones, everything is heard in the middle.
Somewhere in the mid-1960’s, technology had progressed enough as to offer commercial stereo recordings. These first years, roughly from ’64 to ’69, producers would experiment, using the now largely abandoned extreme, or radical panning. Take The Doors recordings, for example, which are a very known example of this: If you listen to say, Light My Fire, you will notice that half of the instruments come from one speaker, and the other half from the other speaker. If you disconnect one of the speakers, you lose half the song. If you play it in a car, the driver perceives a totally different mix compared to the passenger on the other seat. These early stereo mixes would require a different approach for removing the vocals – and, depending on the mix, success is not always guaranteed.
In the 70’s, logic prevailed and producers settled for the less panned mix that we still use today, which would take note of the importance that listeners give to the various instruments, and pan them accordingly: The very basic, indispensable stuff, namely vocals, bass and drums, would be panned in the middle, while the secondary elements, such as guitars, keyboards, strings, and b vocals, would be panned left or right.
This is exactly the reason why we are able to remove vocals from a track:
Practically, we take advantage of the fact that the vocals are present in both left and right channels. By inverting one of the channels, we change the polarity of it: The positive samples become negative and vice versa. In practice, this way we subtract the elements that are present in both channels, because in one channel they are positive and on the other negative, so they eliminate themselves. Meaning, we get rid of the vocals.
There is a rather important “detail” here though: As said above, vocals are not the only thing panned in the middle and present in both channels, but bass and drums are there as well.
So, by using the method above, we also get rid of the bass and the drums. So we end up, not with a track missing the vocals, but with a total disaster.
Here is where we have to be a bit creative: Usually, the average voice of a singer, doesn’t go much below 180-200hZ – unless we are talking about Johnny Cash, but who on earth would ever want to remove vocals from a Johnny Cash track?
The area below 200hZ, is luckily the area where most of the bass and drums belong too.
Another thing, is that the upper limit of the human voice, rarely goes above 10.000hZ, calculating also the sibilance (the “sss” sound). And, in this area, a whole deal of music is to be found on the center: Usually hi hats and similar percussive devices – unless we are talking about a lavishly produced track, where someone put the effort into panning them left or right.
So, in order to save our reputation as engineers, we should think beforehand and make a copy of the center channel frequencies below 200hZ and above 9.00hZ, as to mix them back to the track when we are done removing the vocals.
This way, the rhythmic spine of the song, bass and drum sounds are back in the mix, along with all the rest of the material that is on left and right: Guitar, keyboards and the rest.
Adding all this together, we practically end up with the whole track, minus the singer.
Let’s keep in mind that, any vocals that are placed left or right on the mix, will not get removed. Usually this kind of mix is used for most choruses of commercial tracks during the last decade, but who would want to deal with these tracks anyway?
So, let’s make this simple and clear and put it down in numbered steps:
1. Import the song we want in our favorite sound editor (I use Audacity: It rarely crashes, it’s free, and most importantly, when you install it you are not prompted to install a ton of malware as well. Reliable, free and honest = Good enough for me).
2. We make two more copies of the track (don’t ask why, we’ll see about this later), so we have a total of three copies of the same song. We mute the second and third copies so they don’t get in the way.
3. We select the first copy, and click on “Split Stereo track”.
This does what is says: It splits the combined track in two separate ones: One of the left channel and one of the right.
3. We select one of them – doesn’t matter which. We go to Effects > Invert.
4. We set both of these split tracks to Mono.
5. Theoretically we are done. If we only listen to these two tracks, the vocals should not be there (or we should hear a very faint ghost of the stereo reverb, if it was used in the original recording). We then select both of the tracks, and go to Track > Mix and Render, which combines the tracks into a new one.
Here is where most vocal removal tutorials end. But, there is more to be done, since the lack of bass and drums is annoyingly noticeable. So, we mute our voiceless (and bassless and drumless) track, and forget about it for a moment.
6. We select the second copy of our original stereo track, un-mute it, and go to Effects > Low Pass. This does what the name implies: It gives a pass to the low frequencies, eliminating the high ones.
As we said above, a voice rarely goes below 180-200hZ, but most bass and drums are there. So, we dial in the frequency of 180hZ and press enter.
Let’s try to listen to it: Are there any vocals audible, or the only thing we can hear are booming bass and kick drum sounds? If the vocals are still audible (the low-end mumbling sound), we Undo our action, and try again, by setting the Low Pass Filter in an even lower frequency, let’s say 150hZ. We experiment a bit, until we find the sweet spot.
Once we got this, we mute the second track and go to the next step.
7. We un-mute the third copy, and do the same procedure, but by using the High Pass Filter, which does the exact opposite: it only lets the high frequencies in the track and removes all lower ones (Effects > High Pass). We start by choosing a frequency around 10.000hZ. This should do it. If not, we go a bit higher, at 10.500-11.000hZ. We should only hear hi hats, cymbals, and if the recording was crappy to begin with, tape hiss or static noise.
8. When we are done with this too, we go to the final step, slowly preparing ourselves for our magic trick: We un-mute all 3 tracks, press play and, voilà! We got a track without any (or most) vocals, but with the bass and drums back in the mix.
Last step: If we are patient enough, we can then play a bit with the Equalizer (Effects > Equalizer), Limiter (Effects > Limiter) and Amplify (Effects > Amplify) and try to make the sound as pleasant as possible, or as close to the original mix as possible (these two things are not always the same). With a bit of fiddling, we can have a mix that is really close to the original – minus the vocals, of course.
When we are satisfied with the overall sound, we just go to File > Export, and export the track in our favorite format.
We have one voiceless track in our music library and on the way, we have obtained a fair amount of insight about the upper and lower limits of vocal, bass and drums frequencies, about the frequencies that make a song sound beefy, full, empty or lousy and about the very nature of sound itself. This is all crucial knowledge that can make or break a mix. With more practicing on random songs (even on the ones that do not work for vocal removal), we can end up with some solid experience that will come handy when recoding, mixing and mastering.
And, this was the point all along – did you really think the world needs voiceless tracks?
Here is my video tutorial showing how to remove vocals from a song using Audacity.