open source – Random Acts of Chooch

Auphonic Logo

Updated 03-01-2013: to correct some details noticed by Georg from the Auphonic team

I recently came across what may prove to be the single most useful podcasting tool I’ve seen in years. It is a free online service called Auphonic which automates the tasks of normalizing audio as well as noise reduction, encoding, distribution, and a whole lot more.

Many podcasters regularly use The Levelator by Conversation Networks to do some of this. Levelator is great and can save you hours of manual processing. I’ve used it a lot when I have recordings of multiple speakers spread across a room, or an uneven Skype conversation where I don’t have the raw audio from each side. Levelator does have a few shortcomings though: you have absolutely no control over any of the processing; it mangles music; is rarely updated; and only works on Windows or Mac*

Auphonic not only addresses these issues but goes well beyond. Working backwards: Auphonic is a web service so operating system is irrelevant; development is fast and furious and the system includes a machine learning component; it identifies music and processes it separately from voices; and you have control over what processing is done as well as the target “loudness” of the completed file.

Further, Auphonic will process audio and video files from/to many different formats; offers integration with Dropbox, ftp/sftp, Libsyn, and other services; will handle metadata (as well as chapter marks); and provides an API for those inclined to automate their workflow.

What’s It All About?

The Auphonic team’s goal is to provide end-to-end services for podcast production from recording to feed. Meaning, a system to capture a recording, edit and polish, create blog post w/show notes, and post for listener consumption. The first part of that goal is the web service to improve your audio files.

The service is built on open source tools, and they are planning to release the algorithms as plugins to Audacity (~~hopefully in the form of VSTs for use in other DAWs as well~~ – no plans for VST at the moment. They’re working with the LV2 plugin format which Audacity supports). They have also released an IOS App to record and process files, with an Android version coming any day now.

As stated, the service is free and ~~they have no plans to charge for it above voluntary donations~~ they will try to establish a freemium model based on the amount of data people are processing. So heavy users pay a little bit for it and small podcaster[s] can still use it for free.

Much of the work is being funded by the ~~Graz University of Music and Performing Arts and the~~ Austrian government.

Feature Breakdown
When a file is processed, it is first analyzed to classify speech, music, and background segments so that each component can be optimally processed to give the best sounding output file. Current features include:

Intelligent Leveler – Each person speaking has their level automatically raised or lowered to give a consistent presentation.
Loudness Normalization – Voices and music are adjusted for momentary, short term, and overall loudness through limiting and compression. You can specify the overall loudness level based on established European broadcasting loudness standards – or the US ATSC A/85 recommendation to be compliant with the CALM act. (~~boy, I wish the US would adopt these!~~ I had no idea the US had any loudness standards.. commercials sure do seem to still jump out at you!)
Filtering – a high pass filter that removes unnecessary low frquencies
Noise Reduction – removes consistent background noises from computer fans, air conditioning, or line noise (buzz or hum).
Encoding – the processed file can be encoded to a variety of formats including lossy (mp3, AAC, Opus, Ogg) and lossless (WAV, FLAC, ALAC). The service will create multiple output formats at the same time, so click the button once and fill all of your feeds if you offer multiple formats to listeners. Also, if the input is a video file, it can be output to the same format leaving the video untouched.
Metadata Management – fill in desired metadata fields once (artist, album, title, artwork, etc) and all of the output files will include the properly formatted tags, including chapter marks for enhanced podcasts. Even your MP3 and OGG files can have chapters!
Content Deployment – the service can read and/or write files to a host of services automating and easing the process of getting files in and out. These currently include: FTP, SFTP, Dropbox, AmazonS3, YouTube, Archive.org, SoundCloud, and Libsyn.
Presets – Create presets, or templates to easily process all of your files the same way. This could be as simple as predefining the bitrate for your mp3 files, all the way to what external services to copy the files to, pre-filled metadata, and what processing to do.
API – a complete programming interface that allows you to write scripts or full applications that will import, process, and export your files in any way you like.
Machine Learning – the system includes machine learning components to constantly improve all of the algorithms. Similar to email spam filters or search engines – the more people use the service, the better it gets.
Batch Processing – Specifying a preset, you can batch groups of files together to all be processed at the same time

Control Freak

You have control of many aspects of the processing and resultant files. This includes:

Target bitrate for audio formats
Stereo to mono conversion
Chapter splits to multiple output files
Which processing to perform
- Adaptive leveling
- Filtering
- Global loudness normalization (on/off as well as how loud it should be)
- Noise reduction (including the amount to reduce by)
Email notifications can be selected on processing completion, errors, warnings, or all of the above

How Does It Sound?

I’ve gone back through my archives and pulled the audio from some “challenging” recordings to put the service through it’s paces. These included live recordings from conventions with several speakers at varying distances from microphones; listener feedback recorded over phones; and a recording with significant electrical ground noise that seemed to permeate every band on the EQ.

Some of those took me hours to fix BEFORE getting to editing. The last was deemed unusable after I and another audio engineer took swings at it. The Auphonic exports were on par with all of the manual work I did, and the results were returned to me within minutes! The last file still had some audible hum here and there, but was totally usable in a podcast as long as you gave a little warning/caveat at the top of the show.

I was going to include some samples here, but seeing as the service is free and so fast – you need to just grab some raw audio and see for yourself. I’m confident that you won’t be disappointed and will likely make Auphonic the last stop for all of your future recordings.

Conclusion

The breadth of options and flexibility are already astounding and I can’t wait to see what features they add in the future. One in particular that was mentioned on a FLOSS Weekly interview is removing natural room reverb from a recording (presumably using downward expansion).

Being a completely free service, I see no reason beginner and expert podcasters alike won’t find this to be a huge time saver and go-to tool for all of their productions.

* Yes, there is technically a Linux version of The Levelator available, but the required libraries have far outpaced it, so it won’t run on modern systems. There are plenty of guides on how to use it on Linux with Wine or some such, but again, due to not getting updated, I haven’t been able to get it to output a file on Linux for a few years.

Tag: open source

My New Favorite Podcasting Tool

Uncle! Big Corporations Are Making Me Leave Linux

Balticon 46: Open Source Podcasting

There are DAWs and there are DAWs