Many of my audio projects end up creating large amounts of uncompressed WAV format audio since it’s lossless and easy to work with. But when I’m done with the project (at least for the time being), I don’t want to keep all these gigs of uncompressed audio just eating up space. But I also don’t want to delete them or compress them into something lossy since I might need them in the future; I learned my lesson with deleting intermediate files when revisiting projects before.
So what to do? In the past, I just threw all the WAV files in a WinRAR archive (with default settings) and called it a day. However, after a recent cleanup/organization of files, I was left questioning how effective the RAR approach is. So, I tried compressing about two hours of WAV audio using 7-Zip and WinRAR and tweaking as many different settings as I could so as to find the optimum settings for each. I also tried using FLAC as a non-archive alternative. Here’s what I learned.
7-Zip
7-Zip has had a steady popularity over the last ten years as a free alternative to WinRAR with comparable results. I’ve switched to offering downloads on my site using this format instead of WinRAR in recent years as 7-Zip is now widely-adopted and easy to acquire and use. But is it any good at raw audio compression?
Looking over the literature, I was seeing several mentions of increasing the dictionary size in 7-Zip leading to better compression, especially with larger files. However, when I tried larger dictionaries, it actually resulted in slightly less compression ratio. The overhead of maintaining a larger dictionary must outweigh the compression benefits for audio data.
I also tried increasing the compression level in 7-Zip, but this too gave a slightly worse compression ratio (<1% increase) at a cost of taking twice as long (from Normal to Ultra). This is probably due to the dictionary size increasing automatically for Ultra mode; when the dictionary is set back to the default 16MB with Ultra, there is a very small improvement that’s definitely not worth the additional processing time.
The optimal settings for 7-Zip with WAV audio seem to just be the defaults. At its best, 7-Zip achieved a 70.0% compression ratio but at about half the speed as WinRAR.
WinRAR
According to Google Trends, WinRAR has been the dominant compression utility for the last 20 years. Its relaxed approach to trial versions has made for great memes and given no reason to buy a license or for another product to compete. Myself and many others swore by it for a long time, but is it still the king of compression?
For WinRAR, dictionary size didn’t seem to have much of an effect (either positive or negative) with my test files. It might more intelligently reduce the dictionary size if doing so gives a better compression ratio. But I found that increasing the dictionary size from default definitely doesn’t help the ratio, and decreasing the dictionary size is expectedly even worse.
Increasing the compression level in WinRAR gave a very modest improvement (<1% decrease) in ratio for up to about twice the compression time (from Normal to Best). Overall, WinRAR seems to be almost twice as fast as 7-Zip at WAV audio compression while still achieving a 70.0% compression ratio at its optimal settings. Things are not looking good for slow-poke 7-Zip.
WinRAR also allows creation of RAR4 archives, an older version which has a mode specifically for compressing WAV audio. It improved the compression by a small but consistent amount even though the dictionary limit for RAR4 is just 4MB. And it also seems just as fast as RAR5 (or later). Its compression ratio was 68.7% with the test data. Using the “Best” compression level here had a negligible effect.
FLAC
FLAC has been around for quite a while, but I had never given it much regard as anything other than an intermediate format, something to transcode into MP3 and then delete typically. In case you’re unfamiliar, it is a free, lossless codec for audio, meaning the data is supposedly functionally exactly the same as the uncompressed input. I was hesitant to use FLAC for long-term audio storage as I’ve previously found it rather cumbersome to work with, especially for multiple files.
With my test data, FLAC had the best compression ratio of anything so far at 57% (using the default compression level). It was also by far the fastest, about 3 times faster than WinRAR. At max compression level, there was only a tiny increase in compression ratio (<1%) while taking about 12 times longer.
Converting the files back to WAV from FLAC reveals almost an exact copy of the original WAV file, with the caveat being that the encoding software tag will have likely changed and also some software-dependent metadata may be stripped away (like from Adobe Audition). However, upon comparison of the files in a hex editor, the audio data did appear to be exactly the same. Compression of test files was done with FFmpeg with minimal options so that format specifications (like bit depth and sample rate) remain unchanged by conversion.
Conclusion
Overall, I think the best option is to use FLAC for long-term storage of WAV files. It’s faster, has better compression, and can often be opened natively in whatever player or editor one might use. Although, if you need to be absolutely certain the files are unchanged by compression for whatever reason, RAR 4 is still a good option. It’s relatively fast, completely lossless, and stores files in nice, compact archive files that can be easily extracted from and preserve timestamps.
The only remaining hurdle I had before I could use FLAC for my long-term storage needs was how to bulk-compress directories, and in a reversible way. However, 30 minutes with an AI got me a decent solution in the form of a Python script that can do just that. It’s nothing special, but thought I’d just include it in case it saves someone else the trouble. You’ll find it on GitHub.