tkddans
500+ Head-Fier
- Joined
- Jul 18, 2016
- Posts
- 697
- Likes
- 1,316
I was wondering about that too when encountering the volume change in stereo vs Atmos comparisons. It did seem like stereo had everything crammed together in a small range of loudness (less dynamic range), whereas Atmos gave more room between instruments and vocals. That is, Atmos felt like I could hear a greater range, even if only a trick of staging separation, that allowed me to appreciate the qualities of instruments better at quiet vs loud volumes.Don't quite take machine learning to master tracks out of the equation. It would be very similar to applying a "style" to an image (look up deep style transfer); granted we're talking about a 1D signal vs a 2D image. You can have ML learn the "style" in which music is mastered. The input to such a network would be the raw input tracks and the output would be a single mixed track. Training data would essentially encompass the raw tracks that were used to generate the final master with the output being the final master itself. Granted we don't have access to this data, but I'm sure the record companies do (assuming they keep all the data). If the record companies kept this data, they'd have a wealth of data to train these models with.
In terms of "put the cells over there" and "pianos over there," the ML could learn that certain "types" of sounds would be placed and mixed a certain way. Given enough examples, it would definitely learn this. You have to remember that neural nets start out as a blank slate (unless you're doing transfer learning, then it doesn't), and it needs to learn what different features exist and patterns within the features and how they connect. We can train an auto encoder to generate realistic looking faces. There isn't a clearly defined "put the eyes next to each other" and "a nose below and between the eyes" inherently in the network, but it learns it as it sees more examples of this. These networks learn a general pattern and apply said pattern. The auto encoder will learn the that eyes have eyeballs and eye balls have dark circles and irises surrounding them.
One could try to, for example, make a model that's trained on recordings from just one genre or recordings from one specific producer; I feel finding patterns for a smaller set of related things would be easier than trying to make one model to rule them all. But that would be the end goal. If you can get one model to work, then you could apply transfer learning to get other genre's/producers to be replicated by model.
The problem? Well there is inherent risk involved. It's likely there is a very complex pattern that exists (how humans do things do tend to have certain patterns to them, even if there is a little variation from one work to the next). The problem is what's the cost of learning this model? Time? These models can take years to train assuming there is hardware out there that is capable of doing it. Not only the training time, but the time it takes to tune the model; which of itself is kind of an art still rather than a science (though there are approaches to doing this line a genetic algorithm or some other form of optimization like Bayesian Optimization). Hardware costs are another huge thing. You can rent, but this will cost you very much in the long term. You can buy, but that's not cheap either. Manpower and expertise... Who's going to do the work, they need to get paid too... It likely takes a team.
I feel like the two biggest factors right now is both on the hardware and time fronts. Getting a machine that can train a NN with this sort of data is going to be very costly if not impossible. Without good hardware your training times per model will take forever, and when optimizing and tuning a model, you train many models with that number growing combinatorially larger. I'll also admit there is a lot of luck involved. Note as of time of this writing there isn't much hardware out there that can train a NN on raw genomic DNA... Even a bacterial genome is too large to do (either space is too big or time is too big). Basically, I don't think the hardware has caught up to be able to do music yet.
To the first one, Warner actually has a playlist of this (so does Apple). The Atmos recordings don't sound too different to the originals; they are the least altered ones IMO. Artists wouldn't be doing the custom mixes, the songs are owned by the record companies. They definitely have the budget to do this. Especially if there is backing from Amazon, Apple, Spotify, etc.
Edit: I honestly could care less for 3D effects and stuff. They're cool and all... But I'd rather the record labels start going back to mastering things quiet again instead of loud. The Dolby Atmos tracks for the most part accomplish this particular goal IMO. It's really the only reason why I like them. I hear more dynamics with the Atmos tracks vs the original masters (on AAC). So yeah, forget the Atmos stuff... Just do the mastering in a way that doesn't completely destroy the dynamics of the music. Though I guess some people prefer their music loud.
The loudness alterations in particular got me thinking, as you seem to be saying, that the “loudness war” we’ve had in music for many years may be softening up. Maybe Atmos will help lead artists to focus more on impressing listeners with staging and space, rather than just blasting the dB higher on each side of the ears.
Or maybe we will just have a new wave of loudness wars, but within Atmos. Crap