No, I think most people don't really quite "get" how AI works and that's ok. It's not anywhere near human intelligence in terms of "taste," or "creativity" for topics like art. Generally, sound-oriented AI is going to fit into classification and understanding tasks.
For Atmos, you still need a human listening to placement in the remaster process and making decisions on how to structure the remaster.
While you might be tempted to think that "judgment and discernment" are part of the AI world because of the looming breakthroughs in Autonomy, you have to remember that there are very clearly defined rules of the road and that there is a significant economic incentive to creating a system that can operate in the ambiguity within the rules of the road. There isn't a clearly defined "put the cellos over there and pianos go here" type of mixing philosophy an AI can be trained on.
While there could theoretically be a rough version of an AI system that could do some really weak version of a remix, it's results wouldn't justify the development investment with our current technology.
Don't quite take machine learning to master tracks out of the equation. It would be very similar to applying a "style" to an image (look up
deep style transfer); granted we're talking about a 1D signal vs a 2D image. You can have ML learn the "style" in which music is mastered. The input to such a network would be the raw input tracks and the output would be a single mixed track. Training data would essentially encompass the raw tracks that were used to generate the final master with the output being the final master itself. Granted we don't have access to this data, but I'm sure the record companies do (assuming they keep all the data). If the record companies kept this data, they'd have a wealth of data to train these models with.
In terms of "put the cells over there" and "pianos over there," the ML could learn that certain "types" of sounds would be placed and mixed a certain way. Given enough examples, it would definitely learn this. You have to remember that neural nets start out as a blank slate (unless you're doing transfer learning, then it doesn't), and it needs to learn what different features exist and patterns within the features and how they connect. We can train an auto encoder to generate realistic looking faces. There isn't a clearly defined "put the eyes next to each other" and "a nose below and between the eyes" inherently in the network, but it learns it as it sees more examples of this. These networks learn a general pattern and apply said pattern. The auto encoder will learn the that eyes have eyeballs and eye balls have dark circles and irises surrounding them.
One could try to, for example, make a model that's trained on recordings from just one genre or recordings from one specific producer; I feel finding patterns for a smaller set of related things would be easier than trying to make one model to rule them all. But that would be the end goal. If you can get one model to work, then you could apply transfer learning to get other genre's/producers to be replicated by model.
The problem? Well there is inherent risk involved. It's likely there is a very complex pattern that exists (how humans do things do tend to have certain patterns to them, even if there is a little variation from one work to the next). The problem is what's the cost of learning this model? Time? These models can take years to train assuming there is hardware out there that is capable of doing it. Not only the training time, but the time it takes to tune the model; which of itself is kind of an art still rather than a science (though there are approaches to doing this line a genetic algorithm or some other form of optimization like Bayesian Optimization). Hardware costs are another huge thing. You can rent, but this will cost you very much in the long term. You can buy, but that's not cheap either. Manpower and expertise... Who's going to do the work, they need to get paid too... It likely takes a team.
I feel like the two biggest factors right now is both on the hardware and time fronts. Getting a machine that can train a NN with this sort of data is going to be very costly if not impossible. Without good hardware your training times per model will take forever, and when optimizing and tuning a model, you train many models with that number growing combinatorially larger. I'll also admit there is a
lot of luck involved. Note as of time of this writing there isn't much hardware out there that can train a NN on raw genomic DNA... Even a bacterial genome is too large to do (either space is too big or time is too big). Basically, I don't think the hardware has caught up to be able to do music yet.
Edit: building a model that mixes an individual track/instrument/etc. would require a lot less hardware than one that mixes multiple tracks. So you'd send each track in, it' would "mix it" individually and you'd just add all the tracks together. Getting training data for this might be more difficult.
Edit 2: on second thought, making the model blind to the other "tracks" would likely be more of a hinderance than not since knowing what other tracks exist and what they sound like would be beneficial for the model to use and learn from/exploit.
Q - How many classic recordings do you want to hear alterered for Spatial or Atmos?
A - Me? Not any. You?
Q - How many artists do you think have the budget to make custom mixes for these new formats
A - Not many.
To the first one, Warner actually has a playlist of this (so does Apple). The Atmos recordings don't sound too different to the originals; they are the least altered ones IMO. Artists wouldn't be doing the custom mixes, the songs are owned by the record companies. They definitely have the budget to do this. Especially if there is backing from Amazon, Apple, Spotify, etc.
Edit: I honestly could care less for 3D effects and stuff. They're cool and all... But I'd rather the record labels start going back to mastering things quiet again instead of loud. The Dolby Atmos tracks for the most part accomplish this particular goal IMO. It's really the only reason why I like them. I hear more dynamics with the Atmos tracks vs the original masters (on AAC). So yeah, forget the Atmos stuff... Just do the mastering in a way that doesn't completely destroy the dynamics of the music. Though I guess some people prefer their music loud.