The problem is that, when you start talking about surround sound, that term becomes even more vague.
(Even past the fact that what's "audibly transparent" to one person may not be to another... and that what's "audibly transparent" on certain content may not be on other content.)
Here's an easy example.....
Some of the early compression CODECS saved a lot of space by making the assumption that we aren't especially perceptive about the details of high frequency content.
It was particularly assumed that, while the amount of high frequency spectral content may be quite audible, we aren't very sensitive to the details.
As a result, at least one or two of the early surround CODECS simply didn't bother to store the information in the surround channels in the upper frequency bands.
During the encoding process, when they were analyzing what was present in each frequency band, if they detected what seemed to be "decorrelated noise" in the surround channels, they would simply discard it.
All they stored was a very generalized piece of data about "how much sound was present in that band".
Then, when decoding that content, they would simply "fill in" the upper bands in the surround channels with "about the right amount of decorrelated noise".
(I seem to recall that one particular CODEC would simply save that band from one channel - then duplicate it to all of the channels when decoding the content.)
This is NOT unprecedented in VOICE reproduction.
Many advanced telephone CODECs don't actually store the voice at all.
They break it down into small bits of sound - which can then be stored as "coefficients", and then "rebuilt" later from that information (in general terms this is called "tokenizing").
So, for example, they might find that a certain spoken sound in my voice is equivalent to "a 50 msec burst of noise in band 2 at level 5 mixed with a 50 msec burst in band 5 at level 7 followed by a 100 msec burst in band 3 at level 2".
They would then store this information - and identify it (for example as "sound #24").
Then, at the receiving end, when told to do so, the decoder would "play a copy of sound #24".
(The process is somewhat similar to MiDi.)
(Imagine a video transport system where a person at one end watches the action - then describes it over a phone to a remarkably fast artist at the other end.
The artist at the other end then DRAWS what the sender sees based on their description. You would end up with the equivalent of "a cartoon that looks very much like the original".)
You may have experienced this if you've ever had a cell phone conversation where the voice was quite intelligible but any background noise came through as odd electronic chirping noises....
Some of these CODECs, especially the early ones, actually handled voice quite well, but were confused by unusual sounds they were unable to "understand" and "deconstruct" - like background noise.
This was quite noticeable on many early "Internet phone systems".
Similar "decisions" are made all the time when applying compression to VIDEO content... with similar questions about whether they are "visible" or not.
I'm going to regale you with an example I saw on a DVD, which demonstrates the question very well.
In a certain very old disaster movie about a tornado.... one scene takes place in front of a background of very dark rapidly swirling clouds.
In the original VHS tape versions of this movie the clouds could very clearly be seen to swirl throughout the entire scene... along with a significant amount of tape background noise.
However, in the DVD version of the same movie, in that same scene, the clouds
DO NOT SWIRL (they change once or twice but essentially remained stationary).
This example is striking because, if you'd never seen the movie before, you would have said that "the DVD looked quite good"... and never missed the movements in the clouds.
However, if you were familiar with the tape version, or the original movie, it was obvious that the DVD version did not reproduce it accurately at all.
(And, apparently, even though there is a lot of random tape noise, we humans can easily discern the difference between swirling clouds and tape noise.)
The reason this happened is obvious (if you're familiar with video encoding for DVDs).
Because noise, like tape noise, is in fact random, it doesn't compress efficiently, so accurately recording tape noise requires a lot of bandwidth.
In the CODEC used for DVDs, bandwidth is allocated intelligently.
And, in general, noise is something that most people prefer not to see, so you normally want to remove it anyway.
So, as part of the process, noise is filtered out before compression is applied, so as to preserve more bandwidth for useful information by avoiding "wasting bandwidth on noise".
(The choice of what to filter out can be controlled manually - but can also be done automatically in many encoders.)
In this particular scene, because the swirling clouds are very dark, and contain little information, the algorithms have "decided" that the swirling is "noise" and filtered it out.
Another way of looking at it would be to say that the encoder has substituted static clouds for the original "unimportant" swirling clouds in order to save space for more "important" information.
(It is performing "priority based perceptual encoding".)
It is in fact possible that, in this case,rather than the encoder, a human operator CHOSE to set the filtering at a level that would wipe out the swirling in the clouds.
However, the result is the same.....
Even though many viewers may PREFER the smoother filtered version....
We cannot reasonably claim that "the encoding is 'visibly transparent' to the original"....
(And it's quite obvious that "the original artistic intent" called for "ominously swirling dark clouds".)
So,if you were an aficionado of bad old disaster movies, would you prefer to see the encoded version or an ACCURATE reproduction of the original.
(Unfortunately, in this case, unless you were to acquire a theatrical master copy, you would be forced to choose between the tape noise from the VHS version, and the "smoothing errors" on the otherwise excellent DVD transfer.)
TO BRING THE CONTEXT BACK TO THIS DISCUSSION....
Unless you have the lossless copy of a file, encode it yourself, and compare the two, can you TRUST the encoding process to never make similar "editorial decisions"?
(And, even if you confirm that ten files you encode and carefully compare are "audibly transparent", are you willing to believe and trust that EACH AND EVERY FILE encoded by someone else will be audibly transparent?)
Personally, not being a major aficionado of old movies, I'm willing to concede that "most DVDs look as good or better than the VHS version", and that's plenty good for me... so I'd rather have the DVD.
However, I'm simply not willing to make a similar concession for music.
I've got no real use for "beyond audible transparency".
What I meant about data rate was overall data rate. Usually, the data rate is sufficient to achieve transparency. But transparency at 2 channel is a different data rate than transparency at 7.1. More pieces cut in the pie mean a bigger pie is needed.