Kind of puts a snag on the whole "bits are bits" theory. If you listen to some quarters, there should be NO perceived differences between the various transports in the digital world (USB, Coax, Optical, Ethernet, Wifi, etc.). But it seems quite clear to me that just like in the analog world, implementation details matter quite a bit.
So when you say that junctions are a bad thing in digital transports, what happens to the transported bits? Do bits get lost... err... in translation? Is the loss quantifiable, say as a percentage of total data, and do lost bits get automatically replaced by input receivers with 0s? And can the perceived differences be explained by the bits absent in real-time streaming?
I'm wondering because these things seem like easy to test with access to the right equipment, checking input data with packages received at each point in the chain at each point in time... I mean, if the "bits are bits" hypothesis doesn't hold in real-time streaming, there has to be a mechanism for differing perceived differences from something that nominally should unmistakenly be 0s and 1s.
Let's start super-simply and take the specific interface out of the equation:
Even if you get all the right bits, properly clocked, to your destination, which isn't hard to do with an error detecting/correcting (or retransmitting) scheme*, the noise inherent to, and generated by, all electrical circuits can affect the conversion of that data into the analog signal necessary to output an audio wave form.
Since all electrical circuits have different noise characteristics, they can have different effects on that analog signal. Those can be audible, especially when they occur during DA conversion and/or before amplification.
The bits remained bits, the clock remained "perfect", and yet still there are different possible effects on the output.
That's not a concern (or when it is, it is easily dealt with) for all-digital systems such as computers transferring files, since we don't have to deal with the analog conversion.
...
Now, when it comes to actually transmitting the data:
With fiber optic systems every junction between two different transmission mediums causes additional attenuation. So going from glass to air and then back to glass, as you do with those little adapters, reduces the number of photons that make it from emitter to the receiver.
The receiver is generally a photodiode. Those change their electrical resistance or generate a potential difference when illuminated, depending on the precise type of diode and how is is applied. This effect is dependent on photons reaching, and being absorbed, at the depletion region of the diode. If insufficient photons reach that point then the change in conductivity or the current generated may be below the detection threshold for the receiving circuit.
That results in a "1" registering as a "0".
It can also make edge detection on the bi-phase clock unreliable which causes jitter issues.
(Make that circuit too sensitive and it can register a "1" spuriously.)
There are various ways to deal with this, and nothing stops you using an error detecting/correcting protocol but, in audio, the most common existing standards don't*!
What effect this has, when it occurs, depends entirely on the decisions made by the system designer. In general, however, since we don't know what we were SUPPOSED to receive, the result is either clock error or outputting the "wrong" voltage. Whether you can hear that or not depends on many factors (frequency and magnitude of error, for example).
...
Now, suppose you eliminate these interfaces entirely and play from a source directly housed in your conversion device. That might be a memory card, a RAM buffer (those are pretty much ALWAYS the final source) or a spinning disc ...
Guess what?
While you've eliminated the VISIBLE digital interface, there is still one present, and the eventual analog output is still potentially affected by any noise it generates. Which is one major reason why DACs got externalized from the transports/players in the first place!
...
Yes it's entirely possible to build a device that captures the incoming bit stream and writes it to a file so you can compare it to the source. That'll tell you that you did or didn't get the data (and it was timed) you were intended too. That will tell you absolutely NOTHING about what happens post-reception as that digital data is converted to analog however.
--
*S/PDIF (and AES3 on which it is based) whether electrically or optically transmitted, and USB Audio do not have error detection/correction/retransmission capabilities. So you have no way to guarantee accurate data delivery.
HDMI Audio does include ECC (error correction code) data in the stream and thus CAN be reliably reconstructed by the receiver even after transmission errors (provided the number of errors is less than the number of correctable bits provided for in the ECC). However, this does NOT prevent the same downstream noise effects from affecting the converted analog signals (or, indeed the analog conversion process).
(Please excuse any typos or incomplete thoughts ... picking away at this on my tablet while doing several things at once .. )