[1] Cleverly, both data and clock may be combined in such a way that both can be derived from the demodulated signal. But the process is never quite perfect, and one of those imperfections is called jitter, a simple indeterminancy about the incoming clock and its related data.
[2] AES uses higher signal levels than SP/DIF, so signal-to-noise is likely to be better. [2a] And better glass in a Toslink jumper is likely to be audibly better for the same reason. [2b] It's just case of having more signal to work with when you go to demodulate.
[2c] I'm a retired communications engineer, and I've worked with high datarate fiber runs of kilometers plus having miniscule bit-error-rates, and I can assure you that all these kinds of things are well understood in the industry, and there's no magic at all in getting any signal from point A to point B. If cost isn't the primary concern, which is almost never the case.
[3] But there are many, many things that can go wrong, and that's especially true in cost-costrained consumer goods.
[4] That said, I don't let USB anywhere near my music, and use coax and glass where appropriate.

1. Correct, the process is never perfect, clocks are not perfect, clock recovery/processing circuitry is not perfect and there will always be some amount of jitter. The question is (or rather was, about 45 years ago!), how low does the jitter have to be in order for it to be inaudible.
2. Again, digital audio is binary and therefore has just two states, zero or one (on or off), there is no 3rd state for noise! The signal to noise ratio therefore has absolutely no effect whatsoever, providing there is not so little signal and/or so much noise that a one cannot be differentiated from a zero ...
2a. So, better glass in a Toslink jumper/cable will NOT be audibly better, for the above reason!
2b. No it is not! It's a case of having enough signal to be able to differentiate a zero from a one, more signal than that makes absolutely no difference whatsoever, you get EXACTLY the same zeroes and ones!
2c. In you home audio system are your digital interconnects "runs of kilometers"? With long cable runs we both loose signal and gain interference/noise, a double whammy which can/will cause issues and requires solutions. AES is specified with a higher SNR for this exact reason, for commercial use in say recording studios where we may have several/many tens of meters of digital cable runs. With a typical consumer digital cable run of just 2 or 3 meters or so, then SP/DIF or AES has absolutely no impact on data integrity. If you are/were a communications engineer how is it possible that you don't know this? Maybe you've only dealt with kilometers of cable runs and never considered or learned what happens over far shorter distances but I find that hard to believe of a qualified, experienced engineer.
3. There are indeed many, many things that can go wrong, however all of them have been addressed! In fact, they were addressed so long ago and technology has advanced so much since then, that it now costs peanuts to address them. Your statement is therefore false and in some cases the EXACT opposite of what we actually see: It can be "especially true" in some expensive audiophile products, which in an attempt to differentiate themselves in a crowded market sometimes employ bespoke, esoteric designs which fail to address the problems that cheap DACs have overcome. When you were a communications engineer, did you state that a system/project was completed based purely on your assumption that it would work or did you objectively test and measure it's performance first, in order to avoid making false statements and appearing ignorant or incompetent?
4. I've objectively measured and tested digital audio data transferred over USB, AES and SP/DIF and in one case, well over a decade ago, had a very modestly priced USB audio interface run continuously for several days with not a single bit error. What have your objective measurements of USB DACs demonstrated?
[1] Hans Beekhuyzen has a great video on the topic:
[2] A university (in Japan iirc) conducted a test were subjects listened to music, once with frequencies up to 22kHz and then music with no frequency cap. Whilst listening, an MRI of the peoples brains was made, and in every case when a sample with ultrasonic frequencies was played, the brains showed significantly more activity compared to redbook sample.
[2a] The moral of the story? Just because our conscious mind is not able to make out certain sonic elements doesn't mean, that said elements don't affect us and add to the listening experience.
[3] And since the most significant difference compared to OS dacs or solid state is, that tubes and NOS add even order harmonics. [3a] Also way below hearing threshold. [3b] Yet people seem to perceive it.
1. On what basis is it "great"? Is it great because you liked and/or believed it? Is it great because quite a few other audiophiles liked/believed and quote it? In terms of audiophile marketing, I could possibly agree that it's "great" but in terms of the actual facts/science it's pretty much the exact opposite! Unfortunately therefore, you posted that video in the wrong forum, this is NOT an audiophile marketing forum, this is the sound science forum.
2. Yes, I've read that paper. While the MRI demonstrated a difference in certain brain activity, the subjects themselves reported experiencing no difference.
2a. How can it "add to the listening experience" if there is no difference in the experience? How then is that the "moral of the story"?
3. If you're referring to NOS DACs, then they are typically also filterless and therefore do not add even harmonics but fail to comply with digital audio theory and remove alias images, which are harmonically unrelated.
3a. No they're not, they're typically significantly above the threshold of audibility and an order of magnitude or more above the jitter and other digital audio artefacts found in even quite cheap DACs.
3b. People can easily perceive differences even when there are none whatsoever (see the "McGurk Effect" for example), which as demonstrated is purely an effect of how we "perceive", actually hearing a difference is entirely a different kettle of fish though and when put to a reliable test many of those perceived differences typically vanish. This isn't applicable in this case though, the affects of tubes and NOS DACs is above the threshold of hearing and can be differentiated in reliable (double blind) testing.
[1] Nobody I know can hear jitter, but we all hear its side effects,
[1a] which vary all over the map depending on everything from source material, to several sorts of analog signal chain distortion, all the way out to transducer type and resolution.
[2] Since isochronus transfer is source clocked, the troubles usually start there, with something as simple as a noisy power rail.
[3] Enter reclockers, jitter buffers, et al, and Maybe it sounds good enough in the end. Maybe not. In any case, the result is usually a sort of degradation we find unmusical.
[4] SP/DIF seems less fragile, if only because it's simpler. AES/EBU is better still, if available.
1. No, we do not! This is the sound science forum (and you were an engineer were you not?) so where's your evidence?
1a. Yes, the effects of jitter do vary all over the map; random noise, sharp spikes and other artefacts for example. However, at what level? There are some cheap consumer DACs where the highest jitter artefacts are at -130dBFS, at least an order of magnitude below audibility and even below the theoretical Nyquist/Johnson noise level of a component/system. No one can hear that or any side effect of that, let alone "we all" being able to hear it!
2. Yes, again, the problems did start there, decades ago but were solved many years ago and today the effects are reduced to way below audibility with just a couple of bucks worth of components. And if a DAC designer cannot deal with "something as simple" and common as a noisy power rail, that is the very definition of incompetence! Again, even a cheap $60 consumer DAC can isolate itself from even a very noisy USB output from a computer/laptop to levels well below audibility, why then is it apparently such a "trouble" for more expensive audiophile DACs?
3. "In any case, the result is usually a sort of degradation" that should be well below audibility and therefore we CANNOT find it "unmusical", musical or anything else! Notwithstanding the possibility of an incompetently designed/faulty audiophile DAC.
4. AES/EBU is better if you have very long cable runs but if your digital interconnects are 3m or less (or even 5m in most cases) then it is not better, data recovery is the same!
-------------------------------------------
The above is ALL a prime example of what I mentioned in my last post: It's all just regurgitations of audiophile marketing fallacies and falsehoods, regurgitations that we've seen posted here countless times, which have been demonstrated/proven false years ago. If they do start regulating "fake news" on the internet, most of the audiophile world would go up in a puff of smoke and about 95% of the posts on headfi will have to be removed!
G