One would think that newer USB protocols would not have issues related to the data stream, but it appears to not be the case.
My own experience is that the OS makes a difference on the same hardware.
Mike has spoken earlier in this post regarding differences between hardware with the same OS. (Mac OS)
The last USB audio protocol was the 2.0 spec, which is about 15 years old now; it wasn't updated for USB 3.0 or 3.1.
In principal, as a pure streaming protocol it works well enough - the clock is where it should be, it has more than enough bandwidth, etc. The problem is more to do with how the actual protocols are implemented on the computer (e.g. there are some timing issues with the latest version of OS X and on some Windows boards) and, more importantly, the electrical characteristics of the interface, and its signalling, at a physical level.
Power and data run together. Power is often very noisy since the vast majority of source devices aren't audio-focused and don't have to worry about noise on the power lines upsetting analog outputs as they work purely in the digital domain. Additionally, all of the data-centric (storage, peripheral interconnect, etc.) protocols that exists for USB are error-detected/corrected, where audio isn't.
A better solution, for USB 3.0, would have simply been to shoot the entire file down the wire, with ECC data, up front, and let the receiver buffer and play it. That wouldn't fix all issues (you still have digital interfaces in the DAC then, but at least they can be optimized for audio purposes), but it would eliminate or ameliorate the issues of noise, isolation, consistency, power levels and so on from the interface.