It's a timing error (or frequency modulation, if you want to think of it that way... in the frequency domain, you get some sidebands and spread of the original frequency). At 44.1 kHz sampling rate, you're supposed to more or less get a new sample (of course it's not an abrupt occurrence, because of the nature of the filtering and reconstruction) every 22.6757 [size=small]µ[/size]s or so. Let's call that amount the time unit. A 441 Hz tone should be 100 time units per period. With jitter, it might be 100.001 samples per time unit, or 99.999 samples per time unit. Or sometimes 100.002 or 999.998 samples, or less frequently 100.0008 or 99.9992 samples? I guess this should sound like some kind of flutter / warble / vibrato effect.
That's a better articulation.
By "play clock" you mean the timing of the DAC's output? If the recording was done with a jittery ADC process, then the recording should be compromised (audibly so, if somehow at a high enough level) because information was not sampled at the correct times. That has nothing to do with any USB transmission later on. I'm not sure what you mean by two tracks streaming at different frequencies. In one audio signal chain, there's only going to be one track, and its average frequency is going to be 44.1 kHz (48 kHz, 96 kHz, etc.).