I realize that this is a losing battle as the listener in question has made up his mind regarding what he's hearing with the X7 vs the mojo, but--
The only reason DACs need to upsample is to make room for an analogue reconstruction filter with a less steep rolloff than the brickwall filtering a non-oversampled digital sample stream would require. The filter would cut off at the Nyquist frequency of the oversampled sample stream, which is 16x (say) the original Nyquist frequency, leaving plenty of room for the signal to be passed without attenuation at the original Nyquist frequency.
16x upsampling is plenty enough for this--if that were all that the X7 were using. I'm not at liberty to fully disclose the workings of the ES9018S DAC the X7 uses, but what is publicly known is that it upsamples PCM and DSD alike into a high frequency multibit stream for subsequent ASRC jitter reduction, volume control and D/A conversion. Seeing as it does this upsampling for up to DSD512, in the case of 44.1kHz audio the upsampling factor would also be at least 512x.
But back to the central argument. Does a high oversampling factor increase "timing performance" of the reproduced signal? Quite frankly, no, it wouldn't, even if we pretended for the moment that the "timing performance" as a metric made any sense!
Your contention, apparently, is that the "timing performance" of the DAC is equivalent to the sampling period of the upsampled sample stream. At 16FS, sampling rate = 44.1kHz*2048 = 705.6kHz, sampling period = 1/(705.6x1000)s = 1.4μs as claimed.
The thing is, the D/A converter at the receiving end of this sample stream outputs ANALOG signals--and it does this by lowpassing the digital pulse train. An analogue lowpass filter does not dumbly "join the dots" of the digital sample stream--it has particular mathematical characteristics dictated by the fact that it preserves all frequencies in its passband and rejects all frequencies in its stopband. But a picture speaks a thousand words:
On the top, the D/A conversion of your upsampled signal as you would seem to have us believe is going on, wherein adding more points would smooth out the curve;
At the bottom, the D/A conversion of the upsampled signal as actually occurs--output is analog waveform that bears little resemblance to what an imagined "join the dots" curve would look like.
http://www.head-fi.org/t/769647/objectivists-board-room/1635#post_12203380
Elsewhere I detailed technical arguments regarding Rob's "timing" contentions, I have yet to receive a response.
Don't mind me, I'm just taking an early retirement from my position as FiiO rep