Fourier transformation is an approximation only if you don't let the summation go to the infinity.
What I'd worry about more is whether the mathematical requirements required by the formula are actually satisfied in implemented systems. For example, every sample in a song should contribute to the final analogue signal at all times, although the contribution approaches zero fast as you go farther from the sample's own time. But when analog sound is being reconstructed, that really is an approximation.
I was just thinking... in PCM, you must use not only all past samples but also all future samples to achieve true reproduction. In DSD, you only need to use all past samples and have no need to know the future ones. By the nature of its design, DSD reconstructed analog output is just the sum of signal deltas (differences) from the beginning of recording till the given moment. With PCM, you need in theory to put a sin(x)/x curve around all samples and for any given point add them all up. In practice you use some kind of filtering but you never use all the samples from the future, just a certain amount of nearest ones. This is an approximation, and the DSD does not have it. Could this cause the difference that engineers are hearing? If so, they "only" need to increase number of samples used in recreation, i.e. increase the number of taps on digital filters, to improve the approximation. God knows the memory and processing power is cheap these days. Maybe they've overlooked that in the design, or screwed up, who knows.