It is correct that an R2R DAC basically takes the full value of the sample as "input" and puts out a single voltage value "all at once" while a Delta-Sigma DAC essentially "slices it up in time, processes each piece in sequence, then sums the results". However, the conclusion that "that's why it sounds better" simply isn't logically valid. While the way a Delta-Sigma DAC works is certainly more complex, and intuitively seems "messier and less precise", the fact is that all that really counts is the result - and both deliver very accurate output signals. (The fact that the process used by an R2R DAC is simpler and easier to understand in no way suggests that it produces a "better" output.)
Neither Delta-Sigma DACs nor R2R DACs are "prone to timing errors". What's happening is that, because of the high level of oversampling used in Delta-Sigma DACs, they are more sensitive to timing errors that are present in the signal you send to them. This same factor is present to a degree in any oversampling DAC - because, the higher the clock rate, the more of a percentage error a fixed amount of jitter is in relation to it. This affects Delta-Sigma DACs more than other DACs because they oversample at a higher rate. If you send a bad signal to both an R2R DAC and a Delta-Sigma DAC, odds are that the Delta-Sigma DAC will produce more distortion as a result. Note that this situation doesn't exist if you send a GOOD quality signal to both. It just means that you have to be more careful what you send to a Delta-Sigma DAC if you want good results.
To put a bit of perspective on this.... Assuming a perfect input signal, with absolutely no jitter, and all else equal, a $5 Delta-Sigma chip will deliver performance equivalent to or better than that you get from a $50 R2R chip. However, since the Delta-Sigma chip is more sensitive to jitter, you're going to have to spend an extra $10 on the input circuitry to ensure that the Delta-Sigma chip gets a clean enough input signal to avoid having its performance degraded by jitter. However, assuming you deliver a clean signal to both, their outputs will be equivalent.
(However, this can be a "deal breaker" if you aren't able to design your other circuit elements well enough to deliver the clean signal that the Delta-Sigma DAC requires to perform well. This might suggest that, if you're designing a DIY project, or are a small company without the design know how and expensive test equipment required to design and test for low levels of jitter, the less strict signal requirements of the R2R chip might be a distinct advantage to you.)
Your final comment about "timing and music" also calls for additional comment.... (you are laboring under a common misconception there).
When we refer to jitter as a "timing error", we are talking about nanoseconds or picoseconds - that's BILLIONTHS and TRILLIONTHS of a second. To put this in perspective, at the 44.1k sample rate used on a CD, the samples are about 20,000,000 picoseconds apart. There is no way a human (or any other living creature) is going to HEAR an error of even tens of thousands of picoseconds directly. (A "decent" input stage, by today's standards, should limit the jitter to several hundred picoseconds at worst). In order to be audible as a beat "out of place", you would need an error of several milliseconds, or a speed error of several hundredths of a percent.
Producing a clean and correct output relies on converting samples that have the correct values at the correct times. If you have jitter, then the timing is slightly incorrect, so you're converting the right values at the WRONG times, which produces a result quite similar to what would happen if the timing was perfect but the sample values were wrong - you get distortion. As it turns out, the distortion you get is related to the frequency characteristics of the jitter, and is related to the content itself, but not in a "harmonic manner" (you get distortion that is related to the input signal, but doesn't consist of "simple harmonics" - which means that it doesn't sound exactly like "ordinary THD".)
When you see those graphs, with a sharp peak surrounded by a bunch of smaller peaks and assorted junk, what you're being shown is the overall spectrum of "what's coming out". The theoretical perfect output would be a single sharp narrow vertical line, and those other peaks are signal that shouldn't be there but is (distortion). Since harmonics tend to be masked by the music signal itself, and a lot of music already contains harmonic content anyway, we can reasonably assume that this unrelated and non-harmonic distortion will quite possibly be more audible and more annoying when it is present. This is why, with a DAC, we would hope to find not only an overall noise floor that is on average inaudible, but we would also hope that no individual "spike" would extend high enough above the average noise floor to itself be audible. So you look for a low noise floor ("the grass") and for there to be no peaks that extend very far above it.
Ignoring the pictures, most people who claim to notice low but significant amounts of jitter usually describe it as "blurring the sound stage" or "making things sound blurry".... I would personally describe the effects as "making a well recorded wire brush cymbal sound more like a leaky steam valve" - the frequencies are all present, but you lose the "sense" of individual wires hitting metal and it sounds more like a generic burst of noise at the proper frequencies. I also tend to notice a difference on sibilants - to me they seem more exaggerated but less natural when a high level of jitter is present.
(Note that I'm talking about "jitter being present at the DAC" - which is all that counts. If the DAC has some sort of jitter reduction mechanism, which many do, then all that matters is how much jitter remains when the signal arrives at the actual DAC chip to be converted. As it turns out, it requires VERY careful circuit design to be able to remove or reduce jitter to a very low level, and to avoid introducing new jitter to the signal on its way to the DAC itself. Simply using a good clock is not enough to ensure low jitter on the audio signal - although using a bad clock can be enough to ensure a bad jitter spec.