that's a way to look at it.
disclaimer: i dont remember too much about how opamps work so i may be kind of incorrect on the following, but im pretty sure it's correct.
audio signals are ac signals (think sine wave) that are centered on the signal ground (for stereo audio, the channels are left/right/ground). the opamp's maximum output voltages are the +/- voltages powering it. If it didn't have the negative voltage and was instead connected to ground there, it wouldn't be able to output any negative voltages. when the audio signal goes above or below the maximum the opamp can put out, it just puts out the max. this is what clipping is; the signal is being clipped off at the voltages that try to go above the max.
let's say the negative side of the battery is 0v. by tieing everything to the virtual ground (including the ground channel from the audio), we are in effect level shifting the audio signal up by 4.5v (with our 9v battery example). this makes it so that the ac signal is now centered on 4.5v and wont clip (assuming the signal's magnitude never exceeds 4.5). if we had instead tied the ground to the negative side of the battery, since the opamp can only output from 0 to 9 volts and the sine wave is centered on 0v, you would only be able to output the positive half of the waveform. it's all a game of reference points; with it centered at 4.5, we can output both the positive and negative parts of the signal, and then by passing 4.5 as the ground for the output, your headphones will use that as the reference point that the audio signal is centered around. if you had passed on the 0v ground from the battery instead, your headphones would not be happy.