Higher sampling frequencies allow for better representation of the continuous audio waves by the discontinuous sequence of samples (”dots“ on a timeline).
No, they don't if we are talking about band-limited signals and we should be, because sampling theorem
requires that signals are bandlimited according to the Nyquist frequency. By bandlimiting the signal, we know 100 % what the signal does between the sample points. There is mathematically only one way it can connect the dots. In order to behave differently the signal would need frequencies above Nyquist, but band-limited signals do not have those!
Also, how audio waves look to our eyes is different from how our ear hear them. For our eye the visual shape of the wave is easy to see, because that's what eyes are for: seeing shapes. Our ears however work differently and are much more into analysing the frequency content of the waveform. Since different
looking waveforms can have the exact same frequency content, ears are not interested much about the exact shape of the waveform. In fact, rooms acoustics with all the reflections and reverberation render the original waveform pretty much unrecognisable to the eye, but that doesn't matter much, because for the ear the frequency content is more or less intact (depending on how good the acoustics are). Even speakers/headphones alone change the waveform drastically.
But don‘t forget the aspect of timing. Humans perceive timing differences in the arrival of wavefronts between left and right ear down to a few microseconds (like 8 to 5 microseconds or even less). That is one aspect of perceiving the direction of a source of sound - you know, in evolution, the famous lion out there that is going to eat you … So, in order to represent such small timing differences in sampled representations of sound, it takes sampling frequencies of more than 100 and up to 200 kHz or more. (More precisely, 1 microsecond corresponds to 1 MHz, 5 microseconds to 200 kHz, 10 microseconds to 100 kHz.)
I think that is an easy explanation of what sampling rates are needed for a good representation of sounds by discontinuous samples.
Again, the core idea of the sampling theorem is to represent (band-limited)
continuous signals fully by taking samples often enough.
Higher sampling frequency allows sampling of higher frequences. That's all.
Higher bit depth allows lower noise floor. That's all.
For
consumer audio 44.1 KHz sampling frequency and 16 bit quantisation allow enough bandwidth (up to 20 kHz) and dynamic range (technically about 96 dB and perceptually up to 120 dB depending on what kind of dithering is used). In music
production 24 bit is useful/beneficial for practical reasons and also higher sampling rates may have a place in the production, for example when recording ultrasonic sounds and using them at lower samplerates as sound effects.