To make it simple: the audio that is being reproduced is a sum of the waves coming from every instrument. Your brain does all the work recognizing different frequencies within this sum of sine waves and letting you recognize different instruments. Some people can even recognize the instrument by looking at the visualization of the waves recorded.
Look at the image below and notice how adding different frequencies together doesn't result in separate oscillations, but that it all ads up to a new pattern:
If you think about it, your eardrums are simply drums. Nothing more, nothing less. They can only vibrate the way a drum vibrates and that is backwards and forwards. The same can be said about any single audio driver and microphone. Those also move backwards and forwards with different amplitudes and with the sum of frequencies that they receive.