The reason why there are such large differences between listeners is indeed only partially because of biological hardware, that can lead to different sensitivities.
A more important factor is that when the brain processes sensory information, it does not simply 'mirror' internally what we hear or see; it needs to rebuild the image from the ground up. At the lowest level, a single cell only responds to a dot (black/white contrast) in a tiny location of the visual field. One level higher, several dots form a line. Then a square, then a house etc. Since this is such a demanding process, it only filters the relevant information of each level during that process, and 'colors' the missing information with a calculated projection to give us the feeling we are not missing anything, even though we are only consciously processing a fraction of the possible information. For example, when you are walking down a busy street with your girlfriend, you might notice gift shops for an upcoming birthday present for her, or a restaurant if you happen to be hungry, or perhaps other ladies (not me of course, I don't even know they exist). She in turn might be looking at the architecture, or a cute little doggy, etc.
The same is true within audio. While processing audio, the brain automatically selects on a lower level what you find relevant, based on previous experience and preference. This happens subconsciously, in other words, before we are aware - we only hear the end result of that process. So for some people that means attention is diverted to bass, or vocals, or treble etc. It is simply impossible for our brain to process everything all at once, even though we are given that illusion. This is why it is often very difficult to understand why other people hear so different; "I know what I hear, nobody can tell me otherwise." The brain is primarily focused on a narrow selection of the possible information, and this selection has been made subconsciously. Someone else's brain in turn makes their own pre-selection. As a result, everyone creates their own mental image, regardless of the source, music etc.
The brain tends to be rigid; it wants to hold on to what it knows, as this provides certainty. Of course, we can counter the implicit 'bottom-up' processing with conscious 'top down' processing, i.e. manually diverting our attention to other aspects, like you do with analysis. But changing that implicit process requires new experiences or active training. Either way we are 1) restrained to processing limited amounts of audio at one time, 2) never fully aware what we are and aren't missing.
How was this relevant again? Oh yeah, smoking grass. That definitely helps to divert and expand attention, even at the subconscious phase (again, not me of course, I have no idea what it does)