If you play a sound (from a flat-response source) next to the ear and record it with a (flat-response) microphone at the eardrum, the frequency distribution of the recording will be different from that of the original sound. The difference is the head-related transfer function, i.e. HRTF: how the shape of the ear modifies the sound passing by it.
The raw response (measured at the eardrum) attempts to show the original frequency distribution + the effect of HRTF; while the compensated response attempts to portray how you actually perceive that sound, i.e. how your brain interprets it, by subtracting the HRTF from the raw response. Since the shape of your ear is modifying every sound that comes in, your idea of a neutral sound includes that effect, meaning that a sound with its energy evenly distributed (flat) will only be perceived as flat if it comes to your eardrum as non-flat (flat + your HRTF). A flat sound at the eardrum, on the other hand, would not sound flat to you unless you have no ear at all and just an exposed eardrum.
You've got it right that the raw graph shows what the ear 'hears', but of course you don't actually hear with your ear (you feel with it) - the brain does the hearing. And the compensated graph is what tries to display what the brain perceives, i.e. what the listener hears.
Then again, if I've got it wrong, corrections are welcome.