Would it be possible to create a measurement rig that compares headphone output recorded to the original recording?
- Using those head shaped microphones, record in a studio and use that as reference A.
- Playback on headphones to the same microphone and record as reference B.
- Then tune headphones according to the differences between B and A until they match.
I'm not understanding the purpose here. Are you talking about compensating the FR, or are you looking to have the headphones sound like people or speakers in the room?
For FR it's easy but really meaningless as all you're tuning is for the dummy head and not for any given listener. Also the frequency response when measuring sound sources in a room is dependent of direction. So you'd have to settle for something that won't be right for instruments or speakers in other places.
There exist target responses called free field and diffuse field with the 2 most extreme condition for reverberation(one with no reverb at all, one with a huge amount making it almost as if sound was coming from all directions). Those are well known and did come from work like what you're discussing. IMO free field sounds like crap, and diffuse field sound too bright but is relatively close(while still audibly failing) to some approximate impression of flat.
If you're looking for restitution of room sound in headphones, then you need a all lot of measurements. The full thing would require crazy processing, measurements at various levels for each independent sound source(and multiply all that by X look angles if you plan to have head tracking on the headphone). Even then you'd obviously have something a little different as you probably couldn't fully compensate the distortions of the headphone. And of course if you plan for something a listener can use, then it would all need to be recorded at his ears(would also require some extra EQ to deal with the ear canal).
A simplified version involves convolution. It exists, I'm using it right now on my Realiser A16, and there is a free software(Impulcifier)
https://www.head-fi.org/threads/recording-impulse-responses-for-speaker-virtualization.890719/ with convolution you capture an impulse response so you only have linear information about sound. Non linear distortions are out. But it's already very convincing IMO. The process is as you wrote(plus extra takes if we use head tracking). Obviously your headphone still has to be fairly flat and extended in frequency, it's also going to be better if it has fairly low distortions. Otherwise you might end up with a signal that the headphone simply can't reproduce.
Same as before, if you do it with a dummy head, it will work for that dummy head, not for you or me.
Another key aspect here is how you record for a given sound source at some distance from you, and if you're planning to have another sound source later on, the previous measurements will not help you. If you record for 5 sound sources in the room, that's what you get, those 5 positions in the room. For more, you have to consider a 3D sound capture(some might call that a sound field) with special mics at particular positions and then some fancy processing. Atmos the Dolby tech uses that but it's a tech that's a few decades old and well documented. At some point depending on what you have measured, you can even use some software to move the captured sound source anywhere you want in the virtual room.
Or was your question strictly about headphones and what happened to them in term of distortions? I don't know if I went way too far or if I only scratched the surface of your question, so I'll stop here for now.