It's just that headphones will remain inherently limited as long as they don't tailor very precisely the frequency response curve they output so that they can reach at a particular listener's eardrum a very specific FR target tailored to that one user's own head-related transfer function. This beyond the usual individual variations that have been well documented in Harman's research (and probably others).
If you put person A and B in the exact same room listening to the exact same pair of speakers in the same position and measure at their eardrum the frequency response they hear, it will be different because their varying anatomy influence the FR curve they receive at their eardrum. Headphones putting the drivers much closer in a very different acoustic space mess with that and produce yet again different FR curves, but differently different if you see what I mean

. So they'll always sound off to some degree (besides problems related to lack of crossover). This is also the reason why surround sound simulation with headphones is never truly convincing for most users with only general HRTF profiles (by convincing I mean being able to pass tests where the listener is tasked with pointing their finger in a 3D space at the origin of the sound and accurately locating it within x degrees).
Currently the best we can do without directly measuring a user's HRTF is gather anatomical data on a user and compare that anatomical data thanks to various algorithms (such as neural networks) to a known database of anatomical data vs. measured HRTFs to come up with an individualised HRTF profile for that user. But that requires gathering anatomical data on a user's head / torso / ear in the first place. Apple is uniquely placed here thanks the technology they developed with Face ID which can work very well for mapping an ear in 3D. Problem being that optical sensors need some distance from the object to work well in that case which would result in very large an unwieldy earcups, and I bet that such sensors are quite expensive. I don't know if the sort of long range (ie not just "touch", they work a few mm (cm ?) away from the surface) capacitance sensors Apple mentioned in their patents was able of that to a good enough degree, but apparently it was good enough to form a low res image of a user's ear to detect left from right ear and orientation of the headphones.
This is all why passive headphones from boutique companies, as good as may be sometimes, kind of bore me to death. That being said as you said coincidence is that most headphones I've really enjoyed lately are all passive and all in the $100-$300 range

.