This
https://www.eurekalert.org/news-releases/479769 explains how the outer ear has the main role in determining elevation from a sound source. The principle being that when a sound comes from above, it bounces/resonates (some frequencies at least) on the lower part of the ear before getting back up into the ear canal. And same thing with sound below you being bounced by the higher part of outer ear. Those different parts having different shapes, they "favor" specific frequencies.
All that to say, by default in an ideal scenario, elevation from a sound source is mainly about frequency response. So grab an EQ, fool around and all is going to be better... Except we're dealing with the human brain, and it's never simple because there is more to sound and even non audio variables impact perception of sound.
The paper referenced in the link suggests that people adapt to whatever they're getting, not necessarily well or entirely, or even correctly, but we adapt somehow. Which is great(things could get better just with more time), and horrible news, as trying to find what you need becomes conditional on how little time you've spent doing it "wrong"(IEMs included).
If you'd like to go mad trying to understand in detail, I find that paper particularly hard to digest but also quite interesting:
https://www.nature.com/articles/s41598-018-37537-z
the basic idea was that FR changing with sounds coming from different elevations, it would be cool to find what does what to our perceptions. So they start with notions that are fairly typical about HRTF and elevation. Stuff we can see in several research papers,
This is frequency response with each horizontal layer representing how the sound is altered by the ear, head, torso, whatever(HRTF) when sent from a specific elevation. Red means louder and blue, quieter. It's tempting to correlate several things here, like maybe that having quieter 6-7kHz in the headphone might help feel like the sound comes from a lower elevation. And that might work. The paper suggests that, as I said above, things are more complicated

. But it still doesn't hurt to try using an EQ and checking if by chance and a little similarity with other humans, EQing certain areas typical for lower elevation might help you, specifically, to lower the perceived position of sound.
The trouble, beside all they suggest in the paper, is that IDK your own HRTF and that you also have to factor in the FR of your headphone while on your head (which is also probably a little or very different from the same model measured on a dummy head).
Then there is head movement, vision, age... In my case, head movements are a major variable. My interpretation/educated guess is that as I move, my brain expects the sound to remain anchored to the room, instead of turning with my head when I have a headphone on. My brain probably goes, "wait a minute! If the sound source turns with the head and I don't see those sound sources with my eyes, then the sound must come from something on or inside my head". After that, it's only a matter of logic. I don't see the singer or speakers, so if it has any amount of perceived distance, I probably assume it must be a sound source above me, as below me there is already my body and I see nothing in front of me.
That could be another reason for your experience. But I strongly suspect that FR is also a factor for you. Maybe the main one.
Already long story short, I do get some small improvements with EQ, but sadly what I like and what helps me get a singer more in front of me aren't the same frequency responses. Also, as soon as I move my head a little or open my eyes, it's all bad and collapsed again with the singer in my head or at best on my forehead.
It could be interesting for you to try working a few things out with eyes closed and no head movements (from 30s to a few minutes to start relying only on sound and usually improve the spatial experience). Or if you have actual speakers, trying to have them in front of you when listening to the headphone and find out if that helps you "place" some sounds onto them (It helps me a lot and yet another paper suggests it influences a majority of people).
You got no luck, this is an extremely complicated subject with many possible answers. So I'd understand if you decided to ignore all this and go purchase plenty of suggested headphones with good "soundstage"(whatever the poster means by that). I do not think this is the answer, but we can never discount luck. Another headphone might have a FR that's more like what your brain expects. And it's true that all things being equal, higher fidelity transducers can have a positive impact on sound localization. What they won't do is magically remove a different issue you have. Which is why I suggest looking for it first.
Because of that annoying/amazing brain plasticity, you might be able to train yourself in some ways, too. In my case, watching video clips and movies on the computer with my headphones, has also done a little job of convincing me that many sounds come from the monitor. It's far from perfect or even correct, and I still break everything anytime I move my head. It also fails the moment I use the headphone with a DAP. Damn you brain! Why u so smart for the wrong stuff?
For elevation in the center (phantom center), and again that's only my anecdote and I do not claim for other humans to share the same experience, playing with an EQ for days, maybe even weeks, was the biggest help. Then closing my eyes and not moving my head, but that's not very pragmatic. And then having stuff in my field of view that my brain associated with sound source (speakers, TV, radio). Over time, I tried just about all crossfeeds, and 3D whatever DSPs with honestly very little impact for your specific problem of sound above you. Stuff with head tracking helped well for me, that's why I made the guess that my brain is less picky about elevation once more of the other cues are more believable (which is not good for headphone listening in general if true, as most spatial cues are BS on headphones).
To leave you in total despair, it is known that for a small percentage of the population, the frontal sounds will just never feel like they're in front of them at a distance. Even with perfectly replicated HRTF. Be it head movement, or not seeing the sound source, or just the knowledge that a headphone is the sound source, their brains apparently just reject that interpretation of distance in front of them as a possibility for those people.
Oh, and age and hearing loss are also known to affect perception of vertical cues more than horizontal ones.
edit for sprellling erhorz.