I've listened to Dolby Atmos tracks in Apple Music with my speaker setup--most of the mixing can be wonky as they place instruments behind you. The Dolby app on the computer has different Dolby Atmos and Dolby Atmos for Headphones samples. When listening to a movie in Atmos through my headphones (with sound set as Dolby) it's still flat compared to tracks I've heard that are specifically Atmos for Headphones. As I understand, there is some subtle differences in positional metadata for physical speakers vs translation to binaural decoding.
Mixing in Dolby Atmos for Headphones
They're different processes.
Atmos for speakers decides what each speaker should receive. The job is to express one virtual speaker with one or more real speakers in the same area, it's a sort of advanced panning job in real time based on a specific speaker setup(that should be placed in the room following Dolby's not always super clear standard).
Atmos for headphone doesn't do that. Each virtual sound source(virtual speaker) gets its own pair of impulses to ....convolve? The signal. Those impulses are picked from the closest matching direction from some HRTF reference(ideally your own). Then it's just a matter of adding all the left channel signals together with a mixer, and same for right channels. At least that's the dream idea. I guess it's possible that the only available impulses are those for the directions of standard 5.1/7.1/9.1etc setups. If they decide to approach the simulation like that(would be a pity), then the speaker atmos work would be entirely applied for headphone atmos and then just add the HRTF work based on a simple 7.1 for example, no matter how many virtual sound sources the master had. I'm not sure which implementation is in use, but the all HRTF convolution thingy doesn't exist for speaker playback.
IMO, what makes most demo more impressive is how they tend to put sound sources all over and move them a lot(something that gets annoying real fast in a movie or a an audio album). That way it's easier to Discriminate directions. And specifically for center channel perception, if a virtual speaker just "passes" by the center in front of us, it won't matter to our brain if that particular location doesn't work well on its own because it lacks interaural variations as cues. The brain will make the trajectory work just fine in most cases.
Having a convincing fixed center image at some distance on headphones is another story entirely, and some people have been found to never get that experience on headphones, even with custom cues that agree with their own HRTF. It's a small percentage of the population, but they clearly exist.
In my case, it's head tracking that anchors the center image at a distance. Otherwise, be it normal headphone playback or any sort of fancy processing, I end up with the center channel inside my head of sometimes(based on FR) on top of my forehead. If I lie down in the dark, after a while I can construct something with some distance(with just about any type of audio). But if my eyes are at work or if I move my head ever so slightly, I really need head tracking.