This is my understanding as well, and here's an experiment to try. Rest your hands lightly on your ears, then listen to the sound in the room while moving your orientation. The impression of listenening in a real 3D space is still there. Now take your hands away. Although you can now hear that environment with much higher fidelity, I don't think that the 3D environment is any more “realistic”.
Once the A16 does its per-ear dynamic filtering (assuming it's set up correctly) the choice of headphone doesn't really alter the effect that Smyth are trying to achieve. You don't get “more 3D” with Focal Utopias; instead, you get more fidelity. But as we know, the fidelity improvement between a good set of heaphones costing $100 and a set costing $4000 is not massive ... it may be all the difference to an individual listener but a good pair of cheap phones (if we were to quantify it) might well be presenting over 90% of the same information. (Perhaps even more ... I'm guesstimating here!)
There will be scope to get an improvement using the A16 and better phones but no one (at least on this forum) is going to hear the A16 and lose the effect because their headphones aren't up to it. If anything is a limiting factor it will be bass extension but, as we know, bass frequencies are the least used for positional data.