Headphones don't produce soundstage, large, small or miniature. As I've said a dozen or more times, they don't present primary distance cues, which are essential for soundstage. it isn't soundstage if it goes through the center of your head.
If signal processing becomes more common, inexpensive and sophisticated maybe. But we aren't there yet.
As an electric engineer I think about this differently. I think about this as a chain of transfer functions. If
something (such as a 7 dB notch filter around 400 Hz or spatial cues) is being introduced earlier in the chain, removing blocks later doesn't remove thise things. Recordings don't contain "accurate" spatial information because of the way they are produced, but they contain a "montage" of different kind of spatial cues mixed to work well with speakers in a room. This can work, because spatial hearing can be fooled in certain ways. That's a huge requirement for stereo sound to make sense, to have a soundstage with only two sound sources. This has it's limitations and that's why we have multichannel systems to mitigate these limitations, but a lot of people are completely used to live with the limitations of stereo sound, often not even realizing such limitations exist.
Speakers in a room create spatial cues of distance. Our ears don't measure these distances with a measuring stick, instead it decodes the spatical cues. The creation of these cues is an acoustic phenomenon in 3D space, but they get encoded into 1D information of air pressure changes for our eardrums. Theoretically this can be simulated with sophisticated enough signal processing. The key point here is physical distance is not an absolute requirement, IF we can simulate the resulting spatial cues otherwise accurately enough.
Headphones in our head create also spatial cues of distance, but in this case the distace is extremely small! An inch perhaps. The overall ILD for example is so huge it's a strong spatial cue for sound sources right at our ears. Now if we "break" the headphones into two halves and start moving the parts further from our ears the sound get of course quieter, but also the spatical cues change and the sound doesn't sound so near anymore. We can image moving the drivers to were speakers would be, 10 feet away or so. If the sound wasn't almost inaubibly quiet, the spatical cues would be now similar to speakers. Now, if we think about transfer functions, this "moving sound sources from your head to where speakers would be" can be theoretically done
earlier in the chain of transfer functions. It could be in the recording itself! In fact binaural recordings are more or less like this. Since binaural recording make little sense with speakers (spatial cues get "doubled"), most recordings are not like this. Most recordings assume spatial cues of distance gets added in the playback.
Recordings are mostly produced to work well with speakers. So the "montage" of different kind of spatial cues in the recording gets convoluted with the spatial cues of speakers in a room at the distance of 10 feet or so. With headphones the "montage" of different kind of spatial cues in the recording gets convoluted with the spatial cues of sound sources one inch away. Clearly this is a problem. In my opinion this highlights the problems of the "montage" like spatiality of the recording while speakers in a room hide/softens them. What if we mitigate this problem a little bit? If we lower the ILD for example, we weaken the spatial cues of very near sound and modify them to be closer to spatial cues (ILD-wise) of more distant sounds and we assume our spatical hearing gets fooled more or less by this "buchering" of spatial cues? Since in transfer function chain I can do this reduction of ILD before the sound goes to my headphones, I can use a crossfeeder to do that. It turned out this works for me! The result is a miniature soundstage. Not a headstage or speaker soundstage, but something in between.