Is it transient response, flat FR, low distortion, or do we even know what causes one component to produce a more believable image than another?
Flat FR and low distortion contribute but aren't the primary factors in imaging and soundstage (not until playback anyway), since these come from the recording quality. So if some band hired a hack of an engineer or just don't care, no amount of money will get you a system that will correct that.
Also, I second dvw's point that you need to hear these firsthand. Are there any car audio clubs in your area? (They might have a forum) In any case try to get a hold of an EMMA pr IASCA Competition Disc, or any other car audio-specific disc (Focal made one years ago). They test all parameters of an audio system in a car as per competition rules,* including imaging and soundstage.
In very simplistic terms, the latter refers more to the space the sound emanates from, while imaging refers to how accurate the placement of each instrument and vocal/s are. In such discs you'll have tracks like a guy discussing both terms while walking around with a mic on a fixed position, which demonstrates how it is done when recording the music.
Better if you can hear all this using properly recorded discs in a properly-set up car or home audio system. IMO, some confusion here on Head-Fi probably has a lot more to do with younger people going straight to expensive headphone systems before they even listened to how a proper speaker set-up (car
and at home) does it. One of my personal pet peeves is the absolutist application of signal processing as "Garbage In, Garbage Out" where people here would run a cable from an audiophile DAP's (usually more expensive than a proper receiver/processor for a car) line out instead of the right equipment that can deal with the acoustic problems in a car, like unequal pathlength form your ears to each tweeter and midwoofer on either side plus the sub behind, and the soundwaves of each tweeter bouncing off the windshield (and they think both can be absolutely fixed just by EQ and Balance L-R bias). Not to mention the practical ergonomic issues of manipulating a device that is likely sitting on the front seat than on the dash.
*
The point ultimately is that natural tonality, imaging and soundstage needs to be replicated in a car; the fun in it is finding ways to get around the limitations of car audio. Classes are divided based on equipment (and other stuff like sound deadening materials) MSRP, and also how much you do to the car. You can't just rip out an entire car interior and install a lounge in there, because by definition, car audio means you can listen to that system while you're stuck in traffic or driving to the event for example. My tweeter pods would cost me points but on daily use they don't get in my way; more angle inwards gets less reflections but the vocals are screwed up when driving, and only center again if I push my head against the headrest while stopped.