What I see, and what the claim says is that the mic can capture the frequency spectrum exactly at the ambient level.
Meaning, it doesn't under-capture certain parts, so you don't have to +ve EQ them later to make it neutral.
When it comes to playback, you can find speakers with this kind of flatness. This one is taken from Neumann monitors:
Theoretically, the mic above paired with the speaker here corresponds to a perfectly neutral setup.
If you want this on your own setup, you'll need to know the frequency respones of your speakers, and +ve/-ve EQ the portions that are lower/higher, respectively.
However, it doesn't mean they'll sound good.
What you hear is not the speakers alone, its also the room acoustics, and certain settings work better for regular rooms than acoustically treated studio environments.
Another factor is whether the speakers are near field or not. These ^^ may sound neutral in a near field setup, but may not work as well when used as bookshelf units.
Also, the EQ you do is just telling you about the signal you're passing into the speakers. Whether or not does it translate in to that sound is dependent on the speaker characteristics.
For example, you may want to increase the bass by 10dB, but that doesn't mean the speaker will also increase the bass by 10dB. Some may overdo it, some may under perform, or give distortion.