Sorry, I've missed your post.
It's a fact that proper volume matching(within 0.1dB) can be challenging for us random consumers.
With DAPs, there are problems. for example, most DAPs have something like half a dB or even 1dB increments for volume setting, so matching within 0.1dB with that isn't convenient. While finding some value that matches on both DAPs is almost always possible, it also often lands outside of your preferred listening level. I've been very annoyed by that myself and I think at least some listening tests were so far off from how I usually listen to music that the test lost its purpose for me.

On occasion, you'll get small deviations between measuring the voltage unloaded and measuring it with an IEM plugged. I don't think the variation justifies to worry about it in general, but ideally you'd want the DAP loaded(with the IEM in the circuit).
The real challenge with DAPs is that you have different amp sections with different impedance outputs. Depending on the IEM, that will lead to audible change in frequency response. And if the FR is audibly different, which frequency should be matched in amplitude? who knows?
Ideally I'd say you should use a relatively high impedance headphone or a very flat impedance IEM(same impedance at all frequencies within the audible range) to do your test. That way you can rely on almost anything from a multimeter and a test tone, some RTA on a computer if you have a cable with 2 male jacks, or even any app to detect sound levels on your phone(might not be accurate down to 0.1dB!), and of course the IEM/headphone and the "mic" cannot move at all throughout the measurements. not even from the driver shaking from the test signal!
But obviously if you wish to know about sound differences with a specific IEM, then that's what you should use. It's just that you'll have to make some not so objective choices when it comes to volume matching if the signature changes between DAPs because of impedance(something that's also tricky to check by yourself with a multimeter because most multimeter aren't all that reliable at high frequencies(most of the cheap ones are made for electrical household stuff at 50 or 60Hz).
All in all it can be simple or it can be really hard to test DAPs by ear. If you use a switch, there is the horror of trying to time align the signal between DAPs. Try to use a relatively long track so that if you get lucky once, you have some time to do the listening.
Personally I started trying to match stuff by ear with a 2kHz tone, which could get me within less than 0.5dB but not within 0.1dB without serious luck. Bad method! Barely better than nothing.
Then I used a switch, LOD cables(short male-male interconnect cables) and some short cables I made myself with croco plugs. That allowed me to have wiring in the "open" to put my voltmeter or ADC on them while the IEM was in the circuit. I think that's the best solution but then someone very worried about the "night and day" impact of cables would scream after looking at all the extra stuff I had in the path leading to the IEM. ^_^
There is a clear gap between us constantly asking people who make claims about sound differences, if the listening was volume matched(or blind), and how hard it can be to properly volume match gears. There is no denying that. We have no hope or real desire of filling that gap by making everybody an expert researcher(that would be such a buzzkill, even I would change hobby). What we mostly wish is for people to simply make fewer of the claims they cannot back up with evidence(empty claims). The key idea being that "I know what I heard" settles nothing and proves nothing. "Dude trust me" also doesn't help anybody.
Us asking if the listening experience was properly controlled is to point out that the previous post probably didn't deserve to be presented as a known fact.
But if you're genuinely curious, I strongly encourage you to try all sort of controlled testing. There is a lot to learn about experimenting, about sound, and really about ourselves. If only most tests weren't such a PITA...