My guess for the "echo" is the default oversampling filter used by the Ztella, as well as the MQA filters used on MQA albuns.
Different filters have different amounts of pre-ringing and post-ringing. This can produce "echo", or "reverb" effect.
All the DACs we know (unless the specific NOS DACs, which stands for 'non-oversampling DAC') internally oversamples everything lower than the max rate to its max rate. If we listen to a 192k song on a Ztella, it internally oversamples to 384 before hitting the analog circuit. And the processing power of such a small device is very very low, if we comparte to the processing power of a modern Intel PC for example.
A fair comparison between DACs should be using a good software + good hardware for oversampling. Like a PC with HQPlayer, which is my case. When you send data to the DAC at its max rate, you can skip the internal oversampling, and you choose which filter you like. HQPlayer for example has something like 30 different filters.
And each person prefer a different sound. There are filters with lots of pre-ringing, some pre-ringing and no pre-ringing at all. If I'm not wrong the MQA filters have no pre-ringing at all, which I find out a little artificial, and I personally like some amount of pre-ringing.
OK but we are talking about a mobile device, so it shouldn't require a large computer to make it sounding good. But that's the way it is. Good oversampling comes at a cost: processing power. The default oversampling filters on a portable DAC (and even on UAPP using a limited smartphone processor) will never be as good as on a dedicated and large equipment. But they can be quite good. And they are, indeed. People are loving the current mini DACs.
Each vendor may choose a filter that some like and some dislike. The ESS chip already has some filters embedded that the DAC controller may choose to use. And by the way, that's how a DAC is certified as "MQA renderer". The data stream has a flag that indicates which filter to use. On the first MQA renderer DACs (like Dragonfly Red which I also have), the USB controller chip was responsible for this task. Now it's embedded on the ESS chip.
And just for information: on my PC when I listen to MQA content, I prefer the software first unfold (which in fact doubles the data rate) + my personal filter option on HQPlayer instead of the MQA filter. On mobile, however, the full MQA chain (including MQA filter) is quite pleasuring to hear, given there's no better option for a better oversampling.