There are many things at play when we talk about upgrades. Due to law of diminishing returns, the difference between IEMs reduces as we go up top - not in a bass vs bass kind of comparison, but in the overall sense, one would consider many IEMs as equals. Selecting one among them would be a matter of personal preference.
For instance, Xears XE200 is rated as 7.9 by Joker, M3 @ 7.7 and GR07 @ 9.1 (likewise with clieOS ratings). Between XE200 and M3, they belong to the same tier with XE200 being 'slightly better'. But, I feel that in general, an IEM rated 8.5 and around is already significantly better than something in the 7.x tier. Anything with 9.x belongs to the top-tier of IEMs. I would say that once you get used to IEMs with 8.x and 9.x ratings, it would feel slightly difficult to go back to the lower tier ones. You may appreciate it after listening on a stand-alone basis, but you would always want to go back to the better ones. I have only used Joker's SQ scores as a point of reference (+ something that more or less aligns with my thoughts), but I'd say our own subjective SQ scores matter the most. We may not be assigning numerical weights to what we hear, but I am sure almost every one here has something of that sort.
Coming to the question of "substantial", that again depends on one's own value system. For me, FX700 is 'much better' than FXT90, even though the difference may not be as huge as between FX700 and CC51. It all comes down to how you value that last mile of performance (subjectively). To break it down further, individual preferences really weigh in. To some, sound stage may be important, engaging mids could carry higher weight in some other's books, timbre could be very important for others. How you rank an IEM within the same tier and what you consider "equal", "better", "much better" or "substantial" comes down to how much weight you assign to each of the attributes.
So to answer your question, to my ears, FXT90 is a 'significant upgrade' to XE200, CC51 and M3.
Judging by clieOS' rating of SE215 at 4.5 and Tandem at 4.5, I would hazard a guess that FXT90 would not score significantly more, there's only so much more left out of 5.0
.