I'm aware that many speaker manufacturers give this recommendation – because it's the only way to achieve a passably extended high-frequency response.
When it comes to reflections, it's not just the near-field reflections that count, but all reflections caused by all surfaces of the listening room. Near-field reflections have the most detrimental effect, as they interfer with the direct sound in a way that makes them indistinguishable from it. Later reflections are part of the ambiance as unavoidable – and partly wanted – contribution from the listening room, since you wouldn't want to enjoy your music in an anechoic chamber; it would sound unnatural and virtually inedible. This in contrast to listening to music on an open-air concert, even without reflection from a stage. I know what I'm talking about, since my own open-air gigs as a bass player were the highlights in my career as an active musician. Now what's the difference between an anechoic chamber and the free field? It's the residual reflections and resonances at low frequencies, which are extremely hard or actually impossible to completely eliminate. So why is the sonic experience in an anechoic chamber so unenjoyable, given that those low-frequency reflections are still magnitudes below those in a typical listening room? The cause is the extreme discrepancy in the frequency distrubution among the reflected sound.
Exactly the same results from the uneven dispersion pattern of typical speakers. At the upper end of each driver's frequency range it becomes narrower, especially noticeable with the tweeter – the main responsible for the toe-in recommendation. If you're familiar with electroacoustics, you may know that the theoretical ideal is an even dispersion of sound energy across the whole frequency spectrum – in all directions. Conventional speakers are far from this goal. In this context toe-in is necessary because of exactly this weakness – a drop-off of highest frequencies off axis – but toeing-in aggravates the discrepancy between reality and ideal, since those highest frequencies now are quasi projected towards the listener like a spotlight, while the off-axis sound in this frequency range is massively subdued, which leads to a speaker characteristic with «wet» low- and mid-frequencies and a continuously drier high-frequency reproduction (roughly spoken; in fact multiway speakers additionally suffer from [vertical] interferences between the drivers).
I'm absolutely aware that the resulting sound is kind of adapted as a «normal» speaker sound, and I agree (just like the majority of music listeners has adapted to the CD sound). But it's not optimal, and I'm trying to offer my experience as a former speaker builder who knows what's actually possible. Not toeing in your speakers is just a simple measure to avoid the worst-case scenario, ideally combined with active equalizing.
Logically near-field speakers suffer less from this problem, but in my experience it's nevertheless beneficial to avoid heavily uneven dispersion characteristics – and particularly large, reflective speaker baffles: the source of extremely harmful near-field reflections. Of course that doesn't just apply to the latter category.