@ProtegeManiac: I also pinned the FLAC as poorer quality, but since I'm going to reference some tell-tale signs which can be used to more easily differentiate some of the samples, I'll go ahead and put it in a spoiler box:
Warning: Spoiler! (Click to show)
After listening to all the samples a few times, I could tell some differences but, as noted in my original response, they were very hard to pick out. I decided to try another method which I though would make the differences clearer: using a 10-band EQ. I randomly decided to up the 16k band by 12 dB and lower all of the others by 12 dB. And one of the differences was much clearer: I very easily heard some background noise beginning at 38-39 seconds. This noise lasted from then until the end of the samples and, since it was the most obvious difference to me, I used it to pick out which sample was which. Unfortunately, this is where I went wrong.
As all of the sounds in the sample are digital, there is no point of reference for what is "right" and "wrong". The only thing I could do was pick what I thought sounded best. In the 64 kbps HE-AAC and 128 kbps MP3 samples (2 and 4, respectively), I heard a very pleasant oscillation in the aforementioned background noise. The oscillation was clearer in 4 (128 kbps MP3), so I picked that as the best. The oscillation was less consistent and harder to hear in 2 (64 kbps HE-AAC), so I picked it as second. The other two samples had almost no oscillation, only a constant noise, so I assumed that they were the lowest bitrate samples. Being very similar (i.e. they both sounded like noise), I was only barely able to pick sample 1 (256 kbps MP3) as my third choice; there was just a hint of that ever-so-wonderful oscillation I was after! That left sample 3 (what turned out to be the FLAC) as last on my list; I thought surely the loss of that oscillation was produced by compression and that this must be the worst sample, the encoder leaving nothing but boring noise in the background!
Unfortunately for my list, the oscillation itself was the compression artifact and the noise was what the composer intended.
Based on my experiences, then, I think this test could be improved by having a fifth sample, also FLAC, with which to compare the other samples to. The control sample would be an undisguised FLAC and the other four could be the same types as were already used. This way, our differing perceptions can be neutralized better; it won't matter what I think sounds better or more pleasing, but what I think sounds closest to the control sample. If this was non-electronic music, I think the having a control would still be quite helpful (especially for those who don't have very much experience going to live performances and hearing many instruments played in person) but maybe not imperative.
One part of the test that surprised me, though, was my mixing up the lower-quality MP3 and the HE-AAC samples. As that sound which I liked (which turned out to be the compression artifact) was actually less pronounced on the 64 kbps HE-AAC than the 128 kbps MP3, I wonder if that will translate to less artifacts in other music. It uses, after all, a newer compression scheme and was designed in order to be better than MP3 compression. Comparing them just once in a very specific type of music isn't enough for me to make a broader judgment on them, but I am intrigued and would like to listen more into the differences between the two formats.
So, to conclude, this was an extremely interesting test and I am disappointed more people didn't participate. However, aQiss, to use this test of a very carefully selected piece of simpler electronic music with no clear control and no other standard of reference and say we don't "really need lossless audio formats" is totally unfair. If you wanted to design a test that was very difficult, then yes, you did a top-notch job. But seeing that myself and some others were able to distinguish pretty accurately between most of the samples (even though we disagreed as to which ones sounded subjectively better and thus about which bitrate was which, having no standard of comparison), this hardly makes the point that the samples were indistinguishable.