This whole thread really is getting silly - so much so that it's starting to be impossible for it to accomplish anything.
This thread is NOT the latest journal of some science association. It is merely a thread for audiophiles to discuss a specific issue based on science rather than pure opinion, or pseudo-science, or "subjectivism". And, to be honest, I see part of the purpose of this thread to be to encourage people to perform their own tests, following reasonable scientific protocols, and eliminating as much obvious error as possible. So, yes, claims based solely on opinions should be discouraged, and it's perfectly reasonable to criticize specific procedures and methodologies, but, even though your favorite standards organization may have decided that an ABX test must have at least 16 runs to satisfy THEIR idea of being a "good" ABX test, the fact remains that any double-blind test is still better than a sighted test, and even four or five runs is still more information than none.
Yes, of course, the more times you perform a test, the more you can eliminate the effects of randomness, and so the more statistically significant your results will be. However, statistical significance itself is a continuum. Tossing a coin and getting three heads in a row is statistically different than you would expect from random chance, but it isn't MUCH different; ten heads in a row is a lot further from what you would expect by random chance; and, when you apply things like standard deviation, you will find that you can quantify how much less likely ten heads in a row is than three heads in a row... but there isn't some line where "anything above x is significant". In fact, if you actually research the subject, you will find that the amount of deviation that's required to consider something significant itself varies depending on the test. (If you flip a coin 1000 times, a result that's more than 5% from the norm may be very unusual; however, if you count the traffic through a certain intersection between 4 PM and 6 PM, you may find that a 20% variation from day to day is quite normal. And, if a hundred people each flip that coin five times, and 75 of them come up with four or more heads, then, in the aggregate, that is indeed a significant result.)
My point is that, just because your favorite standards organization has decided that an ABX test requires at least sixteen runs to satisfy the criteria they've chosen for that particular type of test does NOT mean that an ABX test with only five runs "isn't valid" or "isn't worth anything"... It simply means that the results of a single test with five runs are much less conclusive, or less meaningful, than the results with 16 runs. This is especially significant since we're talking about a group discussion here, and so we may get meta-data from multiple tests. So, for example, if twenty people each do an ABX test with five runs, and fifteen of them get a correct answer 4 times out of 5 or better, then, collectively, that data may actually be quite significant... because, while the odds of a single person getting 4 out of 5 may not be especially significant, the odds of fifteen people out of twenty each getting 4 out of 5 may be much higher - and so that result may be much more significant.
Therefore, to put it bluntly, I think the goal of rational discussion is much better served if we DON'T discourage everyone from performing a test just because their methodology, while reasonable, doesn't live up to some arbitrary standard... I'd much prefer to see a lot of people performing "pretty good" tests for themselves, than see them discouraged from doing so... as long as we all understand the limitations and significance of all the results..... (And, to put it even more bluntly, the discussion itself will help people with no experience to learn how to determine for themselves the difference between a really good test, and a pretty good test, and one that really is total junk - or pseudo-science.)
To me, the only appropriate criticism of doing an ABX test with only five trials would be:
"Five trials is a relatively low number, which means that, even though you've got the right idea, and your results are suggestive, you'd need a larger number of runs to produce a compelling result, and to more completely rule out the possibility that your results were due to random variations in results produced by random chance."