An audio player like foobar2000 reads a portion of the file, decodes it, applies DSP effects (if any) and puts the result in a buffer.
On a fast computer, even a large buffer is filled within a few milliseconds or less. However, if your computer is slow or currently busy (e.g. copying files, decoding a HD movie and so on) filling the buffer might take longer.
The samples in the buffer is what is being played by the soundcard/interface. Now if the buffer is very small and your computer doesn't fill it fast enough you'll run into audio glitches such as stuttering, drop-outs, clicks. Additionally, too short buffers will cause certain visualizations to stop working and also limits the DirectSound fade in/out durations.
Simple example:
Assume you have a full buffer of 1000ms. Once 25ms have been played, there are still 975ms of audio in the buffer before there will be a glitch. That should be enough time for even slower or very busy computers to refill the buffer.
With a much smaller buffer, lets say 50ms, your computer has to refill the buffer in under 25ms or a glitch will occur. Additionally, the smaller the buffer, the higher the overhead: in the example above, the player could refill the buffer with a single big chunk (e.g. 500ms) once it is half-empty, but with the small buffer it needs to refill the buffer maybe 20 times (20*25ms = 500ms) or even more often with very small chunks.