Active noise cancellation works by using a microphone to listen to external noise, and adding the inverse of that signal to the audio.
Ideally, what you really want to know is the noise received at the eardrum, not as measured outside the earpiece. It turns out that the difference doesn't matter so much for lower frequency sounds — they pass through relatively unadulterated (besides attenuation) from the outside, through the headphone case and ear canal to the eardrum. Simply applying an inverse of the noise signal works quite well for lower frequencies, and this technique has been used for decades.
It gets more complicated for higher frequencies such as human voice. Higher frequencies are more affected by surroundings — reflection, absorption, diffraction, passing through different material at different speeds, etc. Furthermore, these effects are non-linear and change with different frequencies. These effects also happen at lower frequencies, just to a smaller degree. The end result is that the external noise recorded at the microphone is not exactly not what is heard at the eardrum. So the challenge is to model the cancellation signal to exactly match the noise that reaches the eardrum, and do it in real time. Any deviation would be inserted into the audio signal as distortion and noise. The technology isn't quite there yet to cancel human voices. To prevent degrading sound quality too much, the noise cancellation effect is fully applied only at lower frequencies.