[1] Perhaps I'm naive, or grossly misinformed, but I've always thought, for nearly as long as I've been a live, that hi-fidelity reproduction - of music, movies, toilets flushing, anything - requires a flat response.
[2] As for TV sound, in general, I find it to be theee most dynamically compressed, not necessarily loudness-processed, but more compressed than even the most recent pop or rock CD release.
[2a] If anyone here is a racing fan, and specifically of NASCAR, they have a 30-40sec. segment called 'Crank It Up!' during broadcast of the race. During such segment, all network and driver-related graphics exit the screen, and viewrs are encouraged to turn up the sound on their TV sets or home theater systems.
[2b] Now I have attended the races, and what i'm 'cranking up ' sounds NOTHING like it does at the actual track! Even through my big speakers, it sounds. like. mush.
[3] PBS(U.S.) is about as dynamic as it gets on broadcast television. I can crank up things like 'This Old House', and from the kitchen it sounds like an actual renovation is taking place in the living room! Same for their Saturday nigh feature classic movie. For the most part, the original screen aspect ratio and sonic impact of those films is passed along to viewers. I wish the major networks could do the same, especially with sports.
1. Yes, that is somewhat naive. Each have somewhat different requirements for hifi reproduction. For example, movies have what's called the "x-curve" which is applied to the monitoring (B-chain) during mixing and reproduction (although this isn't applicable to home reproduction), TV is essentially flat and the common/usual trend for several decades with music is for a house curve with a raised bass but as there are no mandated specifications/requirements (unlike TV and film) this house curve can essentially be whatever the individual studio wants. In general, a slightly raised bass in the consumer reproduction chain would therefore give a more hi-fidelity reproduction and many/most consumer transducers have this built-in (for example, it's one of the typical differences between a "speaker" and a "monitor"). A flat response will certainly give a more hi-fi reproduction than whatever is the natural response of speakers+room acoustics in a consumer listening environment but ideally, most of the time, for the highest-fidelity music reproduction, you should achieve a flat response and then raise the bass a little. A flat response is therefore somewhat of an audiophile myth.
2. While TV can be quite heavily compressed, it is nowhere near the levels of compression applied to much pop/rock music. Even going back as far as the late 1960's, pop and rock genres often/routinely drove compressors to (and beyond) distortion. In fact with quite a few sub-genres, heavily over-driven compression is a required sonic characteristic of the genre. This is never the case with TV. Also, since 2012, it absolutely IS "necessarily loudness-processed"! This isn't just different, arbitrary loudness specifications/requirements of individual TV channels/networks, in the USA (and some other countries) it's an actual legal requirement. In the case of the USA, it was enshrined in law by the CALM Act (2010).
2a. The workflow with live events is necessarily entirely different to films and other TV content which is not live, such as docos, dramas, etc. With docos for example, the dialogue of the interviews is recorded at the same time as the filming. Once all the footage has been acquired, it is edited and then passed to the audio post production team, who clean up the production dialogue, source, edit and sync all the sound effects, add the music and narration and then mix all of it together to create a 5.1 mix. This mixing process (called "Re-recording" in the TV/film world) involves balancing all the elements, applying noise reduction, EQ, compression, reverb, etc., writing "automation" to ensure balance not only within each scene but obviously between scenes and finally this mix (along with the other required deliverables) is recorded ("printed"). We do of course entirely control the dynamic range of each of the elements and of the completed mix and constantly tweak all of this during mixing according to taste but obviously within the loudness specifications. While the time, cost, exact details and number of personnel involved in audio post varies greatly, this is broadly the workflow of all TV/Film, with the obvious exception of live events and most ENG (news), where there is no audio post process! However, live events still have to comply with loudness specifications. I have no direct experience of doing the sound for NASCAR or other motor racing events but what effectively seems to happen is that an independent mic and/or set of mics is associated with each camera position (appropriately balanced/processed) and then as the director changes camera/angle, the sound switches to the mics associated with that camera. The exact setup and workflow for such events has evolved over decades, therefore considerable experience is required and particularly at events such as motor racing, where there can be very high SPLs and wide dynamic range, then considerable compression typically has to be applied, in order to play it safe and remain within loudness specifications, baring in mind that you obviously can't go back and tweak it a few weeks later in audio post.
2b. When you have attended races, are you "at the actual track" or are you in the audience stands? It's obviously going to sound significantly different if your ears are in a significantly different location to the mics and that's in addition to the compression, NR, crossfading and other processing required in order to: Maintain a similar loudness between different cameras/mic setups, allow the commentary to be intelligible and comply with loudness specifications. Considering all the practical difficulties of creating 5.1 mixes with multiple 5.1 mic setups, such high SPLs and dialogue/commentary that has to sit above it and that it's all compliant with loudness specs, I'm amazed at how good the sound usually is.
3. Your "wish" is significantly different to that of most other consumers. The vast majority of consumers want to watch a sports event live, within a few seconds of it being filmed, they don't want to wait weeks (or in the case of films, many months) until after the event is filmed, to allow for audio post and the creation of high fidelity, wide dynamic range, etc.! Also, rather ironically, you've actually got this completely backwards! Particularly in film, pretty much none of the sound is "real", none of the Sound FX were recorded during filming, they are created and recorded in a completely different environment, with different equipment and different people using it, even much/most of the dialogue was recorded weeks/months after the filming. With a live sports event though, it's the opposite, ALL the sound you hear is the actual sound that's occurring at that event and that instant in time.
Recently, I've seen small sound services mixing on small speakers for TV. I've never worked that way myself. I always mix to full range first, and then check on small speakers to see if anything causes problems. It may be that the documentaries he's talking about are just handing the mix to a DIY guy who isn't working to professional standards. It pays to do things right. I had a project once where the budget had been blown through and they tried to cheap out on post. It bit them in the ass.
At the end of the day, loudness and other specs have to be met, penalties are severe for not doing so and as you say, bites them in the ass. Docos are generally near the bottom of the genres budget spectrum though, so sound quality is generally nearer the minimum required to pass QC. A lot of docos are mixed on relatively small speakers but with bass management (with a sub), so they still have a somewhat "full-range". This can't really be avoided as 5.1 has been a standard requirement for quite a few years. Most commonly, docos are mixed on good systems by very experienced personnel because although it costs a lot more per hour, the number of hours required is a lot less and the end result more reliable. The issue described by
@old tech is known about but isn't easily solvable, it's a fairly uncommon, unpredictable consequence of 5.1, unavoidable professional workflows, loudness specs and consumer playback equipment.
G