Since you're asking me directly, I will try to explain (and how there's some similarities with camera ADCs and audio ADCs, even if terms are different). Mind you this was in relation to camera systems and not these new 32bit processing audio recorders from 20bit ADC. So cameras record in a RAW format: which is a direct sensor dump, a jpeg preview, and metadata of exposure settings (and it's optimized to have best file space to have maximum quality: now there's also some "compressed" RAW formats that save space with reduced resolution). 16bit is the maximum because that's the max for the ADC. Each brand of camera has a different sensor that has a different value for noise and maximum saturation (one might record a channel that has the "black" point noise floor closer to 0, while another might be a little more...and they all have different saturation points: or a "white" point going to 65536). When I first started with digital photography, the best sensors could reach 12stops of DR (so 12bit RAW was the most optimal). Now, many cameras have sensors that are capable of 16 stops of light, and can reach *about* 16 bit at optimal situation and settings. Dynamic range in sound is dB, but in photography, we think stops of light (which is log with exposure settings and also the same for bit depth). So best theoretical setting reaches 16bit dynamic range (or 65,536 shades of tone with each color channel). That's the theoretical, like recording audio, it's quite different with your situation. To get that full dynamic range, you have to be in a bright enough environment to be at 100 ISO (easily daylight). You also might still be in an environment where you'll have areas of the scene that are blown out with that range (great example would be taking a photo indoors with a window: the window would be blown out because you're exposing for the indoors). We also have to consider exposure times: sometimes to reach a fuller exposure at a lower ISO means you can't be hand holding and you have to be on a tripod. But for most situations, you're increasing ISO to be able to expose at faster speeds in lower light (digital cameras have better sensitivity than film, and are much better at low light). Increasing ISO also logarithmically reduces dynamic range.
With digital imaging, we can extend the limits of one exposure in editing. So with my experience in 3D animation, we merge multiple exposures into 32bit (or 4.29billion shades of tone that can simulate all light levels for modeling light onto rendered models). Photographers will also merge multiple exposures if they're photographing a scene that has a dynamic range greater than their camera settings for one exposure (and can then adjust those curves with a photo that doesn't have any blown out clipping). There are also some fancy processing for video cameras, but cinematographers are experienced enough to be exposing below the 16bit RAW that many cinema cameras are capable of now (film wasn't as good, and they're used to the workflow of lens filters and aperture from previous workflows).