If the backlight spectral power distribution multiplied by the masking spectral transmittance and sensor spectral sensitivity per channel achieved a trichromatic response which neutralises the orange mask (produces the same code values per channel) without the need for any post white balancing, and then the overall exposure were adjusted such that D-min transmittance sits at a code value of (2^bit_depth-1) / 2 per channel. That would allow for the cleanest capturing from D-min to D-max.
Yes, although gains are pretty small. Even with 12bit sensors. This is a link to a post where I compare "straight" and "neutralized" scan.
Thank you for clarifying this. I wonder if all digital cameras use this approach. I wish the manufacturers tell us more about what our cameras are actually doing with our images.
If you think about it, it makes no sense to do it otherwise. The "width" and "depth" of the pixel wells are predetermined, you can only capture a certain amount of photons. Even if you could adjust individual R, G, B pixels' voltage gain (different "ISO" for R, G, B) it would make absolutely no difference in 99.99+% pictures taken with digital cameras. Especially with modern sensors which are basically ISO invariant in a range that far surpasses the needs of a scanner.