I am struggling to understand why multiple film stocks. Shouldn't one be sufficient to calibrate the scanner provided the XYZ / Lab values of the target patches are known? Does it even need to be a photographic process?
I see where you're getting stuck, and yes, at face value, you'd expect it to work the way you say. Calibrate the scanner with whatever transparency material you've got that has some color patches with known densities on it, and you should be good. At least - in the ideal world. But of course, the real world works differently. In reality, a magenta patch on two different pieces of slide film that appear more or less the same color to the human eye may in fact have subtly different transmission densities. Moreover, the scanner's sensor sites (typically CCD) have their own spectral sensitivity curves. Both of these things interact with each other, and this can result in (usually) small deviations. This is what you calibrate for.
For instance, here are the spectral dye density curves for Ektachrome 100 (magenta curve) and Velvia 50 (cyan curve) overlayed:
Notice how they follow a similar pattern (indeed, it looks like Kodak and Fuji really are using pretty much the same dyes), but there are subtle differences. Note for instance the shift in the magenta peak and a subtly different gradient in the left-side of the cyan curve.
If you were to put the sensitivity curve of the scanner's sensor on top of such a plot, you will also see that certain wavelengths will be emphasized while other are suppressed, and that there's a truckload of channel crosstalk going on. Part of this is filtered out by the scanner's software/firmware on the basis of calibration procedures that are done during product development (which would boil down to pretty much what you were saying). But part of it will ring through in the final scans because there's no way to perfectly control for the variety of combinations of transparency film and sensor behavior using a single calibration setting. Hence, to squeeze the last bit of accuracy from the system, you can profile the scanner against an IT8 target that's made on the film you're going to scan lots of.
If you don't use this IT8 profiling, you're thrown back to the factory/product-inherent calibration of transparency targets, which for most purposes will in practice be good enough.