Okay, so there's a lot to unpack here, so I'll start by throwing a few comments out there. I'll probably miss something buried in the details of one of the above posts, but I'm going to try to offer some input.
Most (now-discontinued) high-end commercial densitometers have an accuracy specification of +/- 0.02D or worse, depending on the density of the target. I have no idea what Stouffer uses for measuring their transmission targets, as they only mention their reflection instrument on the website.
A typical bench-top densitometer can only be as good as its calibration references, and there are only so many ways to actually get such references these days. The only sources I'm personally aware of for this are Stouffer, Acurad, and NIST. From my own conversations, I've been able to gather that Stouffer claims their calibrations are NIST traceable, and that Acurad does something proprietary with a collection of internal reference densitometers specific to whichever device they're making a strip for. In my own experiments, Stouffer and Acurad references are clearly not calibrated to the same upstream reference.
If you want a trustworthy "gold standard", I think your only choice is to buy a NIST Photographic Film Step Tablet Transmission Density Standard (38120C). Link
here. Be warned, it costs about $5k. They also don't specify on the website what documents it comes with, or what standard it itself is measured against. It is also my understanding that the NIST reference standard is actually manufactured by Stouffer, but measured by NIST.
The densitometer standard (ISO 5-3:2009, formerly known as ANSI PH 2.18) actually specifies spectral conditions for density measurements. Some densitometers follow them, some don't, and some are vague about it. If you measure materials with different spectral properties, or with different light sources, variations are absolutely possible. The effect of this should be minimal with normal B&W film, but could be quite pronounced with color film (even on a gray target). It may also lead to a big difference between under-the-enlarger measurements and measurements from densitometers that provide their own light source.
The developer of the enLARGE iOS app goes on at great lengths on
his website in trying to explain how the relationship between enlarger height and print density cannot simply be explained by the inverse square law alone. I have no idea if his explanations make any sense, but back when he was bragging about the app on Facebook I did a lot of my own experiments and confirmed that the relationship is more complex than a simple calculation.
Finally, if you want to compare the performance of
two different densitometers, you absolutely have to calibrate them against the
same reference strip. Otherwise, you're not comparing the densitometers, you're just comparing the reference materials you have paired with them.
I should also point out that when you measure a reference strip, you need to measure it in the correct orientation. This orientation will depend on the specific densitometer you are using, but is typically whatever has the emulsion site facing the sensor. On the X-Rite 8xx series, this is typically emulsion-down. On most others, this is typically emulsion up.
Okay, now below a break for some visual separation...
When measuring a transmission step wedge, it is important to always measure from the same spot. One thing Stouffer does not do, is actually tell you which spot on their wedge the calibration values they provide were measured from. If you measure different points along the same patch, especially for the darker patches, you absolutely will see a difference. When I use mine, I always try to aim for measuring from dead center, and only from dead center.
Now, if you want to know where the Printalyzer Densitometer reference strips come from, its all ultimately traceable back to Stouffer. But for the specific process, it works something like this:
First, I calibrate a reference unit to the best of my ability. This begins with something I call "slope calibration", which is a process that uses a calibrated Stouffer T2120C (21-step wedge) to correct for any linearity issues with the sensor. The way the data is processed, minor step-to-step variations shouldn't cause a problem here. The result is basically coefficients for a correction polynomial.
After this, I perform normal calibration on the unit. That involves measuring an open aperture, and then measuring dead-center on patch 4 of a calibrated Stouffer T5100C (5-step wedge).
Second, I take a fresh unlabeled Stouffer T5100 transmission wedge, place it in a holder that covers up where the label would go, and use the reference unit to measure the center of the remaining exposed area of each step. I then log all the data, print a custom label, and apply it.
Phew... I hope that was enough to help add something to the conversation. I'll continue to monitor this thread and may chime in again as necessary.