There's almost no way you could make this a statistically meaningful study that couldn't be fundamentally challenged. I don't think a captive audience makes it more valid -- it just slightly alters the research question.
Think about what it would take just for some statistical validity. First, you would need a control. That means the same performer has to play mediocre music sometimes and virtuosic music other times. And in order to make sure that the "control" and "experimental" audiences are the same, he couldn't do things like alternate hours and days, because different groups commute at different times. He would need either long stretches (weeks) of performing one way and then the other, or he would need complete randomization over a long period.
Then what is your outcome measure? A prospective study needs to have a primary outcome measure that is predetermined (and is therefore the result of a specific study design). In this case, donations would be the most easily quantifiable one.
But is that a valid outcome measure? And are you really studying the general population when you look at donations? The only commonality shared by people on the subway is their chosen mode of transportation. But maybe people who like his kind of music are wealthier, or more generous, or more likely to take pity, or more likely to have change in their pocket. Maybe it's the opposite. But at any rate, by looking at donations, all you're doing is testing a subset of the general population that likes his kind of music. And because you can't presume that the remainder of the population would ever donate to that kind of musician, you wouldn't be able to draw conclusions about the general population.