Hmmmmm........ That is almost like asking how long a person's legs need to be.
Outside of doing some test polaroids for each shot, I'd say the best way to go would be to go with the longest exposure you can. The idea being that you'd would be going well past the limit where people show up. Kind of like stopping all the way down to ensure an object is in focus.
That said, all the images I have seen that manage this trick are VERY long exposures. Like 10-40 minutes.
It also seems to me (and take this with a grain of salt as I have never done this), that you need the right type of crowd for this to work.
The principal at work here is pretty much the same as dodging in the darkroom. The dodging wand moves around enough that you don't see it on the print.
This won't work though if the wand stays put or is used for too long. At some point it will leave a noticable shadow.
This would seem to apply to crowds.
If one person stands still long enough, they will show up. Likewise, if there are too many people, you won't be able to get the proper exposure for the area they are in. And by this I mean a New York City sidewalk at Christmas type of crowd.
This is of course all relative to your exposure time. If you had an exposure time of say 10 hours the image would able to tolerate more "standers" and a larger crowd.
What you need to do is figure out the limits here and sort of predict the behavior of people. Not easy to do. Which is why I would suggest using the longer exposures when possible. I seem to recall seeing an image of the interior of York Minster taken with a LF camera. It is a gorgeous B/W image of I think the nave. Anyway, the photographer used something like a 40 minute exposure time. Very long. And people did walk through the picture. But the timing was long enough to "erase" them. Now I have been to York Minster several times. People tend to "meander" there. Sort of a slow stroll through the place. Similar to what they do when in a museum. So I'm guessing that 40 minutes will get rid of most passersby. Unless they are VERY determined to stay put.
Of ocurse 40 minutes is a VERY long time and you would need some serious ND filtering for most situations. I am not sure if any ND filters were used in the image I described, although windows were visible in it, so I'd imagine that some filter must have been done. Even if the image was taken in darkest winter in Yorkshire (which is VERY dark). I would expect that in most situations you'd need to do some sort of filtering.
One thing you might want to try is pacing out the area to be shot ahead of time with the crowd. In other words figure out what will be in the image and walk with the crowd to approximate the average amount of time the average person is "in" the frame. You could do this by following someone in the viewfinder, but that might be difficult. Walk the scene, then double your time. Figure that is the longest average time someone will be in the frame. Then work from there. This might be helpful if you need to figure out the best time and don't have an ND 25X on hand.