Thursday, May 5, 2005

Unapologetic geekiness: Picking a random line from a file

My first definitively geeky post!

I just learned a cool method for selecting a random line from a file, without knowing ahead of time the number of lines in the file. Well, the method is sort of cool. The fact that it actually works is really cool. Here's the Perl code, courtesy of the Perl Cookbook:
srand;
rand($.) < 1 && ($line = $_) while <>;
# $line is the random line

OK, so why is this cool? Well, as you pass each line, it's selected with probability 1/N, where N is not the number of lines in the file, but the number of lines you've read so far. So the first line is definitely selected, the next line is selected half the time, the third line is selected a third of the time, and so on. After you run out of lines, you print out the last line you selected. And it's equally likely to have been any of the lines in the file. It's not horribly difficult to work out the math for this (and if you're interested, the proof is also in the Cookbook), but it's just counterintuitive enough that it made me say "Wow."

No comments:

Post a Comment