Monday 30 September 2013

5 minute bash tutorial - how to win the lottery!

Apologies for the slightly misleading title. To expand...

In my office, a number of us are involved in analysis of large datasets, often sequence or text files. Each of us has our own little habits and idiosyncrasies; given the same task, we each find different ways to go about it.

A running joke with one of my friends is that I'll persevere with getting a long, bafflingly piped bash command long after she'll have resorted to scripting something in a higher level language.

(In all fairness we both do both, but I do find something deeply satisfying about chaining everything together in one line, no matter how human-unreadable it may be.)

Last Friday, this topic came up during a conversation about the upcoming Euromillions lottery. As a bit of a joke, I whipped up a quick line of bash to generate compatible random numbers (NB - I actually wrote it wrong in the tweet!).

Today I realised that if I explained the steps in the process, it might help someone new to bash understand how best to get some more functionality of out the command line.

So, here it is (with the error removed):
 for i in {1..50}; do echo $i >> win1; done; shuf win1 | head -5; for i in {1..11}; do echo $i >> win2; done; shuf win2 | head -2; rm win*  

This should produce output that fits the Euromillions format; five numbers between one and fifty, with two 'star' numbers between one and eleven, e.g.:
22
4
2
15
6
7
2

Let's break this down.

 for i in {1..50};  

This sets up a for loop, which will iterate through every integer in the range 1 to 50. For loops in bash take the format:

 for VARIABLE in THING  
 do   
 SOMETHING  
 SOMETHING_ELSE  
 done  

Here, the VARIABLE is the thing that changes each iteration, and THING is the range or list of items. SOMETHING and SOMETHING_ELSE are then commands you want to run within the loop, which have to be sandwiched between 'do' and 'done'. More bash loop examples/explanation here.

However, all those carriage-returns take up far too much time (and make it far too comprehensible) - I tend to substitute those for semi-colons, which allow you to concatenate multiple commands into one-liners.

So, our one-liner first loops from 1 to 50, then what?

 do echo $i >> win1; done;  

Here we have the contents of that first loop, sandwiched between 'do' and 'done' as per requirements, so that bash knows what to do at each turn of that loop.

Echo is the unix equivalent of 'print'; it just outputs something you tell it to the screen (or somewhere else).

In this case, we echo $i, where i denotes the variable we set up in the loop; we just need the dollar sign to let bash know we're asking for the contents of a variable. So, in the first cycle of the loop $i = 1, then in the next $i = 2, all the way up to $i = 50, which is when the loop stops.

Now if we just had 'do echo $i' in our loop, we'd just get the numbers 1 to 50 printed out in the terminal. However, we've used '>>' to redirect the numbers away from the terminal, instead putting them in a new file called 'win1'.

By using '>>' instead of '>', each cycle of the loop appends the new number to the end of the file win1 - if we used just one > each iteration of the loop would overwrite the whole file, leaving win1 at the end containing just the number 50.

So now we have the file win1 containing the numbers 1-50, each on its own line.

 shuf win1 | head -5;  

This is the bit of the code that picks the numbers.

Shuf simply shuffles all of the lines in a file, into a random(ish) order. It's an incredible useful command, which I kicked myself when I found out about it, for having taken so long. In this instance, it mixes up all the lines in win1, so they will no longer be in numerical order.

Here's where we first encounter pipes - that long character between the shuf and the next command is said to 'pipe' (or carry) the results of the first command into the second. 

We take the randomly shuffled win1 file, and pipe that into head (one of the commands you'll probably end up using most!) to take off the top five numbers, giving you your first five lottery numbers.

The next bit just repeats the same methodology for the lucky star balls (two numbers between one and eleven), outputting them after the previous five:

 for i in {1..11}; do echo $i >> win2; done; shuf win2 | head -2;

Finally, a bit of tidying up:

 rm win*  

Rm removes files - here I've used the '*' wildcard character to make it remove all files which begin 'win', thus removing the win1 and win2 files created earlier. This is also vital if you're going to run the script more than once, as otherwise you might end up drawing the same number twice (as re-running the loop would add yet more numbers to the files).

Note that rm can be dangerous - by default, it won't double check if you're sure you want to delete something, and there's basically no going back, so be careful! For instance, by not explicitly stating the files I wanted to remove, this line would also remove any other files which fit the pattern - e.g. if I had a file called 'winning_lottery_numbers.txt' in the same directory, it would have been deleted too!

This can be overcome by being explicit in the files you're removing, like so:

 rm win1 win2   

However if you have more than a few files, you're probably going to want to use the wildcards sensibly.

There you have it - a silly bit of code, but hopefully a useful one to a bash novice!

In case you're wondering, I did a small bit of rigourous, carefully controlled science with my script - I bought a ticket using numbers generated with either my code, or with the lottery's own 'lucky dip'. My code won nothing, but I got £3 from the lucky dip! Pretty conclusive we can all agree.

No comments:

Post a Comment