-
Notifications
You must be signed in to change notification settings - Fork 0
CodeRunner Tutorial
This tutorial is designed to take you from knowing nothing about CodeRunner to being able to use all of the standard CodeRunner commands. It takes about 20 minutes. Once you have completed this tutorial, you will have seen a large part of what CodeRunner can do, admittedly in a very simple test case. Before starting, you need to install CodeRunner: see [Installing CodeRunner](Installing CodeRunner "wikilink").
For quick reference, you can look up commands and options in the CodeRunner manual:
$ coderunner man | less
However, the manual is a very terse summary. This tutorial should be much easier to follow.
All of these examples actually work, as they actually form part of the CodeRunner test suite. To start with, go to and empty folder.
First up, we need to compile the example program that we are going to use for this tutorial. This is a ridiculously trivial program written in C++. All it does is calculate the volume of a cube (strictly speaking a cuboid), given the three sides. It will also do a couple of other random things, like sleep for a given period.
$ coderunner gencc # Generates the cubecalc source code file\
$ g++ cubecalc.cc -o cubecalc # Replace g++ by whatever your C++ compiler is.
So we now have our trivial program. Now we need to set up the root folder. The root folder is a very important concept in CodeRunner. All runs inside the root folder should be from one simulation code, and all have a unique id. In general, you can't copy runs from one root folder into another root folder. We're going to call this folder 'simulations', but it can be called anything.
$ mkdir simulations
$ cd simulations
Now we need to set up the root folder. We do this by issuing the status command, and at the same time telling CodeRunner what code we are using in this folder, where the executable for this code is, and (in this case) which modlet to use. The code should correspond to a code module to be found in the folder code_modules in the CodeRunner source directory. If you look inside code_modules, you will see a folder called 'cubecalc', containing a file called 'cubecalc.rb', which is the module that allows CodeRunner to run and analyse our example code. Modlets are a feature of CodeRunner which allow a given code module to be further customised in different circumstances. They are are more advanced topic, which we will address later. So for now, we use the modlet called 'empty', which does nothing.
$ coderunner st -C cubecalc -m empty -X ../cubecalc
Now have look at what is inside the folder.
$ ls -a
Have a look at the file .code_runner_script_defaults.rb
$ more .code_runner_script_defaults.rb
We can see that all the properties we've just specified have been stored. There will be no need to specify the C, m and X flags again unless we want to change any of these properties.
We are now ready to submit a run. First of all test submission, using the T flag.
$ coderunner sub -T
Look inside the root folder.
$ ls
There is now a file called cubecalc_defaults.rb. This is a file which stores some default input parameters for our code. In the case of cubecalc, we have four input parameters, height, width and depth and calculate_sides, a flag which will be explained later.
Also look at the folder v
$ ls v\
$ ls v/id_1
CodeRunner has created a new folder for our first simulation, labelled by the ID. In this folder there is an input file for the simulation, called edges.txt.
$ more v/id_1/edges.txt
In it, we have the three values for height, width and depth which appeared in the defaults file.
Now let's actually submit a simulation.
$ coderunner submit -p '{width: 3.0, height: 8.0}' -n 1 -W 30
Here, we have overridden two of the default values, width and height, using the p flag. We are also specified 1 processor using the -n flag. In this case, we had no choice about the number of processors because cubecalc is not parallelised. We also specified a maximum wall clock time of 30 minutes, which is completely irrelevant because cubecalc takes a fraction of a second to run.
Let us see what happened:
$ coderunner st
1:-1 v_width_3.0_height_8.0_id_1 Complete 24.000000 []
CodeRunner will print out one status line for every job that has been run. The exact form of the status line is determined by the code module (look in cubecalc.rb if you are interested).
What this tells us is:
- The id of the job is 1.
- CodeRunner was unable to determine the pid or job number of the job because it ran for such a short time, so it gave it the value -1.
- The run name of the job is
v_width_3.0_height_8.0_id_1. This is a name that CodeRunner automatically constructs to make it easy to identify different simulations. - The status of the simulation is 'Complete'.
- The result of the simulation, in this case the volume of the cube, is 24, i.e. 1.0 x 3.0 x 8.0.
Let's submit a few more runs.
$ coderunner sub -p '{width: 3.0, height: 6.0}'\
$ coderunner sub -p '{width: 3.0, height: 9.0}'\
$ coderunner sub -p '{width: 12.0, height: 9.0}'\
$ coderunner sub -p '{width: 5.0, height: 6.0}'
Look the results
$ coderunner st
Imagine we had not just five simulations, but 50 or 100. If CodeRunner had to look at them all every time you ran it it would get very slow. To stop this happening, CodeRunner keeps a cache of results. You can tell it to use that cache instead of looking at the simulation data again using the U flag.
$ coderunner st -U
If you use the u flag instead of the U flag, CodeRunner will still use the cache but it will recheck any simulations that are not complete. In this case, all of our simulations are complete, so we just use the U flag.
You can order the way the run status information is printed out:
$ coderunner st -O 'width;height'
This will order the runs by width and then by height.
$ coderunner st -O '-volume'
This will order the run by volume in descending order.
If you only want the information from certain runs, you can use the j and f flags.
$ coderunner st -j 4,3
means only look at the runs with ids of 3 and 4.
$ coderunner st -f 'width==5.0 and height==6.0'
Means only look at the runs with width equal to 5, and height equal to 6.
It is possible to customise the local defaults file, and this will be reflected in the input parameters of all successive runs that are submitted. Open up the defaults file in an editor:
$ emacs cubecalc_defaults.rb
And change the value of calculate_sides from 0 to 1 and save the file. Now submit two more runs:
$ coderunner sub -p '{width: 34, depth: 3.4}'\
$ coderunner sub -p '{width: 19, depth: 3.4}'
Now look at the result:
$ coderunner st
Now we have changed that flag, our useful little program cubecalc has not only calculated the volume of the cube, but it has also calculated the area of the three distinct sides of the cube.
Suppose we submitted a run which we no longer want.
$ coderunner sub -p '{width: 1.3e6, depth: 3.4}'
Having that huge volume is skewing the rest of our results. So let's delete the run.
Either:
$ coderunner del -f 'volume > 1e6'
Or:
$ coderunner del -j 8
will do the trick.
When using CodeRunner to plot graphs, always bear in mind the fundamental distinction between a general graphkit, which combines results from several runs, and is specified with the G flag, and a run graphkit, which is a graph defined in the code module, and represents data from a single run, and is specified with the g flag.
$ coderunner plot -O volume -G "width*height*depth:volume" -U
means plot a graph of volume versus width*height*depth (which we sincerely hope is a straight line), ordering the results by volume. The graph will be displayed in an X11 window using gnuplot.
$ coderunner wg graph1.ps -U -G 'width:height:depth:volume;;volume < 100;height'
means plot a 4D graph where the axes are width, height and depth and volume is represented as a colour along the line but write the graph as a postscript file called graph1.ps. Note that this command is the same as
$ coderunner wg graph1.ps -U -G 'width:height:depth:volume' -f 'volume < 100' -O height
In the first command we included the filter and the sort in the definition of the graphkit. This is useful when you want to plot to graphs with different filters and sorts.
$ coderunner plot -U -G "width*height*depth:volume;;;volume" -G "2.0*width*height*depth:volume;;volume<50;height"
Here we have plotted two graphs on the same plot with different ordering and filtering conditions.
Let us examine the structure of the string used to define the graph, which is known as graphkit shorthand. It is divided into four sections separated by semicolons:
'`<definition>` ; `<options>` ; `<filter>` ; `<sort>`'
For a general graphkit, the definition is a number of axes separated by colons.
`<definition>` = `<axis1>` : `<axis2>` : `<axis3>` : <axis4>
The definition of the axis can be any mathematical expression which makes sense:
2.0 * Math.sqrt(height) + width / Math::PI
Is a valid thing to plot, although probably not very useful! Try to plot this quantity yourself using the plot command...
you don't have to plot just physical results: suppose you want to plot the volume versus id
$ coderunner plot -G 'id:volume;;;volume'
is perfectly valid.
For those who know bit about Ruby, the expression defining an axis is evaluated in the context of each run object, using instance_eval.
The string for run graphkits is the same except is replaced by , where the name refers to a named graph defined in the code module, not in CodeRunner.
$ coderunner plot -g 'sides;;[6,7].include? id'
This achieves the not very useful feat of plotting the areas of the different sides of the cube for the those two runs. The graph 'sides' is defined in cubecalc.rb (have a look).
If the standard CodeRunner commands available are not enough to do everything that you need, you can either write a script, which is beyond the scope of this tutorial, or you can pass CodeRunner little fragments of code to evaluate dynamically. You can do this in three different ways.
This is probably the most useful. This causes a piece of code to be evaluated by every filtered run.
$ coderunner rc 'puts %[Hello I am run #{run_name}, my id is #{id} and half the volume I calculated was #{volume/2.0}\n]; puts' -U -f "id > 2"
In the case of our simple code, its use is not immediately apparent, but where complicated analysis has to be done, it can be very useful to run selected bits of analysis.
This evaluates the code you pass it in the context of the run class, the special class that is defined in the code module to analyse the results.
$ coderunner cc 'p rcp.variables, rcp.results, rcp.run_info' -U
rcp stands for run class property.
This command gains power when the code module defines its own custom commands.
This evaluates the code you pass it in the context of the runner.
$ coderunner ev ' p "submit options are", SUBMIT_OPTIONS, "my root folder is", @root_folder'
If you are doing your simulations on a remote system, CodeRunner will need to be installed on that system as well. You then have two choices about how to use CodeRunner. You can log into that system using ssh and then run CodeRunner exactly the same way as you would on a local system.
However, there is another much more exciting way of doing this. You can run CodeRunner on your local machine which will then use ssh to connect to the remote machine and tunnel data back to you. The way to do this is very simple, using the Y flag, which tells CodeRunner to run in a folder different from the current one.
You can in fact use the Y flag on your local system as well
$ cd ..\
$ coderunner st -Y simulations\
$ cd simulations
To use it for remote systems, all you need to do is put a username and host in front of the folder, exactly as you would for a program like scp. In this case, we will use localhost, i.e. your own computer as an example. This won't work unless you have enabled ssh access to your computer.
$ coderunner st -Y username@localhost:path/to/simulations
where username is your username and path/to/simulations is the path of the folder we have been working in. (You can replace it by $PWD in this instance).
When is it useful? Sometimes you may find, for example when plotting graphs, that CodeRunner can run slowly, because obviously it takes time to send the X11 data back down the ssh tunnel, and using this method will speed up. It is most useful when writing scripts which access data from many systems, but this is beyond the scope of this tutorial.
For the rest of this tutorial we will need a simulated batch launcher, so that jobs can be submitted as if they were on a supercomputer. Run these two commands:
$ export CODE_RUNNER_LAUNCHER=tutorial # For csh use setenv\
$ coderunner start_launcher 2 10 &
When you want to shut down the launcher at the end of the tutorial, bring it into the foreground:
$ fg
and then press Ctrl-C to terminate it. You also need to delete the environment variable:
$ unset CODE_RUNNER_LAUNCHER
Up until this point, it has not mattered that our little example program cubecalc completes instantly it has started. However, on a real system of course simulations will first be in the queue, and then be running for a long time. In the next sections of the tutorial the methods we will talk about require this behaviour. So we are going to get cubecalc to simulate this behaviour by looping for a given number of seconds before exiting. Now this behaviour is already written into the C++ source code for cubecalc, but it is not written in to cubecalc.rb, the CodeRunner module. So we are now going to customise cubecalc.rb by using a modlet. cubecalc is of course a toy problem, and in the real world the developers of CodeRunner have not yet found a situation so complicated that modlets are required, but we will demonstrate how to use them anyway!
Let's have a look at the current input variables that the CodeRunner module knows about:
$ coderunner cc 'p rcp.variables' -U
Now let's tell it to use a modlet called sleep.
$ coderunner st -m sleep
Now run the previous command again:
$ coderunner cc 'p rcp.variables' -U
CodeRunner now knows about a new variable, called sleep_time, because we are using a modlet which has changed the properties of the run class. (Have a look at code_modules/cubecalc/my_modlets/sleep.rb)
A defaults file can be specified using the -D flag. At the beginning we did not specify one, so CodeRunner copied the one called 'cubecalc_defaults.rb' from the code module folder, which is the default defaults file. The trouble with this defaults file is that it does not provide a default value for sleep_time, the new variable in the last section. But if you look in code_modules/cubecalc/defaults_files/there is a different defaults file called 'sleep_defaults.rb'. So let's tell CodeRunner to use that one.
$ coderunner sub -D sleep -T\
$ ls
Now there is a copy of sleep_defaults in the local folder as well as cubecalc_defaults.rb.
Now let's actually submit a run.
$ coderunner sub\
$ coderunner st
The first thing to notice is that CodeRunner correctly detected the pid of the job (or the job number on an HPC system) because the job did not exit instantaneously. Now suppose we have a job that is running and we want to continually check its status. This is done using the -l flag. First we submit a job that will sleep for 15 seconds before exiting:
$ coderunner sub -p '{sleep_time: 15}'
And then we check on the status:
$ coderunner sl -u
After 15 seconds the status of the job changes from incomplete to complete. To stop the live update, press Ctrl+C.
$ coderunner sub -p '{sleep_time: 1e7}'
This job will last for a year! Type
$ coderunner st -u
to find out its id (which will be 11), then
$ coderunner can 11
As you have no doubt realised, it takes a non-negligible amount of time to load up CodeRunner. The best way to use CodeRunner is actually in interactive mode. Start interactive mode using the command:
$ coderunner im -u
The -u flag here sets the option for the rest of the interactive
session (you don't need to specify it for each command). Try a command:
>> st
It works just as before. All the commands you have used in this tutorial can be used in interactive mode. There is an easy way to convert between commandline commands and interactive mode commands. For example:
$ coderunner wg graph1.ps -U -G 'width:height:depth:volume' -f 'volume < 100' -O height
becomes:
>> wg 'graph1.ps', G: 'width:height:depth:volume', f: 'volume < 100', O: 'height'
You can see that commas have been inserted between all the arguments and
options, that the preceding - has been replaced by a trailing colon,
and that all the arguments and options (not the command) have been
encased in quotes.
Try running all the commands again in interactive mode. Make sure the
batch launcher is still running. Switches like the u flag require
Boolean arguments:
>> st u: false
Suppose you want to submit a large number of runs. You can automate this using a shell script, or you could automate it using a Ruby script, but CodeRunner provides two easy ways of doing this which have certain advantages, not least that they will run in parallel.