How To Time Programs

It is quite simple to time the execution of a whole program: simply run the program by invoking the command time on it. That is, to find out how long your code named experiment1 takes on data set trial1, run
time -o outputfile experiment1 < trial1
(See the man pages for the time command for details of format.) The information returned includes wallclock time, system time, and user time; user time is the desired measure -- system time is the time spent paging in the program and data, handling interrupts, etc., while wallclock time is the total elapsed time, which includes system time, user time, and any time spent idle while other programs are running.

There is one problem with this approach, however: it does not allow one to measure certain parts of a program, only the entire execution from start to finish. If you want to measure steady-state behavior after building up the data structure, for instance, you would have to resort to fairly desperate measures, such as timing the building of the data structure alone, timing the entire test, then subtracting, or perhaps running such long tests in steady-state that the buildup part would be negligible. But if you want to measure the cost of rebalancing an AVL tree, as distinct from the cost of searching for a place to insert or delete, then you really cannot do it at all with the time command.

The solution is to include in your code calls to the system routine getrusage; this is a library routine (part of any Unix environment) that returns a struct which contains a time stamp -- the number of hours, minutes, seconds, and microseconds after a certain initial event (it also contains a lot of information about other resource usage, including memory, paging, sockets, etc.; for details check the man page). To measure the user time for a section of program, one issues this call before entering the section, recording the time stamp, then issues this call again on exiting the section, and one then takes the difference between the two timestamps. In this way, you can time pretty much any section of code, but be careful about one thing: the precision of the clock, in spite of its giving you microseconds, is only on the order of one millisecond on most machines (it depends on the interrupt frequency of the system clock), so that timing fairly short sections will give erroneous results. It pays to time a large number of repetitions of the same action, so as to extend the running time into at least hundreds of milliseconds. (Remember that a modern Pentium 750MHz processor could potentially execute three quarters of a million instructions in one millisecond! in almost all cases, cache behavior will decrease this rate to just a few hundred thousand instructions per millisecond, but that's still a lot of searches in a tree!)