Many-Bugs progress:

High level take-away:

Using Eric's version of GENPROG at the assembly level seems to be as effective as the original version of GENROG. I tried one or two bugs from each of the programs targeted in the many-bugs paper for which a fix was found. In all cases, Eric's version was able to find a fix. However, for each of the bugs that GENPROG was not able to fix, Eric's version also failed to find a (reliable) fix.

Lessons learned:

Setup Details

For each run, I selected a bug to repair. In the bug tarball there is a file called bugged-program.txt. This identifies the source file that contains the bug. To generate the assembly file, I would use the touch command to update the timestamp of the file and run make. By observing the output of make I found the compilation command used by the Makefile. I used the same command to manually compile the file with the addition of the --save-temps flag so that I could get an assembly file version. The VM fitness script used the test.sh script in the bug directory to determine the fitness score to return to the host OS. For example, if there were 31 positive test cases and 5 negative test cases, the evaluation phase of the script would look something like:

cd $TESTDIR

for i in {1..31}; do
  $TESTDIR/test.sh p$i >/dev/null 2>&1 && FIT=$(($FIT+1))
done 
for i in {1..5}; do
  $TESTDIR/test.sh n$i >/dev/null 2>&1 && FIT=$(($FIT+1))
done

The developer fix can be found in the fixed directory of the bug tarball. I verified that the fitness script would return the target fitness by copying the developer fix to the appropriate location and generating an assembly representation as described above. I verified that the buggy version passed all of the positive test cases and failed all of the negative test cases and that the developer fix version passed both the positive and the negative tests.

Once this script was complete, I made sure that the host side fitness script communicated correctly and returned the correct fitness scores for both the buggy and developer-fix versions of the assembly. The host side script looks something like:

#!/bin/bash

ASM=$1
THREAD=$2

[[ -z $THREAD ]] && THREAD=0;
if [[ $THREAD == "main" ]];then
  THREAD=0
fi

PORT=$((8000 + $THREAD))

echo "GOING TO MACHINE ON PORT # $PORT"

ssh -p $PORT root@localhost "killall fit.sh"

scp -P $PORT $ASM root@localhost:$ASM

ssh -p $PORT root@localhost "/root/fit.sh $ASM"

FIT=$?

ssh -p $PORT root@localhost "rm $ASM"

exit $FIT

Some bugs (e.g. gzip-bug-3eb6091d69a-884ef6d16c6) presented interesting behavior. Bugs like this would sometimes reach target fitness but would fail manual verification. I changed the fitness script in these cases so that the fitness script would only return the target fitness if the candidate repair passed all of the test cases twice in a row.

The failure to find repairs that GENPROG was not able to find in the past let to a closer examination of the bugs which did not have repairs.

For most of these bugs, the reason that no repairs were found is obvious. The developer fix of the bug often consisted of the development of entirely new functionality. As a result, I shifted my focus to the subset of bugs which seemed (based on Zak & Clair's analysis) to be most amenable to repair. Specifically, I focused on the bugs which were identified as not fixable by GENPROG because of the limitation (which Eric's version does not have) that GENPROG could not mutate previous mutations.

Unfortunately, most of these bugs had other issues that made repair unlikely.

The following table lists the subset of bugs under consideration. In the column labeled 'Other Problem' indicates the reason a repair seemed unlikely, with the exception of the ones marked 'Promising'.

BUG Other Problem
libtiff-bug-0860361d-1ba75257 Promising
libtiff-bug-a2f7abf-ce76d31 Missing fn/var
php-bug-307146-307147 Missing fn/var
php-bug-307563-307571 Missing fn/var
php-bug-308020-308035 Missing fn/var
php-bug-308046-308051 Large Diff
php-bug-308734-308761 Promising
php-bug-309111-309159 Promising
php-bug-309453-309456 Missing fn/var
python-bug-69223-69224 Promising
python-bug-69368-69372 Missing fn/var
python-bug-69831-69833 Missing fn/var
python-bug-70019-70023 Missing fn/var
python-bug-70098-70101 Missing fn/var

Each of the ones marked promising were run (or re-run if they had been run early in this adventure) but no repairs were found.

Of the ones marked 'Promising', the developer fix involved a function call or the insertion of a conditional statement. Clearly a repair at the assembly level would require multiple edits in just the right order to make this happen. Since the fitness function is based on the number of test cases that the candidate repair passes, it does not encourage the kind of intermediate edits needed build up a function call or a conditional statement. Perhaps if the fitness function were to be modified in some way that could encourage these kinds of edits a repair could be generated.

Shortly before I left NM, I started re-running some of these at the CIL level since a function call or a conditional statement could be a single line fix. I tested my CIL implementation of the repair algorithm on the GCD bug and it fixes it in a matter of seconds. For reasons that I have not figured out yet, the repair runs crash after several hundred fitness evaluations.