Next: Sensitivity to parameters Up: Data Structures for Previous: The design space

Experimental comparison of sequence data structures

In order to compare the performance of these data structures I implemented each of them and a simulator program that would simulate typical editing behavior. The simulator has a number of parameters that I will discuss in presenting the results. I ran these tests on a several machines (Micro VAX III/VAX, SUN3/M68020, SparcStation 2/SPARC 2, DEC 5000/MIPS 3000), under two compilers (cc and gcc) and with maximum optimization. The results for the different architectures and compilers were all similar. Most of the results in this section were obtained using gcc and gprof on a SUN 3/60.

The measurements were made with the following parameters (except where one of these parameters is being experimentally varied):

Sequence length of 8000 characters
Block size of 1024 characters
Fixed buffer methods keep buffers at least half full (from 512 to 1024 characters)
The location of 98% of the edits is normally distributed around the location of the previous edit with a standard deviation of 25
The location of 2% of the edits is uniformly distributed over the entire sequence
After each edit, 25 characters on each side of the edit location are accessed
Every 250 edits the entire file is scanned sequentially with ItemAts

The sequence data structures measured (and their abbreviated names) are:

Null -- the null method that does nothing. This is for comparison since it measures the overhead of the procedure calls.
Arr -- The array method.
List -- The list method.
Gap -- The gap method.
FsbA - The fixed size buffer method with the array method used to maintain the sequence inside each buffer.
FsbAOpt - The fixed size buffer method with the array method used to maintain the sequence inside each buffer and with ItemAt optimized.
FsbG - The fixed size buffer method with the gap method used to maintain the sequence inside each buffer.
Piece - The piece table method.
PieceOpt - The piece table method with ItemAt optimized

I will present graphs for Insert and ItemAt operations. The Delete operation takes about the same time as the Insert operation for all these sequence data structures.

Figure 11 shows how the speed of the ItemAt operation is affected by the size of the sequence.

Figure 11: ItemAt times as the length of the sequence varies

It basically has no effect except for the interesting result that ItemAt operation for the PieceOpt method gets faster for larger arrays. The reason for this is that for longer sequences the caching used in the optimization becomes more effective. Each ItemAt is faster although (since the sequences are longer) ItemAts is called many more times.

The Arr method is the fastest and is nearly as fast as the Null method. The Gap method is only a little slower. The FsbA method is much slower and is about the same as the List method, the FsbG method and the Piece method. The optimized FsbA method is nearly as fast as the Gap method and the PieceOpt method gets close. Even so, the PieceOpt ItemAt is half the speed of the Arr ItemAt. Since ItemAt is such a frequent operation it is necessary to optimize (with caching) all the methods except for the Arr and the Gap method.

Figure 12 shows how the speed of the Insert operation is affected by the size of the sequence.

Figure 12: Insert times as the size of the sequence varies

It has no effect except for shorter sequences. The List method is the fastest and the FsbG, Gap and Piece methods are all about half its speed.

The Arr and FsbA methods are not shown on this graph because they are so much slower that they would distort the graph (as the next two graphs show). Figure 13 includes the FsbA method which is an order of magnitude slower than the other methods (for the Insert operation).

Figure 13: Insert times as the size of the sequence varies

Figure 14 includes the Arr method which is two orders of magnitude slower than the other methods.

Figure 14: Insert times as the size of the sequence varies

Note that it gets slower linearly with the size of the sequence, as one would expect.

Next: Sensitivity to parameters Up: Data Structures for Previous: The design space

Charlie Crowley
Thu Jun 27 15:36:10 MDT 1996