This paper has two purposes:
A review of the literature shows that there are only a few different data structures for text sequences that have been used in text editors. A careful examination of the design space showed that there really are not that many fundamental types of sequence data structures.
Sequence data structures are divided into two categories. Basic sequence data structures (array, gap and linked list) and recursive sequence data structures (line spans, fixed size buffers and piece tables).
The array method is the obvious one: keep the text in an array of characters. The gap method is similar but it keeps a gap in the middle of the array at the text editor insertion point. The linked list method keeps the characters on a linked list.
The recursive methods keeps the text in a number of separate ``spans'' and keeps track of a sequence of pointers to these spans. The line span method uses a span for each line. The fixed size buffer method keeps a sequence of fixed size buffers each of which contains one span. The piece table method uses spans of any size either in the original file or in a file for added characters.
In examining these data structures a general model of sequence data structures was formulated and used this model and the examples were used to discover several new variations for sequence data structures that might improve performance in some situations.
A series of experiments was performed on these data structures to determine their relative performance and they were compared on a variety of criteria including time. The main conclusion is that the piece table structure is the best data structure for text sequences although some of the other methods might be useful in certain cases.
The piece table method has a number of advantages and is an especially good method for text with additional structure. Thus it would be the best choice for a word processor or a editor with hypertext facilities.