Improving the AbiWord's Piece Table【转】
One of the most critical parts of any word processor is the backend used to store its text. It should be fast to lookup, fast to insert and to erase new text at a random location, undo friendly, etc.
The AbiWord backend has all these virtues and some more. It's (IMHO) the most impressive piece of code of the whole AbiWord project, and one that has been exceptionally stable over the years. In short, Jeff's code rocksTM.
However, improvement is still possible. I will show a modified piece table that changes the current O(n)complexity of current insertion and lookup operations by O(log(n)) operations.
Nota Bene: In this discussion, “n” is the number of pieces, not the number of characters.
Current Piece Table
If you already know how the piece table works, you can skip this section.
TODO: Write this section
In the meantime, you can read several good descriptions in the article Data Structures for Text Sequences (by Charles Crowley) and in this Piece Table Description.
The piece table that AbiWord uses is like the one explained in these articles, except that it has a little cache (the last served piece and the next one on the piece table are cached), and that after a change on the piece table, when you do a lookup, a vector is created to mirror the doubly linked list of pieces (obviously, a O(n)operation).
This vector increases the speed at which pieces are served (as long as it remains valid) and looked up (the lookup operation becomes O(log(n)) once the vector is up to date).
Unfortunatelly, the vector comes with a price. It slows down the first lookup after an insert/erase operation, it takes more memory, and it complicates the code that uses the pf_Fragments class, as it has to signal when the frags becomes dirty (the AbiWord “fragments” are in this document “pieces”).
Red-Black trees
You should be able to find plenty of explanations about how red-black trees work in the net.
The complexity guarantees of red-black trees are O(log(n)) for insertion, erase, lookup, next and previous in the worst case. The next and previous operations have an average complexity of O(1) (in average, you should just follow two pointers to reach the next or previous node).
TODO: Give some pointers to red-black trees descriptions.
Suggested modifications
The modification that I suggest is to change the doubly linked list with a auto-balanced tree. We need to stablish a key and a comparation operation to make the change possible.
As we want to make lookups (i.e., to pass from a document's position to a piece) in O(log(n)), it seems natural to choose as key something related to the document's position range that is covered by each piece.
If we choose as key the beginning position and the size of the piece, we'll have trees like the next one:
It's obvious that lookup is done now in O(log(n)), but if we do an insertion in the middle of the document, we will have to update the “beginning position” of the half upper pieces of the documents (half the tree). As we need to walk from a node to the next one, and we need to visit in the worst case O(n) nodes, the insertion operation will be O(n).
Nota Bene: It may seem that it should be O(n log(n)), because the worst case of a “go to the next node” operation is O(log(n)), but we will prove latter that the average cost of this operation is just O(1) (TODO: write down the prove!).
To solve this problem, we will “distribute” the offset information among several nodes, so it will be harder to recover the offset of a piece (O(log(n)) instead of O(1)), but it will be faster to “fix” the offsets of all the nodes in the tree (O(log(n)) instead of O(n)). We will put in each node only the size of its left subtree, the size of its right subtree, and its own size.
With this change, the lookup operation remain O(log(n)), and the insertion operation becomes alsoO(log(n)), as we don't have to update the whole tree anymore, but just all the parents of the modified piece (and any leaf has O(log(n)) parents).
With this strategy, the new tree will look like this one:
Now, if we insert/erase a node, let's say that this node becomes a left son, we should just “fix” the size_left of its parent, and then repeat the fixation process with our parent. This fixation should be done before the eventual rebalance of the tree starts. And, of course, the sizes should also be updated after each rotation in the rebalancing of the tree.
total = ;
while (node != root)
{
total = node→size_left + node→size + node→size_right;
if (node→parent→left == node)
node→parent→size_left = total;
else
node→parent→size_right = total;
}
As the rebalance of the tree is a O(log(n)) operation for a red-black tree (the variant of autobalanced tree that I've used here), and the “fix size” operation is also a O(log(n)) operation, the whole cost of the insertion/erase of a new node is O(log(n)).
To calculate the offset of a node, we should start with the size_left field of this node, and add the size_left + size of all the ancestors for whom this node is in the right subtree. For example, to calculate the offset of the node that has a size of 12, we start with its size_left (0), and we jump to its parent (size 1). As we are the left son, we don't take in account the contribution of the parent. We then jump to the grandparent (size 8), and this time, we're in the right subtree of the grandparent, so we add the size_left and the size of the grandparent to the previous offset (0 + 9 + 8 = 17). We jump to the parent of the grandparent (the root of the tree), and as we're in the left subtree, we don't take in account the contribution of the root. We're done, the offset of the node is 17.
offset = node→size_left;
while (node != root)
{
if (node→parent→right == node)
offset += node→parent→size_left + node→parent→size; node = node→parent;
}
The lookup operation is trivially in O(log(n)), due to the invariants of the red-black tree. (The lookup operation is a linear function of the height of the tree, and the height of the tree is always less than 2 log(n)in a RB tree.)
Never assume, measure!
I've performed two performance tests. In the first one, I throw 1,000,000 characters to the piece table, each one of them at a random position. The piece table will finish with roughly 1,000,000 pieces. That's the equivalent of a dense document with 30,000 pages (and with a good deal of format changes).
The mean time for the insertion operation goes from ~2 μs (that's 2 * 10-6 seconds) when the piece table is empty to ~10 μs when it has 1 million pieces (on a 750Mhz computer with 256MB of memory). The experimental data are the blue squares, and the theoretical curve is the black line. I guess that the two dots that are visibly out of the theoretical curve are just due to a process switch between when I start measuring and when I end the measure. (One of the other ten processes that were running in my computer should have got several cicles while I was measuring.)
So far, so good. The delete operation, however, hides more surprises than its peer, the insert operation. To interpret the next figure, we should divide it in two parts. The first one is the inferior branch that starts between 2 and 3 μs and 0 pieces, and ends with 250,000 pieces and between 7 and 8 μs. The second branch (the upper one) goes from 250,000 pieces to 0 pieces.
When the delete operation is performed in a piece table with a big piece, it will split the piece in two. When it is performed in a piece table with plenty of pieces that contain only one character, then it will delete a piece.
The delete operation starts making more and more pieces, until it reaches a stability point in which the number of destructions equals the number of creation (in our figure, when the piece table has 250,000 pieces), after this point the number of destructions becomes the dominant factor, and we end coming back to 0 pieces.
Now, why does the delete operation show this histeresis ? My guess is that the tree is extremely dispersed in the computer's memory in the second branch. The tree had 250,000 nodes, who were compacted in several MB. When we start deleting them randomly, the mean distance (in the computer's memory) between two nodes increases, and this distance induce more and more page misses (and that becomes the dominant factor). But I'm just guessing.
We're not yet lost, as we can reduce the number of page misses. To reduce them I will focus on:
- Reduce the memory size. The “color” of the node can be optimized to the point of not adding a single bit to the size of the node structure. The size_right field can also be suppresed entirely without any bad consecuence (all the operations keep the same speed, maybe even a bit faster) [DONE]. The node structure can be allocated using a memory pool. That way the bookeping memory that the compiler uses to handle the structure can be optimized away.
- Increment the spacial locality of the nodes. Using the memory pool (again), we can put all the nodes together, and thus put them in the same page (or in a reduced set of pages) all the information need to walk through the tree.
Conclusion
I've shown than it's possible to have a backend in which all the operations have a worst case of O(log(n)), and usual cases (forward and backward movement) can still be resolved on an average time of O(1).
Is it a priority for the AbiWord project to switch to this kind of piece table? IMHO, no. It's not even near a priority. As I said in the introduction, the current piece table is already a high quality implementation, and it has got several useful improvements performance-wise over the time.
The current bottleneck in AbiWord right now is in the layout part (TODO: give some figures. Assertions without facts suck.). Neverless, the same kind of structure that I propose here is automatically usable to also solve theO(n) operations that AbiWord has in the layout code.
Anyway, let's say that I wanted to solve this problem, not because I considered it very important, but because it was the second time that I tried to solve it, and I knew that it was possible :-)
That said, when the performance problems of the layout code will be fixed, the piece table will eventually show its head in profilers, and I hope that these modifications will help at that time
I've done a reference implementation of a piece table as the one that I describe here. You can download the code PieceTable2.zip. In the zip you will find two different backends for the piece table, a red-black tree and a double linked list. It also contains an almost complet regression test. Update: This version contains a piece table without the size_right member.
The code has been tested with MSVC 6 and gcc 2.95.3.
TODO: Remove exceptions (AbiWord doesn't like C++ exceptions), fix the (2) functions that have sub-basic exception guarantees, complete the regression test (mostly done), fully comment the code, and profile it (done).