abiword Related Pages

时间:2022-05-03 00:15:23

The 'af' directory contains all source code for the cross-platform application framework. This directory contains the following subdirectories:

Directory ev

Source code for the event mechanism for the cross-platform application framework. This code contains the machinery to do key bindings, mouse bindings, menu bars, and tool bars. This code is currently used by the word processor, but we expect to also use it for the spreadsheet application.

Directory gr

Source code for graphics (drawing code, font code, etc.)

Directory util

Source code for general-purpose utility functions.

Directory xap

Source code for application-neutral portion of the cross-platform framework defined in ev.

Squiggles are used to underline miss-spelled words. Instead of simply erasing all squiggles and rechecking all words in a block when the block is changed, the squiggles are handled much like the words they underline.

The word currently being edited is the pending word. When the cursor does not touch the pending word anymore (due to being moved away or the user typing a word separator), the word is spell-checked. If it is miss-spelled, it will be squiggled.

When text is added to the block, fl_Squiggles::textInserted is called with information of where in the block the text was added, and how much. It will then remove any squiggle located at that offset, move all following squiggles (so they end up aligned with the words they should underline) and spell-checks words in the added text (via fl_BlockLayout::_recalcPendingWord).

When text is deleted from the block, fl_Squiggles::textDeleted is called with information of where in the block text was deleted, and how much. It removes squiggles intersecting with that area, moves all following squiggles and makes pending the word at the deletion point since two words may have been joined, or a word lost part of its letters.

When a block is split in two, fl_Squiggles::split is called with information of where the block was split, and a pointer to the new block. The squiggles from the old block are split between it and the new block. The word at the end of the old block (which may have been broken), is spell-checked, and the first word of the new block is made the pending word.

When two blocks are merged into one, fl_Squiggles::join is called with information of the offset where squiggles from the second block should be joined onto the first block. The word at the merge point is made the pending word.

There's one known buglet: typing "gref' " correctly squiggles the word when typing ' since it's a word separator. However, deleting the s in "gref's" leaves gref unsquiggled because the word gref was not pending when the ' changed from a word character to a word delimiter. (hard to explain - just try it)

FL_DocLayout is a formatted representation of a specific PD_Document, formatted for a specific GR_Graphics context.

A FL_DocLayout encapsulates two related hierarchies of objects.

The logical (or content) hierarchy corresponds to the logical structure of the document.

Where each fl_BlockLayout corresponds to a logical element in the PD_Document (i.e., usually a paragraph of text).

The physical (or layout) hierarchy, by contrast, encapsulates the subdivision of physical space into objects of successively finer granularity.

Where each fp_Run contains some fragment of content from the original document, usually text.

Other subjects

Todo:
Add more class names / links to sources.

1. PieceTable

1.1. Introduction

pt_PieceTable is the data structure used to represent the document. It presents an interface to access the document content as a sequence of (Unicode) characters. It includes an interface to access document structure and formatting information. It provides efficient editing operations, complete undo, and crash recovery.

1.2. Class Overview

The PieceTable consists of the following classs:

  1. InitialBuffer -- This is a read-only character array consisting of the entire character content of the document and initially read from the disk. (All XML tags and other non-content items are omitted from this buffer.)

  2. ChangeBuffer -- This is an append-only character array consisting of all character content inserted into the document during the editing session.

  3. InitialAttrPropTable -- This is a read-only table of Attribute/Property structures extracted from the original document.

  4. ChangeAttrPropTable -- This is an append-only table of Attribute/Property structures that are created during the editing session.

  5. Piece -- This class represents a piece of the sequence of the document; that is, a contiguous sub-sequence having the same properties. Such as a span of text or an object (such as an in-line image). It contains a links to the previous and next Pieces in the document. Pieces are created in response to editing and formatting commands.

    1. TextPiece -- This subclass represents a span of contiguous text in one of the buffers. All text within the span has the same (CSS) properties. A TextPiece is not necessarily the longest contiguous span; it is possible to have adjacent (both in order and in buffer position) TextPieces with the same properties. A TextPiece contains a buffer offset and length for the location an size of the text and a flag to indicate which buffer. A TextPiece contains (or contains a link to) the text formatting information. Note that the buffer offset only gives the location of the content of the span in one of the buffers, it does not specify the absolute position of the span in the document.

    2. ObjectPiece -- This subclass represents an in-line object or image. It has no references to the buffers, but does provide a place-holder in the sequence.

    3. StructurePiece -- This subclass represents a section or paragraph. It has no references to the buffers, but does provide (CSS) style information and a place-holder in the sequence. There are no links between StructurePieces or between other Pieces and their (containing) StructurePieces.
  6. PieceList -- This is doubly-linked list of Pieces. The are linked in document order. A forward traversal of this list will reveal the entire content of the document; in doing so, it may wildly jump around both of the buffers, but that is not an issue.

  7. PX_ChangeRecord -- Each editing and formatting change is represented as a ChangeRecord. A ChangeRecord represents an atomic change that was made to one or more pieces. This includes offset/length changes to a TextPiece and changes to the PieceList.

  8. ChangeVector -- This is a vector of ChangeRecords. This is used like a stack. ChangeRecords are appended to the vector (pushed onto the stack) as they are created in response to editing and formatting commands. The undo operation takes the last ChangeRecord in the vector and un-does its effect. A redo operation re-applies the ChangeRecord. The ChangeVector holds the complete information to undo all editing back to the initial document. The index of the current position in the ChangeVector is maintained. ChangeRecords are not removed from the vector until the redo is invalidated. When a ChangeRecord is removed from the vector, it is deleted.

1.3. Operations

  1. Insert(position,bAfter,c) -- To insert one or more characters c into the document (either before or after) the absolute document position position, we do the following:

    1. Append the character(s) to the ChangeBuffer.
    2. Find the TextPiece that spans the document position.
      • If the document position is in the middle of a TextPiece (p1), we split it into two TextPieces (p1ap1c) and create a third TextPiece (p1b). p1a and p1c contain the left and right portions referenced in p1p1b spans the newly-inserted character(s). The PieceList is updated so that the sequence p1a,p1b,p1c replace p1 in the list.
      • If the document position is at the end of a TextPiece and the buffer position in either buffer is contiguous with the buffer and position referenced in the TextPiece and the formatting is the same, we may avoid the three part split and simply update the offset/length in the TextPiece. This case is very likely when the user is composing text or is undoing a delete.
      • If the document position is between Pieces, a new TextPiece is created and inserted into the PieceList.
    3. Create a ChangeRecord and append it to the ChangeVector. For an insert, we construct a ChangeRecord of type InsertSpan.
      • cr.span.m_documentOffset contains the document position of the insertion.
      • cr.span.m_span marks the buffer position of the text that was inserted.
      • cr.span.m_bAfter remembers whether the insertion was before or after the document position.
  2. Delete(position,bAfter,length) -- To delete one or more characters from the document (either before or after) the absolute document position position, we do the following:

    1. Find the TextPiece that spans the document position.
      • If the length of characters is contained within the TextPiece (p1), we split it into two TextPieces (p1a and pl1b). The offsets and lengths are set in the new TextPieces such that the deleted sequence is not in either piece. (The deleted text is not actually deleted from the buffer; there are just no references to it from the PieceList.)
      • If the document position is at the beginning or end of a TextPiece, we can just adjust the offset/length, rather than doing the split.
      • If the deletion extends over multiple Pieces, we iterate over each piece in the range and perform a delete on the sub-sequence. This will result in a multi-step ChangeRecord.
      • TODO what about non-TextPieces??
    2. Create a ChangeRecord and append it to the ChangeVector. For a delete, we construct a ChangeRecord of type DeleteSpan.
      • cr.span.m_documentOffset contains the document position of the deletion.
      • cr.span.m_span marks the buffer position of the text that was deleted.
      • cr.span.m_bAfter remembers whether the insertion was before or after the document position.
  3. InsertFormatting()
  4. ChangeFormatting()

  5. Undo -- This can be implemented using the information in the ChangeVector. If the CurrentPosition in the ChangeVector is greater than zero, we have undo information. The information in the ChangeRecord prior to the CurrentPosition is used to undo the editing operation. After an undo the CurrentPosition is decremented.

    • If the ChangeRecord is of type InsertSpan: we perform a delete operation using cr.span.m_documentOffsetcr.span.m_span.m_length and cr.span.m_bAfter.

    • If the ChangeRecord is of type DeleteSpan: we perform an insert operation using cr.span.m_documentOffsetcr.span.m_span, and cr.span.m_bAfter.

    • If the ChangeRecord is of type ChangeFormatting:
    • If the ChangeRecord is of type InsertFormatting:
  6. Redo -- This can be implemented using the information in the ChangeVector. If the CurrentPosition in the ChangeVector is less than the length of the ChangeVector, the redo has not been invalidated and may be applied. The information in the ChangeRecord at the CurrentPosition provides complete information to describe the editing operation to be redone. After a redo the CurrentPosition is advanced.

  7. Autosave -- This can be implemented by periodically writing the ChangeBuffer, ChangeVector, and the ChangeAttrPropTable to temporary files. After a crash, the original document and the temporary files could be used to replay the editing operations and reconstruct the modified document.

1.4. Observations

  1. The content of the original file are never modified. Pieces in the PieceList describe the current document; the original content is referenced in a random access fashion. For systems with small memory or for very large documents, it may be worth demand loading blocks of the original content rather than loading it completly into the InitialBuffer.

  2. Document content data (in the two buffers) are never moved once written. insert and delete operations change the Pieces in the PieceList, but do not move or change the contents of the two buffers.

  3. The result of an undo operation must produce the identical document structure and content. Since consecutive Pieces in the PieceList may have the same formatting properties and may refer to congituous buffer locations (there is no requirement to coalesce them), an undo operation may produce a different PieceList than we originally had prior to doing the operation that was undone.
    • TODO Check this. Whether the PieceList should be identical or equivalent.

1.5. Problems or Issues

  1. TextPieces represent spans of text that are convenient for the structure of the document and a result of the sequence of editing operations. They are not optimized for layout or display.

    • We can provide access methods to return a const char * into the buffers along with a length, which the caller could use in text drawing or measuring calls, but not c-style, zero-terminated strings.
  2. Mapping an absolute document position to a Piece involves a linear search of the PieceList to compute the absolute document position and find the correct Piece. The number of Pieces in a document is a function of the number of editing operations that have been performed in the session and of the complexity of the structure and formatting of the original document. A linear search might be painfully slow.

    • TODO We have a patch to use an rbtree instead of the doubly-linked list to give us O(log(n)) searching, but the memory consumption is still terrible and there is much improvement to be made. It is recommended that we optimize away the color and better yet, just switch to a fractal prefetching b+-tree. Other ideas include a topologically integrated mesh data structure, but the implementation and wrapping (for sanity's sake) of this would be quite a bit more work than the fpbtree.
    • TODO Consider caching the last few lookup results so that we can avoid doing a search if possible. This should have a high hit-rate when the user is composing text.
  3. We provide a complete, but first-order undo with redo. That is, we do not put the undo-operation in the undo (like emacs).

  4. TODO The before and after stuff on insert and delete is a bit of a hand-wave.

  5. TODO Need to add multi-step-undo so that delete operations which span multiple pieces can be represented operation to the user.

1.6. Code

class PT_PieceTable
{
const UT_UCSChar * m_InitialBuffer;
const UT_UCSChar * m_ChangeBuffer;
pt_PieceList * m_pieceList;
pt_AttrPropTable m_InitialAttrPropTable;
pt_AttrPropTable m_ChangeAttrPropTable;
...
};
class pt_Piece
{
enum PieceType { TextPiece,
ObjectPiece,
StructurePiece };
PieceType m_pieceType;
<linked-list or tree pointers>
...
};
class pt_Span
{
UT_Bool m_bInInitialBuffer;
UT_uint32 m_offset;
UT_uint32 m_length;
};
class pt_TextPiece : public pt_Piece
{
pt_Span m_span;
pt_AttrPropReference m_apr;
...
};
class pt_ObjectPiece : public pt_Piece
{
...
};
class pt_StructurePiece : public pt_Piece
{
pt_AttrPropReference m_apr;
...
};
class pt_PieceList
{
<container for linked-list or tree structure>
...
};
class pt_AttrPropReference
{
UT_Bool m_bInInitialTable;
UT_uint32 m_index;
...
};
class pt_AttrProp
{
UT_HashTable * m_pAttributes;
UT_HashTable * m_pProperties;
...
};
class pt_AttrPropTable
{
UT_vector<pt_AttrProp *> m_Table;
...
};
class pt_ChangeRecord
{
UT_Bool m_bMultiStepStart;
UT_Bool m_bMultiStepEnd;
	enum ChangeType	{ InsertSpan,
DeleteSpan,
ChangeFormatting,
InsertFormatting,
...
};
struct {
UT_uint32 m_documentOffset;
UT_Bool m_bAfter;
pt_Span m_span;
} span;
struct {
UT_uint32 m_documentOffset1;
UT_uint32 m_documentOffset2;
pt_AttrPropReference m_apr;
} fmt;
...
};
class pt_ChangeVector
{
UT_vector m_vecChangeRecords;
UT_uint32 m_undoPosition;
...
};

The 'text' directory contains the text-editing engine used by AbiWord and other AbiSuite apps. There is one subdirectory per module.

Directory fmt/xp (Formatter):

Contains formatting and layout code, including views.

Directory ptbl/xp (PieceTable):

Contains the editable document, implemented using piece tables.

Todo:
Finish it.

1. Generalities

This part contains all the importer and exporter code used by AbiWord. IE_Imp_* classes are the document importers. IE_Exp_* classes are exporters. IE_ImpGraphic_* classes are graphics importers.

Importers and exporters are also used for clipboard operations.

2. Importers

  • IE_Imp -- This is the base class for all WP importers.

  • IE_Imp_AbiWord_1 -- Imports version 1 (ie current) of AbiWord documents

  • IE_Imp_Applix -- This is the importer for Applix Words documents.

  • IE_Imp_DocBook -- Importer for DocBook SGML documents.

  • IE_Imp_GraphicAsDocument -- Import a graphic as an empty document containing that graphics. Use available IE_ImpGraphic_*

  • IE_Imp_GZipAbiWord -- Imports gzip compressed AbiWord documents (.zabw)

  • IE_Imp_MsWord_97 -- Imports MS Word 97 documents using libwv.

  • IE_Imp_RTF -- This is the RTF importer.

  • IE_Imp_Text -- Plain text importer. Also handle non-ASCII text.

  • IE_Imp_WordPerfect -- Imports WordPerfect documents.

  • IE_Imp_XHTML -- Import valid XHTML documents.

  • IE_Imp_XML -- Generic XML importer. Used as a base class for all other XML work.

3. Graphic Importers

  • IE_ImpGraphic -- This is the base class for all graphics importers.

  • IE_ImpGraphic_JPEG -- This is the JPEG importer using jpeglib. Convert JPEG image to a PNG image.

  • IE_ImpGraphic_PNG -- This is the PNG importer. Simply reads the PNG file.

  • IE_ImpGraphic_BMP -- This is the BMP importer. Convert a BMP file to PNG.

  • IE_ImpGraphic_WMF -- WMF Importer.

  • IE_ImpGraphic_SVG -- SVG Importer. Currently worthless.

4. Exporters

WP

The 'wp' directory contains all source code specific to AbiWord. There is one subdirectory per module.

Directory ap (AP)

Contains source code for application-specific portion of the cross-platform framework defined in src/af/xap and src/af/ev. This contains application key bindings, mouse bindings, menu layouts, and toolbar layouts. It contains the menu string tables. It contains the table of application functions to which events may be bound. It contains the code to manage the document window (rulers, scroll bars, and the actual document window itself).

Directory impexp (ImpExp)

Contains importers and exporters for various file formats.

Directory main (main)
Contains platform-specific source code for main().

Subdirectories below may have additional hierarchy to further break things down by module. However, eventually, source code should find itself in a directory which indicates the portability of the code within it. For example, cross-platform code should always be placed in a subdirectory called 'xp'. Win32-specific code should be in a subdirectory called 'win'.

Member AP_CocoaApp::getPrefsValueDirectory (bool bAppSpecific, const gchar *szKey, const gchar **pszValue) const
support meaningful return values.
Member AP_CocoaApp::getStringSet (void) const
This function should be inilined.
Member AP_CocoaApp::shutdown (void)
The return value should be fixed to check the return values of the functions it calls, and potentially handle errors. At a minimum, it should return false on errors.
Member AP_UnixApp::getPrefsValueDirectory (bool bAppSpecific, const gchar *szKey, const gchar **pszValue) const
support meaningful return values.
Member AP_UnixApp::getStringSet (void) const
This function should be inilined.
Member AP_UnixApp::pasteFromClipboard (PD_DocumentRange *pDocRange, bool bUseClipboard, bool bHonorFormatting=true)
currently i have this set so that a ^v or Menu[Edit/Paste] will use the CLIPBOARD property and a MiddleMouseClick will use the PRIMARY property -- this seems to be the "X11 way" (sigh). consider having a preferences switch to allow ^v and Menu[Edit/Paste] to use the most recent property... this might be a nice way of unifying things -- or it might not -- this is probably an area for investigation or some usability testing.
Member AP_UnixApp::shutdown (void)
The return value should be fixed to check the return values of the functions it calls, and potentially handle errors. At a minimum, it should return false on errors.
Member AP_UnixToolbar_StyleCombo::getPangoAttrs (PD_Style *pStyle, PangoFontDescription *desc)
ROB parse more attributes like font-color, background-color
Member fp_CellContainer::setWidth (UT_sint32 iWidth)
Should force re-line-break operations on all blocks in the container
Member fp_VerticalContainer::insertContainerAfter (fp_Container *pNewContainer, fp_Container *pAfterContainer)
This function has been hacked to handle the case where pAfterContainer is NULL. That case should not happen. Bad callers should be identified and fixed, and this function should be cleaned up.
Member fp_VerticalContainer::setWidth (UT_sint32)
Should force re-line-break operations on all blocks in the container
Member FV_View::getEditableBounds (bool bEnd, PT_DocPosition &docPos, bool bOverride=false) const
speed this up by finding clever way to cache the size of the header/footer region so we can just subtract it off.
Member FV_View::isImageSelected (void) const
eventually make it faster by not fetching the image data ID.
Member GR_CairoGraphics::polygon (UT_RGBColor &c, UT_Point *pts, UT_uint32 nPoints)
Rob find out how to have this function used, and test.
Member IE_Imp_Applix::_applixNewPara (const char *buf, size_t len)
TODO handle the style and paragraph attributes.
Member IE_Imp_Applix::_applixPageBreak (const char *buf, size_t len)
TODO handle even/odd page, currently ignored by Abiword
Member IE_Imp_RTF::LoadPictData (PictFormat format, const char *image_name, struct RTFProps_ImageProps &imgProps, bool isBinary=false, long binaryLen=0)
TODO: We assume the data comes in hex. Check this assumption as we might have to handle binary data as well
Page ImpExp
Finish it.
page Main Page
Add more class names / links to sources.
Page PieceTable
Add more class names / links to sources.
Member RTF_msword97_level::ParseLevelText (const std::string &szLevelText, const std::string &szLevelNumbers, UT_uint32 iLevel)
look up the parent label and be more precise about what is added by this label.
Class RTFHdrFtr
add right and left headers and footer. Not yet supported by AbiWord
Member UT_convert (const char *str, UT_sint32 len, const char *from_codeset, const char *to_codeset, UT_uint32 *bytes_read, UT_uint32 *bytes_written)
Check for out-of-memory allocations etc.
Member AP_CocoaApp::initialize (void)
This function is 136 lines - way too long. Needs to be refactored, to use a buzzword.
Member AP_UnixApp::initialize (bool has_display)
This function is 136 lines - way too long. Needs to be refactored, to use a buzzword.
Member fp_Run::_drawTextLine (UT_sint32, UT_sint32, UT_uint32, UT_uint32, UT_UCSChar *)
Currently, this does not detect whether it is on the screen or not, so it redraws way too often.

abiword Related Pages的更多相关文章

  1. Activating Browser Modes with Doctype

    原文地址:https://hsivonen.fi/doctype/ In order to deal both with content written according to Web standa ...

  2. 30 个很棒的 PHP 开源 CMS 内容管理系统

    本文汇集了30个优秀的开源CMS建站系统,采用PHP开发.以下列表不分先后顺序. 1. AdaptCMS AdaptCMS Lite 是一个开源的CMS系统,主要特点是易用,而且可以轻松和其他系统接驳 ...

  3. cppunit官方文档浅析

    使用doxygen生成官方文档 cppunit使用了doxygen作为它的文档建设工具,所以我们要找的“官方文档”,其实就在cppunit的代码里面. 请先参考博文<下载doxygen>( ...

  4. LibVLC video controls

    原文 http://www.videolan.org/developers/vlc/doc/doxygen/html/group__libvlc__video.html VLC  3.0.0-git ...

  5. Confluence 6 超过当前许可证期限进行升级

    这个页面将会对你在进行 Confluence 升级的时候超过了当前许可证的期限进行升级的情况. 许可证警告 在升级的过程中,你将会在 Confluence 的应用程序日志(log file)中看到类似 ...

  6. doxygen

    //commndline: doxygen Doxyfile /**comment /* /** time diff@pre precondition@post endcondition@throw ...

  7. JavaScript&colon; For &comma; For&sol;in &comma; For&sol;of

    For: define: The for statement can customize how many times you want to execute code Grammar: for (c ...

  8. The Ph&period;D&period; Grind

    The Ph.D. Grind A Ph.D. Student Memoir Summary The Ph.D. Grind, a 122-page e-book, is the first know ...

  9. When an HTTP server receives a request for a CGI script

    cgicc: Overview of the Common Gateway Interface https://www.gnu.org/software/cgicc/doc/cgi_overview. ...

随机推荐

  1. android中添加背景音乐

    方法一:这是使用java中的多线程,另外new出一个类,用类来启动音乐. 这个方法,就像当初写java的小游戏一样,不过是在电脑上运行的,可以控制每一个动作,比如你的触碰动作,但是,在我这个游戏中,我 ...

  2. git如何使用 svn如何使用

    git和svn是2款常用的版本控制系统. git 的功能: 1.从服务器上克隆完整的Git仓库(包括代码和版本信息)到单机上. 也就是说自己机器上有一个git仓库. 这和svn是不同的,svn是没有本 ...

  3. Having关键字

    Having关键字:用于对整组整组地过滤(对比,where仅仅是一行一行地过滤.) 查询每个住址各住多少人? 查询住址的会员人数超过2人的住址. 注意,出现顺序where>group by&gt ...

  4. 【Android】进入Material Design时代

    由于本文引用了大量官方文档.图片资源,以及开源社区的Lib和相关图片资源,因此在转载的时候,务必注明来源,如果使用资源请注明资源的出处,尊重版权,尊重别人的劳动成果,谢谢! Material Desi ...

  5. stormzhang的推荐!

    欢迎转载,但请务必在明确位置注明出处!http://stormzhang.com/android/2014/07/07/learn-android-from-rookie/ QQ交流群:入群理由请正确 ...

  6. BZOJ 3786&colon; 星系探索 &lbrack;伪ETT&rsqb;

    传送门 数据,标程 题意: 一颗有根树,支持询问点到根路径权值和,子树加,换父亲 欧拉序列怎么求路径权值和? 一个点的权值只会给自己的子树中的点贡献,入栈权值正出栈权值负,求前缀和就行了! 和上题一样 ...

  7. Day8 接口与归一化设计

    接口:在程序的使用中,我不能把程序的主体直接提供给使用者,一般是提供一个接口. 为什么要使用接口: 1,接口提取了一群共同的函数,可以把接口当做一个函数的集合. 2,让子类去实现接口中的函数. 归一化 ...

  8. 在启用属性的情况下启动 Confluence 6

    在一些情况下,你可以希望 Confluence 在系统启动的时候就对属性文件进行打印.如果你的 Confluence 经常进行重启,并且你可能忘记来启动针对系统诊断的属性文件日志开关. 编辑 CONF ...

  9. python全栈开发day21面向对象初识总结

  10. UVA 11426 &lpar;欧拉函数&amp&semi;&amp&semi;递推&rpar;

    题意:给你一个数N,求N以内和N的最大公约数的和 解题思路: 一开始直接想暴力做,4000000的数据量肯定超时.之后学习了一些新的操作. 题目中所要我们求的是N内gcd之和,设s[n]=s[n-1] ...