Notes on Graphics Shaping Functions

Tomas Frydrych <tomasfrydrych@yahoo.co.uk>

Dec 4, 2004

Complex Scripts

Some languages require that extra processing is done on text before it can be output on screen. This document addresses two particular issues: bidirectional reordering, and shaping, and the AbiWord API for interfacing with external libraries for complex script processing.

Bidirectional reordering

Some languages are written from left to right (LTR) others from right to left (RTL). When the two are mixed together (bidirectional text), things get messy. In particular, certain characters are not clearly LTR or RTL -- their directionality depends on their context. The Unicode standard describes an algorithm for determining the order of characters on screen for a given sequence of characters.

Shaping

In some languages, such as Arabic, a character has a number of different shapes depending on the characters it connects to, and before text can be output on screen, correct shape has to be selected for each character in the text; this process is called shaping

The AbiWord layout engine implements bidirectional reordering of text according to the Unicode bidi algorithm (using the fribidi library). In addition, the base GR_Graphics class also implements very basic shaping for complex scripts. However, the capabilities of the built-in shaper are extremely limited -- it is only intended for platforms which do not provide more sophisticated shaping engine; rather, the platform graphics class should make use of any shaping services the platform provides. In order to take advantage of the platform services, the platform graphics class needs to implement the virtual fucntions and abstract classes described in this document.

Shaping and bidirectional reordering are closely tied together. The whole process of processing complex text consists of three stages: itemisation, reordering and shaping.

Itemitisation

Itemisation is a process in which the shaper divides a sequence of characters into a subsequences that have consistent properties for the purposes of reordering and shaping.

Reordering

Reordering is a process by which the items created in the itemisation stage are put into a sequence for direct output on the screen. This consists of two separate steps: reordering of individual items, and reording of text within each item; the former is always done by the AW layout engine, the latter is done by the shaper.

Shaping

Shaping is a process by which the Unicode text is converted into a sequence of glyphs to be output to the screen; this is done by the shaper.

The internal workings of a shaper will vary from implementation to implementation. AbiWord's Graphics class provides an abstract API, which allows for the details of how the shaping is achieved to be completely opaque the the AbiWord layout classes.

Shaping API Classes

GR_Item

This is an abstract class that describes an item of text and is passed to the shaper. Each platform needs to implement a derived class that would hold item information required by the specific shaper.

GR_Itematization

A wrapper class for a set of items, passed to the itemization function (see below).

GR_RenderInfo

Abstract class that encapsulates shaping information for an item of text. The contents of this class are produced by the shaper and are later passed to the rendering routines. Each platform has to implement a derrived class, holding whatever information it needs to pass between the shaper and the rendering (drawing) routines.

This class contains several virtual methods that allow rendered items to be split, appended and cut.

bool GR_RenderInfo::append(GR_RenderInfo &ri, bool bReverse)

The purpose of this method is to append the contents of *this and of ri. bReverse is set if the items are in RTL order.

If the graphics class is not able to append items, this function should return false; this will force the text in question to be scheduled for reshaping.

bool cut(UT_uint32 offset, UT_uint32 len, bool bReverse)

This method should remove a segment of length len starting at offset offset from the item; bReverse is set if the item is in RTL order. If the function is not able to remove segments from the item, it should return false; this forces the text in question to be scheduled for reshaping.

bool split (GR_RenderInfo *&pri, bool bReverse)

This function should split the item represented by *this into two parts and return the second part in pri; bReverse indicates that the item is in RTL order; the offset of the split is set in this->m_iOffset.

If the fuction cannot split the item, it should indicate it be returning false. However, if pri == NULL, the function has to allocate a new instance of GR_*RenderInfo and set pri to it. In all cases, the function has to make copy of this->m_pItem in pri->m_pItem.

GR_ShapingInfo

This class encapsulates information that is passed down to the shapper; this class is not abstract and should not be overriden by platform version. Among other things, this class holds a pointer to GR_Item.

GR_Graphics complex scripts-related virtual methods

bool itemize(UT_TextIterator & text, GR_Itemization & I);

This method has to produce set of items describing the Unicode text represented by the text iterator.

bool shape(GR_ShapingInfo & si, GR_RenderInfo *& ri);

This method produces a platform specific instance of GR_*RenderInfo for the text represented by GR_ShapingInfo si.

void renderChars(GR_RenderInfo & ri);

This is the function responsible for drawing on screen the text represented by GR_RenderInfo ri. It should be noted that renderChars() can be asked to output only a segment from the text which is represented by ri (see prepareToRenderChars())

void prepareToRenderChars(GR_RenderInfo & ri);

This function is called imediately before renderChars() and is passed the same input GR_RenderInfo ri. The purpose of this function is to do any pre-processing that renderChars() might require. For example, renderChars() might be called inside a loop, asked to draw the text represented by ri in several segments (as when we draw selection, and need to change colour of text and the background). However, it might be possible and desirable to take some processing outside that loop and do it only once for the whole text represented by ri; such processing should take place in prepareToRenderChars(), which will be called outwith the loop.

void appendRenderedCharsToBuff(GR_RenderInfo & ri, UT_GrowBuf & buf) const;

This function is included for legacy reasons and working implementation does not have to be provided; however, the platform should provide dummy implementation containing a not-implemented assert if it does not provide a working implementation.

void measureRenderedCharWidths(GR_RenderInfo & ri);

This function is to measure the (shaped) text represented by GR_RenderInfo ri, what exactly is involved in this, and how the text metrics is stored is up to the platform implemenation. The actual metrics of text is obtained by subsequent calls to getTextWidth() described below.

bool canBreak(GR_RenderInfo & ri, UT_sint32 &iNext, bool bAfter);

Determines if text can be legally broken (for linebreaking purposes) at given position. ri.m_pText contains an iterator which is positioned at the start of the item represented by ri and it has its upper limit set so as not to advance beyond the end of the item. The offset of the character at which the break is questioned is in ri.m_iOffset, and ri.m_iLength contains the length of the item (in characters). bAfter indicates whether the caller is asking for break before or after the the given character.

If break is not possible at the requested position, the function may indicate to the caller where in the item the next legal break point is by setting iNext (as an offset from the start of the item); if the fuction does not know where the next break point is, it should set iNext to -1.

bool needsSpecialCaretPositioning(GR_RenderInfo & ri);

This function indicates whether the item represented by ri requires special positioning of the caret (in some lanaguages, such as Thai, not all caret positions are valid).

UT_uint32 adjustCaretPosition(GR_RenderInfo & ri, bool bForward);

This function should adjusts caret position if given script restricts where caret can be placed. The caller has to set initial position of the caret within the run in ri.m_iOffset, overall length of the item in ri.m_iLength and provide a text iterator over the text of the run in ri.m_pText. bForward indicates if the caret is moving forward or backward. The return value is the adjusted offset

void adjustDeletePosition(GR_RenderInfo & ri);

This function adjusts position for delete if given script restricts deletion to character clusters. The caller has to set initial position within the item where deletion is to happen in ri.m_iOffset, overall length to be deleted in ri.m_iLength and provide a text iterator over the text of the run in ri.m_pText. On return ri.m_iOffset should contain the adjusted (item-relative) position and ri.m_iLength the adjusted length of the delete (for example, the language rules might require that delete of 3 characters at offset 2 in the item, results in actual delete of 4 characters starting at offset 1).

bool nativeBreakInfoForRightEdge();

The AbiWord line breaking was designed so as to look for breaks at the right edge of a character, ii.e., the character that can break is included with the left part of the split run. The Uniscribe library, however, holds breaking info for left edge, and sometimes it is useful to know what system we are dealing with for performance reasons.

UT_sint32 resetJustification(GR_RenderInfo & ri, bool bPermanent);

If the character advances in this run have been adjusted due to justification, call to this function should reset them back to the regular character advances. bPermanent indicates whether this reset is temporary for the benefit of the linebreaking routines, or whether it is permanent (whether the function handles the two differently depends on what is involved in the reset).

UT_sint32 countJustificationPoints(const GR_RenderInfo & ri) const;

Determine number of points between which any justification amount can be divided. This function has to take into account ri.m_bLastOnLine; if set, any justification points after last meaningful character in the item are to be disregarded.

void justify(GR_RenderInfo & ri);

Adjust character advances using ri.m_iJustificationPoints and ri.m_iJustificationAmount.

UT_uint32 XYToPosition(const GR_RenderInfo & ri, UT_sint32 x, UT_sint32 y) const;

Translate point <x,y> (in layout units relative to the screen coords of the item) to a position in the item (return value is the offset into the item, relative to the start of the item).

void positionToXY(const GR_RenderInfo & ri, UT_sint32& x, UT_sint32& y, UT_sint32& x2, UT_sint32& y2, UT_sint32& height, bool& bDirection) const;

Translate character offset in ri.m_iOffset (relative to the start of the item) to coords (in layout units, relative to screen position of the item).

UT_sint32 getTextWidth(GR_RenderInfo & ri);

Measure the width of segment of text from item represented by ri, starting at offset ri.m_iOffset, and of length ri.m_iLength.

const UT_VersionInfo & getVersion() const;

This function needs only to be implemented by graphics classes that are plugins.

void drawChars();

When the graphics class provides renderChars() method, the drawChars() method is only used to draw static text, and hence, its performance is less critical. The drawChars() method has to carry out the entire itemize/reorder/shape process on the Unicode text passed to it.