AbiSource File Import

Version 1.0

Jeff Hostetler

jeff@abisource.com

AbiSource, Inc.

1. Introduction

AbiWord can import documents in various different file formats. This document provides an overview of the import mechanism and instructions for adding support for a new format.

2. class IE_Imp

The code for our current set of importers can be found in abi/src/wp/impexp/xp.

Class IE_Imp defines the base class for all importers. All file importers are derived from this class.

Class IE_Imp also defines some methods which the actual file-open and file-open-dialog code will use. This includes code to enumerate through the set of derived classes and to instantiate a specific derived class by name.

3. Adding a New File Type

Currently, we have defined a class called IE_ImpSniffer. This class is responsible for various duties, including creating new instances of our IE_Imp subclass (i.e. the actual importer). The duties of the sniffer include:

[1] Importer registration

[2] Automatic recognition of a file’s contents

[3] Automatic recognition of a file’s suffix

[4] Creating a new importer object

Here are the steps to add support for a new file type:

[1] Create a new sniffer class and register it via the ie_impexp_Register.cpp file

[2] Clone one of the existing importers and start hacking.

[3] As a first step in creating your importer, I recommend that you start by just extracting the text of the document without worrying about formatting. The most trivial document contains one section, one block (paragraph), and some content (one or more spans). So you should be able to get something on the screen with nothing more than this:

m_pDocument->appendStrux(PTX_Section,NULL);

m_pDocument->appendStrux(PTX_Block,NULL);

m_pDocument->appendSpan(data,length);

...

You can make as many calls to appendSpan() within a block as you need.

[4] Once you get that working, you can start adding section, block and span formatting attributes using the second argument of appendStrux() and the appendFmt() method. These consist of a series of CSS2-like name-value pairs. See AbiWord_DocumentFormat.abw and the source for the other importers for details.