$Id$

AbiSource File Import

Version 1.0

Copyright (C) 1999 AbiSource, Inc., All Rights Reserved.

Jeff Hostetler

jeff@abisource.com

AbiSource, Inc.

$Date$

1. Introduction

AbiWord can import documents in several different file formats. This document provides an overview of the import mechanism and instructions for adding support for a new format.

2. class IE_Imp

The code for our current set of importers can be found in abi/src/wp/impexp/xp.

Class IE_Imp defines the base class for all importers. All file importers are derived from this class.

Class IE_Imp also defines some static tables and methods which the actual file-open and file-open-dialog code will use. This includes code to enumerate through the set of derived classes and to instantiate a specific derived class by name.

3. Adding a New File Type

Here are the steps to add support for a new file type:

[1] In abi/src/wp/impexp/xp/ie_Imp.cpp: add a DeclareImporter() entry in the s_impTable[].

This table defines the enumeration order for things like the suffix list in the FileOpen and FileSaveAs dialogs, so watch where you add the entry.

[2] Currently the code in IE_Imp::constructImporter() is quite stupid. It determines the type of document by a simple test of the file's suffix. We hope to improve this later, but for now that's how it works. Therefore, you will need to choose a unique suffix for your importer.

[3] Clone one of the existing importers and start hacking.

[4] Each importer must define 3 static methods to allow IE_Imp to properly enumerate them and delegate work to them: RecognizeSuffix(), StaticConstructor(), and GetDlgLabels(). The first is used in the suffix guessing described in [2]. The second is used to instantiate an instance of the importer. And the third is used by the FileOpen and FileSaveAs dialogs to populate the FileType combo box (on the platforms that have one).

[5] As a first step in creating your importer, I recommend that you start by just extracting the text of the document without worrying about formatting. The most trivial document contains one section, one block (paragraph), and some content (one or more spans). So you should be able to get something on the screen with nothing more than this:

m_pDocument->appendStrux(PTX_Section,NULL);

m_pDocument->appendStrux(PTX_Block,NULL);

m_pDocument->appendSpan(data,length);

m_pDocument->appendSpan(data,length);

...

You can make as many calls to appendSpan() within a block as you need.

[6] Once you get that working, you can start adding section, block and span formatting attributes using the second argument of appendStrux() and the appendFmt() method. These consist of a series of CSS2-like name-value pairs. See AbiWord_DocumentFormat.abw and the source for the other importers for details.