figure elyxer.png eLyXer Programming Guide

Alex Fernández (elyxer@gmail.com)

1 The Basics

This document should help you get started extending eLyXer. The package (including this guide and all accompanying materials) is licensed under the GPL version 3 or, at your option, any later version. See the LICENSE file for details. Also visit the main page to find out about the latest developments.
In this section we will outline how eLyXer performs the basic tasks.

1.1 Getting eLyXer

If you are interested in eLyXer from a developer’s perspective the first thing to do is fetch the code. It is included in the standard distribution, so just navigate to the src/ folder and take a look at the .py Python code files.
For more serious use, or to get the latest copy of the code (after the last published version) you need to install the tool git, created by Linus Torvalds (of Linux fame). The code is hosted in Savannah 1, a GNU project for hosting non-GNU projects. So first you have to fetch the code:
$ git clone git://git.sv.gnu.org/elyxer.git
You should see some output similar to this:
Initialized empty Git repository in /home/user/install/elyxer/.git/
remote: Counting objects: 528, done.
remote: Compressing objects: 100% (157/157), done.
remote: Total 528 (delta 371), reused 528 (delta 371)
Receiving objects: 100% (528/528), 150.00 KiB | 140 KiB/s, done.
Resolving deltas: 100% (371/371), done. 
Now enter the directory that git has created.
$ cd elyxer
Your first task is to create the main executable file:
$ ./make
The build system for eLyXer will compile it for you, and even run some basic tests. Now you can try it out:
$ cd docs/
$ ../elyxer.py devguide.lyx devguide2.html
You have just created your first eLyXer page! The result is in devguide2.html; to view it in Firefox:
$ firefox-bin devguide2.html
Note for Windows developers: on Windows eLyXer needs to be invoked using the Python executable, and of course changing the slashes to backward-slashes:
> Python ..\elyxer.py devguide.lyx devguide2.html
In the rest of this section we will delve a little bit into how eLyXer works.

1.2Containers

The basic code artifact (or ‘class’ in Python talk) is the Container. Its responsibility is to take a bit of LyX code and generate working HTML code.
The following figure shows how a Container works. Each type of Container should have a parser and an output, and a list of contents. The parser object receives LyX input and produces a list of contents that is stored in the Container. The output object then converts that contents to a portion of valid HTML code.
figure container.png
Figure 1.1 
Container structure.
Two important class attributes of a Container are:
A class called ContainerFactory has the responsibility of creating the appropriate containers, as the strings in their start attributes are found.
The basic method of a Container is:
Now we will see each subordinate class in detail.

1.3Parsers

A Parser has two main methods: parseheader() and parse().
parseheader(): parses the first line and returns the contents as a list of words. This method is common for all Parsers. For example, for the command ’\\emph on’ the Parser will return a list [’\\emph’,’on’]. This list will end up in the Container as an attribute header.
parse(): parses all the remaining lines of the command. They will end up in the Container as an attribute contents. This method depends on the particular Parser employed.
Most Parsers reside in the file parser.py. Among them are the following usual classes:
LoneCommand: parses a single line containing a LyX command.
BoundedParser: reads until it finds the ending. For each line found inside, the BoundedParser will call the ContainerFactory to recursively parse its contents. Then the parser returns everything found inside as a list.
Parsers are confined to the parser package, so that any future format changes can be done easily.

1.4 Outputs

Common outputs reside in output.py. They have just one method:
gethtml(): processes the contents of a Container and returns a list with file lines. Carriage returns \n must be added manually at the desired points; eLyXer will just merge all lines and write them to file.
Outputs do not however inherit from a common class; all you need is an object with a method gethtml(self,container) that processes the Container’s contents list attribute.

1.5 Preprocessors

The file preprocessor.py contains some preprocessing to be done on Containers before they are output. Preprocessing is deactivated at the time since it needs some context, not just the current Container.

1.6 Tutorial: Adding Your Own Container

If you want to add your own Container to the processing you do not need to modify all these files. You just need to create your own source file that includes the Container, the Parser and the output (or reuse existing ones). Once it is added to the types in the ContainerFactory eLyXer will happily start matching it against LyX commands as they are parsed.
There are good examples of parsing commands in just one file in image.py and formula.py. But let us build our own container BibitemInset here. We want to parse the LyX command in listing . In the resulting HTML we will generate an anchor: a single tag <a name="mykey"> with fixed text "[ref]".
\begin_inset CommandInset bibitem
LatexCommand bibitem
key "mykey"
\end_inset
Listing 1.1 
The LyX command to parse.
We will call the Container BibitemInset, and it will process precisely the inset that we have here. We will place the class in bibitem.py. So this file starts as shown in listing .
class BibitemInset(Container):
  "An inset containing a bibitem command"
  
  start = ’\\begin_inset CommandInset bibitem’
  ending = ’\\end_inset’
Listing 1.2 
Class definition for BibitemInset.
We can use the parser for a bounded command with start and ending, BoundedParser. For the output we will generate a single HTML tag <a>, so the TagOutput() is appropriate. Finally we will set the breaklines attribute to False, so that the output shows the tag in the same line as the contents: <a …>[ref]</a>. Listing shows the constructor.
  def __init__(self):
    self.parser = BoundedParser()
    self.output = TagOutput()
    self.tag = ’a’
    self.breaklines = False
Listing 1.3 
Constructor for BibitemInset.
The BoundedParser will automatically parse the header and the contents. In the process() method we will discard the first line with the LatexCommand, and place the key from the second line as link destination. The class StringContainer holds string constants; in our case we will have to isolate the key by splitting the string around the double quote ", and then access the anchor with the same name. The contents will be set to the fixed string [ref]. The result is shown in listing .
  def process(self):
    #skip first line
    del self.contents[0]
    # parse second line: fixed string
    string = self.contents[0]
    # split around the "
    key = string.contents[0].split(’"’)[1]
    # make tag and contents
    self.tag = ’a name="’ + key + ’"’
    string.contents[0] = ’[ref] ’
Listing 1.4 
Processing for BibitemInset.
And then we have to add the new class to the types parsed by the ContainerFactory; this has to be done outside the class definition. The complete file is shown in listing .
from parser import *
from output import *
from container import *
  
class BibitemInset(Container):
  "An inset containing a bibitem command"
  
  start = ’\\begin_inset CommandInset bibitem’
  ending = ’\\end_inset’
  
  def __init__(self):
    self.parser = BoundedParser()
    self.output = TagOutput()
    self.breaklines = False
  
  def process(self):
    #skip first line
    del self.contents[0]
    # parse second line: fixed string
    string = self.contents[0]
    # split around the "
    key = string.contents[0].split(’"’)[1]
    # make tag and contents
    self.tag = ’a name="’ + key + ’"’
    string.contents[0] = ’[ref] ’
  
ContainerFactory.types.append(BibitemInset)
Listing 1.5 
Full listing for BibitemInset.
The end result of processing the command in listing is a valid anchor:
<a name="mykey">[ref] </a>
The final touch is to make sure that the class is run, importing it in the main file elyxer.py, as shown in listing .
from structure import *
from bibitem import *
from container import *
Listing 1.6 
Importing the BibitemInset from the main file.
Now this Container is not too refined: the link text is fixed, and we need to do additional processing on the bibitem entry to show consecutive numbers. The approach is not very flexible either: e.g. anchor text is fixed. But in less than 20 lines we have parsed a new LyX command and have outputted valid, working XHTML code. The actual code is a bit different but follows the same principles; it can be found in src/link.py: in the classes BiblioCite and BiblioEntry, and it processes bibliography entries and citations (with all our missing bits) in about 50 lines.

2 Advanced Features

This section tackles other, more complex features. Not all of them are included in the current version.

2.1 Parse Tree

On initialization of the ContainerFactory, a ParseTree is created to quickly pass each incoming LyX command to the appropriate containers, which are created on demand. For example, when the ContainerFactory finds a command:
\\emph on
it will create and initialize an EmphaticText object. The ParseTree works with words: it creates a tree where each keyword has its own node. At that node there may be a leaf, which is a Container class, and/or additional branches that point to other nodes. If the tree finds a Container leaf at the last node then it has found the right point; otherwise it must backtrack to the last node with a Container leaf.
Figure shows a piece of the actual parse tree. You can see how if the string to parse is “\begin_inset LatexCommand”, at the node for the second keyword “LatexCommand” there is no Container leaf, just two more branches “label” and “ref”. In this case the ParseTree would backtrack to “begin_inset”, and choose the generic Inset.
figure parse tree.png
Figure 2.1 
Portion of the parse tree.
Parsing is much faster this way, but there are disadvantages; for one, parsing can only be done using whole words and not prefixes. SGML tags (such as <lyxtabular>) pose particular problems: sometimes they may appear with attributes (as in <lyxtabular version="3">), and in this case the starting word is <lyxtabular without the trailing ’>’ character. So the parse tree removes any trailing ’>’, and the start string would be just <lyxtabular; this way both starting words <lyxtabular> and <lyxtabular are recognized.

2.2 Postprocessors

Some post-processing of the resulting HTML page can make the results look much better. The main stage in the postprocessing pipeline inserts a title “Bibliography” before the first bibliographical entry. But more can be added to alter the result. As eLyXer parses a LyX document it automatically numbers all chapters and sections. This is also done in the postprocessor.
There is also a LastStage: a stage that processes the last container based on the current one. It is used to join list items into one coherent list tag. The principle is the same as with other postprocessors.

2.3 Mathematical Formulae

Formulae in LyX are rendered beautifully into TeX and PDF documents. For HTML the conversion is not so simple. There are basically three options:
eLyXer employs the third technique, with varied results. Basic fractions and square roots should be rendered fine, albeit at the moment there may be some issues pending. Complex fractions with several levels do not come out right. (But see subsection .)

2.4 Distribution

You will notice that in the src/ folder there are several Python files, while in the main directory there is just a big one. The reason is that before distributing the source code is conflated and placed on the main directory, so that users can run it without worrying about libraries, directories and the such. (They need of course to have Python 2.5 installed.) And the weapon is a little Python script called conflate.py that does the dirty job of parsing dependencies and inserting them into the main file. There is also a make Bash script that takes care of permissions and generates the documentation. Just type
$ ./make
at the prompt. It is a primitive way perhaps to generate the “binary” (ok, not really a binary but a distributable Python file), but it works great.
At the moment there is no way to do this packaging on non-Unix operating systems with a single script, e.g. a Windows .bat script. However the steps themselves are trivial.

2.5 License and Contributions

eLyXer is published under the GPL, version 3 or later 3. This basically means that you can modify the code and distribute the result as desired, as long as you publish your modifications under the same license. But consult a lawyer if you want an authoritative opinion.
All contributions will be published under this same license, so if you send them this way you implicitly give your consent. An explicit license grant would be even better and may be required for larger contributions.

3 Future Extensions

The author has plans for the following extensions.

3.1 Templates

Some header and footer content is automatically added to the resulting document. The use of templates might make the job far more flexible.

3.2 Page Segmentors

A page segmentor should build a set of pages and cross-reference them, but generally avoids the complexities of the internal structure. The tool is called idxer. It uses templates to construct the header and footer.
The complete package should implement something like the flow in figure . This is the high-level design that has to be filled in with the missing tool.
figure pipeline.png
Figure 3.1 
Complete eLyXer pipeline.

3.3 MathML

As suggested by Günther Milne and Abdelrazak Younes 4,5, MathML is by now well supported in Firefox. An option to emit MathML (instead of more-or-less clumsy HTML and CSS code) could be very useful.

3.4 Roadmap

Basic tool support is ready by Q1 2009 (end of March), including support for most common LyX documents. In Q2 2009 (i.e. by the end of June) the pipeline should be complete and running. LyX integration (as an external tool) is planned for Q3 2009. All this within the usual constraints: day job, family, etc.

4 Discarded Bits

Not everything that has been planned or can be done with eLyXer is planned; some extensions have been discarded. However, this means basically that the author is too ignorant to know how to do them right; help (and patches!) towards a sane implementation would be welcome if they fit with the design.

4.1 Spellchecking

LyX can use a spellchecker to verify the words used. However it is not interactive so you may forget to run it before generating a version. It is possible to integrate eLyXer with a spellchecker and verify the spelling before generating the HTML, but it is not clear that it can be done cleanly.

4.2 URL Checking

Another fun possibility is to make eLyXer check all the external URLs embedded in the document. However the Python facilities for URL checking are not very mature, at least with Python 2.5: some of them do not return errors, others throw complex exceptions that have to be parsed… It is far easier to just create the HTML page and use wget (or a similar tool) to recursively check all links in the page.

4.3 Use of lyx2lyx Framework

Abdelrazak Younes suggests using the lyx2lyx framework, which after all already knows about LyX formats 6. It is an interesting suggestion, but one that for now does not fit well with the design of eLyXer: a standalone tool to convert between two formats, or as Kernighan and Plauger put it, a standalone filter 7. Long-term maintenance might result a bit heavier with this approach though, especially if LyX changes to a different file format in the future.

5 FAQ

Q: I don’t like how your tool outputs my document, what can I do?
A: First make sure that you are using the proper CSS file, i.e. copy the existing docs/lyx.css file to your web page directory. Next try to customize the CSS file to your liking; it is a flexible approach that requires no code changes. Then try changing the code (and submitting the patch back).
Q: Why does your Python code suck so much? You don’t make proper use of most features!
A: Because I’m mostly a novice with little Python culture. If you want to help it suck less, please send mail and enlighten me.
Q: How is the code maintained?
A: It is kept in a git repository. Patches in git format are welcome (but keep in mind that my knowledge of git is even shallower than my Python skills).
Q: I found a bug, what should I do?
A: Just report it to the Savannah interface: https://savannah.nongnu.org/bugs/?func=additem&group=elyxer.

Bibliography

[1] Free Software Foundation, Inc.: eLyXer summary. https://savannah.nongnu.org/projects/elyxer/

[2] S White: “Math in HTML with CSS”, accessed March 2009. http://www.zipcon.net/~swhite/docs/math/math.html

[3] R S Stallman et al: “GNU GENERAL PUBLIC LICENSE” version 3, 20070629. http://www.gnu.org/copyleft/gpl.html

[4] G Milde: “Re: eLyXer: LyX to HTML converter”, message to list lyx-devel, 20090309. http://www.mail-archive.com/lyx-devel@lists.lyx.org/msg148627.html

[5,6] A Younes: “Re: eLyXer: LyX to HTML converter”, message to list lyx-devel, 20090309. http://www.mail-archive.com/lyx-devel@lists.lyx.org/msg148634.html

[7] B W Kernighan, P J Plauger: “Software Tools”, ed. Addison-Wesley Professional 1976, p. 35.


Copyright (C) 2009 Alex Fernández (elyxer@gmail.com)