Building Scientific Web Books by Martin Otter

Chapter 4 - Other Alternatives

There are other alternatives to build a scientific web book. The following (incomplete) discussion summarizes information from the Modelica trac ticket 1730.

4.1 Epub3

The web standard for electronic (HTML) books is Epub. It is used by a large community. The newest version 3.0.x (epub3) supports additionally svg, MathML and Javascript. Epub is interesting because a lot of development work is being done. Since there is a huge commercial interest, tools are available (for Epub2) or are becoming available (for Epub3). A reasonable open source epub2/3 editor is Calibre. It provides most of the commands via a toolbar and shows the rendering result at once in a second window. There are also commercial editors that have a true WYSIWYG editing facility (e.g. BlueGriffon).

Epub3 has currently several drawbacks:

4.2 Office Applications

The best WYSIWYG (What You See Is What You Get) editors are the ones for office applications, such as Microsoft Word, OpenOffice, LibreOffice, GoogleDocs etc. So, one could use one of them as "mother" version and when a new version of the book is needed, a pure HTML version could be constructed.

The drawbacks are:

4.3 Markup Languages (Markdown, Pandoc, ReStruturedText, Sphinx)

There are many textual markup languages that use "natural" markups and can be converted into different formats, especially into HTML. Examples are:

4.4 LaTeX

LaTeX is the de-facto standard in the scientific community to produce high-quality printed books. It is natural to have a processing system that is able to transform a subset of LaTeX markups (or more precisely, the calls of TeX macros) in to HTML.

A short sketch of LaTeX to HTML tools is given by Martin Sjoelund . His conclusion is that Hevea seems to be the best LaTeX to HTML converter. There is a distribution of Hevea for Windows (the latest Windows version is from March 2013). I tried this distribution on Windows 7 but did not yet manage to get it running (when running in the installation directory of hevea: "hevea test\pavtest.tex" get the error message: "File error: Cannot open file: hevea.hva").

4.5 DocOnce

There is a markup.language called DocOnce. The author, Hans Petter Langtangen, explains here why Sphinx, pandoc-markdown, LaTeX and other approaches are not sufficient for scientific books. He argues that he wants to have one document source and generate from this source (a) a book in high (LaTeX) quality and (b) a web-site in high (html) quality, including modern (responsive) web design. He argues that this is not possible with any solution he knows (including Sphinx, pandoc-markdown and LaTeX) and therefore he invented a new markup-language and implemented the needed converters in Python. He used DocOnce to write a 900 page Springer book (A Primer on Scientific Programming with Python). So, this is a practical proof that this language has everything what is needed to write (a) a scientific book and (b) a web page (there are many examples on the DocOnce web page).

The drawbacks are:

From all inspected approaches, this seems to be the only one that is able to produce a printed and a web-site book in high quality from the same source.

4.6 Summary

If a book shall be generated in several formats (say HTML and high-quality book in pdf), the markup languages sketched above seem currently to be the only choice and one has to accept their drawbacks. If HTML shall be the only book format, the situation is different:

All the reasonable approaches discussed in this chapter have generally the drawback that no office-like graphical user environments are available to easily define the text. For some community this might be acceptable, for others not. The approach proposed in this book is better in this respect, because there are open source and commercial GUI programs to define HTML text in an office-like way.

All these approaches also seem to not support the features that are important for a scientific web book: Numbering of sections, figures, tables, equations, and cross referencing these elements. Again, this is supported by the approach advocated in this book.

It might be possible to combine the approaches. For example, if a tool generates HTML files with id attributes, then the makeWebBook program might be used to add all the missing section, caption, and figure numbers. Equations can most likely not be handled, because a special marking is assumed by makeWebBook.