Chapter 3 - Missing Features

The "makeWebBook" program as of October 2015 is a first step and several useful features are missing. In this chapter some extensions are summarized that should be implemented in the future.

3.1 Other Captions

In web books other caption types might be useful, such as captions for examples (e.g. "Example 3-1"), captions for definitions (e.g. "Definition 4-1"), captions for animations with webgl (e.g. "Animation 5-1"; see webgl examples). It is not clear, how to formulate this. Currently, <caption> is seen as a caption for a table (e.g., "Table 3-1: table caption text ") and "makeWebBook" treats "Table 3-1" as the caption number. It might be better to generalize this and require from the user to use the right keyword and "makeWebBook" only updates the number. The tool collects then all captions with the same keyword, such as "Table" or "Example", and numbers the captions within one such keyword.

3.2 Search Button

In a pdf-file it is common to easily search for a given string. On a web site it is common to search for a given string within the respective site. A user therefore expects a search button in the tool bar. If the book is present on a web page, one could easily use a Google search button. However, the issue is how to implement a reasonable search when the book is stored locally on a computer.

3.3 Index

Scientific books have usually an index that collects the most important keywords used in the book together with a reference to the most important sections where this keyword is used. It would be nice to add such a capability also here. This should be quite easy: In a text a keyword is marked in the following way:

<span id="keyword-id" title="OptionalText">Keyword</span>

The "makeWebBook" program collects all this information together and then generats a separate file (similar as the table-of-contents file) where the Keyword is present or if a title is used, the title text. When clicking on this index entry, the browser jumps to the span element with the corresponding keyword-id.

A completely different solution might be to use a fixed vertical navigation bar at the left side. Then the link issue above would be gone and one could add more navigation possibilities (e.g. list all chapters). This extension would require a rather small extension of the "makeWebBook" program.

3.4 Parsing Issue

The "makeWebBook" program collects information with the goquery package that does not distinguish between upper or lower case element/attribute names. When generating a new file, the file is read again but now a simple search facility is used to search for the start of an element, such as for "<h1". Currently, only lower case characters are searched. Therefore, something will go wrong, if upper case letters are used for element/attribute names in the underyling HTML file. This should be fixed.

3.5 Converting to PDF Format

Sometimes it would be useful to have the web book in pdf format. This is already now possible but requires manual effort. It would not require much effort to do this automatically as sketched below:

The first step is to store one HTML file in pdf format. There are HTML to PDF converters on the web, but it is unlikely that they are useful for scientific web books. For example the commercial Adobe Acrobat Pro XI program has a (in principal) nice converter for any web page: Clicking in the tool bar on "Convert Web Page to Adobe PDF ..." saves the currently inspected web page in PDF-format. The program supports HTML5, CSS3 and Javascript. However, it does not support SVG and not MathML. Therefore SVG images are not present in the PDF file and jqmath equations do not look nice.

The only meaningful approach seems to be to print the web page to file via a printer driver (only then all elements are contained, including SVG images and MathML equations generated internally by jqmath, because the web browser renders the page and the rendered page is printed). It is sufficient to use any postscript printer driver. These drivers usually have the possibility to print to a file as well. The result is a file in postscript (*.ps) format. With the open source ghostscript program ps-files can be converted to pdf. With the open source gsview program, the conversion can be performed from a graphical user interface. In case the commercial Adobe Acrobat Pro program is available, the generation is very simple: In the (Firefox) browser click on "File / Print ..." and then select the printer "Adobe PDF". This printer driver generates directly a pdf-file of the current web page.

Generating a page of a web book with a printer driver would not give good results, because the (unaltered) rendered page is designed for viewing in a web browser. For example, the navigation bar on the left side would be present. For this reason, in the "resources/styles/stylesheet.css" file, the following statement is present at the end:

@media print {
  ...
}

The CSS3 definitions present in this block overwrite the other CSS3 definitions if the web page is printed. In particular it is defined that

another font family is used,
the font-sizes are reduced,
the navigation bar and the page heading is removed,
the maximum width of images (<img>) is restricted to the page width (so images are automatically reduced in size if they would not fit on a page),
page breaks within <figure>, <table>, <pre> elements shall be avoided, and
page breaks after <h1>, <h2>, ..., <h6> elements are also avoided.

Whenever all files of a web book are merged together in one file (see below), in the <head> section of this file the following definition should be introduced:

<style type="text/css">
   h1       { page-break-before: always; }
   h1.cover { page-break-before: avoid;  }
</style>

and the first <h1> definition in the cover page should be defined as

<h1 id="cover-page" class="cover"> ... </h1>

As a result, a page break is introduced before every Chapter, but not on the cover page.

Additionally, the CSS class page-break is defined in the "resources/styles/stylesheet.css" file. It introduces a page break at the defined location when printing the page. Therefore, whenever the pdf file does not look good, it might help to use

<div class="page-break" />

in order to introduce a page break.

With these definitions, the pdf-file generated from a web book file looks usually good.

It remains to generate one HTML5 file from all web book files. Currently, this must be done by manually copying all <body>...</body> elements in one file and printing this file. It would be not much work to extend the "makeWebBook" program with an option to do this automatically.

There are the following remaining (not yet solved) issues:

With a postscript or Adobe PDF printer driver hyperlinks are not retained. Therefore, in the pdf-file the hyperlinks are still marked in "blue" but they have no effect (it is not possible to click on them).
In the "table of contents" the page numbers of the sections and sub-sections should be present, but they aren't.

There are professional (expensive) solutions to generate high-quality books from HTML5 and CSS in PDF format. Especially:

From the description it is not clear whether all elements needed for a scientific web book are supported, especially it seems that MathML and SVG are not supported.

There are W3C working drafts to define the layout of printed books with HTML5 and CSS, especially CSS Paged Media Module Level 3 and CSS Generated Content for Paged Media Module. A nice overview with several examples is provided in the web article Designing For Print With CSS. These features are not supported in Firefox version 41 (anyway, the proposals are working drafts and are not yet released). However, Prince supports them and is then able to generate high quality, nice looking, printed books in pdf format.

Note, the CSS2 properties counter-reset and counter-increment to number HTML elements are supported by Firefox and most other browsers. However, it is an incomplete solution and not sufficient for a scientific web-book. Even if it would be sufficient, there is the major drawback that the actual numbers are only visible in the rendered page and not in the HTML file. This is inconvenient when inspecting/editing the file with a text editor.