WebBook Level 1

Unofficial Proposal Draft,

This version:
http://glazman.org/e0/webbook.html
Repository:
https://github.com/therealglazou/webbook
Warning:
WORK IN PROGRESS
Author:
Daniel Glazman, Disruptive Innovations

Abstract

WebBook is a new format for electronic books, based on Web Standards only, and meant to make such books readable inside a browser.

1. Introduction

This section is informative.

The world of electronic books (ebooks) is very fragmented, technically speaking. The current electronic book formats, inheriting from multiple sources, are not readable inside a browser without a dedicated programmatic layer. Even if the ZIP package containing the electronic book is unzipped, finding and rendering the individual documents composing the book is not a task anyone can perform. Furthermore, most of these existing formats rely a lot on dedicated XML or even proprietary binary formats that raise many technical issues.

The current document proposes the Level 1 of a new electronic book format, called WebBook, designed to free the electronic book market from its current industrial silo to make it become a real first-class client of the Web.

1.1. Requirements

This section is informative.

The following requirements are the basis for the technical choices in this specification:

  1. one URL is enough to retrieve a remote WebBook instance, there is no need to download every resource composing that instance

  2. the contents of a WebBook instance can be placed inside a Web site’s directory and are directly readable by a Web browser using the URL for that directory

  3. the contents of a WebBook instance can be placed inside a local directory and are directly readable by a Web browser opening its index.html or index.xhtml topmost file

  4. each individual resource in a WebBook instance, on a Web site or on a local disk, is directly readable by a Web browser

  5. any html document can be used as content document inside a WebBook instance, without restriction

  6. any stylesheet, replaced resource (images, audio, video, etc.) or additional resource useable by a html document (JavaScript, manifests, etc.) can be used inside the navigation document or the content documents of a WebBook instance, without restriction

  7. the navigation document and the content documents inside a WebBook instance can be created and edited by any html editor

  8. the metadata, table of contents contained in the navigation document of a WebBook instance can be created and edited by any html editor

  9. the WebBook specification is backwards-compatible

  10. the WebBook specification is forwards-compatible, at the potential cost of graceful degradation of some content

  11. WebBook instances can be recognized without having to detect their MIME type

  12. it’s possible to deliver electronic books in a form that is compatible with both this specification and [EPUB301]

2. WebBook instances

A WebBook instance is a [ZIP] container. All files in a WebBook instance MUST be stored as is (no compression) or using the Deflate algorithm. All file and directory names must be encoded in UTF-8 [UNICODE]. All File Names within the same directory must be unique following case normalization as described in section 3.13 of [UNICODE]. All File Names within the same directory should be unique following NFC or NFD normalization [UAX15].

The file name of a WebBook instance SHOULD use the wbook file extension, unless compatibility with [EPUB301] is added; in that case, the epub file extension is recommended. User Agents should always treat a zipped package with file extension wbook or epub or using the EPUB Media Type [EPUB-OCF301] as a potential WebBook.

A WebBook instance MUST contain a navigation document.

A WebBook instance and each directory inside that WebBook instance can contain any number of files and directories. All files inside a WebBook instance SHOULD have a file extension to ensure all User Agents (including Web browsers) can render the file if necessary even if no HTTP header is present or if the User Agent cannot determine the type of the file from its contents.

Inside a WebBook Level 1 instance, all internal references (links, references to replaced elements, etc.) SHOULD be strictly relative. With respect to section 4.2 of [RFC3986], only path-noscheme and path-empty are allowed for IRIs' relative-part. References to resources external to the WebBook instance are not restricted.

An example of Moby Dick packaged as a WebBook can be found there.

The navigation document MUST be a [html] document named index.html if its serialization is HTML and index.xhtml if its serialization is XML. The navigation document MUST be placed inside the topmost directory inside the containing WebBook instance.

If both index.html and index.xhtml are present in the WebBook instance, the User Agent must use index.html as the navigation document.

The navigation document is a regular html document, intended to be rendered by a Web browser, that contains the following information:

2.2. The title

The title of a WebBook instance represents the title of the electronic book and is the title a User Agent SHOULD present to users. The title of a WebBook instance is contained in the html the title element element of the WebBook instance’s navigation document.

<html lang="en">
  <head>
    <title>Moby-Dick</title>
  </head>
  ....
</html>
If the navigation document has no title element, the title of the WebBook instance is the empty string.

User agents should use the navigation document’s title when referring to the containing WebBook in their user interface. When the contents of a title element are used in this way, the directionality of that title element should be used to set the directionality of the WebBook’s title in the user interface.

If the root element of the navigation document of a given WebBook specifies a primary language for the navigation document, that language is also the primary language of that WebBook.

The principal writing mode of a WebBook is the same as the principal writing mode of its navigation document, as specified in CSS Writing Modes Level 3 §principal-flow, and the page progression direction of a WebBook is the same as the page progression direction of its navigation document, as specified in CSS Writing Modes Level 3 §page-direction and CSS Paged Media Module Level 3 §progression.

Note: The above implies that authors and authoring tools need to set the dir attribute to "rtl" on the html or body element of the navigation document for WebBooks written in right-to-left languages and scripts to work properly.

A WebBook MAY contain a unique identifier (no other WebBook may have the same identifier; multiple instances of the same WebBook can have the same identifier), such as a UUID, DOI or ISBN.

That identifier should be contained in a html element itself contained in the navigation document and expressed using [RDFA-PRIMER], for example through the property and vocab attributes.

<span property="http://purl.org/dc/elements/1.1/identifier">
  urn:uuid:A1B0D67E-2E81-4DF5-9E67-A64CBE366809
</span>
<span vocab="http://purl.org/dc/elements/1.1/" property="identifier">
  urn:uuid:A1B0D67E-2E81-4DF5-9E67-A64CBE366809
</span>

It’s also possible to use the content attribute on a html meta element but that approach makes the property not easily editable in a Wysiwyg editor and is therefore strongly discouraged in the context of a WebBook.

The navigation document can contain any number of extra metadata, expressed through [RDFA-PRIMER], for example through the property and vocab attributes.

It’s also possible to use the content attribute on a html meta element but that approach makes the property not easily editable in a Wysiwyg editor and is therefore strongly discouraged in the context of a WebBook.

<html lang="en">
  <head>
    <title>Moby-Dick</title>
  </head>
  <body>
    ...
      <span property="http://purl.org/dc/elements/1.1/identifier">
        urn:isbn:9780316000000
      </span>
      ...
      <span property="http://purl.org/dc/terms/modified">
        2012-01-13T01:13:00Z
      </span>
      ...
      <span property="http://purl.org/dc/terms/creator">
        Herman Melville
      </span>
      ....
      <span property="http://purl.org/dc/terms/contributor">
        Dave Cramer
      </span>
    ...
  </body>
</html>

A WebBook instance MAY contain no more than one collection of navigation data serving two purposes:

  1. it specifies the Reading Order of the resources composing the WebBook

  2. it specifies the Table of Contents of the WebBook

If present, navigation data are a nav html element carrying the doc-toc role [DPUB-ARIA-1.0] role and contained in the body element of the navigation document.

The collection of all hyperlinks (a elements) inside navigation data, in document tree traversal order, specifies the Reading Order of all the resources composing the WebBook.

The collection of all hyperlinks (a elements) inside navigation data having no inclusive ancestor [DOM] holding a html hidden attribute, in document tree traversal order, specifies the Table of Contents of the WebBook.

If the navigation document does not have navigation data or if it does have navigation data that contains no hyperlink (and therefore would define an empty Reading order and Table of Contents) then the Reading Order and Table of Contents are defined to contain the navigation document.

Except in documents with no navigation data or an empty one, the navigation document is not included in the reading order by default, and WebBook-capable User Agents would skip it when displaying the book from the start.

It is is possible to include the navigation document into the Reading Order by having a link (possibly hidden) to it from the navigation data.

This contrived but fully functional and valid example shows a minimalist WebBook whose navigation document also serves as the first document in the reading order.

index.html:

<!doctype html>
<html lang=en>
<meta charset=utf-8> <meta name=viewport content="width=device-width">
<title>A Good Joke</title>
<nav role=doc-toc>
<h1><a href=#>A Good  Joke</a></h1>
  <p>Why did the chicken cross the road?
  <p><a href="punchline.html">Punchline</a>
</nav>

punchline.html:

<!doctype html>
<html lang=en>
<meta charset=utf-8> <meta name=viewport content="width=device-width">
<title>A Good Joke’s Punchline</title>
<p>To get to the other side.

User Agents can use the Reading Order to render all its resources, for example, in one single paginated flow. User Agents can use the Table of Contents for example to provide the user with an ordered collection of resources he/she can directly navigate to on demand.

<html lang="en">
  <head>
    <title>Moby-Dick</title>
  </head>
  <body>
    <nav role="doc-toc">
      <ul>
        <li><a href="html/cover.html" hidden>Cover</a></li>
        <li><a href="html/titlepage.html">Moby-Dick</a></li>
        <li><a href="html/epigraph.html" hidden>EXTRACTS (Supplied by a Sub-Sub-Librarian)</a>.</li>
        <li><a href="html/chapter_001.html>Chapter 1. Loomings.</a></li>
        ...
      </ul>
    </nav>
  </body>
</html>

2.8. Content Documents

A Content Document is a file object, present in the WebBook instance and referenced from the Navigation Data, or the Navigation Document itself.

This specification imposes no restriction on the type of Contents Documents. Any file type or format accepted by modern Web rendering engines (eg. images, videos, html, styled XML, SVG, etc.) can be a WebBook Content Document.

2.8.1. Content Document Navigation Hints

In the case of a HTML (any version or serialization) Content Document, it is recommended to add to the head element of the document information about other Content Documents immediately reachable for the user from the current one.

This is achieved through a link element having a rel attribute specifying the relationship between the current document and the target of the link, and a href attribute holding a relative URL to the target Content Document in the WebBook instance.

User Agents may use these links, if present, to offer navigation to target Content Documents from the rendering of a given html Content Document instead of relying on the Navigation Data.

The following table lists the possible relationships:

rel value Relationship
contents The target is the Navigation Document
next The target is the next Content Document in Reading Order
prev The target is the previous Content Document in Reading Order
index The target is a document providing an index for (at least) the current document.
glossary The target is a document providing a glossary of terms that pertain to (at least) the current document
start The target is the first document in the Reading order
end The target is the last document in the Reading order
bookmark The target is a document providing a list of bookmarks that pertain to (at least) the current document
Some of these values were originally introduced in [HTML32] or [HTML401] but were removed from the [html] specification. Although not present in the specified list of values for that attribute in [html], the html validator does not refuse them.
Examples of Navigation Hints in a Content Document:
<link rel="contents" href="../index.html">
<link rel="prev" href="chapter015.xhtml">
<link rel="next" href="chapter017.xhtml">
Please note the rel value for the previous Content Document in Reading Order is prev and not previous

3. Compatibility with EPUB 3.0.1

This section is informative.

It is easily possible to turn a valid [EPUB301] package into a WebBook, retaining full compatibility with [EPUB301], using the following instructions:

  1. modify the EPUB package so its Navigation Document [EPUB-CONTENTDOCS301] is now:

    1. named index.xhtml

    2. placed inside the topmost directory of the package

  2. update all links and references to extra resources inside the Navigation Document to reflect the new name and location of the file inside the package (if needed)

  3. add the role attribute with value doc-toc to the toc nav element of the Navigation Document

  4. update the reference to the Navigation Document in the manifest of the Package Document [EPUB-PUBLICATIONS301] of the package to reflect the new name and location of the file inside the package (if needed)

  5. update references to the Navigation Document inside other Content Documents [EPUB-CONTENTDOCS301] of the package to reflect the new name and location of the file inside the package (if needed)

An example of Moby Dick packaged in conformance to both this specification and EPUB 3.0.1 can be found there.

4. Frequently Asked Questions

This section is informative.

Why is there no container.xml file to be even more compatible with EPUB?

In EPUB, the container.xml file is supposed to hold a list of existing renditions in the EPUB package, but this never really worked: first, there are almost no EPUB packages containing multiple renditions in the wild; second, and the former is probably a side-effect of the latter, the EPUB specs say that EPUB Reading Systems must use the first OPF rendition available and nothing is said about the other potential ones; third, the file must be, for historical reasons, contained in a META-INF folder that does not make sense any more. Furthermore, that’s one step too much to reach the metadata of the document, contained in the OPF file. Even the EPUB version is not available there, only on the OPF. All in all, the container.xml of EPUB is almost useless. That’s why WebBook has no container.xml file.

What about the EPUB Multiple-Rendition Publications 1.0 specification?

That specification introduced a mechanism for User Agents to select a rendition based on the characteristics of the reading device. As far as we can tell, it is not implemented. It could have helped compatibility between EPUB2 and EPUB3 User Agents through the creation of EPUB packages containing both a EPUB2 rendition and a EPUB3 rendition; that required to be able to select a rendition based on its EPUB version. Unfortunately, it’s not in the specification.

No constraint on Content Documents?

No. That’s a design choice. At the notable exception of Amazon KF7/KF8, almost all EPUB Reading Systems are based on the WebKit or the Blink rendering engines that accept all flavors of HTML, styled XML, SVG and more. There is no reason to select a given flavor of html or even a serialization of html.

(to be extended)

5. Known implementations

This section is informative.

6. Acknowledgements

The author would like to thank the following individuals for their invaluable contributions to this document throughout the numerous discussions he had with them: Dave Cramer, Florian Rivoal.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[DPUB-ARIA-1.0]
Matt Garish; et al. Digital Publishing WAI-ARIA Module 1.0. REC. URL: https://www.w3.org/TR/dpub-aria-1.0/
[EPUB-CONTENTDOCS301]
EPUB Content Documents 3.0.1. Recommended Specification. URL: http://www.idpf.org/epub/301/spec/epub-contentdocs.html
[EPUB-OCF301]
EPUB Open Container Format (OCF) 3.0.1. Recommended Specification. URL: http://www.idpf.org/epub/301/spec/epub-ocf.html
[EPUB-PUBLICATIONS301]
EPUB Publications 3.0.1. Recommended Specification. URL: http://www.idpf.org/epub/301/spec/epub-publications.html
[EPUB301]
EPUB 3.0.1. Recommended Specification. URL: http://www.idpf.org/epub/301/spec/epub-overview.html
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[RFC3986]
T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet Standard. URL: https://tools.ietf.org/html/rfc3986
[UAX15]
Mark Davis; Ken Whistler. Unicode Normalization Forms. 26 May 2017. Unicode Standard Annex #15. URL: https://www.unicode.org/reports/tr15/tr15-45.html
[UNICODE]
The Unicode Standard. URL: https://www.unicode.org/versions/latest/

Informative References

[HTML32]
Dave Raggett. HTML 3.2 Reference Specification. 14 January 1997. REC. URL: https://www.w3.org/TR/REC-html32
[HTML401]
Dave Raggett; Arnaud Le Hors; Ian Jacobs. HTML 4.01 Specification. 24 December 1999. REC. URL: https://www.w3.org/TR/html401/
[RDFA-PRIMER]
Ivan Herman; et al. RDFa 1.1 Primer - Third Edition. 17 March 2015. NOTE. URL: https://www.w3.org/TR/rdfa-primer/
[ZIP]
.ZIP File Format Specification 6.3.3. URL: https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.3.TXT