1. Introduction
This section is informative.
The world of electronic books (ebooks) is very fragmented, technically speaking. The current electronic book formats, inheriting from multiple sources, are not readable inside a browser without a dedicated programmatic layer. Even if the ZIP package containing the electronic book is unzipped, finding and rendering the individual documents composing the book is not a task anyone can perform. Furthermore, most of these existing formats rely a lot on dedicated XML or even proprietary binary formats that raise many technical issues.
The current document proposes the Level 1 of a new electronic book format, called WebBook, designed to free the electronic book market from its current industrial silo to make it become a real first-class client of the Web.
1.1. Requirements
This section is informative.
The following requirements are the basis for the technical choices in this specification:
-
one URL is enough to retrieve a remote WebBook instance, there is no need to download every resource composing that instance
-
the contents of a WebBook instance can be placed inside a Web site’s directory and are directly readable by a Web browser using the URL for that directory
-
the contents of a WebBook instance can be placed inside a local directory and are directly readable by a Web browser opening its
index.html
orindex.xhtml
topmost file -
each individual resource in a WebBook instance, on a Web site or on a local disk, is directly readable by a Web browser
-
any html document can be used as content document inside a WebBook instance, without restriction
-
any stylesheet, replaced resource (images, audio, video, etc.) or additional resource useable by a html document (JavaScript, manifests, etc.) can be used inside the navigation document or the content documents of a WebBook instance, without restriction
-
the navigation document and the content documents inside a WebBook instance can be created and edited by any html editor
-
the metadata, table of contents contained in the navigation document of a WebBook instance can be created and edited by any html editor
-
the WebBook specification is backwards-compatible
-
the WebBook specification is forwards-compatible, at the potential cost of graceful degradation of some content
-
WebBook instances can be recognized without having to detect their MIME type
-
it’s possible to deliver electronic books in a form that is compatible with both this specification and [EPUB301]
2. WebBook instances
A WebBook instance is a [ZIP] container. All files in a WebBook instance MUST be stored as is (no compression) or using the Deflate algorithm. All file and directory names must be encoded in UTF-8 [UNICODE]. All File Names within the same directory must be unique following case normalization as described in section 3.13 of [UNICODE]. All File Names within the same directory should be unique following NFC or NFD normalization [UAX15].
The file name of a WebBook instance SHOULD use the wbook
file extension, unless compatibility with [EPUB301] is added; in that case, the epub
file
extension is recommended. User Agents should always treat a zipped
package with file extension wbook
or epub
or
using the EPUB
Media Type [EPUB-OCF301] as a potential WebBook.
A WebBook instance MUST contain a navigation document.
A WebBook instance and each directory inside that WebBook instance can contain any number of files and directories. All files inside a WebBook instance SHOULD have a file extension to ensure all User Agents (including Web browsers) can render the file if necessary even if no HTTP header is present or if the User Agent cannot determine the type of the file from its contents.
Inside a WebBook Level 1 instance, all internal references (links,
references to replaced elements, etc.) SHOULD be strictly relative. With
respect to section 4.2 of [RFC3986], only path-noscheme
and path-empty
are allowed for
IRIs' relative-part
. References to resources external to
the WebBook instance are not restricted.
An example of Moby Dick packaged as a WebBook can be found there.
2.1. The navigation document
The navigation document MUST be a [html] document named index.html
if its serialization is HTML and index.xhtml
if its serialization is XML. The navigation
document MUST be placed inside the topmost directory inside the
containing WebBook instance.
If both index.html
and index.xhtml
are present in the WebBook instance,
the User Agent must use index.html
as the navigation document.
The navigation document is a regular html document, intended to be rendered by a Web browser, that contains the following information:
-
metadata:
-
a title (optional)
-
a main language (optional)
-
a progression direction (optional)
-
a unique identifier (optional)
-
other metadata (optional)
-
-
navigation data (optional)
-
other html elements (optional)
2.2. The title
The title of a WebBook instance represents the title of the electronic book and is the title a User Agent SHOULD present to users. The title of a WebBook instance is contained in the html the title element element of the WebBook instance’s navigation document.
If the navigation document has notitle
element, the
title of the WebBook instance is the empty string.
User agents should use the navigation document’s title when referring
to the containing WebBook in their user interface. When the contents
of a title
element are used in this way, the directionality of that title
element should be used to set the
directionality of the WebBook’s title in the user interface.
2.3. The main language
If the root element of the navigation document of a given WebBook specifies a primary language for the navigation document, that language is also the primary language of that WebBook.
2.4. The progression direction
The principal writing mode of a WebBook is the same as the principal writing mode of its navigation document, as specified in CSS Writing Modes Level 3 §principal-flow, and the page progression direction of a WebBook is the same as the page progression direction of its navigation document, as specified in CSS Writing Modes Level 3 §page-direction and CSS Paged Media Module Level 3 §progression.
Note: The above implies that authors and authoring tools
need to set the dir
attribute to "rtl"
on the html
or body
element of the navigation document
for WebBooks written in right-to-left languages and scripts to work properly.
2.5. The identifier
A WebBook MAY contain a unique identifier (no other WebBook may have the same identifier; multiple instances of the same WebBook can have the same identifier), such as a UUID, DOI or ISBN.
That identifier should be contained in a html element itself contained
in the navigation document and
expressed using [RDFA-PRIMER], for example through the property
and vocab
attributes.
<span property="http://purl.org/dc/elements/1.1/identifier"> urn:uuid:A1B0D67E-2E81-4DF5-9E67-A64CBE366809 </span>
<span vocab="http://purl.org/dc/elements/1.1/" property="identifier"> urn:uuid:A1B0D67E-2E81-4DF5-9E67-A64CBE366809 </span>
It’s also possible to use the content
attribute on a html meta
element but that approach makes the property not easily editable in a
Wysiwyg editor and is therefore strongly discouraged in the context of
a WebBook.
2.6. Other metadata
The navigation document can contain any
number of extra metadata, expressed through [RDFA-PRIMER], for
example through the property
and vocab
attributes.
It’s also possible to use the content
attribute on a html meta
element but that approach makes the property not easily editable in a
Wysiwyg editor and is therefore strongly discouraged in the context of
a WebBook.
<html lang="en"> <head> <title>Moby-Dick</title> </head> <body> ... <span property="http://purl.org/dc/elements/1.1/identifier"> urn:isbn:9780316000000 </span> ... <span property="http://purl.org/dc/terms/modified"> 2012-01-13T01:13:00Z </span> ... <span property="http://purl.org/dc/terms/creator"> Herman Melville </span> .... <span property="http://purl.org/dc/terms/contributor"> Dave Cramer </span> ... </body> </html>
2.7. Navigation data
A WebBook instance MAY contain no more than one collection of navigation data serving two purposes:
-
it specifies the Reading Order of the resources composing the WebBook
-
it specifies the Table of Contents of the WebBook
If present, navigation data are a nav
html element carrying the doc-toc
role [DPUB-ARIA-1.0] role and contained in the body
element of the navigation document.
The collection of all hyperlinks (a
elements) inside
navigation data, in document tree traversal order, specifies the Reading Order of all the resources composing the WebBook.
The collection of all hyperlinks (a
elements) inside
navigation data having no inclusive
ancestor [DOM] holding a html hidden attribute, in document tree traversal order, specifies the Table of
Contents of the WebBook.
If the navigation document does not have navigation data or if it does have navigation data that contains no hyperlink (and therefore would define an empty Reading order and Table of Contents) then the Reading Order and Table of Contents are defined to contain the navigation document.
It is is possible to include the navigation document into the Reading Order by having a link (possibly hidden) to it from the navigation data.
index.html
:
<!doctype html>
<html lang=en>
<meta charset=utf-8> <meta name=viewport content="width=device-width">
<title>A Good Joke</title>
<nav role=doc-toc>
<h1><a href=#>A Good Joke</a></h1>
<p>Why did the chicken cross the road?
<p><a href="punchline.html">Punchline</a>
</nav>
punchline.html
:
<!doctype html>
<html lang=en>
<meta charset=utf-8> <meta name=viewport content="width=device-width">
<title>A Good Joke’s Punchline</title>
<p>To get to the other side.
User Agents can use the Reading Order to render all its resources, for example, in one single paginated flow. User Agents can use the Table of Contents for example to provide the user with an ordered collection of resources he/she can directly navigate to on demand.
<html lang="en"> <head> <title>Moby-Dick</title> </head> <body> <nav role="doc-toc"> <ul> <li><a href="html/cover.html" hidden>Cover</a></li> <li><a href="html/titlepage.html">Moby-Dick</a></li> <li><a href="html/epigraph.html" hidden>EXTRACTS (Supplied by a Sub-Sub-Librarian)</a>.</li> <li><a href="html/chapter_001.html>Chapter 1. Loomings.</a></li> ... </ul> </nav> </body> </html>
2.8. Content Documents
A Content Document is a file object, present in the WebBook instance and referenced from the Navigation Data, or the Navigation Document itself.
This specification imposes no restriction on the type of Contents Documents. Any file type or format accepted by modern Web rendering engines (eg. images, videos, html, styled XML, SVG, etc.) can be a WebBook Content Document.
2.8.1. Content Document Navigation Hints
In the case of a HTML (any version or serialization) Content Document,
it is recommended to add to
the head
element of the document information about other Content Documents immediately reachable for the user from the current one.
This is achieved through a link
element having a rel
attribute specifying the relationship between the current document and the
target of the link, and a href
attribute holding a relative URL
to the target Content Document in the WebBook instance.
User Agents may use these links, if present, to offer navigation to target Content Documents from the rendering of a given html Content Document instead of relying on the Navigation Data.
The following table lists the possible relationships:
rel value
| Relationship |
contents | The target is the Navigation Document |
next | The target is the next Content Document in Reading Order |
prev | The target is the previous Content Document in Reading Order |
index | The target is a document providing an index for (at least) the current document. |
glossary | The target is a document providing a glossary of terms that pertain to (at least) the current document |
start | The target is the first document in the Reading order |
end | The target is the last document in the Reading order |
bookmark | The target is a document providing a list of bookmarks that pertain to (at least) the current document |
<link rel="contents" href="../index.html"> <link rel="prev" href="chapter015.xhtml"> <link rel="next" href="chapter017.xhtml">
rel
value for the previous
Content Document in Reading Order is prev
and not previous
3. Compatibility with EPUB 3.0.1
This section is informative.
It is easily possible to turn a valid [EPUB301] package into a WebBook, retaining full compatibility with [EPUB301], using the following instructions:
-
modify the EPUB package so its Navigation Document [EPUB-CONTENTDOCS301] is now:
-
named
index.xhtml
-
placed inside the topmost directory of the package
-
-
update all links and references to extra resources inside the Navigation Document to reflect the new name and location of the file inside the package (if needed)
-
add the
role
attribute with valuedoc-toc
to thetoc nav
element of the Navigation Document -
update the reference to the Navigation Document in the manifest of the Package Document [EPUB-PUBLICATIONS301] of the package to reflect the new name and location of the file inside the package (if needed)
-
update references to the Navigation Document inside other Content Documents [EPUB-CONTENTDOCS301] of the package to reflect the new name and location of the file inside the package (if needed)
An example of Moby Dick packaged in conformance to both this specification and EPUB 3.0.1 can be found there.
4. Frequently Asked Questions
This section is informative.
- Why is there no
container.xml
file to be even more compatible with EPUB? -
In EPUB, the
container.xml
file is supposed to hold a list of existing renditions in the EPUB package, but this never really worked: first, there are almost no EPUB packages containing multiple renditions in the wild; second, and the former is probably a side-effect of the latter, the EPUB specs say that EPUB Reading Systems must use the first OPF rendition available and nothing is said about the other potential ones; third, the file must be, for historical reasons, contained in aMETA-INF
folder that does not make sense any more. Furthermore, that’s one step too much to reach the metadata of the document, contained in the OPF file. Even the EPUB version is not available there, only on the OPF. All in all, thecontainer.xml
of EPUB is almost useless. That’s why WebBook has nocontainer.xml
file. - What about the EPUB Multiple-Rendition Publications 1.0 specification?
-
That specification introduced a mechanism for User Agents to select a rendition based on the characteristics of the reading device. As far as we can tell, it is not implemented. It could have helped compatibility between EPUB2 and EPUB3 User Agents through the creation of EPUB packages containing both a EPUB2 rendition and a EPUB3 rendition; that required to be able to select a rendition based on its EPUB version. Unfortunately, it’s not in the specification.
- No constraint on Content Documents?
-
No. That’s a design choice. At the notable exception of Amazon KF7/KF8, almost all EPUB Reading Systems are based on the WebKit or the Blink rendering engines that accept all flavors of HTML, styled XML, SVG and more. There is no reason to select a given flavor of html or even a serialization of html.
(to be extended)
5. Known implementations
This section is informative.
-
epub3towebbook, a Node.js script to convert a EPUB3 package into a EPUB3-compatible WebBook
-
the next public version of BlueGriffon will create EPUB3-compatible WebBooks
6. Acknowledgements
The author would like to thank the following individuals for their invaluable contributions to this document throughout the numerous discussions he had with them: Dave Cramer, Florian Rivoal.