Long looooong ago, I wrote a deep review of the XHTML 2.0 spec that was one of the elements that led to the resuming of the HTML activity at the W3C and the final dismissal of XHTML 2.0.
Long ago, I started a similar effort on EPUB that led to Dave Cramer's EPUB Zero. It's time (fr-FR) to draw some conclusions.
This document is maintained on GitHub and accepting contributions. The document can be read at http://glazman.org/e0/e0.html.
Daniel Glazman
mimetype file in first
uncompressed position" constraint since I think the vast majority
of reading systems don't care (and can't care) because most of the
people creating EPUB at least partially by hand don't know/care. The
three last contraints (zip container fields) on the ZIP package
described in section 4.2 of the spec are usually not implemented by
Reading Systems.container element of the META-INF/container.xml
file has a version attribute that is always "1.0",
whatever the EPUB version. That forces editors, filters and reading
systems to dive into the default rendition to know the EPUB version and
that's clearly one useless expensive step too much.multipart/alternative.
When Borenstein and Freed added it to the draft of RFC 1341 some 25
years ago, Mail User Agents developers (yours truly counted) envisioned
and experimented far more than alternatives between text/html
and text/plain only. I am under the impression multiple
renditions start from the same good will but fail to meet the goals for
various reasons:
application/oebps-package+xml mimetype while the
EPUB 3 family of specs defines it as the first rendition in the
containertext/html and output for you the multipart/alternative
between that text/html and its text/plain
serialization, each Publication rendition must be edited separately.META-INF/metadata.xml file
has always been quoted as "This version of the OCF specification
does not define metadata for use in the metadata.xml file.
Container-level metadata may be defined in future versions of this
specification and in IDPF-defined EPUB extension specifications."
by all EPUB specifications. The META-INF/metadata.xml
should be dropped.full-path attribute on rootfile
elements is the only path in a publication that is relative to the
publication's directory. All other URIs (for instance in href
attributes) are relative to the current document instance. I think full-path
should be deprecated in favor of href here, and finally
superseded by href for the next major version of EPUB.mimetype attribute on rootfile
elements since the prose of EPUB 3.1 says the target of a rootfile must
be a Package Document, i.e. an OPF file... If EPUB 2 OCF could directly
target for instance a PDF file, it's not the case any more for OCF 3./files/xhtml/index.html with
a leading slash, cf. path-absolute construct in RFC 3986)
are harmful to EPUB+Web convergenceMETA-INF/container.xml
file becomes useless and it can be dropped.META-INF/manifest.xml makes me wonder
why this still exists: "does not mandate a format", "MUST NOT be used".
Don't even mention it! Just say that extra unspecified files inside META-INF
directory must be ignored (Cf. OCF section 3.5.1) , and possibly reserve
the metadata.xml file name, period. Oh, and a ZIP is also
a manifest of files...encryption.xml and signatures.xml
filesrights.xml file still has no specified format.
Strange. Cf. item 8 above.container.xml
file for instance shouls have models and attribute lists for each
element, similarly to the Packages spec.links and link
elements in a container.xml file... (Cf. issue
#374). The way these links are processed is unspecified anyway.
Why are these elements normatively specified since extra elements are
allowed - and explicitely ignored by spec - in the container?spine element represent the default reading order of
the package. Basically, it's a list. We have lists in html, don't we?
Why do we need a painful and complex proprietary xml format here?linear attribute, that discriminates
between primary and supplementary content, is extremely badly chosen. I
always forget what really is linear because of that.linear
attribute, making it pointless from an author's point of view.collection element used and I
don't really understand why it contains link elements and
not itemref elementsrefines
in 3.0 was a bit of a hell (despite warnings to the EPUB WG...), and
it's gone from 3.1, replaced by new attributes. So no forwards
compatbility, no backwards compatibility. Yet another parser and
serializer for EPUB-compliant user agents.guide element is now a html landmarks
list, proving it's feasible to move OPF features to htmlrefines, there is absolutely nothing any
more in 3.1 preventing Package's metadata to be expressed in html; in
3.0, the refines attribute was a blocker, implying an
extension of the model of the meta html element or another
ugly IDREF mechanism in html.prefix attribute on the package element is a good
thing and should be preservedrendition-flow property is weird, its values being paginated,
scrolled-continuous, scrolled-doc and auto.
Where is paginated-doc, the simplest paginated mode to
implement?nav
elements having a special epub:type/role (see
issue #941),
that's easy to make it contain an equivalent to the spine or more.mimetype file is uselesslinks/link
elementscontainer.xml file any
moremetadata.xml and manifest.xml files removedencryption.xml, signatures.xml
and rights.xml inside a META-INF directory (or directly
in the package's root after all) to please the industry.application/oebps-package+xml mimetype is not necessarybody
element of the Navigation Documentnav element inside
the body of the Navigation DocumentMETA-INF/container.xml
and the OPF file... Let's have the Navigation Document mandatorily named
index.xhtml so a directory browsing of the uncompressed
publication through http will render the Navigation Document.EPUB is a monster, made to address very diverse markets and ecosystems, too many markets and ecosystems. It's weak, complex, a bit messy, disconnected from the reality of the Web it's supposed to be built upon and some claim (link in fr-FR) it's too close to real books to be disruptive and innovative.
I am then suggesting to severe backwards compatibility ties and restart almost from scratch, and entirely and purely from W3C Standards. Here's the proposed result:
A E0 Publication is a ZIP container. Files in the ZIP must be stored as is (no compression) or use the Deflate algorithm. File and directory names must be encoded in UTF-8 and follow some restrictions (see EPUB 3.1 filename restrictions).
The file name of a E0 Publication MUST use the e0 file
extension.
Do we really need a mandatory file extension? edasfr thinks we don't so we can deal with zipped web sites.
A E0 Publication MUST contain a Navigation Document. It MAY contain files encryption.xml, signatures.xml and rights.xml (see OCF 3.1 for more information). All these files must be placed directly inside the root of the E0 Publication.
A E0 Publication can also contain Content Documents and their resources.
Inside a E0 Publication, all internal references (links, anchors,
references to replaced elements, etc) MUST be strictly relative. With
respect to section 4.2 of RFC 3986, only path-noscheme and path-empty
are allowed for IRIs' relative-part. External references are
not restricted.
A E0 Navigation Document is a html document. Its file name MUST be index.xhtml
if the document is a XML document and index.html if it it is
not a XML document. A E0 Publication cannot contain both index.html
and index.xhtml files.
index.html: part of the document, or just for metadata
A E0 Navigation Document contains at least one header
element (for metadata) and at least two nav html elements
(for spine and table of contents) in its body element.
dauwhe wants to preserve a manifest of file...
E0 metadata are designed to be editable in any Wysiwyg html editor, and potentially rendered as regular html content by any Web browser.
E0 metadata are expressed inside a mandatory header html
element inside the Navigation Document. That element must carry the "metadata"
ID and the vocab attribute with value "http://www.idpf.org/2007/opf".
All metadata inside that header element are then expressed
using html+RDFa Lite 1.1. E0 metadata reuse EPUB 3.1 metadata and
corresponding unicity rules, expressed in a different way.
edasfr wants a way to "externalize" all of this and link rel=toc it
Do we really need @vocab?
Explain that we use RDFa Lite 1.1 only for @vocab, @prefix and @property attributes. Explain that JSON-LD is not Wysiwygly editable nor trivially rendered by web browsers.
Refinements of metadata are expressed through nesting of elements.
Example:
<header id="metadata"
vocab="http://www.idpf.org/2007/opf">
<h1>Reading Order</h1>
<ul>
<li>Author:
<span property="dc:creator">glazou
(<span property="file-as">Glazman, Daniel</span>)</span></li>
<li>Title:
<span property="dc:title">E0 Publications</span></li>
</ul>
</header>
The mandatory title element of the Navigation Document,
contained in its head element, should have the same text
contents than the first "dc:title" metadata inside that header element.
edasfr does not like the paragraph above
rdeltour wants to keep Media Overlays
Make that header optional. The only mandatory thing is the title and it's in the title element of the document.
edasfr thinks special nav elements should not be identified by an ID. I agree this is restriction of the ID value namespace but I don't like his solution, using @role only.
The spine of a E0 Publication is expressed in its Navigation Document as
a new nav element holding the "spine" ID. The spine nav
element is mandatory.
Dave Cramer suggested to make the element optional and use the ToC instead if the spine is not present. I like it.
edasfr thinks we should drop the spine since it can be recreated from link rel=prev/next elements. This is true but expensive. And you still need the first document to render...
Explicit spine vs rel=next / rel=prev
See EPUB 3.1 Navigation Document.
The Table of Contents of a E0 Publication is expressed in its Navigation
Document as a nav element carrying the "toc" ID. The Table of Contents nav
element is mandatory.
edasfr thinks the ToC should be optional.. Ahem. He also wants the spine to be dropped. So how do we define the first document to read?
What to do if there are several tocs?
See EPUB 3.1 Navigation Document.
The Landmarks of a E0 Publication is expressed in its Navigation Document
as a nav element carrying the "landmarks" ID. The Landmarks nav
element is optional.
See EPUB 3.1 Navigation Document.
nav elementsThe Navigation Document may include one or more nav
elements. These additional nav elements should have an role
attribute to provide a machine-readable semantic, and must have a
human-readable heading as their first child.
IDs "metadata", "spine" , 'landmarks" and "toc" are reserved in the
Navigation Document and must not be used by these extra nav
elements.
<!DOCTYPE html>
<html lang="en">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="content-type">
<title>Moby-Dick</title>
</head>
<body>
<header id="metadata"
vocab="http://www.idpf.org/2007/opf">
<ul>
<li>Author:
<span property="dc:creator">Herman Melville
(<span property="file-as">Melville Herman</span>)</span></li>
<li>Title:
<span property="dc:title">Moby-Dick</span></li>
<li>Identifier:
<span property="dc:identifier">glazou.e0.samples.moby-dick</span></li>
<li>Language:
<span property="dc:language">en-US</span></li>
<li>Last modification:
<span property="dcterms:modified">2017-01-17T11:16:41Z</span></li>
<li>Publisher:
<span property="dc:publisher">Harper & Brothers, Publishers</span></li>
<li>Contributor:
<span property="dc:contributor">Daniel Glazman</span></li>
</ul>
</header>
<nav id="spine">
<h1>Default reading order</h1>
<ul>
<li><a href="cover.html">Cover</a></li>
<li><a href="titlepage.html">Title</a></li>
<li><a href="toc-short.html">Brief Table of Contents</a></li>
...
</ul>
</nav>
<nav id="toc" role="doc-toc">
<h1>Table of Contents</h1>
<ol>
<li><a href="titlepage.html">Moby-Dick</a></li>
<li><a href="preface_001.html">Original Transcriber’s Notes:</a></li>
<li><a href="introduction_001.html">ETYMOLOGY.</a></li>
...
</ol>
</nav>
</body>
</html>
A E0 Publication may contain any number of directories and nested directories.
E0 Content Documents are referenced from the Navigation Document. E0 Content Documents are html documents.
E0 Content Documents should contain <link rel="prev"...> and <link rel="next"...> elements in their head element conformant to the reading order of the spine present in the Navigation Document. Content Documents not present in that spine don't need such elements.
The epub:type attribute is superseded by the role attribute and must not be used.
E0 Publications can contain any number of extra resources (CSS stylesheets, images, videos, etc.) referenced from either the Navigation Document or Content Documents.