Visit Page
Skip to content

Metadata for ebooks

Posted in Just for Writers

justmetadataI was talking to a colleague about missing metadata in an ebook file and discovered that she didn't fully understand what I was talking about.

I'm sure she's not alone.

So here's a little overview about where it comes from and how it's used. As they say, the devil's in the details.

What metadata is visible to retailers?

Here’s an entry for one of my books at a random retailer I’ve never heard of. Since it seems to be in Romania, it probably got there via distributor PublishDrive. (Many retailers are worse, even those whose local language is English.)

Produs publicat in 2012 la Perkunas Press
Data aparitiei: Octombrie 2012
Colectia The Hounds of Annwn
ISBN EPUB: 9780963538413
Formate: ePub (Adobe DRM)
Drepturi utilizare: 6
Compatibil cu: PC/Mac, iPad/iPhone, Android, Nook, Sony, Trekstor (afla mai multe)
Clasament bestseller:
#11005 in Carti digitale
#4 in eBooksCarte strainaEN – FICTIONFantasy – Contemporary

PublishDrive asks all sorts of metadata information, just like Amazon KDP does, and this site seems to understand that the book is part of a series (The Hounds of Annwn), though its link to a series page points somewhere else in error.

Colectia The Hounds of Annwn

Understandably, it doesn’t note which entry in the series it is (should be #1) since it doesn’t have a field for it on the screen.

categoriesI am fascinated to discover that the book is #4 in its category (especially since I have no known sales here), though less so when I realize (clicking on “Fantasy – Contemporary”) that my books are the ONLY books in that category, no doubt because it’s an English-language category.

#4 in eBooksCarte strainaEN – FICTIONFantasy – Contemporary

In this case, note how few English fiction (EN – FICTION) books there are, and how Fantasy is displayed.

Constraints and remedies

Retailers can only display data they've planned for. If they haven’t planned for non-English categories, then they can’t use them (and won’t translate them on the fly or try to match them up to the local language). If they don’t have a field for series order (or sometimes for series name), then they won’t display it.

Even the best and most detail-oriented of modern distributors can’t shove fields (like series order) into retailer sites that don’t use them.

Nor can it provide the local language version of search categories, nor make the retailer have the correct links to a series page.

As publishers, we can do little at this time about the latter, but we can help with the first of these issues by making sure we have redundancy built in (in this case, in the book description which includes series and series order, as well as a one-line tag – highlighted here in bold).


Book 1 of The Hounds of Annwn.


George Talbot Traherne is just doing his job on a fine autumn morning, keeping the hounds together for the huntsman of the Rowanton Hunt in Virginia along the Blue Ridge Mountain. Doesn’t pay to get distracted by a white stag in unfamiliar territory, though. Next thing you know, you might find yourself… somewhere

Citeste mai mult… [click for more]

What metadata is visible to various tools and apps?

Ebooks contain their own internal metadata that is unrelated to all the metadata questions asked by distributors and retailer/distributors (like Amazon).

There are tools that use that information directly. Most ebook apps for individual libraries or collections, for example, assume that ebooks will come from multiple retailers and rely upon the ebook's internal information. One such popular tool is Calibre.


All the data presented on this screen (except the series name and number) came from inside the ebook file, independent of the retailer. Even the book cover — this is part of the ebook file, not the book cover image you may have presented to the distributor or retailer.

Here’s what it looks like for my colleague's most recent book. (An excellent read, by the way, as are all her books.) The ebook was formatted for her by Vellum, I believe.


There is no book description, and much of the metadata is missing or never defined.

What's the relationship between externally and internally defined metadata?

As must be obvious, there are no good rules for metadata which operate in the book world, only rough expediency.

In a well-designed system, there would be a defined set of metadata, a place for ebooks to define that internally, and a way for all distributors and retailers to pull from that data and present it on their websites.

Instead, it’s too difficult to obtain consistency across all the various players in the industry, so no one tries (yet). Distributors attempt to impose consistency in the metadata they solicit from publishers at the external-to-ebook-file level, and push that data to their retailers, which may or may not have a place for it, use it properly, or account for multi-lingual issues, such as category names.

Not all distributors solicit exactly the same metadata and things like categories are language-specific and based on a variety of schema (BISAC, Amazon (on the fly), etc.) which are not mutually compatible, since they were designed for different purposes (BISAC for library-level categories, Amazon for niche marketing). As new concepts arise (not just Series, but Series Order) which rate their own metadata fields, various parts of the industry infrastructure lurch inconsistently into operation to accommodate them.

Where does the metadata come from?

That depends…


External metadata originates at the distributor or retailer/distributor level, in a single language. There is no formal consistency. Book descriptions may or may not survive their journey to a retailer website with intact formatting or in their full length, and provided fields may be ignored.


Metadata within the ebook file is better defined. Ebook files are zipped HTML files, and there are subject-specific standards defined for many HTML categories, which are known as schema. Ebook metadata is part of the Dublin Core Metadata Initiative (DCMI) – very technical but intended to be a foundation so that all software can recognize subject-domain-specific metadata.

In other words, when a browser or some other tool sees a field like this, it knows the subject-domain is books and it knows what sort of metadata field is appropriate to “books”. This adds intelligence to HTML.

Not all tools which can use this metadata in the ebook files actually use it.

Here's the metadata section of the Content.OPF file in my ebook example. (The funky characters in the book description are just a way of representing HTML bold, italics, paragraph marks, etc., without interpreting them.)

<metadata xmlns:dc=”” xmlns:opf=””>
<!–Required Metadata–>
<dc:identifier id=”BookId” opf:scheme=”ISBN”>urn:isbn:0963538411</dc:identifier>
<!–Use the Same for the toc.ncx file –>
<dc:title>To Carry the Horn</dc:title>
<dc:creator opf:role=”aut”>Karen Myers</dc:creator>
<dc:publisher>Perkunas Press</dc:publisher>
<dc:date opf:event=””>2012-10-08</dc:date>
<dc:date opf:event=”modification”>2016-07-27</dc:date>
<dc:subject>Myths & Legends</dc:subject>
<dc:subject>Folk Tales</dc:subject>
<dc:description><div><p class=”description”>Book 1 of The Hounds of Annwn. </p><p class=”description”>NEW JOB, NEW FAMILY, AND IN TWO WEEKS THE END OF A WORLD HE’S JUST DISCOVERED, IF HE CAN’T RISE TO THE CHALLENGE. </p><p class=”description”>George Talbot Traherne is just doing his job on a fine autumn morning, keeping the hounds together for the huntsman of the Rowanton Hunt in Virginia along the Blue Ridge Mountain. Doesn’t pay to get distracted by a white stag in unfamiliar territory, though. Next thing you know, you might find yourself… somewhere else. </p><p class=”description”>The land is the same but not the people. Their huntsman has just been murdered, and George is tapped for the job. It’s an emergency — the Wild Hunt is only two weeks away, and if it doesn’t happen on schedule, the antlered god Cernunnos will take the realm from its ruler Gwyn ap Nudd and find someone who can mete out justice with the Hounds of Hell in his place. </p><p class=”description”>George throws himself into the task, finding strength in the mission and resources he never knew he had. The more he comes to feel at home, settling into his new responsibilities, the more he wants to stay and make a life for himself. He’s finally met someone worth spending his life with, even if she’s just a bit older, a mere fifteen hundred years or so. </p><p class=”description”>Can he keep the Wild Hunt on track despite the attempts to thwart it? Will he be accepted by those he wants to defend who view his timely presence and his human blood with suspicion? Above all, what does Cernunnos want of him and how far will he go? Can he survive the attention of a god? </p><p class=”description”>Readers who are familiar with the sources of Arthurian literature such as the Mabinogion will recognize many of the characters, flourishing still in the world we cannot quite reach.</p></div></dc:description>
<dc:rights>All Rights Reserved</dc:rights>
<meta content=”My_Cover_ID” name=”cover” />
<meta content=”0.7.2″ name=”Sigil version” />


There is no way to fix the External data source issues, except to provide whatever redundancy is possible via universal fields such as the Book Description.

There are some tools today that use Internal metadata from the ebook file, and to the degree that internet HTML standards penetrate slowly but surely, there is reason to expect that eventually more and more tools will do the same, since it’s the only way for them to present retailer-independent metadata. The hope is that distributors and retailer/distributors will also treat that information from the ebook as a metadata source (someday), though that is more than a little optimistic.

Though the example metadata section above looks intimidating, it only needs to be set up once, and then the details modified for each subsequent book that is formatted. Of course, this assumes that you (or your formatter) sees the actual ebook HTML file (in a product like Sigil), rather than some sort of hand-holding simplifying representation of it through an intermediate piece of software, like Scrivener or Vellum.

What's in that black box?
What's in that black box? And where's my metadata?

Like most “black box” solutions, software that makes things easier for you in one way often makes things harder or impossible in other ways.

But cheer up — the traditional publishers frequently omit both book cover (they supply a placeholder, not an actual cover) and book description from their internal ebook metadata, so everyone's struggling with this.

Subscribe to My Newsletter

...and receive a free ebook: The Call, a short story that precedes the start of The Hounds of Annwn.


  1. Thanks for checking into this – but my head started to hurt almost immediately.

    I’m assuming I won’t have to deal with this for now – one book, only on Amazon – and will stick it on the To Do list.

    It’s always something.

    November 19, 2016
  2. Yes, I did feel like I ought to hand out aspirin at the same time. 🙂

    Depends on where you are how much of this will start to make sense to you.

    November 20, 2016

Leave a Reply

Your email address will not be published. Required fields are marked *