This topic came up in conversation elsewhere, inspiring me to do an annotated post of how I use Schema.org information to partially control how my book's metadata is presented on the Internet.
Intelligence for search engines
In an ideal universe, search engines would understand the context of the data that they retrieve. They would just know that a recipe is a recipe, that a book is a book, that a business location is a business location, and so forth. To the degree that they have gotten as far as they have, it's because of metadata — data about the data that they retrieve — that allows them some intelligence about presenting the information that they find.
To do this requires a combination of descriptive metadata from the data owner, and collation and presentation work from the search engine presenter. As in most such things, Google seems to be leading the way.
Google's Information Cards
When you search on a restaurant using Google, you get not only ranked links scrolling down the screen — you also get a nicely formatted “information card” on the top right of the screen that collects the information you would find most useful in an intelligent way.
Pictures explain this best. (Click on any image for a bigger version.)
For events, extra information is sometimes collatable and presentable.
For retails shopping applications, all sorts of information about pricing and availability can sometimes be presented, as well as basic company history.
Schema.org
So, how does Google do this? (Other search engines and even Facebook are doing similar things.) How does Google know that a restaurant is a restaurant, or that an event is an event?
This is managed by a structured data standard instantiated at Schema.org.
Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond.
Schema.org vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model. Over 10 million sites use Schema.org to markup their web pages and email messages. Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.
Founded by Google, Microsoft, Yahoo and Yandex, Schema.org vocabularies are developed by an open community process, using the public-schemaorg@w3.org mailing list and through GitHub.
A shared vocabulary makes it easier for webmasters and developers to decide on a schema and get the maximum benefit for their efforts. It is in this spirit that the founders, together with the larger community have come together – to provide a shared collection of schemas.
As a standard, Schema.org tries to cover every domain of interest. And I do mean everything. Commercial use of the standardized data is, naturally, driven by commercial priorities. Knowledge domains like Businesses, Locations, Events have a very high priority, which is why it's easy to find Google Information Cards for those knowledge domains.
Books and other creative media are in a lower tier for priority.
How does the data markup happen?
The metadata that describes the information of interest (that identifies it as a “restaurant” in a “location” with “business hours”) is encoded in the data that Google searches, using a format specified in the Schema.org standard.
For common knowledge domains, there are WordPress plugins that do the work for you. For books or other knowledge domains, you have to do it yourself, at least so far. Eventually there will likely be plugins for creative media knowledge domains.
What does Google do with the marked-up data?
Google takes the subject data from your site, where you are the verified “authority”, combines it with other information about you and about the subject, and then presents the combined “best guess for completeness” data in an Information Card, the format of which varies by knowledge domain. The card it presents for a business is different from the one it presents for an event or a book.
Google has only done this work for a few of the most commercially popular knowledge domains in Schema.org. The world of Schema.org structured data knowledge domains is divided for this purpose into:
- Knowledge domains where tools exist to supply Schema.org data and Google has intelligent Information Cards
- Knowledge domains where tools are scarce, but Google has intelligent Information Cards
- Knowledge domains where tools are scarce, and Google does not have Intelligent information Cards
“Books” fall into the middle of this list.
Annotated example
Here's a WordPress example of how you can mark up a book page on an author website with metadata about the book, and what Google makes of that, in combination with other data about you and the book.
This is embedded at the end of the WordPress page for the book, where it is invisible to the reader, but available to Google for indexing.
Google takes this information and looks for more items that can be added to the Information Card, such as other books by the author. It also includes its own Google Books and Google Play links to the book (of course).
Looking at the buttons on the right…
- The book preview is apparently pulled from Google Books (but actually the link and the pages are from the Google Play ebook edition).
- Goodreads is searched for a rating.
- The description is apparently pulled from Google Books (but actually the link is to the Google Play edition).
- It knows what the next book in the series is. That information is part of the “series” info in the ebook in Google Play, which is the likeliest source.
- It pulls the BISAC and other categories from the distributor for the ebook edition.
- It supplies purchase links to some US retail sources (but boycotts Amazon).
- It offers a library widget (which fails to locate the book at any of the 3 libraries near my offered zip code where it is present).
- It identifies what may be related books from search requests. This includes a “Karen Myers” who is not me.
Conclusion
For all the technical difficulty of providing the data and all the imperfect rendering by Google for its Information Card, this is still the best method I know to own and control the information about your book on the Internet. It can still be overridden by search engine algorithms, but it's a good deal better than nothing.
1/22/2020 – Update
Tools
The tool to use to proofread your Structured Data inclusion on a WordPress page is:
https://search.google.com/structured-data/testing-tool
This will give you some clue about any JSON errors (as well as reveal other things that some of your plugins do, like Yoast).
Multiple keywords for JSON
When I wrote this article, I could list all the ISBNs for, say, an ebook (EPUB, MOBI), or multiple identifiers to validate my Author claim. You can see them in the embedded example above.
When I went to make repairs after WordPress's Gutenberg release (see comments below), I also discovered that either JSON or Google's Structured Data Testing Tool (above) or both now disallows that — only one value per keyword now.
This is great!
How do you implement easily the actual script into each footer or page of your WordPress website? Do you use a specific plugin or implement directly into the page’s code?
Merci beaucoup 🙂
Olivier
The script is added to the bottom of the book page, where it does not display via the browser.
The example in the article is for this page. The script is not visible in the “Visual” editor of WordPress, but is visible in the “Text” editor. Here’s a picture of the bottom of the visible page and the start of the script.
Unfortunately, as I mention in my article, I don’t yet know of a plugin you can use for the “Books” domain, so it has to be entered directly as this script. (An appropriate plugin would present a form for the information and create the script itself.) On the other hand, this allows you to be as fully detailed as you can be, which a plugin with a form would be unlikely to allow.
You can find out more about the various identifiers used in the script in this 4-part series of articles.
This was a great article. I just wanted to let you know that the source code of the page you linked, shows a lot of “” (line breaks, in html notation) in the middle of the json notation, which I’m pretty sure will break things up for you. Just right click on that page and look for it.
Hope you can fix it so Google will index that data with no issues.
Cheers!
The spacing for readers of this article is done for clarity. The actual coding at the bottom of any of my book pages (which you can only see in the source code) doesn’t have those cosmetic breaks.
I looked at the example (from To Carry the Horn) and it seems fine. I think that Google interprets it properly… could you point to a particular problem as an example? Thanks!
Yes, check the source code of https://karenmyersauthor.com/books-and-series/the-hounds-of-annwn/to-carry-the-horn/, lines 899 through 942. That’s invalid json notation. Obviously you’re not hardcoding those lines, some plugin must be messing things up. That doesn’t mean Google can’t interpret it anyways, I’m not sure about that. Hope yo can see it. If I could, I’d upload a screenshot.
Just trying to help.
I am VERY GRATEFUL that you are so persistently helpful.
Alas, as I mention in the article, there is no plugin (yet) for the Book schema, so (yes) this IS all hard-coded.
You’ve made me dig deeper, and I think I now know what the problem is and how it happened. (And it’s all WordPress’s fault…)
The article and the book page code it referred to both predate the big WordPress Gutenberg release. When I embedded my JSON script, I did not have HTML “breaks” at the end of each line.
Here’s the Schema.org example for the Book schema. No HTML “breaks”, of course.
https://hollowlands.com/wp-content/uploads/2020/01/Schema-org-example-JSON.png
But when I now look at my book page source in a browser (for me it’s lines 877-920) I do observe all the HTML “breaks” at the line ends, as you say.
https://hollowlands.com/wp-content/uploads/2020/01/Book-Page-source-code-HTML-for-JSON.png
And here’s how that happened… Look what the automated Gutenberg update did to that part of my original page:
https://hollowlands.com/wp-content/uploads/2020/01/WP-Gutenberg-HTML-for-JSON.png
So, since I have 25 titles(pages) on 3 websites that seem to have this problem, now I have a lot of edit work to do. 🙁
I’m infuriated, but thankful that you took the trouble to point out the resulting manifestations of the error.
—–
Fixed now.
Many thanks for your answer !
Great schema code. Actually, you want it in the header of page between the and tags. This is so that the bot crawls it first and begins to assimilate what is under it by what it already understands from the header schema (bots crawl top to bottom of code). It also ensures it is rendered in the bot crawl. We use SEO Ultimate Pro Plugin with CODE INSERTER + Module to easily insert it into specific page’s header.”
I’m not an author, but an SEO agency helping an author. Seeing your JSON-LD specific to books was helpful.
Thanks very much for the tip, and I’ll add it to my list of fixes to all the pages when my next title comes out (late 2021).