That's the title of an excellent if brief essay by Laura Dawson of Numerical Gurus. Her site is an excellent resource for the explanations and history of some of the acronyms that haunt the world of books.
Since I seem to be on a kick lately with what metadata exists and how it sloshes around through the book ecosystem, I thought we could all benefit.
How many of those girls are properly dressed (um, properly formatted data)? And how can you keep them clean, out there in the big ol' world? Where there are boys, and parties, and fast cars, and lots of dark alleys to wander into.
We've all seen it. We spend time perfecting the metadata in our feeds, send it out to our trading partners, and had to take complaints from agents, authors, and editors. “Why is it like that on Amazon?”
The truth is, data ingestion happens on whatever schedule a given organization has decided to adhere to. Proprietary data gets added. Not all the data you send gets used. Data points get mapped. So what appears on any trading partner's system may well differ somewhat from what you’ve sent out. There are so many different players in the metadata arena that can affect what a book record looks like. When you send your information to Bowker, they add proprietary categories, massage author and series names, add their own descriptions, append reviews from sources they license – and send out THAT information to retailers and libraries. The same thing happens at Ingram, at Baker & Taylor – so what appears on a book product page is a mishmash of data from a wide variety of sources, not just you.
At an online retailer, different data sources get ranked differently. This happens over time, as a result of relationships and familiarity with data quality, and these rankings can change. The data can also get ranked on a field-by-field basis. So a publisher might be the best source of data for title, author, categories, and cover image. But the distributor might be ranked higher for price and availability. And an aggregator might be ranked higher for things like series name – especially if they specify to the retailer that it’s something they’re focusing on standardizing and cleaning up. It’s important to remember that in the eyes of the retailer, not all data feeds are equal. You’d think the publisher would be the best source of data about its own books but I can assure you, having worked with publisher data my entire 30-year career, that isn’t always the case.
A “delta file” is what we call these updates – additions, changes, and deletes only, rather than a full file. Most publishers will send an initial full file, and then supplement with delta files for a time, and begin the cycle again just to make sure that their trading partners are in sync.
But on the retailer/aggregator end, there’s no guarantee that your updates will get processed in a timely way (without a phone call). Companies ingest on their own schedule, and if they have a very heavy processing week, they might skip your delta file and wait for the next one, which means there might be gaps in data updates.
So if you think you can fix your metadata and keep it clean, once and for all, you're living in a dreamworld.