I had the chance to visit with leaders of two of the most influential players in the online genealogy market today, and I was struck by the completely different attitudes they each take toward APIs. FamilySearch has at least four distinct APIs that I know about, including ones for:
- Family tree data
- “Authorities” (standardized dates, places, and names)
- “Record Search” bibligraphic metadata
- “Research Wiki” page content
Footnote, by comparison, doesn’t have any (that they’ve made public, at least.)
API => Startup; No API => Old-media dinosaur
At first glance, this seems backward and counter-intuitive. APIs tend to be the preferred mode of growth and communication used by successful startups like Facebook, Twitter, and Foursquare. By granting access to their data in a format that can readily be consumed by other services, these companies create platforms on which others can build — entrepreneurial ecosystems that nourish other startups (think Facebook or Twitter application developers) — and generate income by applying ad-based monetization approaches or revenue-sharing arrangements.
So-called old-media “dinosaurs” like the New York Times and News Corporation, on the other hand, have tended to throw up paywalls and to resist calls to make their content available via APIs. For them, the mantra of the free content movement: “information wants to be free” has been an anathema to be fought with all the weapons at their disposal.
Before today, I would have tended to tag FamilySearch with the “old media dinosaur” label while filing Footnote under the “startups that get new media” category. So it should be Footnote touting its APIs to the developer community, while FamilySearch stays closed and protective of its data. But instead it’s the reverse. What’s going on here?
False Dichotomies and “New” Old Media
What’s going on here is that both print and online media are undergoing a period of radical disruption, in which old assumptions are overturned or abandoned and previously valid dichotomies are rendered false, or useless, or both.
So it shouldn’t be surprising that genealogy “content providers” are grappling with the same issues and evolving their business models in response.
Business Model Differences Shape Policies Towards Content
One obvious explanation for FamilySearch’s API-centric strategy lies in its non-profit status. As a Church-sponsored entity whose mission is to facilitate and accelerate genealogy (and temple) work throughout the world, it would be self-defeating if FamilySearch treated its content as scarce and proprietary. Footnote, on the other hand, relies on a subscription model that can only succeed if the majority of their most desirable content is kept behind a paywall. [As a small, nimble startup, Footnote is also constrained in how much development in can do with its scarce resources -- robust APIs are not easy or cheap to develop and maintain.]
Consider the Possibilities
But what if Footnote (or Ancestry for that matter) tried to become more of a research platform and less of a “walled garden” of content? In a prescient 2008 essay, VC Fred Wilson makes this prediction about the promise of “Content” APIs:
Content is data, but it’s a bit different. Content is unstructured data with the benefits of a lot of context, semantics, relationships. Once the vast databases of content that exist inside the big media companies start becoming available via APIs, we can start to do some amazing things.
What kind of “amazing things” could for-profit “big media” genealogy companies do if they opened the spigots on their content using APIs? And if they did so, could they still make enough money to continue to fund the record digitization efforts that have so greatly benefited genealogists? I believe they can.
A Modest Proposal
I haven’t fully baked this idea yet, but I’m going to toss it out there anyway. I propose that genealogy content providers develop a two-tier model. The first tier would include popular, entry-level content such as the crucial censuses, family tree data and “Google Books”-type content such as published family histories, county histories, and the like. This data would be offered for free, but with an “as is” consumer-beware caveat regarding the accuracy and reliability of the facts and details included.
The second tier would include vital records, church records, land records and other more “primary” source material, including (naturally, since this is the Genlighten blog) offline documents. These records would be accompanied by some sort of “provenance”, perhaps tied to the reputation of the researcher who had uncovered them or the repository that held them. That reputation would be dynamically determined by a combination of authoritative genealogy luminaries and the crowdsourced ratings of clients and users. Those interested in such records would be asked to pay for:
- Indexed online access
- Record provenance, detailed source citation information and a community-determined “reliability score”
- On-demand retrieval, digitization, transcription and/or translation of records not yet available online, particularly “long tail” records
- The help of skilled and experienced researchers in interpreting the records and acting on their implications
Both sets of records would be made available via APIs, but the second-tier data would have a monetization mechanism attached, allowing content providers, researchers and digitizers to be compensated for the value they added.
A Starting Point
I hope to develop these ideas further, and I’d appreciate your help in doing so. I know there are plenty of smart people in the genealogy community who are already pondering these issues (Thomas MacEntee, for one) and I’d love to hear from as many of you as possible.
Thanks to Gordon Clarke and his FSDN team members, and to Justin Schroepfer at Footnote, for meeting with me today and stimulating my thought processes.