Dan Cohen, the newly named Executive Director of the Digital Public Library of America, spoke at NYU’s Humanities Initiative on Thursday, April 4, 2013 about the DPLA. Dan who was closely involved with the development of the scholarly tool Zotero and the exhibition tool for library and museum collections, Omeka. He will be leaving George Mason University for the DPLA and Cambridge on April 18 when the DPLA officially launches at the Boston Public Library.
Robert Darnton of Harvard University’s Libraries had written in the New York Review of Books earlier this week, but it was great to hear Cohen talk about underlying concepts and infrastructure issues. He provided much more detail than Darnton did. Although people will try to tell you that the web is America’s digital public library, the web in fact, cannot handle local history and archival materials which are either hard to find, scattered, poorly described or behind “gates.” The DPLA will knit together widely separated library and museum resources for research, classroom and general interest. Cohen pointed to the ideal of the American public library which is open to all, never tracks your activity, and doesn’t judge how you use the information it provides, whether for research, leisure or business. He noted that America’s public libraries can be community centers, but they also help people start businesses, and we cannot underestimate their impact (including the financial impact) on communities.
Cohen said that the DPLA had three major components:
Portal to discovery, which allows you to search across all US libraries, where all that content would be enriched with metadata contributed either by major content providers like Harvard Libraries, National Archives, ARTSTOR, or New York Public Library, or by ‘hubs” like state or regional digital libraries. The content would not be limited just to book or manuscripts–there would also be artwork and objects usually found in museums. (At this point, Cohen announced that the NYPL had just partnered that day with the DPLA.)
Platform to build upon, using open data and APIs, programmers would be able to easily access part or all of the DPLA’s data. The API would make their data easy to use. Already the Chattanooga Public Library had sponsored an AppFest.
Public Option. The DPLA would be an advocate for the ‘public option’ for making its content free and unrestricted. He contrasted Kindle, Apple and Overdrive with their restrictions–not that he had anything against those products. In fact you could even download the entire DPLA’s data file if your hard drive were big enough! Their content will have the CC0 (cc-zero) license which is the most open license, but they will be willing to work with organizations who require more restrictions. (Cohen was a bit unclear about how exactly this would be done, but it sounded like the DPLA had done a lot of thinking about this.)
The DPLA in an ImageThink graphic from the Oct 2012 plenary session (source)
Cohen then filled in details on these points. The DPLA would rely on some individual “content providers” like Harvard Libraries, the Smithsonian Institution, the National Archives, New York Public Library, Boston Public Library, and ARTSTOR (which is donating 10,000 public domain images from its large database). As important as these content providers were “service hubs”–typically state (or regional) digital libraries which provided support to small local libraries and historical societies with unique materials. The state or regional digital libraries (there’s a total of 42) would pull in the scanned items, regularize them, add meta data (more on this below), and sometimes would host the files. DPLA is partnering with many of these state and regional libraries to get grant money (generally in the $500-700K range) for digitization and cataloging.
What were some of the metadata features that the DPLA and its partners were looking for?
Geocoding (latitude and longtitude, local and regional place names). You’d even have the ability to draw a circle on a map and ask for similar pictures to what you’d just pulled up.
Timeline. (Not demonstrated but suspect that it will be a slide-bar (wrong word?) that can be adjusted at either or both ends.)
Faceted, meaning that you can narrow on many, many various criteria that may not be available on the content provider’s site.
The DPLA will also open with seven online exhibitions from its digital state libraries.
The DPLA has partnered with Europeana, the European Digital Library, and plan a joint exhibition on immigration to America, with European institutions and American institutions providing materials that could be seamlessly set side by side in the exhibition.
The DPLA had adapted the EDL’s Europeana Data Model (EDM), although the DPLA’s extended the ELM to link data (URI’s) to allow them to collate similar materials across different libraries and collections. He demonstrated with LC’s authority files. (See the DPLA info wiki for more information, including links to their adaptation of the EDM.)
They’re looking into interoperability with TROVE, the Australian digital public library.
The DPLA’s API already had some programs ready to launch on April 18:
History Pin. Takes historical photos, adds longitude and latitude and overlays it on contemporary maps.
Stacklife. An open source project from Harvard Libraries which visually reproduces spine labels of books that would be shelved nearby by subject, to reproduce the effect of serendipitous browsing on library shelves.
Bookworm. A new search interface for library cat that searches collections of catalog records
He anticipated the ability to lay historical maps over Google maps, and he even hoped that people would be able to take Caterina Fake’s Findery to link historical photographs to nearby great bars and restaurants.
The Public Option and restrictions. The DPLA could become an enormous back-end to small local public libraries. However, the DPLA was concerned about restrictions on information. Despite the positive court case on ‘first sale,’ Cohen was concerned about the difficulty faced with current eBooks, where contract law (a license) was superseding copyright law. He pointed to commercial newspaper databases like ProQuest, which had licenses prohibiting your from downloading the entire newspaper. He had no problems with commercial ventures like Proquest, but he felt that we need to protect the public sphere–the question was how? Through advocacy? Through buying stuff?
For advocacy on restrictions and licensing he pointed to a number of projects:
librarylicense.org (from Harvard’s Berkman Center) which will launch very soon and will promote a particular license for publishers and authors–since most books make most of their money in the first five years, copyright after that period would be free for libraries, even while the publisher still sold the book.
Knowledge Unlatched is a new model to try to have libraries pay upfront for first copies, allowing publishers to recoup their editorial and publishing costs, and then liberate the book.
A slightly different approach is Unglueit, a recently launched site which Cohen described as like Kickstarter for book rights–crowdfunding to buy out the copyright holder, and leaving the books to the public domain. Cohen hoped that the DPLA might direct traffic to sites like this.
Digitization to the People. Cohen wants the DPLA to help small collections. Rather than going to the state and regional libraries, Cohen speculated by “Scannebagos” (originally from North Carolina), a Winnebego camper with space for scanning and adding metadata on-site so the materials could stay on site. Similarly, he was excited by History Harvest, a project at Nebraska that had undergrads scanning and researching local history. Partnering with the DPLA, they hoped to go nationally.
The DPLA is starting to make things happen in interesting ways.
In the Q& A period, Cohen addressed some serious questions.
Q. What about when files “go away” or are moved? The DPLA system relies on links, not owning the content. A. Yes, this is a problem, but some state and regional “hubs” were already tracking physical location and keeping copies.
Q. Was excited by the Scannebago–one source (Clearight?) says that half of all artifacts are unknown or uncataloged. How was DPLA going to help small historical societes digitize? A. For now the DPLA would work with state public digital libraries, but the DPLA may offer some form of hosting for small historical societies without their own infrastructure.
Q. What about recorded sound and moving images? A. Tougher, partly because there will be copyrights for the composter/publisher and for the performer. They were definitely interested and were aware of the problem. Cohen had just attended an AHA conference where historians of sound were heavily restricted in what they could share, and it was really holding up the field. How could you really do the history of America without music?
Q. Will the DPLA become a receiver of rights from authors, etc.? A. Not yet determined. The DPLA was planning to convene a meeting with authors in the fall. For now they were “arming authors” with contract language that would “liberate” the book (especially academic books) before regular copyright ran out. Cohen’s own book on digital humanities had a special proviso protecting its copyright under the Act of 1790 (14 years and one renewal).
Q. Business model and funding? A. Funding in the planning phase came from Sloan, NEH, ILMS and private donations including Knight and Mellon. In their current and ongoing phase the DPLA was looking for private, public and individual contributions, and was considering a lot of different options, including working with publishers. However, they were a very small organization and he didn’t anticipate them having a large staff. Europeana is only about 50 people.
Q. What about graduate students who themselves created digital collections? A. Some private collectors have come forward with their materials, but he was wondering if Zotero might provide a short-term solution if they partnered with Zotero Cloud or Omeka.