October 24, 2008

Google teams with library to digitize book collections

The U of C Library has teamed up with Google to digitize select collections for the Google Book Search Library Project, an online card catalog of several of the world’s major libraries and collections. The deal allows Google to scan digital copies of books, which are then put into the University’s digital repository.

Although Google is spearheading the Search Project, the digital copies are maintained by the individual participating libraries.

“We don’t know if Google will always be there or if one day they will change their business model and decide that it is no longer in their best interest to offer these materials for free,” said Sem Sutter, assistant director for the University’s library collections. “The libraries are taking responsibility for keeping our own digital copies to ensure that they have permanence.”

The Committee on Institutional Cooperation (CIC), a consortium of the 11 universities in the Big Ten Conference and the University of Chicago, began discussing the possibility of a joint project with Google in June 2006. The partnership was officially announced the following year. This year is the first of a five-year plan to scan approximately 10 million volumes from the 11 university libraries. As of Thursday, 2,191,319 books have been scanned into the system from CIC schools, most of which are from the University of Michigan and University of Wisconsin–Madison, which independently brokered deals with Google back in 2004.

Within the CIC, libraries will be chosen two- to three-at-a-time to be scanned. No books from the U of C have been scanned yet, and there is no indication as to when the scanning will begin here.

The first two libraries scheduled for scanning since the CIC partnered with Google were at the University of Indiana and Purdue University. However, Purdue has chosen to delay the scanning process, and Google has not yet announced which institution will take Purdue’s place.

Each library has furnished Google with a complete set of library catalog information, which Google will run against their own records to determine what book they have not already scanned elsewhere. When a school is selected, Google will send them a “pick list” of the books they are interested in, and the library will pull the books from the shelves to load into a truck.

“We are not obligated to send anything simply because it is on the list,” Sutter said. “In fact, we will not send anything that is fragile, and to avoid causing inconvenience, we will try not to send all the books of one type at a time.”

Sutter also noted the possibility of consulting with faculty on what courses they will be offering in the coming quarter that might require particular books from the library to ensure that those are not sent out.

There is no direct cost to the library other than the labor of pulling and packing the book, Sutter said. Google takes care of the shipping and scanning. The scanning process is kept secret by Google.

“Some of us have seen a film on how it is done, and have been sworn to secrecy to protect industry secrets, but I can assure you that it is highly sophisticated,” Sutter said.

Once the books are scanned they will be uploaded to the Google Book Search, and a digital copy will be given to the universities as well to be stored in their collaborative digital repository.

A separate digital repository run by the libraries, The HathiTrust, was launched October 13.

The HathiTrust is a collaborative effort between the CIC universities, the University of Virginia, and the University of California system. The repository is located on computers at the University of Michigan and the University of Indiana. According to the HathiTrust website, a universal searchable interface is currently in the works, but there is no time line for when it will be launched.

Only 16 percent of the books scanned are currently in the public domain, but those that are copyright-protected and are not available in full text can still be searched. The pages containing the researcher’s search term will be available on screen. The database will provide researchers with an important resource to find volumes in the closed stacks of the University’s new Mansueto Library, slated to open next to the Regenstein Library.

“It allows you the ability to discover that a book is relevant to what every topic you are working on that you might not otherwise discover just browsing the stacks,” Sutter said. “It serves as a private index.”

In the Mansueto Library, individuals will be able to search for a book on the computer and can request a robotic arm to retrieve the book from the closed stacks. The online repository serves as a way to browse the stacks and save time, Sutter said.