Wednesday, December 28, 2005

Googling books

You will find this hard to believe if you are spending this week in the sun dipping into and out of your favourite book, but the very idea of books is supposedly under attack. That's what the book industry says in two lawsuits filed in the Southern District Court of New York, one from the United States Authors Guild, and one from the Association of American Publishers.

Amid talk of "embezzlement" and "rape" they allege a "massive copyright infringement" of the type they say will do the authors of books "irreparable harm".

In the dock is the search engine company Google, and it is indeed orchestrating a revolution in the way we get access to the printed word - the biggest revolution since the introduction of the photocopier..

Right now in the Oxford University Library, the New York Public Library and the libraries of three US universities, staff are busy removing books from the shelves row by row and loading them onto trolleys for delivery to special centres where their entire contents are scanned and loaded into a computer.

When Google is finished in six or so years it expects to have on its files the words of some 32 million books - just about every book ever written in the English language.

Google describes the end result as a gigantic card index, but it will be much more than that.. No card index has ever allowed you to find books by searching the words within them. The clunky terminals in libraries now do little more than allow you to search the first words of the titles.

You can sample the early results by performing an ordinary search on Google and then clicking where it says "Try your search again on Google Book Search".

You will be presented with a list of books that contain the words you chose plus the sentences either side of the quote. If the publisher permits it, you will be able to read an entire page.

Google isn't alone in its plans. Amazon already allows searching within some books, and Microsoft is scanning books from the British Library. But there's a big difference: Microsoft and Amazon are only scanning books that are out of copyright or for which the copyright owner has given explicit permission.

By contrast, Google is planning to scan everything, whether or not it has been given permission. It will only exclude books if it has been explicitly instructed to.

It's this rudeness inherent in the Google plan that's sent authors and their representatives to court. As one wrote in a letter to USA Today: "I don't need to notify Google that they may not steal from me any more than I need to notify burglars that they do not have permission to rob my house or rapists that my body is off limits."

But Google's presumption that it can go ahead and scan every book ever written unless it is specifically told not to is necessary for the scheme to work. If it had had to ask for permission to scan web pages it never would have built its search engine.

On one estimate perhaps as many as 70 per cent of all the books ever written are "orphan works". They are still in copyright but the copyright owners and their descendents can't be traced. It isn't possible to get permission. As a result, by not seeking permission Google expects to scan tens of millions of books. Microsoft, which will seek permission, might scan perhaps only half a million.

It is not at all certain that Google will get away with it. US judges are famously protective of the rights of copyright owners. Every book that I've ever bought says inside the front cover that it can't be reproduced or stored in a retrieval system.

If Google loses, those of us who love looking up books will lose. But I would also suggest that books themselves will lose. If books can't be searched when other sources of information can, over time books will become less important. They'll stay unsearched on library shelves and in the back of lounge room bookcases.

Industries such as the book industry have been notoriously bad at predicting threats to their health. In 1982 Jack Valenti, the then president of the American Motion Picture Association, warned a congressional committee: "The VCR is to the American film producer and the American public as the Boston strangler is to the woman home alone." These days film producers make more money than they ever did before the VCR, most of it from the sales of videos and DVDs.

Will the ability to search books really result in fewer books being sold?

Authors and publishers had a more believable-sounding case when they objected to the installation of photocopiers in libraries throughout the 1960s and 1970s.

A few years later an economist from the University of Chicago, Stan Liebowitz, examined the resultant change in the market for academic journals (the kind of publication most likely to be photocopied). He reported his findings in the Journal of Political Economy. He found an explosion in the number, page size and price of academic journals, and also in the number of subscriptions. Journals had become more sought after by libraries as a result of photocopying. The most sought after were those that were copied the most.

Will books become more sought after if people can find them? I have a feeling that they will, if they can be as easily found as pages on the web.

Book publishers may well face "irreparable harm" but they will do it to themselves if they win their fight against Google.