Business Brains

British Library puts 300 years of newspaper articles online

British Library puts 300 years of newspaper articles online

Posting in Technology

British Newspaper Archive will eventually include 650 million articles dating back to 1700, shedding light on events and perspectives formerly lost in the sands of time.

Newspapers have been around almost as long as there have been printing presses. Now, it's possible to view the last 300 years of newspapers, encompassing 75 million articles, digitally, over the Web. The British Library reports that it has completed the first phase of a 10-year effort to scan millions of pages of historical newspapers and make them available online, over the Web.

As the library puts it so aptly on their website: compare this with hours of painstaking manual searching through hard copies or microfilm, requiring a visit to the library.

The British Newspaper Archive contains digitized newspaper content made up of four million pages - containing articles from local and regional papers across the United Kingdom going back to 1700. The library reports that its staff had been scanning up to 8,000 digital images per day from original bound newspaper pages over the past year -- including some of some of the rarest and most fragile newspapers in the collection. Some pages were even more than two feet wide, and resulted in single page image files as large as 400MB each.

The project is a work in progress, and is expected to eventually encompass a total of 650 million articles on 40 million web pages in the final archive. Access is free and open to anyone; but there is a fee for downloading the images.

Scanning is being conducted using five Zeutschel A0 scanners that create very high quality digital images of 400dpi in 24bit color. The scanned page images are then converted to a JPEG2000 format for archive purposes. The image files are also run through an optical character recognition (OCR) process which creates the electronic text. This process involves segmenting each page into classified zones to help your searching. Finally, the output OCR text is indexed in a large database which is viewable on the website.

The project will help unearth many events and perspectives formerly lost in the sands of time. As Ed King, head of the British Library’s newspaper collections, is quoted as telling The Telegraph: “People will find this archive extraordinary on both a personal and historical level. For the first time people can search for their ancestors through the pages of our newspapers wherever they are in the world at any time. But what’s really striking is how these pages take us straight back to scenes of murders, social deprivation and church meetings from hundreds of year ago, which we no longer think about as we haven’t been able to easily access articles about them.”

(Photo Credit: British Library.)

Share this

Joe McKendrick

Contributing Editor

Joe McKendrick is an independent analyst who tracks the impact of information technology on management and markets. He is a co-author of the SOA Manifesto and has written for Forbes, ZDNet and Database Trends & Applications. He holds a degree from Temple University. He is based in Pennsylvania. Follow him on Twitter. Disclosure