October 2005

Unsung heroes


    

 

 

This is essay no. 500, and next month will mark the 9th anniversary of Yawning Bread on the web. Inevitably, one is seized by questions of legacy and mortality.

It's a big question, which users of computers and the internet, in our headlong rush to the future, seldom have time to stop and contemplate. But as the amount of information grows, as new generations of hardware and software cascade upon us, it's a question that is fast becoming a critical one.

When Yawning Bread first began, I was backing up stuff using 3-and-a-half inch floppies. Recordable CD-ROMs had not yet appeared.

Today, many models of personal computers do not come with floppy drives as standard features anymore. If I had not switched to using CD-ROMs, but stuck instead to floppies, if I had not re-backed-up my files onto CD-ROMs, my floppy back-ups would be as good as useless.

But what makes anyone think CD-ROMs will prevail any better?

After all, the data on CD-ROMs are just inert patterns of microscopic pits on a surface. To read them requires today's kind of CD-ROM drives, using the lasers (with specific wavelengths) that can distinguish the bits of data represented by those pits. Considering how fast data storage technology is evolving, I'd give it no more than 3 years before today's CD-ROMs and its drives reach obsolescence..

If so, the pits will just be pits, not data.

  

Yet, even if, 20 years from now, there still exist not-yet-clapped-out CD-ROM drives able to read the bits of data, there is still the question of interpreting these bits. To be seen as text, to be presented on a screen, to be recogniseable as pictures, requires the mediation of software.

Software changes just as fast, and is anyone keeping old versions of software?

Do I have to reformat Yawning Bread to cater to succeeding generations of software? How much work will that involve as the number of essays grow?

No wonder few people wish to contemplate such questions. It seems like trying to stop the sun from rising.

Fortunately, there are people thinking about it. In the 15 September 2005 issue of Economist magazine, was an article about where that thinking has reached. As you can see, it's rather technical, and I regret I am not in any position to explain the idea to you. Please read it for yourself, it's on the right.

All I can say is, I certainly hope they find a solution before it's too late.

* * * * *

Of course, it's not a new problem to humankind.

2,000 years ago, when all but a handful of humans were illiterate, knowledge was shared and passed down orally. Besides being very prone to errors that accumulated with each retelling, it was highly dependent on spoken language, and extremely limited by physical distance. As soon as one generation failed to pass it on, or as soon as a language, or even just a tribe, died out, all that knowledge was lost.

Writing improved things somewhat. Relay errors were reduced, though until printing came onto the scene (first in China), manual copying too was very prone to errors.

Whether by copying or printing, humans generally used organic materials such as like bark, leather, paper or silk, and these could decay even faster than present-day digital bits.

Carving on stone was slow and laborious. Etching on soft clay and then baking the tablets was fuel-intensive. These substrates could not have been commonly used, except for the most important occasions of state or for dedications to gods. Hence, what was recorded on hard substrates was probably an unrepresentative fraction of all there was.

Even when words were carved onto stone, they were exposed to decay too. Buildings and monuments have been and will be knocked over by earthquakes, cracked and grown over by jungle roots, eroded by sandstorms and buried by alluvia.

But most critically of all, the language and the writing code, i.e. the software, can disappear too, and all we're left with are marks scratched onto stone.

The wonder then is that modern archeologists have generally been able to decipher what stone writing we have unearthed. It testifies to the way language and writing has been evolutionary, such that if we go backwards one step at a time, we can make out some very old script.

Even more wonderful is that Egyptian hieroglyphics, Mayan pictograms and Linear B have been deciphered though they had no modern equivalent. Figuring them out were supreme achievements of human intelligence.

Against these ranged supreme achievements of human destructiveness. The first Qin emperor of China was well known for his book-burning frenzy. Hitler did the same. Mao let loose his Red Guards  to smash books and art that propagated effete bourgeois culture. The Khmer Rouge were more efficient -- they decided to eliminate everyone who knew anything.

A thousand years ago, popes in the Vatican ordered all genitals of Greek and Roman statues to be smashed, defacing forever exquisite art. In 2001, the fanatical Taliban regime blew up the Bamiyan Buddhas in Afghanistan. In 2003, the Iraqis looted their own National Museum when the US invaded the country yet failed to offer any security, caring only about flexing its macho firepower. In 2005, the Singapore government wants all copies of Martyn See's video documentary of Chee Soon Juan destroyed -- not that See's video is comparable to the Iraqi museum, but hey, knowledge is knowledge.

Between the inexorable forces of nature and the inexcusable idiocy of humans, we should be thankful for the body of knowledge that we do have. Credit must go to librarians and curators who strive against natural elements, blind dogma and the insatiable demands of economic progress.

I don't know how long Yawning Bread will survive me, if it survives me at all. But if it does, it won't be thanks to me really, but to someone out there who is crazy enough to think it is worth the trouble.

So, to mark the 500th essay, I ask you, dear reader, to reflect for a moment with deep appreciation, on the tireless, uncelebrated work of librarians and curators the world over, who have husbanded the information and knowledge we all take for granted, without which civilisation would have been impossible.

© Yawning Bread 


 

15 Sep 2005
Economist Magazine

A new way to stop digital decay

Computing: Could a "virtual computer", built from software, help to save today's digital documents for historians of the future?

When future historians turn their attention to the early 21st century, electronic documents will be vital to their understanding of our times. Old web pages may not turn yellow and brittle like paper, but the digital documents of today's culture face a more serious threat the disappearance of computers able to read them. Even a relatively simple electronic item, such as a picture, requires software to present it as a visible image, but 100 years from now, today's computers will have long since become obsolete. More complex items, like CD-ROMs or videos, will be unreadable even sooner.

In 1986, for example, 900 years after the Domesday book, the BBC launched a project to compile data about Britain, including maps, video and text. The results were recorded on laserdiscs that could only be read by a special system based around a BBC Micro home computer. But since the disks were unreadable on any other system, this pioneering example of multimedia was nearly lost for ever. It took two and a half years of patient work with one of the few surviving machines to move the data on to a modern PC (it can be seen online at www.domesday1986.com)

National libraries are just starting to grapple with this problem as part of their new mandate to preserve digital culture. "It is a major problem, but it is remarkable how little known it is," says Hilde van Wijngaarden, head of digital preservation at the National Library of the Netherlands. "People just accept that things no longer work after ten years."

Keeping working examples of all computer hardware is impractical, so the most popular preservation strategy is to copy files from one generation of hardware to the next. The problem is that today's word processors and web browsers, for example, do not always display files in the same way that older software did. An accumulation of subtle errors can eventually make the original item unreadable. An alternative approach, called emulation, uses software to simulate the old hardware on a modern computer, to allow old software to run. But today's emulators will need another emulator to run on the next generation of hardware, which will need another emulator for the next generation, and so on. This can also introduce errors.

So the National Library of the Netherlands is exploring a third option, using a simulated computer that exists only in software. It is called the Universal Virtual Computer (UVC) and is being developed by IBM, a computer giant. The researchers are writing programs to run on this virtual computer that decode different document formats. Future libraries will have to write software that emulates the virtual computer on each new generation of computer systems. But once that is done, they will be able to view all their stored documents using the decoders written for the virtual computer, which only have to be written once. "The decoder can be tested for correctness today, while the format is still readable," says Raymond van Diessen of IBM.

His team has written decoders for two common image formats, JPEG and GIF. They plan to move on to Adobe's PDF format. IBM is also talking to drug firms, which are required to store data from clinical trials for long periods. Ultimately, the aim is to be able to preserve anything from simple web pages to complex data sets. Ominously, some scientific data from the 1970s has already crumbled into unreadable digital bits.

 

Footnotes

None

Addenda

None