Wednesday 17 February 2010

Making sure my data is readable in a hundred years

Having spent some twenty years researching my family history, I obviously want to make sure that the fruits of my work are accessible to the generations that follow, so how do I ensure that it is all readable in a hundred years?

When I started my research, in the days before PCs, Macs etc, a colleague invested in a Philips Videowriter - basically a huge CRT-based box with built-in thermal printer. It could perform just one task - word processing - using a proprietary format and a 3.25" (yes, 3.25, not 3.5") drive. Within a couple of years, it died and could not be repaired. The disks were unreadable and, worst of all, all the hard copies had faded as thermal prints are wont to do. All the data - years of research - was lost. I am determined that this won't happen to me.

The first thing to do is define what sort of data I am talking about. I think it can be divided into two categories:

Hard copy media - printed copies of research, certificates, old photos, etc

Electronic media - scans and source files, photos, databases, research notes etc

Hard Copy Media
Preserving old documents is a science in itself, so apart from scanning, covered below under electronic media, I won't attempt to discuss that here.

However, I have produced a book containing biographies, research notes, images, photos and family trees - can I assume that it will survive? The problem is, modern toner-based laser prints on re-cycled, generic photo-copy paper are intended to be quick and cheap, not durable. No one really knows how long the toner will remain stable. Ten to twenty years shouldn't be a problem, but beyond that?

Electronic Media
This subject has two specific aspects: format and storage.

Format
Twenty years ago, the standard word processor was Word Perfect and images were stored as 8-bit GIFs. Now, it's more likely to be Word and jpegs and, along with Adobe's PDF format, it is probably no exaggeration to say that there are literally billions of jpegs, pdfs and docs in existence, so even when the standards are superseded, it is likely that those files will be readable, even if only by libraries or specialists. Similarly, the family tree databases are also likely to be readable. Although each family tree program uses a different format and structure which can change from version to version, there is an industry standard specifically for family tree databases managed by the LDS(1). The format, GEDCOM, is ASCII-readable yet maintains names, facts and relationships.

The media on which it is stored, however, is a different matter.

Media
A few years ago, 3.5" and 5.25" discs would have been the norm, but now few people could read either. Since there are probably billions of CDs and DVDs in existence, it is likely that readers for them will exist in the future (even if only in libraries etc), but dye-based CD and DVD ROMs were never intended to last beyond ten years and it is thus unlikely that they will readable in a hundred years - I have already had some fail after 12 years.

PATA and SATA hard drives are already being replaced and USB2-based drives will die off for USB3 which will, in turn, go the way of SCSI, PCMCIA and Firewire. I also doubt that flash memory such as Compact Flash, SD etc will survive as a mainstream format for more than 20 years. Online storage, either using cloud-based virtual drives or hosting research on resources such as Ancestry are great...for as long as you pay the subscription or as long as the hosting company exists. Even if Ancestry survives a hundred years or, more likely, some other online repository is created in its place, how will anyone know our data is there?

Conclusion
It is clear that there is no perfect solution. For data formats, sticking with widely used standards makes sense, and I would encourage genealogists to regularly back up their databases in the Gedcom format. However, the only solution that is truly future-proof is to continually port the data into the new formats and media as they emerge.


(1) LDS - (The Church of JC and the) Latter Day Saints - vast resources employed in genealogy making them a key mover in genealogy technology - perhaps less so since the introduction of paid-for services such as Ancestry etc.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.