Wednesday 17 February 2010

Making sure my data is readable in a hundred years

Having spent some twenty years researching my family history, I obviously want to make sure that the fruits of my work are accessible to the generations that follow, so how do I ensure that it is all readable in a hundred years?

When I started my research, in the days before PCs, Macs etc, a colleague invested in a Philips Videowriter - basically a huge CRT-based box with built-in thermal printer. It could perform just one task - word processing - using a proprietary format and a 3.25" (yes, 3.25, not 3.5") drive. Within a couple of years, it died and could not be repaired. The disks were unreadable and, worst of all, all the hard copies had faded as thermal prints are wont to do. All the data - years of research - was lost. I am determined that this won't happen to me.

The first thing to do is define what sort of data I am talking about. I think it can be divided into two categories:

Hard copy media - printed copies of research, certificates, old photos, etc

Electronic media - scans and source files, photos, databases, research notes etc

Hard Copy Media
Preserving old documents is a science in itself, so apart from scanning, covered below under electronic media, I won't attempt to discuss that here.

However, I have produced a book containing biographies, research notes, images, photos and family trees - can I assume that it will survive? The problem is, modern toner-based laser prints on re-cycled, generic photo-copy paper are intended to be quick and cheap, not durable. No one really knows how long the toner will remain stable. Ten to twenty years shouldn't be a problem, but beyond that?

Electronic Media
This subject has two specific aspects: format and storage.

Format
Twenty years ago, the standard word processor was Word Perfect and images were stored as 8-bit GIFs. Now, it's more likely to be Word and jpegs and, along with Adobe's PDF format, it is probably no exaggeration to say that there are literally billions of jpegs, pdfs and docs in existence, so even when the standards are superseded, it is likely that those files will be readable, even if only by libraries or specialists. Similarly, the family tree databases are also likely to be readable. Although each family tree program uses a different format and structure which can change from version to version, there is an industry standard specifically for family tree databases managed by the LDS(1). The format, GEDCOM, is ASCII-readable yet maintains names, facts and relationships.

The media on which it is stored, however, is a different matter.

Media
A few years ago, 3.5" and 5.25" discs would have been the norm, but now few people could read either. Since there are probably billions of CDs and DVDs in existence, it is likely that readers for them will exist in the future (even if only in libraries etc), but dye-based CD and DVD ROMs were never intended to last beyond ten years and it is thus unlikely that they will readable in a hundred years - I have already had some fail after 12 years.

PATA and SATA hard drives are already being replaced and USB2-based drives will die off for USB3 which will, in turn, go the way of SCSI, PCMCIA and Firewire. I also doubt that flash memory such as Compact Flash, SD etc will survive as a mainstream format for more than 20 years. Online storage, either using cloud-based virtual drives or hosting research on resources such as Ancestry are great...for as long as you pay the subscription or as long as the hosting company exists. Even if Ancestry survives a hundred years or, more likely, some other online repository is created in its place, how will anyone know our data is there?

Conclusion
It is clear that there is no perfect solution. For data formats, sticking with widely used standards makes sense, and I would encourage genealogists to regularly back up their databases in the Gedcom format. However, the only solution that is truly future-proof is to continually port the data into the new formats and media as they emerge.


(1) LDS - (The Church of JC and the) Latter Day Saints - vast resources employed in genealogy making them a key mover in genealogy technology - perhaps less so since the introduction of paid-for services such as Ancestry etc.

Monday 15 February 2010

No more Censuses... Censi?

There was an interesting article in some of yesterday’s papers which suggested that next year’s national census will be the last. There seemed to be three reasons given:

Firstly, Britain’s population is now highly dynamic, with economic migration perhaps likely to become more prevalent. Secondly, there seemed to be concern that, whether due to paranoia, deception or the great British sense of humour, there were too many false or joke responses in recent censuses. Remember back to the 1991 census at the height of the Poll Tax fiasco when we feared our census returns would be cross-checked against our tax declarations? Then, in 2001, we were asked for the first time to declare our religious beliefs – 390,000 of us (yep, I do mean “us”) declared ourselves as “Jedi”.

The third reason given was that there are now many better ways of judging the size and make up of the population. Simply asking Tesco’s for their Club Card data would be start (although I doubt the State could afford to buy it from them), while a bit of data mining on Facebook or the viewing figures for X Factor would help.

So, all this got me thinking about the implications of discontinuing the censuses on future genealogists. Censuses have been taken in the UK since 1811 but only those from 1841 were kept. 1941 didn’t happen because of the war and 1911 is the latest one to be released for public research. Online resources such as Ancestry and Find My Past have fully indexed searchable database of all the accessible censuses and they are the most fabulous window into our ancestors’ lives - telling us where they lived, their family structures as children were born, grew up and left home, their occupations and where they were born.

In a hundred years then, what will our descendents be able to find out about us after the 2011 census? Think about it, our medical records will soon be available on the NHS network, and will no doubt be published in a hundred years, as will criminal records and pretty much all public records on which we appear.

What about our online personas? Facebook already retains profiles for people who have died. While I doubt Facebook will be around in 20, much less 100 years, what will happen to the data? Even if it was taken offline, the data won’t (probably can’t) be destroyed. Will genealogists force its custodians to release the data under Freedom of Information? What about the other companies or agencies that hold databases on us – Tescos, Google, O2 et al, MI5? How about our blogs?

So, yeah it looks like future genealogists will have plenty of information on us with or without the national census. Scary, isn’t it?

Wednesday 10 February 2010

What have Toyota done to upset the BBC?

So what exactly have Toyota done to upset the mighty BBC? I can’t help wondering whether one of the Trustees or Governors has had a falling out with their local service centre, because the Beeb are pulling no punches, are they?

Ok, Ok, there have been some issues with the sticking accelerator pedals which has caused “around 20” incidents in the US over the last few years. Alright, Toyota could have reacted sooner to invoke a simple re-call, but car manufacturers issue recalls all the time, often for more serious defects. So why have the BBC led on the story for at least three days, using words like “crisis” and even (I still cannot believe this) sending an outside broadcast crew this morning to a Toyota dealership to film – live!!!- one of the fixes being implemented.

And now, of course, we have the smug sniggering about the Prius, the car that the BBC, in the shape of Top Gear, put down at every opportunity. No, I don’t have one, but I can appreciate the technology behind it and the fact that, like it or not, if we want to carry on driving individual cars, then hybrids and electric cars are the future.

I await, with bated breath, what new angle the BBC will find for tommorow’s lead. A expense scandal by a Toyota exec?

Monday 1 February 2010

On the Ipad

Now I don’t like to be part of a crowd. I don’t follow any trends, run with any packs or, I’d like to think, don’t fall for the hype.

And yet, I couldn’t ignore the Ipad completely, could I?

So, the first surprise is that it is a super-sized Ipod Touch or Iphone, depending on which you choose.

That’s it really.

It runs the same apps, it looks the same but bigger, and it runs the same operating thingy.

And, you know what? That’s why it will be a staggering success. Because 70 million (70 million!) of us already have the baby version, love it, know how to use it, and have wondered why our “proper” computers couldn’t be as good.

Think about it: web browsing, video watching, music, games – yep, that’s about 95% of the use my laptop gets, tethered to the charger. And then there’s the document reader. Oh, how wonderful to have a screen shaped like a sheet of paper on which I can read books, active news sheets, reports and hold it like I would a book!

Ok, like many people, I joined in the live event and was a little disappointed… until I thought about it and realised that Apple have got it just right.

Do I want one? Yes, oh yes.