My Experience Digitising

Peter Skuse

It was as a boy in junior school when some Assemblies were inspiring, among them being a reading by the Headmaster at the time, in the spring term. He read from an edition then available (1952) Pepys’s description of the Great Fire of London. At age 8 and being aware of chimneys on fire, and always with a fire in the grate at home most late-autumn to early spring days, I was struck by the descriptive passages he chose to read, and asked later where the book came from. It was an Everyman edition, and I went to the library and got a copy, then asked for one for my birthday, and it sat unread apart from those Fire entries.

Then in 1955 when I got edition 1 of the Guinness Book of Records; in a later update there was an entry as to the world’s most exclusive club, because it limited entry to just 70 people: the Samuel Pepys Club. I was still a child . . .

Many years later I threw away my well-thumbed Everyman edition, and asked the library to get the Wheatley edition of the whole Diary [as in my ignorance I thought it was]. It took me a fortnight to read each volume, but in visiting the library I became aware of a new edition, leaving nothing out, by Latham & Matthews. Checking when the complete list of volumes would be available, I got the lot when it was complete. Reading that avidly, I asked for the computer version and the publisher said it had been traditionally typeset and there was no digital version.

Some negotiation then took place between me and the publisher, who kindly let me have the complete set of volumes in the paperback form soon after they were published. To scan these books properly, they needed to be laid page by page, backed on black to prevent print-through, on to a scanner. Then each page needed to be scanned in high resolution, stored on file and backed up. I was saving the work three times, every fifteen minutes or so, to try to minimise repeating scans should there be a glitch/crash/hang-up/ downtime/program fault/hardware failure/or falling over. It took a long time . . .

The best available non-professional (ie: cheaper) type-reader software was bought, and each scanned page image was processed by the reader, converting a picture of the page into a page of typeface. Each page had then to be read through, with errors corrected wherever I could spot them. The printing style adopted, and font used, led to virtually indistinguishable characters such as l (ell), 1(one), I(capital i); there were others such as rn (m? or r . . n?), mn (MN or NM or RNN or NRN?) and so on. Spellchcking was entirely inappropriate at this stage as there were so many names that it was stopping on about every fifth or sixth word. Then there were diphthongs or digraphs, such as fi and fl, these being interpreted as capital H, small h, capital A . . . many Js came out as i, and vice-versa. That was just one page. The footnotes gave so many problems! In consultation with the copyright agents, I left in all the superscript annotations on each page but deleted all footnotes, as an encouragement to folk who might eventually use the digitised edition to buy the printed volume. But by not scanning the pagination of the paperback edition, any reader must refer to the date of the digitised diary entry to find the appropriate footnote from the printed version. Each major SAVE of the processed version was given a revision number, and stored separately as well as being twice backed-up on to different storage media.

It was to the Companion that I turned for ascertaining the preferred spelling of names cited in the diary. Pepys changes his spellings from entry to entry, from year to year; Latham & Matthews had made the decision for me, so another task was to track the incidence if each name and alter all erratic spellings to the preferred one. While this detracts from the diary’s own eccentricities, it makes for easier searching of a name by using a word-processor’s search/find facility. It was no small task seeing to these changes!
Long before I reached that stage, I had to decide how to deal with the superscript lettersa, b, c etc that were used to indicate crossings-out, later additions, amendments, insertions, etc. Again, as being the hardest elements to spot where misread by the reader-software, I am still finding occasional erratics in the text, though now on version 307.

Without realising the work involved, I decided to make the reference to prices consistent throughout the digitised version. I wanted to retain the l [italic lower-case L] used by Pepys for £, and to put it where Pepys put it, after the numeric figure. £100 has always seemed to me an oddball way of abbreviating ‘a hundred pounds’ but this desire left much more work than I had in advance considered. It meant making italicising all d for pence, and all s for shillings, and attempting consistency as to the use of zeroes. But perseverance eventually overcomes most obstacles. One or two will still have escaped the net, and revisions will likely become numbered in the 400s before I tire of the task of seeking them out.