If you’ve ever transcribed the text of an old genealogy record by hand, you know how time-consuming the task can be. Even if you just plan to pull out the most important information from a record to include in your family tree the process can be daunting. Many of us simply never find the time for it – especially if we have many records that need full transcriptions.
Luckily, modern technology has made it easier than ever to hand that task off to a computer. We’ll show you how.
Why should I transcribe my genealogy records?
Transcription means to copy the information from an image (or PDF) based document (an otherwise unreadable and unsearchable file) and turn it into text that is easily readable, searchable and editable.
Most family trees allow you to add a transcription of an image (or PDF) based source when attaching it to your family tree and doing so can make your research easier in the future. Once the information from a record image is transcribed, the text is easily added to your tree in a variety of ways and is then searchable – either by you on your own computer, through your family tree program, or by others when using a collaborative online environment (as with Ancestry).
Here’s a look at the transcription box on Ancestry when citing an external record (a record you found elsewhere and added to your Ancestry tree) OR a record you found on Ancestry.
Taking the time to extract the text and place it with your citation information makes your records more valuable to everyone. (If you’re not sure how to attach external sources to your Ancestry tree, or edit sources you’ve found on Ancestry, you might want to take our Ancestry Crash Course where we cover this in detail).
You may also like to have a full transcription of a record saved to your own personal files so that you can search for records and information more easily in your note keeping program or on your computer. Or perhaps you would like to publish your finds online, want to turn your research into a book one day, or share your research with family via email. In all of these cases a transcription of the document image could be a huge help.
Transcribing documents also allows you to take the information from a PDF or record image and store it and share it in places where only text is allowed. One of the biggest benefits of this is for translation. Foreign documents can be transcribed by OCR, cleaned up and then copied into a translation tool like Google Translate. You will not get a perfect translation this way but it is a good start toward better understanding your ancestors’ lives.
You might find yourself wanting to transcribe:
- newspaper clippings
- sections of old family books
- vital records
- pension files
- draft cards
- sections of census records
- wills or probate records
- and the list goes on and on
Here’s how to create transcriptions easily, and for free.
While there is never a replacement for careful hand transcriptions, the simpler solution for transcribing your genealogy records is to use modern OCR. OCR stands for optical character recognition and there are a variety of options available online, as apps, in printers and scanners and as downloadable programs.
The first step in this process is to have your records available in a digital format – such as a PDF or image (JPG etc). You can scan paper documents to make them digital before applying OCR – or you can apply OCR while you are scanning. See the bottom of this article for a bit more information on this. The rest of this how-to assumes that you are dealing with digital records.
We tested a variety of OCR solutions – including FineReader and Google Docs – and found that, for our purposes, a free online option called Online OCR actually produced the most accurate results. It is also very quick and easy to use.
However, every document is different and each available program has strengths and weakness. These programs may produce dramatically different results under varying circumstances so we have included a list of other options at the bottom of this article for you to try.
Before we go through the process of using online OCR you need to understand the limitations of this technology. OCR does a pretty decent job of reading and transcribing printed (typed) text on records new and old – but no commonly available OCR has yet been able to handle handwriting in any reliable way. There are some programs which claim to be able to do this but they can only read handwriting that the program has been trained to understand. That means that you may be able to teach the program how to recognize your own handwriting, but you will not likely be able to simply read and transcribe the handwritten text on old documents. For now anyway, who knows where this technology will be in a few years.
Even printed text will not come out perfectly, as you will see below. Generally – the clearer the text, the better the transcription will be.
To get started transcribing your own records using this free site simply head over to Online OCR.
You will see in the screen capture below that you have only a few options to select – the file, the language of the document and the output format.
For this example walk-though we have chosen a WWII draft registration card (part of the so-called “Old Man’s Draft”) that was found on FamilySearch. It contains text in two forms — the original printed document text and typed answers. After selecting the file we then selected English as the language and Microsoft Word as the format. This produces the transcribed text in a simple box for copying and as a downloadable file.
Here is the original document.
Here is what we see after converting the file.
And here is the text, exactly as produced by the OCR reader. You can see that it is far from perfect. Any areas with handwritten text, such as the serial number, have been omitted and several sections of printed text were transcribed incorrectly as a jumble of characters. The spacing is also off, with answers showing several spaces below the questions in some cases.
Despite this, it did a pretty good job overall. We ran this same draft record through a variety of other OCR programs and this option produced the most accurate result. The text can be cleaned up fairly easily – certainly in much less time than it would be to start from scratch.
REGISTRATION CARD—(Men born on or after April 28, 1877 and on or before February 16, 1897)
SERIAL NUMBER I. NAVE (Print)
John Zdward Quinn (Lac.)
2. l’I.ArE Or llE,;1,ENCE (Print) 7 Richmond St4 Dover, Strafford, New Hampshire (Numb, out ”tre41t) (Town. townahip. village. or city) County) (State)
[THE PLACE OF RESIDENCE GIVEN ON THE LINE ABOVE WILL DETERMINE LOCAL BOARD JURISDICTION; LINE 2 OF REGISTRATION CERTIFICATE WILL BE IDENTICAL] 3. MAILING AZ)1)REUS
Same. lMniling …Lire*. if other than place indicated on line 2. If stone insert …rd stuns)
4. TELEPIloNE 13-14—VE
— AGE IN YEARS
C. ,CE OF ‘BIRTH
DATE OF BIRTEI July 15, 18,83Orr.) (Exchange) (Number) (Mu( 7. NAME AND ADDRESS (IF PERSON WHO WILL ALWAYS KNoW YOUR ADDRESS
Dover,. crown or county)
New_ (State or country)
MissMary_ Quinn 7 Richmond St. Dover, N. H. S. EMPLOYER’S NAME AND ADDRESS Unepployed 9.P1.-AFE-01* EMPLOYEE T OR Bt
Ilhervoloyed (Number ir) I ..rIt. F. It. number)
( own) ‘ounty)
I AFFIRM THAT I I [AVE VERIFIED ABOVE ANSWERS AND THAT TIIE ARE TRIM. D. S. S. Form. 1 1(1-21n30-2 (Revised 4-1-42) (over) (Registrant’s eignirture)
Whether you have major errors in transcription like above or not, you will need to very carefully read over and check the transcription against the original document before adding the information to any citations, personal notes or online.
And just as importantly you should always, always link or attach the original record source to your transcription whenever possible. Be very clear where the record came from, when it was recorded, the type of document, how it was transcribed and how someone can access the original. Oftentimes you can find full citation data for a record collection on the search page for that collection (true with most major databases, such as FamilySearch, MyHeritage or Ancestry). Use that information along with a note stating that it was transcribed by you (on what date) and save it with your transcription.
Other options for using OCR in transcribing.
If you’ll be transcribing offline documents you should know that many newer printers have OCR option built in. You can scan and transcribe all in one. There are also a variety of handheld scanners and apps that can apply OCR while scanning and accomplish the same thing.
Here are a few digital options to consider that require no hardware. Because each document and need is different we encourage you to try several options for your own needs as each may produce different results.
Google Docs – Google docs can transcribe documents for you and has a nice setup. Visit your Google Drive first, upload your file and then right-click on the red image icon under the file and select “Open With Google Docs.” The transcription will take several minutes. If you use Google Docs already this can be a good solution and it produces a nice output with the original image and text on one page. However, it was less accurate than the solution shown in this article in our own tests.
Abbyy FineReader Online – You’ll get the first 10 pages free as part of a trial and then you will be required to subscribe to this online program if you want to keep transcribing. This program produces a nice formatted output, but overall the transcribed portion returned less accurate results than with other solutions and some sections were omitted completely. The cost is also very high.
Adobe Pro DC – This is the program for editing PDFs directly from Adobe and includes OCR. It is a paid program and downloadable. Some of you may already own this program if you edit PDFs for other reasons.
Apps – take a look at the app store for your phone or tablet (search for OCR or scanner) as there are now quite a few options that may be very useful – especially if you want to transcribe on the go or if your documents are currently offline.
By Melanie Mayo, Family History Daily Editor