Register for Webinar Wednesday - Technology and Techniques for Differentiating Two People with the Same Name by Geoff Rasmussen
2 New Legacy QuickGuides Now Available - South Dakota and Oklahoma Genealogy

Why you may need to redo every online search you've ever done

 This one image is making me rethink every online search I've ever performed.


In her recent webinar, Read 'Em or Weep: Promise and Pitfalls in Newspaper OCR, Mary Roddy presented a convincing case of why we need to not just think of and use name variants (nicknames or common misspellings) in our searching, but to also carefully study the letter combinations and perform alternate searches based on the limitations of the optical character recognition (OCR) that was used in creating the index.

Now, in English.

OCR is technology used by companies to automatically index digitized documents, like newspapers. While very good, there are some limitations of using indexes that were created with OCR technology, and there are other related limitations that are not the fault of the technology at all. Regardless, the end result is that you might not find your ancestor in the index, even though they may be in the record.

Mary's example of searching for the surname of Roddy in an online Ohio newspaper collection found 7,148 entries. Had she stopped there, she would have missed 155 additional entries for her potential ancestors. This doesn't even count searching for surname variations like Rody, Roddie, Rodey, etc. When we understand some of the limitations of OCR technology, and some of the history of typesetting, we can adjust our search strategies and come up with the right combination of alternative letters and names to search for.

In the example below, the surname of Roddy is shown in the digital image of the newspaper. But searching for the surname of "Roddy" in the index did NOT locate this entry.


Using the techniques Mary explained in this webinar, she instead searched for "rodclv" and successfully located the record.


Are you now starting to think about your own ancestral surnames, like I am? Which of them have the potential to fail the OCR test and thus cause your search to come up empty?

After you come up with a list of alternate spellings for the surname, Mary suggests adding these to a spreadsheet.


If you use Legacy Family Tree, another way to keep track of these surname variations is to add a new unlinked person (Add > Add Unlinked) and give them the surname of "RODDY SURNAME". Then click on their AKA button and add every variation you can think of. Then, anytime you are searching for this surname, open up this person and you have easy access to their list.


With the tips Mary gives in this webinar, including her chart of "How letters might appear", this may be one of the most important classes you view this year. As it is one of our BONUS webinars, you'll need either a monthly or annual webinar membership to view it, or you can watch the brief preview. If you are a subscriber, click here to view the class.



Feed You can follow this conversation by subscribing to the comment feed for this post.

Oh my! Oh-Oh! You're right - I need to review this webinar and probably do some new searching!! I have terrible luck with OCR... Thanks for the tips!

Your are right: I research the surname YULE and now have over 130 versions of it world wide. I automatically go to 'J' or "G" , then Yuille, Yuil, etc. If I put in just Yule I get all the Christmas sales and articles so gave up on newspapers a while ago. Spelling changed from area to area and who was doing it. My husbands obit was PRYOR but half way through became Prior and I did the typing so someone at the newspaper did not check properly.

The comments to this entry are closed.