Your Questions Answered LIVE— More DNA with Diahan - free webinar now online for limited time
Enlightened Design with the MyHeritage Chart Creation Tool - free webinar by Janet Hovorka now online

Tuesday's Tip - Importing a Gedcom (Advanced)

Importing a Gedcom

Tuesday's Tips provide brief how-to's to help you learn to use the Legacy Family Tree software with new tricks and techniques.

Importing a Gedcom (Advanced)

A GEDCOM file is formatted so that family history information can be shared between different types of genealogy software programs. You can export a GEDCOM file from your online trees and also from your deskstop software programs. You can use these files to move your family history data around to different programs or to share data with your family members or collaborators.

Normally importing a gedcom is straightforward. You go to File > Import > GEDCOM file and you follow the prompts. However, there are actually quite a few options on these screens that will make a difference with how your gedcom imports. In addition, there are some gedcoms that are just a mess. Legacy has a lot of built in "fixes" for the known issues but there are so many that it just can't catch them all. You need to know how to deal with these yourself. The worst offenders are web-based family trees. There are also some resident genealogy programs that do not follow the set rules in the gedcom protocol. The gedcom protocol itself is quite old so there can be some problems when importing gedcoms created by advanced programs. 

File > Import > GEDCOM file. The first couple of screens are self-explanatory. You will navigate to the gedcom file using a Windows dialog box, you will select the file that you want to import, you will tell Legacy you want to import it into a new file (recommended), and you will name the file. Legacy will do a preliminary analysis.

Legacy analyzes the GEDCOM file to make sure it is valid and recognizable. This analysis pass also shows you how many individuals and families are contained in the file. If Legacy finds information that it does not know what to do with, a message is displayed. You can then tell Legacy where to put the information. You can map it to an event or to the notes. The submitter's name, address, and comments are also displayed along with the name of the program that created the file. If there is no compiler information in the family file you are importing into, a button will appear to the left of the incoming compiler information: Import Compiler Information. You can click this button to import the incoming information into the compiler information of the family file. (If there is already compiler information in the family file, this option button is not shown.)

Now that the preliminaries have been completed, you are now looking at this screen. 

GEDCOM Import
(click image to enlarge)

 

Record Numbering
Most GEDCOM files are encoded with the record identification numbers (RINs) that were used in the exporting program that created them. Often users come to identify particular individuals within their files as much with this number as with their names. If you are importing into a new, empty family file, these numbers can be kept. As an alternative, you can have the incoming records renumbered. If renumbering, you can select the beginning number. As an example let's say you have 2,582 individuals in your current family file and are about to import a new batch. You might want to start numbering the new individuals at 3000, later making it easy to see which people were imported. Of course, if you select a starting number that is already being used in the current file, Legacy will have to jump up to a number higher than the current batch.

Check for Valid Temple Names during this import
(not shown in the above screenshot because I have LDS Options turned off)

If you are using the LDS options in Legacy, this option checks for valid temple names and abbreviations during the import.

Check for Valid Date Formats during this import
Legacy uses consistent, logical formatting rules when it comes to dates. Other programs allow free-form dates that can include unrelated text, making the dates unusable for sorting and date arithmetic. During the import process, Legacy checks each date for a proper format and presents any unrecognized dates for you to correct or accept. If you would like to accept all dates, regardless of their format, uncheck this option. (Using the search engine in Legacy, you can produce a list showing the names and record numbers of all individuals who have unrecognized dates. This list lets you quickly jump to each individual and make corrections later.)

The Dates in the GEDCOM file are in English
The dates in almost all GEDCOM files are in English, even if the GEDCOM files were produced by programs from non-English speaking countries. This is the default standard. If, however, you find that the dates are not in English, uncheck this option. This would be important, for example, if you had a GEDCOM file with abbreviated Finnish dates. The abbreviation for November in Finnish is Mar. If Legacy thought that the dates were being imported in English, all the November dates would be recognized as March. Unchecking this options tells Legacy to analyze the dates in the currently selected language instead of English.

Put Unrecognized Items into Notes Field
This option puts any unrecognized information into the Notes of the individual being read at the time. For example, a line such as "OCCUP Bricklayer" would be put into the Notes because OCCUP it is not a standard GEDCOM tag. (You can also re-map unrecognized tags to standard tags before you start the import.)

Re-wordwrap the Contents of All Notes Fields
If the notes you are importing have hard carriage returns at the end of each line, such as notes from PAF 2.31 (or PAF 3.0 notes imported from PAF 2.31), you can have them reformatted into continuous lines by choosing this option. Paragraphs breaks formed by two consecutive carriage returns are left alone.

Format Names and Places
Formats all incoming names and places to the format currently set in the Customize section. These formatting options include putting initial capital letters on given names, putting initial caps or upper casing on surnames, and formatting location names so there is a space after each comma.

Show Combine Options When Event Definitions or Locations are Different
With this option selected, if an incoming event definition or location definition is different from the current family file, the Combine Event Definition or Combine Master Locations screens are shown so that you can merge them together.

AutoSource
The AutoSource feature of Legacy lets you automatically assign a master source to each incoming individual when you are doing an import. This is often very useful as documentation of where you received the information and is much easier to do and use than making an entry in the Note field.

When you are about to import a Legacy, GEDCOM (or PAF file), you can select a master source to cite for each person by clicking AutoSource on the Import window. You can also add a new source.

Customize - click the button and that will open a secondary dialog box. This particular gedcom does not have any unrecognized "tags." 

Items to Import
(click image to enlarge)

 

Items to be Imported
During the Analysis pass, Legacy gathers all the recognizable GEDCOM tags and places them in the Import these Items box.

Items Not to be Imported
If you find a tag you don't want to have imported, highlight the tag and click Remove, or just drag the tag from the Import these Items box to the Items not to be imported box. You can move all but the first five, basic fields. If you want to only import the five basic fields, Name, Sex, Birth, Death and Marriage, click Basic 5. All the other tags will be moved to the Items not to be imported box. (You can move any tag item back by highlighting it and clicking Include, or by dragging it back to the right window.)

Unrecognized Items
Any tags that are not recognized by Legacy during the Analysis pass are placed in the Unrecognized Items box. These are usually odd, non-standard pieces of information that another program supports. If you can recognize the tag, you can map it to a standard field tag in Legacy. Or, you can always have the information placed in the Notes field so you don't lose it.

Defining an Unrecognized Item
The Unrecognized Items list contains nonstandard GEDCOM tags that were found in the file you want to import. Often, these tags are slight variations invented by another program that are easily recognizable and can be mapped to a standard tag supported by Legacy. To start the definition process, highlight the tag you want to remap and click Map to a Recognized Tag and then choose the GEDCOM tag you want to map it to.

Creating Events from Unrecognized Tags
Some GEDCOM tags are obviously names for events such as GRAD for Graduation. To convert these tags to events and have them placed in the event list for the individual involved, highlight the tag and click Create an Event for this Tag. Legacy then prompts you for an event name (up to 30 characters). During the import, all occurrences of this tag will be changed to the defined event name.

Unrecognized tags that have been mapped to existing tags, or mapped to an event name and moved to the Import these items list, can now be removed from the Import these items list by dragging them back to the Unrecognized items list or the Items not to be imported list.

Baptism versus Christening
Some genealogy programs export christening information into a GEDCOM file using the BAPM tag instead of CHR. You can have Legacy put this information into the Christening fields during the import rather than having a Baptism event created in the Event List by selecting this option.

Note Options: How Notes Are Formatted in a GEDCOM File
In a GEDCOM file, multiple-line notes are supposed to be broken in the middle of a word at the end of each line. For example, this is how a small note might look in the file:

Aunt Mary spent most of her ti
me knitting. When she wasn't kni
tting something, she was cooking.

In the past, however, most programs would break the lines between words instead of in the middle of words. For example:

Aunt Mary spent most of her time
knitting. When she wasn't knitting
something, she was cooking.

A problem arises if the old style is imported with the new rules. This results in some words being put together without any space between them. For example, the note might look like this:

Aunt Mary spent most of her timeknitting. When she wasn't knittingsomething,she was cooking.

Or, if the new style is imported with the old rules you end up with spaces in the middle of words:

Aunt Mary spent most of her ti me knitting. When she wasn't kni tting something, she was cooking.

Legacy keeps an internal list of how all genealogy programs export note blocks into GEDCOM files. This allows Legacy to decide how to put the line back together again when the notes are imported. Sometimes a GEDCOM file comes along that came from a program that Legacy never heard of. In this case, Legacy might guess incorrectly as to how the note lines are formatted. If, after importing a GEDCOM file, you find that the notes either have spaces in the middle of some of the words, or that some words don't have a space between them, you can tell Legacy to change the method it is using. You can choose between:

  • Let Legacy decide how lines are broken
  • Lines are broken in the middle of words
  • Lines are broken between words

Optional Text Preceding Notes
When Legacy comes across something in a GEDCOM file that it doesn't recognize, it generates an error message in the Error.log file and then puts the unrecognized items into the General Notes field for the individual or marriage. You can have some optional text added to the beginning of these entries in the notes to make them easier to search for after the import is completed. For example, you might add "ZZZZZ" to the beginning. Later you can then search for "ZZZZZ" in General Notes to find the individuals and marriages to check these entries to see if you want to keep them or move the information to a different place.

Import Notes into Research Notes
If you are transferring a family file from a previous genealogy program where you have kept research notes in the Notes field, you can have Legacy put these notes into Research Notes instead of General Notes by selecting this option.

Saving Your Settings
If you would like to save a particular import tag list, click Save List after you have selected the tags you want to import. Legacy prompts for a file name and then saves the list to disk.

Loading Your Saved Settings
You can load a previously saved import tag list by clicking Load List and then selecting the desired list to be loaded.

Once you have addressed all of the options you will click OK and then Start the Import. If you are importing from Family Tree Maker, you will likely see this error screen after you start the import:

Family Tree Maker
(click image to enlarge)

 

Knowing what to select here depends greatly on how well you know the incoming data. You can automatically send all of the PLAC tagged information to the Event Description field, the Event Notes field, or you can leave them in the Place field. You can also work with all of the PLAC comments in the same way if they are all the same Event type (all Residence events for example) or you can deal with them one at a time. If you want to analyze each one and decide where the information should go individually you can do that but if it is a very large gedcom you can expect to spend a lot of time sitting at your computer. Family Tree Maker used the PLAC tag for every location that has comments attached to it. If you chose one of the last two options, this is what you will see:

Needs Your Attention
(click image to enlarge)

 

In the above example you can see that in this case moving the information to the Description field make sense. If you see an actual place name along with the comment that's when you would use the Split Apart button.

 

I hope this information will help you do cleaner gedcom imports so that you have less cleanup to do afterwards.

 

Find tech tips every day in the Facebook Legacy User Group. The group is free and is available to anyone with a Facebook account.

For video tech tips check out the Legacy Quick Tips page. These short videos will make it easy for you to learn all sort of fun and interesting ways to look at your genealogy research.

Michele Simmons Lewis, CG® is part of the Legacy Family Tree team at MyHeritage. She handles the enhancement suggestions that come in from our users as well as writing for Legacy News. You can usually find her hanging out on the Legacy User Group Facebook page answering questions and posting tips.

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Incredibly helpful...has answered so many questions for me. Thanks!

Very helpful. Thank you! Where can someone find the actual Gedcom file spec/reference? Am curious why there isn’t a higher level of compliance across the industry? Is there anything being done to get there? Thanks.

Dick,
Here is a link to the standards. http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htm Unfortunately, the gedcom standard has not kept up with the changes in the capabilities of the genealogy software programs.

The Gedcom Standard does have a 1999 Draft. But at that stage the opportunities for individual expression made accommodating all the proprietary provisions not viable for a free product developed when Indexing of the Church records in the 1980s was in full swing by the LDS folk. Remember the IGI of 1988? That was assembled by GEDCOM from the field.
THE GEDCOM STANDARD
DRAFT Release 5.5.1

Prepared by the
Family History Department
The Church of Jesus Christ of Latter-day Saints. : 2 October 1999
-----------
For Me I feel the modern technologies have exacerbated the disparity among systems and programs that might use this communication method to exchange database information. Many program producing companies appear to imagine they are the sole style and communication with others is not accounted for.
In short, do not blame GEDCOM standard, but the programs and systems that would ignore existing standards.
LEGACY does well to accommodate import different standards, but even then it does not do the job properly. I use BROTHERS KEEPER & LEGACY both ways yet the standard is not addressed by either when it comes to additional or different event tags.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name and email address are required. Email address will not be displayed with the comment.)