The Misunderstood Short Segment Match

When I started this blog late in 2018, I had planned about a half dozen posts to get things going at a good pace. However I backed off and put the blog on hold when I decided that I was not ready to endure some of the controversy which I might create. Recently, I decided to go forth again because I see little progress in the field of autosomal DNA matching despite all of the money being spent on consumer genetic testing. One controversy which I have been involved in over the years is short segment matching, so I am devoting this blog entry to that topic (which was my original plan).

Experts in genetic genealogy tend to have a disdain for lowering the default parameters with autosomal one-to-one comparisons. The resulting short segment matches are viewed as something to disregard. Not only are they thought to be useless, but they are labeled with the pejorative “false positive”. Fortunately I had no expert to guide me when I was faced with understanding my Family Finder (autosomal DNA) test results, so I was able to get a different perspective on the problem.

My interests were in finding more distant rather than recent ancestors. I had some luck with Y-DNA in smashing the brickwall for my patrilineal third great grandfather, but I still had one for his wife. I decided to follow up my initial Y-DNA testing with Family Finder, FTDNA’s autosomal DNA test. When I received my Family Finder test with thousands of matches, I was a bit overwhelmed – a feeling which I am sure many of you have had as well. My instincts were that to find distant matches, the matching segments would have to be smaller. I will save my story about breeching the third great grandmother’s brickwall for later if anyone is interested, but I definitely could not have achieved that without short segment matching.

Still, I was faced with the notion shared by most of the field that the parameters such as what I had used for short segment matching were so small that the matches which they produce tend to be false positives. With my rudimentary knowledge of probability and statistics, I thought that even the small segments were big enough to have statistical significance, so I was not buying the pronouncements of a high priest as to whether a segment was a false positive or not. For instance, the chance of two kits getting a half-match by chance at a given SNP pair is about seven out of eight. To get at least a half match with a hundred consecutive pairs just by chance, raise that 7/8 to the one hundred power and you get less than two out of a million chances. So much for the chances of a false positive. If you are having trouble with this, so back and reread “Genetic Genealogy’s Chain Letter Fallacy”.

Looking over what had been written about the issue, I found that the methods used by experts/professionals to “prove” that small segments tended to be false positives centered around either the child having a larger matching segment to someone else than one of his parents did in the same area or a matching segment to someone else popping up for the child, but neither parent having a similar match with that person. I had seen these situations and my intuition immediately was that both parents contributed DNA to reconstruct a segment of DNA from a common ancestor (i.e. the parents were related). Of course, the expert/professional will say that his/her parents were not related. I think that whomever makes that statement is very presumptuous. If you go back eight generations, you have 256 possible lines, half on the maternal side and half on the paternal. Can anyone definitely say that none of the 128 lines on the maternal side share a common ancestor with any of the 128 lines on the paternal? “Genetic genealogy’s Chain Letter Fallacy” says no.

I would like to present examples of how the parent’s DNA can come together for their child to have a larger matching segment than either of its parents do with the same person. I am using data for a child and its mother and father plus a cousin (4th to the child, 3rd once removed to the mother). The cousin has no close relationship to the child’s father, but I believe most people who can trace back to American Colonial times likely are distantly related several ways.

Figure 1 shows the case of a growing segment with child. Some experts have used such examples to say that the larger matching segment has to be false. I think this clearly shows how matching DNA between the father and the cousin, though the segments are not long enough to be called a match, fills in to create a longer matching segment between child and cousin than the cousin had with with the mother.

Figure 2 illustrates the case with a popping up segment.

Here the matching DNA between the mother and cousin and the father and cousin combine to create a longer match between the child and cousin.

Clearly small segments have not been proven to be false positives. The question now is how do we identify the ones which are useful in proving a match?

Genetic Genealogy’s Chain Letter Fallacy

Most of use know about chain letters, specifically the financial pyramid ones where you send X dollars to the guy at the top of the addressee list as instructed, and then you make and send Y copies of the letter with the top addressee dropped, the others moved up, and you added at the bottom. After waiting a short period of time, you should receive a fortune of Z dollars.

Yes, they do work if you have an infinite population and you are able to select people at each round who have not played before (and will all play). So, in what ways does genetic genealogy appear to be like chain letters?  Just as people often are added to more than one recipient’s list of people to get the letter in the future, we can come back to the same person as a common ancestor via more than one of our lines. The implicit assumption is that we have an infinite population with chain letters, but the same seems to hold for genealogy. Some will make statements to acknowledge the problem in genealogy with multiple lines from an individual going back to same ancestor, but then will proceed full speed with a methodology which ignores what they said. But the problem is even worse than with chain letters. The current world population is over 7.5 billion, so ignoring issues of individuals’ age, that is our pool of potential players. With genealogy, each generation goes back further back in time. The potential players become a lot less the further back one goes. The current population in excess of 7.5 billion grew from a level of 6 billion in the year 2000. But go 200 years further back to 1800 and the population level was only passing a billion people. So just as with chain letters where many people are asked to play more than once, negating their potential gains, we find common ancestors who played more than once in many people’s family trees.

What does this mean for our matching methodology with genetic genealogy? I assume that people who have found their way to this website probably have taken an autosomal DNA test such as FTDNA’s Family Finder or AncestryDNA and have at least a basic knowledge about how DNA works. They may be frustrated with the faults of the current matching methodology, at least in finding ancestors more than a few generations back. If you are instead  a novice in need of education, plenty of information and links can be found at the International Society of Genetic Genealogy (ISOGG) Wiki site:

ISOGG Wiki

Though I have referred you to this body of knowledge, in my upcoming blog posts, I will be contradicting some of the theory contained there.

I trust that everyone at this point knows about the twenty-two pairs of autosomal chromosomes found in the nucleus of all our cells, with one chromosome in each pair being a copy of one originally supplied by a parent. We will save the 23rd chromosome pair, the X-Y sex determining one for later.  Also, readers should understand the recombination process in which each parent passes down a single chromosome to a child. This chromosome has parts from the two which came from the parent’s parents. In this process the child gets half of its DNA from each parent.  A chromosome passed on by a parent should average half from each of the parent’s parents’ DNA, but because of the random nature of recombination the proportion may vary a bit as one parent may contribute more and the other less. In general terms, for each additional generation, the amount of an ancestor’s DNA passed on should fall by about a half on average. Thus, a grandparent may contribute a fourth, a great grandparent an eighth, and so on until none is left. However, as we see by the Chain Letter Fallacy, an ancestor’s contribution may last further forward by having more than one lines from him to the descendant.

To determine whether two people are related, we compare their DNA results (kits) to look for sequences of DNA which are the same and presumably come from a common ancestor. The different DNA vendors have their own matching criteria and you have sites such as GEDmatch.com which have matching utilities independent from the vendors where users can upload their kits to widen their search from the vendor’s database.  To be considered a likely match, a segment must exceed a set minimum length (usually expressed in the measure called Centimorgans) and that segment must exceed a set number of SNPs, the points that we measure. Of course, such a procedure is just a statistical test to estimate whether you are related. In reality, you might be and the procedure does not pick up that you are. What we consider a match is typically a half match because the match is only with one of the chromosome pair. The choice of these two parameters determines how likely a “real match” is versus the possibility of a false-positive event. In future blog posts, the question of whether the chances of false-positive matches are overstated will be examined.

Current genetic genealogy theory says that DNA is passed from one parent or the other to the child in long strands. The point at where a change occurs is called a recombination point or crossover. These recombination events are believed to be rather infrequent. One source referenced in the Recombination page on the ISOGG Wiki says that males average 27 crossovers per child and females average 41. That would be over 22 chromosomes of varying length. Sometimes a whole chromosome is thought to be passed down intact. Usually that involves one of the shorter, high numbered chromosomes.

Interestingly, we think of these long strands of DNA as either belonging to the paternal side or the maternal side of the chromosome pair, but the individual point on the strand called an allele cannot be determined by our testing technology as to which side that it is on. I believe what we have here is a case of people thinking that they see what they don’t see. For various reasons, I have developed a contrarian view about DNA being passed in long segments. As you see from my chain letter analogy, I believe that a person typically has several lines back to a more distant common ancestors and the DNA that manages to survive coming down the different lines reconstructs that segment of the ancestor’ DNA corresponding to the match. Another point to consider is that comparing a specific SNP between two people, you have about a 75 percent chance of having at least a half match at random. Just these random matches can be the “glue” that holds many small matches together to give us the appearance of long strands being passed.  I have more arguments to support this belief, but I will save them for later blog posts.

In my view, I see that switching between parents’ contributions is so common that the idea of infrequent crossovers is a faulty part of the theory. Perhaps I can illustrate this with an example.  In Figure 1 we have a couple of GEDmatch Chromosome Browser images for ‘one-to-one’ comparisons on chromosome #20. The top comparison is between a child and her matrilineal grandmother. The graphic shows mostly yellow for half matches with small bands of green for full matches. The blue line confirms that the matching segment is believed to be the whole chromosome.

post 1 figure 1 low res #20 compare

The lower comparison is between the father of the child and the father’s mother-in-law. Of course, the mother-in-law is the same person as the matrilineal grandmother in the first comparison. A One-to-one comparison between the father and the mother on GEDmatch using default parameters does not estimate a relationship nor does the father/mother-in-law comparison, so the father and mother are not likely related in the last couple of hundred years.  Returning to the lower display in figure 1, we still have a lot of yellow with generally fewer and less broad bands of green with a lot of red areas mixed to signify no match areas. I used parameters of 300 SNPs and 3.0 cM minimum segment size for the comparison rather than defaults parameters of 500 and 7.0 cM so that I could get a small matching segment to show. Some will say that the matching segment could be a false-positive since I used such small segment size, but in an upcoming blog post I will show that the chances of getting a false-positive comparing two Family Finder kits using those parameters is negligible. For now, just trust me about that.

At first glance, the match between the child and grandmother appears to be just the grandmother’s entire recombined chromosome #20 contribution being passed down as expected to her daughter, the child’s mother, and that chromosome survived intact  the recombination process of the parents passing their DNA to the child. The small matching segment between the father and his mother-in-law is in an area that is generally yellow in both comparisons indicating a half match. The conventional wisdom conclusion is that the match between the father and mother-in law would be on the other chromosome in the pair.

I think the Chromosome Browser is misleading us here by the washing out many of the full match indications with the low resolution image. In Figure 2, we are focusing on the region where we showed a match between the father and the mother-in-law by using the parameters of 300 SNPs and 3.0 cM minimum segment length . Full resolution was selected, though I do not think it really is full resolution in the sense of one SNP equaling one pixel. None the less, the image has a lot more green indicating full matches and some areas of the green in the comparison between the father and his mother-in-law line up with those in the child/grandmother comparison. From this perspective, the father does seem to have some common ancestry with the mother-in-law and the father is adding some DNA to the match between the child and the maternal grandmother. You cannot make the conclusion that the whole chromosome has been passed down from the grandmother, through the child’s mother, and then to the child after seeing it this way.

post 1 figure 2 hi res #20 compare

This is not a rare example. You easily can find examples like this if you just look at the data. In the next few blog posts I will present more evidence that the idea of long strands of DNA being passed from one or the other parent during recombination is incorrect and that crossovers are far from being infrequent.

Upcoming attraction: The Misunderstood Short Segment Match