Considering Y-DNA testing for genetic genealogy?

In my last post, I introduced the topic of DNA testing for genetic genealogy. In this post, I’m going to delve further into Y-DNA testing. First, I should point out that since the Y-chromosome is what will express the “maleness” of a man, that DNA in it will encode for these masculine attributes. However, it is not these parts of the DNA that are of interest in genetic testing. Instead, the focus is on what’s called noncoding or “junk” DNA. In the human genome, as much as 98% of DNA does not actually encode any information that is transcribed into protein, etc. This junk DNA can be thought of as genetic punctuation marks or spaces between the real words that describe us in our chromosomes. As a result, this junk DNA is much more free to mutate from generation to generation, since there is no impact on the individual. It is as if I wrote a sentence here with a comma, in the wrong, place and two periods.. you still understand the message. However, if I wwre ro msspll mi wrsdss dbly you might not understand what I mnnnt.

So, the noncoding DNA on the Y-chromosome is the focus of genetic testing. This junk DNA is checked for two distinct kinds of structures: short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs). A short tandem repeat refers to a pattern of DNA, for example, the sequence AGAC or ATTA or whatever, that exists at some location on the chromosome. This pattern might be repeated in one individual 12 times and in another individual 15 times. Specific locations on the chromosome at which STRs exist are referred to as Y-DNA Sequences or Segments, with the acronym DYS. Such DYS markers are given identifying numbers, such as DYS393 or DYS392. So, a Y-DNA STR test would come back as a list of DYS markers together with a corresponding list of numbers.

Just to be very upfront, here is a table of my own results:

Typical Y-DNA STR results

Typical Y-DNA STR results

Now, not every lab reports the sequence count in exactly the same way, for any number of reasons. So, when comparing results between individuals, it’s often important to know which labs did the work and how to translate between them. The results I’ve included here are from Family Tree DNA, whom I highly recommend as a testing lab; but we’ll come back to that later.

Note that the table is broken down into “panels”, the first giving 12 markers. This is the simplest (and least useful) test that is offered by most testing labs. It represents the twelve most stable DYS markers on the Y-chromosome, as so far identified. Finding that you and another male are equivalent in this panel is about as useful as knowing that your ancestors came from western Europe somewhere. Don’t expect to prove that Joe was your grandfather with this test. The next test up from 12 DYS locations is 25 (panel 2), then 37 (panel 3), then 67 (panel 4), and so on. It is also possible to order tests for single DYS locations, which may be of interest in specific cases.

We’ll return to STRs in a moment; but now let’s pick up on the concept of single nucleotide polymorphisms (SNPs). A SNP refers to the substitution of a single nucleotide (Adenine [A], Guanine [G], Cytosine [C], or Thymine [T]) by another. Frequently, the substitution is of T by C. Such a substitution might occur every 200 to 300 bases along a DNA string, which is to say that this is the rate of occurrence in the general population, not that every individual has all of these nucleotide substitutions. In order to be of any use to genealogy, a given SNP should occur in at least 1% or so of the population. Anything less than this and the SNP is considered a “family SNP”. Having said that, such “family SNPs” might be of great value in deciding one’s close relationship to another individual.

This brings us to the concepts of haplotype and haplogroup. A haplotype is a combination of, in this context, DYS values that are transmitted in common; in other words, men are in a common haplotype if they share specific combinations of DYS values together. Researchers, looking for genetic markers that are statistically tied to common origins, have selected particular groups of DYS locations as best choices for making valuable predictions or classifications. So, common groups of STRs define haplotypes. Haplogroups, on the other hand, are sets of similar haplotypes that share a common ancestor based on possession of the same set of SNPs; in other words, common sets of SNPs on the Y-chromosome define Y-DNA haplogroups.

Y-DNA haplogroups can be represented as a sort of tree. The “root” of the tree is a “Y-DNA Adam”, a common ancestor of all living men. The concept of this DNA Adam is not common with the Biblical Adam. Instead, the Y-DNA Adam is not necessarily the first human male, rather there may well have been earlier males with other Y-DNA structures. It is just that any of these possible earlier males than Y-DNA Adam do not have any living descendants. Branching from Y-DNA Adam on the tree is the A haplogroup, which is defined by a SNP called M91. Only the A haplogroup has this mutation: the “ancestral” or normal condition is to have 9 repeats of T at a certain position in the Y-DNA. The “derived” condition is to have 8 or 10 repeats of T; and this defines the haplogroup. Also below Y-DNA Adam are the B, D, E, C, and F haplogroups, each defined by unique SNPs. Here is a screenshot of the structure of the tree from Family Tree DNA:

Human Y-DNA Haplotree

Human Y-DNA Haplotree

Haplogroups A through E are currently assumed to have arisen in Africa. Haplogroup F is associated with the exodus from Africa, and all other human Y-DNA haplogroups outside of Africa include the defining SNP for F; viz., M89. You can grasp this idea from this map.

Below Haplogroup F

Below Haplogroup F

Below each of these major haplogroups, the tree can be increasingly further subdivided. The following screen shot from my Family Tree DNA account shows my own particular location on the tree below the haplogroup R, specifically, R1b1a2a1a1b*, which is also called R-P312 after the SNP on the tree that defines this branch.

R-P312 on the Y-DNA haplotree

R-P312 on the Y-DNA haplotree

At the top of the screen shot, you can see the collection of SNPs that I’ve tested for; the bulk of those test having come back “ancestral”, which is to say that my Y-DNA does not exhibit the mutation that defines these SNPs. However, I am positive, or “derived” for the P312 and the M269 SNPs which define the haplogroup of which I am a member. This haplogroup is strongly associated with the Atlantic coast of Europe, notably around France, Spain, Portugal and the British Isles. It is also associated with the Atlantic Modal Haplotype (AMH) which is defined by

  • DYS388 12
  • DYS390 24
  • DYS391 11
  • DYS392 13
  • DYS393 13
  • DYS19 14

This haplotype is closely associated with the R1b1a2a1a branch on the haplotree, which is defined by the L11 SNP. This is to say that the historical individual who first acquired the L11 SNP also had the pattern of DYS values shown above. If you check my DYS values above, you’ll see I differ by 1 at DYS391, where I show 10. However, I also have the P312 SNP, below L11 on the haplotree. Somewhere along the way, my male ancestors acquired both the P312 SNP and one less repeat at DYS391.

So, if I were looking for a very close in family match, I would search for other men who share not only the same set of DYS values, but who also have the P312 SNP. In other words, men who are genetically similar to me would have to be at or below the R-P312 haplogroup and in my haplotype. Estimates vary, but it’s likely that the P312 mutation occurred in some individual around 5,000 years ago living west of the Rhine river. [There can be huge debates about statements like the one I just made. I’m not trying to pick a fight about whether P312 arose 2,000 or 10,000 years ago. My point here is just that it was a long time ago in terms of my family history.] By now, this great*-grandfather of mine has many male heirs, myself included. As evidence thereof, here is a map of the locations of ancestors of current men testing derived for the P312 SNP, again from Family Tree DNA:

P312 map

P312 map

This means though, in order to be really related to me, another man must have the P312 SNP as well as a similar set of STR values. Futhermore, although they are not included on the haplotree for the broad population, testing has also shown that I carry two “family SNPs”; viz., L220 & L221. So, anyone closely related to me personally should also be “derived” for these SNPs as well.

So, here is the most common mistake made by beginners going down this DNA testing path: they will assume that the lowest cost STR panel of 12 markers is going to tell them something about their family. This is not true. On the other hand, what is called a “deep clade” test, which will identify your haplogroup, is far more useful as a starting point. That knowledge, together with at least a 25-marker STR test, is likely what you need to ante up for in order to get serious traction in this game. A 12-marker STR panel test will tell you that you’re ancestors were likely from Europe or Asia or Africa or whatever. It won’t tell you that Bob’s your uncle. A 12-marker STR panel might get you to a haplotype that is associated with, say, a second or third tier haplogroup, say, R1b. That would tell you that your forefathers came through southern Europe about 20,000 years ago, per this map:

Haplogroup migrations

Haplogroup migrations

That might be interesting knowledge, but not very useful for your genealogical researches as such. Some DNA testing organizations have interesting ways of marketing this kind of information. Here is a beautiful chart, suitable for framing, that I received from EthnoAncestry based on a panel of 27 STRs indicating that I am in the “Paleolithic European” group, as opposed to, say, Germanic or Pictish or any of those other kinds of dudes.

A Paleolithic European's 27-marker chart

A Paleolithic European’s 27-marker chart

I suppose in some ways this chart is interesting, in spite of the fact that it doesn’t narrow down my candidates for living relatives very much. By stating that my living DNA is very similar to that of folks living 10,000 to 20,000 years ago, it says that people back then were not very different from me, or you, really. Perhaps if there were some way to resurrect one of my male ancestors from back then, who knows, there might be some family resemblance, n’est-ce pas?

Two authors have fascinating analyses of the origins of the British using similar techniques. Since my paternal line is Scottish, this material appeals to me. One of the two authors is Bryan Sykes, of Oxford University in England; and the book is Saxons, Vikings, and Celts, subtitled The Genetic Roots of Britain and Ireland. Syke’s book was originally titled, The Blood of the Isles; but the title was changed for North American distribution. The other is The Origins of the British by Stephen Oppenheimer, subtitled The new prehistory of Britain and Ireland from Ice-Age hunter gatherers to the Vikings as revealed by DNA analysis. Both are available through Amazon, Barnes&Noble, and other online book-sellers.

Neither author reveals very much in the way of their detailed analytical methods within the book, which is to say that they don’t include the set of markers that they used to test people; but their work has been deconstructed subsequently (Oppenheimer here, and Sykes here). Oppenheimer employed DYS393, 390, 19, 391, 388 & 392. In Oppenheimer’s scheme, I turn out to be in his R1b-9 clade, aka Rox. According to Oppenheimer’s analysis, this Rox haplotype is densest in north-eastern Scotland, which is exactly consistent with my paternal line’s origin. What is fascinating about Oppenheimer’s analysis, to me personally, is that he suggests that my Rox ancestors arrived there as early as 11,000 years ago, as the last Ice Age was melting away. Sykes employed DYS393, 390, 19, 391, 426, 388, 439, 389i, 392, & 389ii. In this model, I am OGAP2, which is supposed to be very uniformly distributed across the British Isles. However, Colson noted in a later paper that adding DYS449 to considerations of OGAP2 yields some unique classification power. Larger values of DYS449, for example 30 or higher, seem more associated with Scotland. Note that I show DYS449 at 31, consistent with Colson’s observations.

Both Oppenheimer and Sykes argued that these populations arrived in the British Isles mainly from Iberia. More recent researches argue that these populations originated from the Balkans. An example is here, in which the authors argue that these R1b groups originated in the Neolithic rather than the Paleolithic. These more recent works do not argue against the current distributions of various haplotypes or haplogroups “on the ground,” but rather, where they came from originally. So, they would not alter any matches that you might discover with other living individuals.

Now, I fully understand that my personal patrilineal background from north-eastern Scotland is not likely to echo strongly with many other people; it is a rather narrow focus. On the other hand, we will each of us have a narrow focus on our own personal origins, wherever they may lie. I have explained how my researches turned out in order to demonstrate that you will find similar resources along your own research path. Some of these resources will be identical, say, beginning with a good test lab delivering quality test results. Some will be different, say, books and papers on your Native American or African or Asian ancestry.

If the previous comments give you the idea that the analysis of genetic information in genealogy is a topic with some degree of scientific turmoil, that would be the correct idea. New information arrives almost daily. Individuals obtain broader test results and place them in databases accessible to the public and the scientific community. Archaeological sites are discovered and human remains analyzed for their DNA content. Previous patterns are re-analyzed and new conclusions are drawn. This is the state of the art. There are forums, like DNA-Forums, where active participants engage in discussions on these results, sometimes misinformed or under informed, but often including some of the best researchers in the world. While this makes the area somewhat confusing, it also means that any sample that you contribute to a test lab can be upgraded for new tests as they become available and matched against an increasingly large database of other individuals. In all of genealogy, this arena is perhaps the most dynamic as a result of the focus of such scientific efforts and the participation of more and more people all over the world.

Now, for a plug for Family Tree DNA as a testing lab. I have used several other labs, and some of these are resold under other brands; for example, through Ancestry.com. My personal recommendation, for which we here get no considerations at all, is to go with Family Tree DNA (FTDNA). Other labs have made mistakes with my results, which I have had double checked through FTDNA. FTDNA has a significant and growing database of customers. They support a large number of surname and geographical projects (see the screenshot below):

Index of DNA projects

Index of DNA projects

If your own family surname isn’t already present, then you can begin one. While I haven’t got to these topics yet, FTDNA supports testing of Y-DNA, mtDNA, and autosomal DNA all from your original sample. You can order test upgrades as they become available. Tests are reasonably priced compared to the competition. I just like these folks services. One of the related services, in the area of Y-DNA is the publicly accessible database at Y-Search. This database has over 100,000 entries of which about 90% are from FTDNA. There are other services and databases; for example, Sorenson Molecular Genealogy Foundation (SMGF), which has about 38,000 men in their Y-DNA database at the time I’m writing this.

For the record, here is a comparison chart of testing companies for you to check over. I stand by my recommendation for FTDNA.

Good luck in your efforts!

Leave a Reply

%d bloggers like this: