SC: A Picture is Worth a Thousand Words

Purpose: Explain the "pixels" of the Cluster Diagram.
Level: Beginner
Format: Step-by-step tutorial
TBD: add example screenshots

Why Use Shared Clustering?

A picture is worth a thousand words.

Unknown (maybe they are a brickwall in your family tree)

Our DNA matches and our shared matches with them are the thousand words, potentially thousands and thousands.

The Shared Clustering diagram is the picture.

In the picture, we will be able to see potential relationships much easier than we ever could reading the AncestryDNA match pages; page after page after page.

The picture organizes our matches into groups of potentially related individuals, called clusters, that potentially share common ancestors in a line of our family tree.

With the Shared Clustering diagram and features of the Shared Clustering application, we will be able to:

1) focus our DNA research on specific ancestors or brickwalls in our tree,

2) leverage AncestryDNA’s ThruLines,

3) quickly identify clusters of matches that may be able to assist our research, quickly access their tree (or know that they don’t have one), and contact them, and

4) record our findings, and update the AncestryDNA notes for each match—bulk updates, not one match at a time!

But, before we look at the diagram, let’s look at one pixel in the picture (one cell in the spreadsheet). It will require a little patience to not look at the clusters just yet, but we will be able to understand the clusters better if we understand one cell first.

Find an AncestryDNA Match in the Diagram

Let’s look at one of our AncestryDNA matches, and find that match in the Shared Clustering diagram.

To complete this tutorial, you need to have already
installed the Shared Clustering application on your
computer, downloaded data for one of your AncestryDNA tests
(generates a .txt file), and clustered the data (generates
an .xlsx file). See Quickstart for clustering.

Make and use a COPY of the .xlsx file for this tutorial as
we will mess it up during the process.
  1. In your browser,
    1. Go to Ancestry.com > DNA > DNA Matches.
    2. Under View Another Test, select the test for which you downloaded data and clustered using Shared Clustering.
    3. Select the first match (the one at the top of the list) by clicking their name.
    4. View this matches’ shared matches by clicking Shared Matches.
  2. In your spreadsheet application,
    1. Open the clusters file (the .xlsx file) for the test you are viewing on Ancestry.com.
    2. Ignore the clusters for a few minutes (yes, it is difficult to do). Let’s get familiar with the spreadsheet.

By default, the file has the leftmost columns (Cluster Number through Note) freezed so they always appear on the left, and the first row freezed so it always appears at the top.

The diagram appears on the right in the portion of the spreadsheet that is not freezed. The diagram may be very large, so you need to be able to scroll left-and-right, up-and-down, through it.

The names of the matches for the AncestryDNA Test are in column B to the left of the diagram. You scroll up-and-down through the matches.

The names of the shared matches for the matches are in row 1 above the diagram. You scroll left-and-right through the shared matches.

This detail (rows=matches, columns=shared matches) is specific to the Shared Clustering application analyzing AncestryDNA matches. This clustering application colors each cell and displays a value (i.e. number) in the cells in a manner that depends on this detail.

How do we know whether the matches are in the rows or
columns?

AncestryDNA currently (28 Mar 2020) does not display shared
matches below 20 cM. If we perform a Shared Clustering
download below 20 cM, we will see rows with shared cMs less
than 20 cM. But we will not see them listed in the columns.

An alternative clustering application may have them flipped
the other way (i.e. rows=shared matches, columns=matches),
or may not even point out which is which.

Ok, let’s look at that AncestryDNA match in the diagram now.

  1. In your spreadsheet application,
    1. Display the matches the same way Ancestry.com does, in descending order of shared centiMorgans (cMs).

      Using the spreadsheet Sort function, sort column E, Shared Centimorgans, in descending order. The name of the first match displayed on Ancestry.com (step 1.3 above) should now be listed as the name of the first match (row 2, column B) in the spreadsheet.

      If the name is not the same, start over at step 1.0 and check your work.
    2. Now find the same name in the shared matches (row 1 of the columns). Use the spreadsheet Find command, or scroll left-and-right through the columns.
    3. For the shared match you found in step 3.2, work your way down the column of cells, stopping at each cell that is beige or a shade of red (the ones with a “1” or “2” inside). For each of these cells, look at the name of the match (the name in column B for that row).

      The cells shaded beige or a shade of red, and having a “1” or “2” in the cell, mean the AncestryDNA Test and the match (the row) do share the shared match (the column).

      The cells shaded white or gray mean the AncestryDNA Test and the match (the row) do not share the shared match (the column).

      The names of the matches should be the same as, and in the same order as, the list of shared matches for this test you displayed on Ancestry.com in steps 1.1–1.4.

The Shared Clustering Magic (Math)

Ok, we know you are dying to do it. Let’s step back and look at the entire diagram.

  1. Zoom out, perhaps to 25% or 50%, so you can see more or all of the diagram.

    In Excel, the zoom setting appears under:

    View tab

Wait a minute, where are the clusters?!

Remember step 3.1 where we reordered the matches. We messed up the magic of the Shared Clustering algorithm. Your data is now displayed the way Ancestry.com presents it, in order of decreasing shared cMs.

Shared Clustering uses fancy math to put the matches (the rows) in an order that forms clusters of possibly related matches—matches that potentially share common ancestors in a line of our family tree.

The Shared Clustering algorithm only places a match in one cluster. But by looking left and right of the cluster along the row for that match for cells shaded beige or a shade of red, and having a “1” or “2” in the cell, we will potentially find shared matches to this match in other clusters.

Summary

The Shared Clustering diagram is a picture of your matches,
listed in the rows, and your shared matches with those
matches, listed in the columns.

The matches are sorted into clusters of matches that
potentially share a common ancestor, or different common
ancestors along that line of the tree.

We also look to the left and right of the cluster (across
the columns of the matches' row) to find shared matches in
other clusters also containing this match, potentially
through other shared DNA segments.

More Information

The details of the cell colors and values are explained here and here.