Purpose: Describe the meaning of the cell colors and values. Level: Beginner Format: Description with examples and equations

NOTE: Shared Clustering Version 1.1.0.97 (released 30 Mar

2020) fixed a bug in the computation of the values in the

cells. If you are using an earlier version, the description

below does not apply, but the differences are small.

## Paint by numbers

In A Picture is Worth a Thousand Words, we described the value of *seeing* our DNA matches and shared matches in a picture. This article explains how Shared Clustering *paints by numbers*.

Clustering applications all face the challenge of painting a picture that is easy to interpret. There are many ways to do this, and certainly the beauty (the ease of use) is in the eye of the beholder (you).

## Who is who?

In this discussion, there are four roles that AncestryDNA kits assume, but we only need to discuss three of them after we define the four.

**Test**—all the data in the diagram is for one AncestryDNA kit, the Test kit**Match**—any of the DNA matches with Test; listed on AncestryDNA’s*Test’s DNA Matches*page**Shared Match**—any of the shared matches between Test and Match; listed on AncestryDNA’s*Shared Matches*tab for Match**Shared Match Pair**—any two Shared Matches of Match. [this is a new term and we welcome input on its suitability or an alternative]

The room is getting crowded, so we will now avoid referring to Test to keep this simple.

And, remember: Matches are rows, Shared Matches are columns.

## The chicken or the egg?

Obviously, Shared Clustering groups Matches together in *clusters*.

Shared Clustering also *colors* the cells and displays a *value* in each cell. Hopefully, you will become comfortable with the colors, and forget the entire math discussion below. Don’t be geeky like us.

Each cell in the diagram gives us information about how the **Match** (the row) is associated with the **Shared Match** (the column).

Shared Clustering assigns a value between 0.0 and 2.0 to each cell, and assigns a 3-color scale to each cell based on that value.

The value in the cell indicates the frequency that Shared Match Pair appears in shared matches lists. **The more frequently a Shared Match Pair appears together in shared matches lists, the more likely the Shared Match Pair also share DNA between them, and the more likely they are on the same line of descent.**

Each member of a Shared Match Pair obviously shares DNA with Test (oops!), otherwise we would not have invited them to our clustering party.

So which came first, the color or the value? In this case, the value. Shared Clustering computes the value. Your spreadsheet assigns the 3-color scale based on the value, assuming it supports this feature.

## Color and Value definitions

The 3-color scale is Grey-Beige-Dark Red. If the cell value is 0.0, the cell is left empty and the cell is White. As the value increases from 0.0 to 2.0, the color shifts from Grey to Beige to Dark Red.

Color | Value Range | Shared Match? | Shared Match Pair? |

White | 0.0, so empty | No | No |

Shades of Grey | (0.0-1.0) | No | Yes, increasing frequency |

Beige | [1.0] | Yes | No |

Shades of Red | (1.0-2.0) | Yes | Yes, increasing frequency |

Dark Red | [2.0] | Same person | Same person |

Value Range symbols:

(# and #) means exclusive of, or not including the number; so (0.0-1.0) means between zero and one, but not including zero and one.

[# and #] means, inclusive of, or including the number.

## How are the values computed? – Part I

Let’s use the real cluster diagram below. There are 16 Matches (rows) and 16 Shared Matches (columns) taken from a much larger diagram. We will focus on Amy and Bill, and refer to everyone else here by a number (1 through 14).

Let’s start with a couple of soft pitches.

Reading across Match Amy’s row, Shared Match 4 (column 4) is not a shared match for Amy. The cell therefore has an empty value (0.0) and White color.

Reading across Match Amy’s row, Shared Match Amy (column Amy) is the same person, so she is a shared match with herself. The cell therefore has value 2.0 and Dark Red color. Looking down the diagonal cells of the diagram from top-left to bottom-right, we see all 2.0/Dark Red.

All the other cells (not empty (0.0)/White and not 2.0/Dark Red) are the fun part.

## How are the values computed? – Part II

OK, here comes the math. Feel free to turn back now. We lost a lot of sleep figuring this out and attempting to explain it simply and clearly.

In the diagram above, focus on the 2×2 block for Amy and Bill in the lower-right corner. We repeat this 2×2 block below. The values are:

Amy | Bill | |

Amy | 2 | 1.875 |

Bill | 2.000 | 2 |

Amy is Amy, and Bill is Bill, so that explains the two 2’s.

Let’s figure out where the *1.875* came from.

The **Value** 1.875 represents how frequently Amy and Bill appear together (they are a Shared Match Pair) in shared matches lists containing Amy.

The equation is:

**Value** = (1 + (**n** / **d**))

**d** is the number of times Amy is a *Shared Match*.

**n** is the number of times Amy and Bill are a *Shared Match Pair* **when** Amy is a *Shared Match*.

First count the number of times Amy is a *Shared Match* (**d**). This is the number of times a cell in **Amy’s column** is in the range [1.0-2.0]. Check the table above explaining the values one more time if you need to. When you count, include row Amy.

The correct answer is **d = 8**.

Now count the number of times Amy and Bill are a *Shared Match Pair* **when** Amy is a *Shared Match* (**n**)?

Let’s first make sure we know what we’re talking about. Look at the diagram above again.

Look at the Shared Matches for Match 2 (row 2). Look at the *green ellipse*. Amy is a *Shared Match* of Match 2. Bill is a *Shared Match* of Match 2. So Amy and Bill are a *Shared Match Pair* for Match 2.

Look at the Shared Matches for Match 12 (row 12). Look at the yellow ellipse. Amy is a *Shared Match* of Match 2. Bill is *not*. So Amy and Bill are *not* a *Shared Match Pair* for Match 12.

Ok, go ahead and count **n**. For each of the 8 times (the 8 rows) that Amy is a Shared Match, how many times is Bill also a Shared Match. The correct answer is **n = 7**.

So **Value** = (1 + (**n** / **d**)) = (1 + (7 / 8)) = 1.875. Yes! We did it together.

## Final exam

Now it is your turn. Repeat the above exercise for the other value in the 2×2 block, the **Value **2.000, representing how frequently Amy and Bill appear together (they are a Shared Match Pair) in shared matches lists containing Bill.

…

Ok, you didn’t peek did you?

**d** = 7

**n** = 7

**Value** = (1 + (**n** / **d**)) = (1 + (7 / 7)) = 2.000