Max Grossmann: My review of Sequencing.com

My review of Sequencing.com

Posted: 2026-05-27 · Last updated: 2026-05-27 · Permalink

This is an unbiased review of Sequencing.com, a company that offers Whole Genome Sequencing (WGS). Neither is this post sponsored nor was I provided with the product for free. Sequencing.com did not know in advance that I would write this review. I am just an individual nerd, biohacker, and scientist. See here for more information. I will be updating this post as necessary.

This is a review of their Comprehensive Health Screen offering. At the time of my purchase in 2026, that was their cheapest bundle (US$399).

Why I had my genome sequenced
Common objections to WGS partially debunked
Preparing and learning about genomics
My desiderata and expectations for WGS
Why I chose Sequencing.com
Timeline
Post-sequencing steps and experience
Quality indicators
ClinVar annotation: what the “Pathogenic” variants actually are
A note on geographic ancestry analysis
Conclusion
Changes to this document

Why I had my genome sequenced

WGS provides several types of information relevant for health management:

Pharmacogenomics: The vast majority of people carry at least one pharmacogenomic variant that affects drug metabolism. Genes like CYP2D6, CYP2C19, and CYP2C9 influence how you process roughly 20% of common medications including antidepressants, opioids, statins, PPIs, and warfarin. Poor metabolizers may experience toxicity at normal doses, while ultra-rapid metabolizers may not respond to standard dosing.
Disease risk variants: High-penetrance variants in genes like BRCA1/BRCA2 (cancer risk), HFE (hereditary hemochromatosis), and LDLR (familial hypercholesterolemia) can inform screening schedules and prevention strategies (see below).
Carrier status: Identifies whether you carry recessive disease variants for conditions like cystic fibrosis, thalassemia, or spinal muscular atrophy. Relevant for family planning.
Reanalysis over time: While one’s genome is static, variant databases and clinical guidelines are updated continuously. New variants are classified each year, and new gene-drug associations are discovered. With the raw data, you can reanalyze against updated databases every few years without needing to sequence again.

Also, I am a biohacker and nerd. In all sincerity probably the key reason to do this.

Common objections to WGS partially debunked

Information avoidance

A common concern: “What if I learn something I don’t want to know?” This is called information avoidance. People deliberately avoid information that is psychologically uncomfortable.

In reality, there are only a few genuine genetic death sentences where knowing (currently) provides no benefit. These are rare, high-penetrance, deterministic conditions with no known cure:

Huntington's disease: CAG repeat expansion in the HTT gene. If you inherit it, you will develop the disease (typically onset in the 30s-40s). Life expectancy is 15-20 years after symptom onset. A gene therapy called AMT-130 showed a 75% slowing of disease progression in 2025 trials, but in March 2026 the FDA said the Phase I/II data were not sufficient to support a marketing application; no cure exists yet.
Early-onset familial Alzheimer's disease: Variants in PSEN1, PSEN2, or APP genes. PSEN1 and APP variants have approximately 100% penetrance. Accounts for less than 1% of Alzheimer's cases. Onset typically before age 65. Drugs like lecanemab and donanemab exist but provide only marginal clinical benefit and do not prevent or cure the disease.
Genetic prion diseases: All caused by mutations in the PRNP gene. All are invariably fatal with no known cure.
- Fatal familial insomnia: Progressive insomnia, dysautonomia, and neurodegeneration. Survival is 7 months to 6 years (average 18 months). Nearly 100% penetrance.
- Genetic Creutzfeldt-Jakob disease: Accounts for 5-15% of CJD cases. Rapid dementia and death, typically within one year.
- Gerstmann-Sträussler-Scheinker syndrome: Progressive ataxia and dementia. Survival is 2-10 years.
Some forms of familial ALS: About 5-10% of ALS is genetic. SOD1 mutations (20% of familial cases) now have tofersen (Qalsody), which slows progression by about 50%. C9orf72 mutations (40% of familial cases) have no approved treatment yet. (However, most ALS is sporadic, not genetic.)
Some extraordinarily rare mitochondrial diseases, such as those in POLG.

Crucially, if none of your family has ever had any of these conditions, your risk is very low. These are hereditary diseases that run in families. De novo mutations (spontaneous new mutations) do occur but are extraordinarily uncommon. For Huntington's, about 24% of new diagnoses lack family history, but these typically arise from expansion of "intermediate" CAG repeats (27-35) that were already present in a parent. True de novo mutations reinventing the disease from whole cloth are exceptionally rare. For early-onset Alzheimer's, de novo PSEN1 mutations account for roughly 8% of sporadic cases with onset before age 51 (which is itself rare). For prion diseases, de novo mutations are extremely rare (sporadic fatal insomnia has only about 30 recorded cases worldwide). The worry about discovering an untreatable genetic death sentence is largely unfounded for people without relevant family history.

It is also crucial to accept that every life on this planet will end. Life is only valuable because it is finite and that indeed, every life comes with a death sentence. The earlier you understand that, the better!

Most genetic risk is actionable

For nearly everything else, knowing allows you to take action. BRCA1/2 (cancer), APOE (Alzheimer’s risk factor, not determinant), HFE (hemochromatosis), LDLR (heart disease), pharmacogenomic variants enable you to:

Adjust medication (avoiding drugs you can't metabolize, using alternatives)
Intervene preventatively (prophylactic surgery, lipid management, iron monitoring)
Get more screening (mammograms, colonoscopies, cardiac imaging)
Modify your lifestyle (diet, exercise, avoiding specific environmental triggers)

My own results illustrate this well: over a dozen variants were flagged as "Pathogenic," and after manual verification, none were clinically actionable. The carrier variants were mild. The pharmacogenomic findings were genuinely useful. (More on that below.) Nothing in my genome dictated my health in a way I could not already influence through ordinary decisions about medication, screening, and lifestyle. For the vast majority of people, genetic determinism is simply wrong. Outside of a small number of high-penetrance Mendelian conditions (listed above), genes are probabilistic risk factors that interact with environment, behavior, and chance in ways we do not yet fully understand.

That last point deserves emphasis: we do not know enough about most of the genome to make strong predictions from it. The clinical utility of WGS today is concentrated in a few well-studied areas — pharmacogenomics, carrier screening for known Mendelian diseases, and a handful of high-penetrance cancer and cardiac genes. For the rest, variant databases are incomplete, penetrance estimates are uncertain, and gene-gene interactions are poorly characterized. WGS data contains a partial, evolving, and often ambiguous snapshot of risk. Knowing about a variant does not change the fact that it was already there. But it may let you do something about it — or, just as often, confirm that there is nothing to worry about.

Privacy and genetic discrimination

“What if someone gets my genetic data?” This has several dimensions, most of which are less serious than commonly assumed.

Law enforcement: Investigative genetic genealogy (the GEDmatch technique used to identify the Golden State Killer) and forensic DNA databases are tools for solving violent crime, where biological material is left at the scene. In the unlikely event that you or the likely readership of this blog are ever investigated for anything, it will almost certainly be white-collar and will not involve any DNA evidence whatsoever. If a relative is identified through your data as a perpetrator of violent crime, the responsibility lies with the person who committed the crime, not with the person who wanted to understand their own health. Keep your data off public genealogy databases if this concerns you, but do not let it stop you from getting sequenced.

Health insurance discrimination: In the United States, GINA (2008) prohibits health insurers and employers from using genetic information. Most developed countries have equivalent protections. Australia goes further: private health insurance uses community rating, meaning insurers cannot price based on individual health status at all. GINA does not cover life insurance, disability insurance, or long-term care insurance in the US; in Australia, the use of genetic test results in life insurance for policies below certain thresholds is currently limited, with a broader statutory ban scheduled to take effect on 2026-10-08.

But the deeper point is economic. Banning insurers from using genetic data while allowing individuals to test freely creates a textbook adverse selection problem. Individuals who discover high-risk variants have an incentive to buy more generous coverage; those who discover they are low-risk may reduce coverage or self-insure. The insured population shifts toward higher risk, and premiums rise for everyone still in the pool. This effect worsens as WGS adoption grows. If both parties had the same genetic information, this adverse selection would disappear: individual premiums would vary more (reflecting actual risk), but the average premium would decrease. Laws like GINA are distributive justice decisions — they shield high-risk individuals from bearing the full cost of genetic bad luck — but they achieve this by raising average premiums for everyone else. For most people, symmetric genetic information would make health insurance cheaper, not more expensive. This is really important to understand: insurance allows you to trade variance for a little bit on top of the expected value. If consumers accept more variance, premiums will drop on average.

Data custody: The genuine privacy risk is not sequencing itself but where the data lives afterward. 23andMe's bankruptcy in 2025, with roughly 15 million genotypes in its database, illustrated what happens when a genomics company fails: customer data becomes a business asset in insolvency proceedings. The mitigation is straightforward — download your raw data, verify it, store it on your own hardware, and do not depend on the company for long-term custody. This is what I did, and it is what I recommend. All reputable WGS providers also allow you to delete your data. They are legally required to do so; and considering the storage costs, they will also oblige out of self-interest.

“I’ll wait for cheaper or better technology”

Sequencing costs have dropped from roughly US$3 billion (Human Genome Project, 2003) to under US$400 for consumer 30x WGS. They will continue to fall, and long-read sequencing will eventually become the consumer standard. But your genome does not change. Sequencing now and reanalyzing later against improved variant databases gives you both immediate pharmacogenomic utility and long-term optionality. Clinical situations where you need pharmacogenomic data — surgery, a new prescription, an unexpected adverse drug reaction — arrive without warning. Having the data before you need it is the entire point.

“A genotyping array is good enough”

Consumer genotyping services (23andMe, AncestryDNA, etc.) use SNP arrays that test roughly 600,000 to 2 million pre-selected positions out of 3.1 billion base pairs in the human genome. They are inexpensive and useful for ancestry and common-variant associations, but they have structural limitations that WGS does not:

Fixed panel: An array only tests positions chosen at chip design time. Novel variants, rare mutations, and anything not on the panel are invisible.
Complex gene regions: Pharmacogenes like CYP2D6 involve deletions, duplications, and hybrid alleles that arrays characterize poorly. WGS captures the full structure.
No reanalysis for new discoveries: When a new pathogenic variant is discovered next year, you can check your WGS data for it. Array data contains only what was on the chip — if the position was not on the panel, you have no data and never will.
No structural variants: Arrays cannot detect inversions, large insertions or deletions, or copy number variants outside pre-designed probes.

Genotyping arrays are a snapshot of known variants at the time the chip was designed. WGS data is future-proof.

Preparing and learning about genomics

Before ordering, I spent time understanding genomics concepts and file formats:

Reading: Archibald’s Genomics: A Very Short Introduction (Oxford University Press, 2018) is a very neat overview. I highly recommend it.
File format basics: Familiarized myself with FASTQ (raw reads), BAM (aligned reads), and VCF/gVCF (variant calls). Understanding these formats helps verify data quality when it arrives.
Analysis tools: Reviewed the GATK Best Practices pipeline, SAMtools, and Ensembl VEP. All are FLOSS and well-documented.
Interpretation services: Explored tools like Promethease (US$12, generates SNPedia-based health reports from VCF files) and free alternatives like Impute.me (polygenic risk scores) and ClinVar (clinical variant database). I do not recommend Promethease, as it has been essentially decommissioned and is broken most of the time.
Quality metrics: Learned what to check. Average depth should meet or exceed ordered coverage (30x), at least 95% of genome covered at 10x or higher, and FASTQ quality scores (Q30) above 85%.

You do not need to become a bioinformatician, but understanding the data formats and basic QC metrics helps ensure you get what you paid for.

My desiderata and expectations for WGS

Before ordering, I established some desirable factors for data quality and format:

Sequencing depth and technology

30x coverage minimum: Each base pair should be read approximately 30 times. This is the industry standard, sufficient for detecting most variants with high confidence. (60x or 100x would be better for rare variants and mosaicism, but costs 2-3x more.)
Illumina platform: NovaSeq 6000 or X Plus provides the highest base accuracy (Q40 on X Plus). Short-read sequencing is the established standard for SNP detection.
PCR-free library preparation: Gold standard. Provides more uniform coverage across GC-rich regions, eliminates amplification bias, and improves indel detection. Sensitivity exceeds 99.77% for SNPs at 40x coverage.

Sequencing.com status: They provide 30x clinical-grade WGS. While not explicitly known, their sequencing platform is almost certainly Illumina.

Data formats and access

This is the most important point: I wanted raw data.

FASTQ files: Raw sequencing reads with quality scores. Essential for complete reanalysis.
BAM/CRAM files: Reads aligned to reference genome. Should be aligned to GRCh38 (hg38), not the older GRCh37.
gVCF, not just VCF: Standard VCF files only contain positions where variants were detected. This creates ambiguity: was a missing position a reference call or insufficient coverage? gVCF (Genomic VCF) includes every position in the genome with confidence scores for reference calls and explicitly marked no-call regions.

Sequencing.com status: They provide FASTQ and VCF files (their "Genome VCF" is standard VCF, not gVCF). BAM files and mitochondrial heteroplasmy VCF are available upon request (email support after processing completes). gVCF is a paid add-on, or you can generate it yourself from BAM (see below). Alignment to GRCh38 plus rCRS MT. Lifetime data storage included (though I, needless to say, do not need that).

Open source compatibility

To my surprise, all major genome analysis tools are FLOSS (GATK, SAMtools, bcftools, Ensembl VEP, etc.) or at least source-available (Expansion Hunter)! Standard file formats (FASTQ, BAM, VCF) ensure compatibility with the entire bioinformatics ecosystem.

Sequencing.com status: Uses standard formats. No proprietary lock-in.

Why I chose Sequencing.com

At the time I ordered, the decisive factor was the combination of price, raw-data access, and clinical-grade logistics. I wanted 30x short-read WGS with FASTQ files, ordinary VCFs, an aligned BAM on request, and enough documentation that I could reproduce or challenge the vendor's interpretation myself. Sequencing.com was pretty much perfect on each front.

Note: Their consumer-facing health reports are more limited than the raw data. I would not buy WGS primarily for glossy app insights. I would buy it to get the sequence files, then analyze them myself with transparent tools and manual follow-up where the result matters.

Sequencing.com automatically (and not too transparently) enrols you in a monthly subscription, but it is easy to cancel.

Timeline

Note: All times and dates are in the AEDT timezone (Australia/Sydney).

2026-01-11: I ordered my kit.
2026-01-12: The kit was shipped.
2026-01-27: I received the kit in Australia. The kit is neatly packaged, and about 15x20x5 cm in size.
2026-02-02: After much hassle, I was able to send back my sample to the United States. I used HS Tariff 0511994070 (ruling), described the item as a non-hazardous Exempt Human Specimen both on the customs form and the outer packaging itself, to fully comply with applicable regulations. The main issue was researching all of these rules (you’re welcome!), and finding a proper envelope. The envelope shipped by Sequencing is clearly far too small for any kind of international shipment. Note: other companies may well still use an ethanol-based stabilization buffer, which may not be shipped by ordinary mail. But that appears not to be the case for Sequencing, so an ordinary international shipment is fine.
2026-02-14: The sample arrived at Sequencing.
2026-02-23: I was informed that DNA extraction was now underway.
2026-03-05: Sequencing was completed. Read on below.

Post-sequencing steps and experience

On 2026-03-05, I received a flurry of emails informing me that several steps had been completed, and finally, that “Congratulations! Your Genome Has Been Sequenced.” Wow, Sequencing.com was so much faster than expected! I immediately logged into my account and perused some of the insights. Interesting!

One of my key reasons to do WGS was so that I could obtain raw data. By default, Sequencing provides the following six files for download:

*-30x-WGS-Sequencing_com-*.snp-indel.genome.vcf.gz (~184 MiB)
*-30x-WGS-Sequencing_com-*.cnv.vcf.gz (~26 KiB)
*-30x-WGS-Sequencing_com-*.sv.vcf.gz (~900 KiB)
ULTIMATE-COMPATIBILITY-*-30x-WGS-Sequencing_com-*.txt (~15 MiB)
*-30x-WGS-Sequencing_com-*.1.fq.gz (~23 GiB)
*-30x-WGS-Sequencing_com-*.2.fq.gz (~24 GiB)

Each file needs to be separately “unarchived” in order to be downloaded. This is presumably because only a small fraction of customers ever download the raw data, so keeping hundreds of terabytes in an instant-access storage tier would be wasteful. Unarchiving took only about 20 minutes for all files I had to download.

For reasons explained below, I separately requested BAM files and mitochondrial heteroplasmy VCF. These are not provided by default. The *.bam file was provided on 2026-03-07 (~30 GiB).

The mitochondrial heteroplasmy VCF was provided some time later, after I inquired again. The *.mito.vcf.gz file is very small (~18 KiB).

All files are fine to use as-is. You do not need to uncompress them! All files are aligned to GRCh38, except for ULTIMATE-COMPATIBILITY* and *.mito.vcf.gz (aligned to GRCh37).

Generating gVCF from BAM

gVCF (Genomic VCF) is a file with detailed genetic information: it includes every position in the genome with confidence scores for reference calls and explicitly marked no-call regions. We will see how to use it below.

You can generate a gVCF file yourself using GATK (Genome Analysis Toolkit). This requires downloading a GRCh38 reference genome and takes about 10 hours on a typical desktop.

Get shell script

If a “USER ERROR: Contig […] not present in the sequence dictionary” happens at the very end, after output.g.vcf.gz* have been written, you can run this validation command to ensure your output files are nonetheless complete. Should generate a lot of output and take about 5 minutes. If you see “OK” at the end, all is well. (The error happens because Sequencing.com’s BAMs are aligned against a reference that also included alt/random/unplaced scaffolds. These are not important.)
Get shell script

Quality indicators

Metric	Value	Assessment
Total reads	816M	Solid for 30x WGS
Mapped	99.43%	Excellent
Properly paired	98.25%	Excellent
Duplicates	22.3M (2.7%)	Low — good library complexity
Singletons	0.01%	Negligible
Median depth	34x	On target for “30x” product
VCF PASS rate	94.7%	Normal
Median variant QUAL	222.4	High

The first five rows are easy to reproduce with samtools flagstat on the BAM; the VCF rows come from bcftools. Overall, Sequencing.com did a solid job. The quality is legitimately good.

ClinVar annotation: what the “Pathogenic” variants actually are

After generating a gVCF, I annotated all called variants against ClinVar, NCBI's public database of clinically-relevant genetic variants, using the GRCh38 ClinVar VCF. The annotation pipeline (available here) matched each variant against ClinVar records and produced a report grouped by clinical significance. Of the millions of variants in a typical WGS dataset, ~47,000 overlapped with ClinVar entries.

The headline numbers looked alarming at first glance: over a dozen variants classified as Pathogenic or Pathogenic/Likely_pathogenic. The listed conditions sounded severe. Having never experienced any of them, I was skeptical.

I used my own knowledge and Claude Code to verify every single one of these hits against the raw sequencing data (BAM and the gVCF I generated myself). The results were sobering:

Category	What it means
Sequencing artifacts (false calls)	The variant does not actually exist in your genome
Real but mislabeled in ClinVar	Population-level risk associations, not Mendelian mutations
Real, carrier only (recessive)	One copy of a recessive variant causes no disease
Benign (e.g. blood group antigen)	Not a disease variant at all

In my case, every “Pathogenic” hit fell into one of these categories. None indicated active disease or required any medical intervention. This is not unusual but, in fact, expected for short-read WGS combined with automated ClinVar annotation. Here’s why.

How false positives happen

The sequencing artifacts fell into two categories:

Paralog cross-mapping. Many human genes have near-identical copies (paralogs) elsewhere in the genome. When a 150-base read comes from one copy, the aligner sometimes places it at the other copy instead. This generates phantom variant calls at positions where the two copies differ. The telltale sign: the variant-supporting reads have degraded mapping quality (MAPQ well below 60), while the reference-supporting reads map uniquely. The Very Short Introduction on Genomics recommended above explains the core issue: WGS uses a process called “shotgun sequencing” that repeatedly reads short sequences of DNA and then uses facts and logic (statistics!) to place these sequences at the right position. That process is simply not 100% accurate for paralogs.

A typical example: a gene with a ~90%-identical paralog elsewhere on the same chromosome generates “Pathogenic” variant calls at sites where the two copies differ. The tell is in the mapping quality: at the artifact site, most variant-supporting reads have low or ambiguous MAPQ scores, while reference-supporting reads map uniquely. At the worst sites, zero variant-supporting reads map uniquely — every single one is ambiguously placed. Compare to a clean region nearby, where all reads have perfect mapping quality. The contrast is stark and easy to verify in any BAM viewer.

Homopolymer slippage. Illumina sequencing-by-synthesis has a known weakness: runs of 6+ identical bases (e.g., AAAAAAAA or GGGGGG) cause the polymerase to occasionally slip, inserting or deleting a base. This generates spurious indel calls at ~10% allele fraction. The giveaway: GATK's own internal estimate (MLEAC) may conclude the true allele count is zero — meaning the variant caller itself does not believe its own call — and the quality score may be far below 1, where a real variant would be in excess of 100.

ClinVar mislabeling

Several variants were real (the sequencing was fine) but labeled “Pathogenic” despite being common population-level susceptibility associations. These were all in non-coding regions (intronic or UTR), all had 0 or 1 ClinVar review stars, and were associated with common-disease susceptibility rather than Mendelian disorders. They were originally identified in GWAS studies and submitted to ClinVar without clinical validation. “Pathogenic” here is an artifact of loose historical submission standards, not a clinical diagnosis.

Carrier status: real but not disease

Some variants were genuine, well-supported heterozygous calls in genes associated with autosomal recessive conditions. For recessive diseases, you need two broken copies (one from each parent) to be affected. Carrying one copy makes you a carrier, which may be relevant for family planning, but does not cause disease. In my case, the carrier variants were for mild and almost whimsical conditions, but that is not guaranteed — carrier status for severe recessive diseases like cystic fibrosis or sickle cell disease is common and worth knowing about.

But pharmacogenomics delivered

Separately from the “Pathogenic” hits, the annotation also identified pharmacogenomic variants that are genuinely clinically actionable. These were classified as drug_response in ClinVar (not “Pathogenic”), had 3-star expert-panel review, and were confirmed real by the same verification process, where they revealed perfect mapping quality, clean allele balance, no artifacts.

This is the kind of finding that justifies WGS. Pharmacogenomic variants affect how you metabolize specific drugs, and knowing about them before you need the drug can prevent serious adverse reactions. Unlike the “Pathogenic” hits that required manual debunking, these had immediate, unambiguous clinical utility.

False negatives: what WGS cannot see

False positives are easy to catch because you have a called variant to interrogate. False negatives (real variants that the pipeline missed entirely) are silent. Several mechanisms guarantee they exist in any short-read WGS dataset:

Trinucleotide repeat expansions: Huntington's disease, Fragile X syndrome, myotonic dystrophy, and Friedreich's ataxia are all caused by repeat expansions that can span thousands of bases. A 150 bp read cannot span them. Specialized short-read tools can still infer some repeat sizes from spanning and flanking reads. I ran Expansion Hunter v5.0.0 against 31 disease-associated STR loci in my BAM; all calls fell within normal ranges. Expansion Hunter reports two allele sizes per autosomal locus, which you compare against published pathogenic thresholds (e.g., HTT becomes concerning above 36 CAG repeats, FMR1 above 55 CGG repeats). That is reassuring for those catalogued loci, but it is not equivalent to a long-read genome or a clinical repeat-expansion assay.
Paralog masking: The same mechanism that creates false positives can hide real variants. If a true variant in a paralogous region causes reads to align ambiguously, the variant caller may not accumulate enough confident evidence to make a call.
Large structural variants: Inversions, complex rearrangements, and insertions larger than the read length (~300 bp) are poorly detected by short reads.
Coverage gaps: Even at 30x mean coverage, random fluctuations and GC bias mean some regions fall below 5x. On chromosome 1 alone, about 8% of positions had fewer than 5 reads in my data. A heterozygous variant in such a region has a significant chance of being missed.

A clean WGS report does not mean “no pathogenic variants exist.” It means “no pathogenic variants were detected in the regions and variant classes that short-read sequencing can reliably access.” Long-read sequencing (PacBio HiFi, Oxford Nanopore) closes most of these gaps, at higher cost and with platform-specific tradeoffs in throughput, error profile, and tooling.

Takeaway

WGS is powerful, but automated annotation without manual verification is unreliable for rare disease variants. If a pipeline tells you that you carry a pathogenic variant, the correct response is not panic but verification. Check the mapping quality. Check the allele balance. Check the sequence context. Check whether ClinVar's “Pathogenic” label actually reflects reviewed clinical evidence or a drive-by GWAS submission from earlier time.

The genuine value of WGS lies in pharmacogenomics (where it works well), carrier screening (where it requires understanding of inheritance patterns), and having the raw data available for reanalysis as databases improve. Anyone selling WGS as a simple health report card is misrepresenting what the technology can and cannot do. Genetic determinism is simply wrong.

A note on geographic ancestry analysis

WGS data can also be used for geographic ancestry inference, typically via principal component analysis (PCA) or model-based clustering (ADMIXTURE). I ran an informal PCA projecting my genome onto the 1000 Genomes reference panel (2,504 individuals across 5 super-populations). The result was entirely unsurprising: I landed squarely in the European cluster, confirming what I already knew.

For someone with known European ancestry, continental-level PCA is trivially confirmatory. The finer-grained breakdowns that consumer services advertise — "42% Northern European, 28% Mediterranean" and so on — require much stronger modeling assumptions. The choice of reference populations, the number of ancestral components (K in ADMIXTURE), and the algorithm used all materially affect the output. These percentages are not biological facts but model-dependent estimates that shift when you change the reference panel or the number of components. The underlying science is real, but the precision implied by consumer reports is not.

Ancestry analysis can be genuinely valuable for individuals with unknown parentage, recent admixture, or complex family histories. For a European who already knows they are European, it is the least interesting thing WGS can do.

Conclusion

I would say that WGS was mildly useful. As someone with reassuringly good genes and no particularly remarkable family history, pharmacogenomic variants are the most important kinds of insight I was able to get. Moreover, the entire process enabled me to learn a lot about human genetics. It was interesting.

I am happy with Sequencing.com and do currently (May 2026) recommend them. I appreciate their support for open standards and their provision of raw data. Moreover, their offerings are hearteningly non-gimmicky.

Changes to this document

2026-05-27 (current version): Blog post was publicly released.