Using Genetic Risk Scores to compare disease risk within families

Summary

Genome-Wide Association Studies (GWAS) are large population genetic studies that Orchid uses to construct Genetic Risk Scores (GRS) for embryo predisposition screening. GWAS capture two types of genetic effects:

Direct genetic effects — the effects where the genome directly influences health outcomes
Indirect effects — environmental effects associated with, but not a consequence of, particular inherited variants

When evaluating the risk reduction achievable through embryo predisposition screening for a specific disease, it is crucial to measure only the direct genetic effects when comparing the relative risk between two embryos. This is because the indirect effects associated with nurture and environment will remain constant regardless of which embryo is selected for implantation.

In this study, we analyze the GRS for six diseases on our embryo screening panel and evaluate to what extent their predictive power can be attributed to direct genetic effects. We find that the disease reduction modeled by these predictors can almost entirely be explained by direct genetic effects, indicating that they accurately represent the reduction in disease achievable via embryo screening.

Introduction

The effect sizes of genetic variants identified in Genome-Wide Association Studies capture both direct genetic effects and indirect effects. These indirect effects might originate from factors such as assortative mating (choosing partners with similar traits), genetic nurture (effects from the shared environment or family), or residual population stratification (differences in allele frequencies due to ancestry rather than their effects on the phenotype)⁵. When Genetic Risk Scores (GRS) are constructed from these summary statistics, the resulting predictors may have attenuated predicted accuracy within-family compared to between-family⁴.

Understanding and quantifying this attenuation is important for Orchid’s embryo report because the amount of risk reduction depends on the within-family prediction accuracy. Prior research has highlighted that the influence of a GRS can vary depending on the trait in question^1,2,4,6. For traits like educational attainment or cognitive ability, the effect of the GRS is diminished when comparing within families. However, for externalizing disorders⁶ and traits like Body Mass Index (BMI) or height, the attenuation is small⁴.

It is important to note that Orchid's embryo report screens only for health conditions, and not traits; however, it remains important to confirm that the GRS for diseases in Orchid's panel do not exhibit a within-family reduction in effectiveness. In this analysis, we show that diseases offered in Orchid’s panel have direct effects comparable to population-level estimates.

Methods

Several previous analyses^2,4 have estimated the direct effect of a GRS by comparing pairs of siblings on quantitative traits by regressing on the siblings’ differences in GRS for family i:

Yi1-Yi2 = δ(Xi1-Xi2)+εi

Here, Y_ij and X_ijare the phenotype and polygenic score of the jth sibling in family i, respectively, δ is the expected difference in phenotype Y between siblings per standard deviation of polygenic score.

This gives an unbiased estimate for δ, but the confidence intervals are large, especially when Y is a disease trait. A more precise method involves using parental genotypes and estimating δ by adding parental GRS as a covariate⁶. However, for most large genetic datasets that can be used as validation, such as the UK Biobank, parental genotypes are available for only a fraction of samples. To overcome this, we impute parent genotypes using the snipar workflow³:

Imputation of parent genotypes for sibling pairs without parents in the UK Biobank: missing parental genotypes from sibling pairs using inferred identity-by-descent segments and Mendelian inheritance rules.
Imputation of parent-offspring pairs: for samples in the UK Biobank with one or both parents available, the missing parents’ genotypes are imputed as the conditional expectation given the observed offspring and parent genotypes based on Mendelian inheritance.
Analysis by generalized linear mixed models: using a linear mixed-effects model (LMM) we compute an estimate of δ for a regression that includes the average of the parents’ imputed GRS, with a random intercept r_j for family ID:

Yij = δ Xij + α(Xp(i)+Xm(i)) + r_j+ ε_ij

Data

GRS for seven conditions (six diseases from Orchid's panel, plus Major Depressive Disorder*) were generated using PRScs software⁷with summary statistics from external meta-analyses. We restricted our analyses to samples with self-reported White British ancestry. To extract the siblings and parent-offspring pairs, we used the provided UK Biobank relatedness resource and used the suggested guidelines to find pairs of samples that are full siblings and parent-offspring pairs⁸. For the phenotypes, we used a mix of self-reported conditions and ICD-10 codes, see Supplement Table A.

Nuclear Family Type	Number of Families (Total Samples)
Sibling Pair	18,826 (n = 37,652)
Sibship With 3+ siblings	1,089 (3,357)
Offspring With Parent(s) in UK Biobank	4,964 (5,185)
Total	24,879 (46,194)

‍

* two mental health disorders on Orchid's panel, Schizophrenia and Bipolar disorder, are sparsely represented in UK Biobank data and are not analyzed below. Disorders governed by oligogenic effects (Type 1 diabetes, Celiac disease, and Alzheimers disease) not not captured by a LLM were not included.

Results

We ran two regressions using each generated GRS: the first was a linear regression on the population effect size estimate β on the samples pruned so that there was only one relative per family ID. The second was an estimate of the direct effects, a linear mixed model lme4 with a random intercept on family ID which is given by δ̂.

A large value of β indicates a large phenotype difference (difference in observed disease) per standard deviation of GRS between individuals from different families (when the dataset contains only one individual per family). Similarly, a large value of δ̂ indicates a large phenotype difference per standard deviation of GRS between individuals within a family.

Thus the ratio of the direct effect to the population effect size (δ̂/β) measures the attenuation when applying a GRS to sibling pairs. This metric is key to understanding whether the GRS is effective in the context of embryo prioritization; a ratio of 0 implies that effects disappear when comparing within a family, while a ratio of 1 implies no measurable loss of effectiveness. The results for the six diseases on Orchid's panel, plus Major Depressive Disorder, are shown below:

Disease	Cases / Controls	δ̂	β	Ratio
Coronary Artery Disease	2,003 / 38,656	0.13 (0.11, 16)	0.14 (0.12, 16)	0.97
Prostate Cancer	1,182 / 15,841	0.22 (0.18,0.26)	0.23 (0.20, 0.25)	0.98
Type 2 Diabetes	2,719 / 37,940	0.15 (0.13, 0.17)	0.14 (0.13, 16)	1.04
Breast Cancer	1,881 / 21,755	0.42 (0.37, 0.47)	0.42 (0.34, 0.50)	0.99
Major Depressive Disorder*	624 / 6,485	0.26 (0.11, 0.41)	0.28 (0.18, 0.38)	0.92
Class III Obesity (BMI > 40)	728 / 39,931	0.16 (0.11, 0.21)	0.18 (0.14, 0.22)	0.93
Atrial Fibrillation	2,004 / 38,655	0.26 (0.11, 0.41)	0.28 (0.18, 0.38)	0.93

‍

*The major depressive disorder phenotype was restricted to the subset of the UK Biobank participants who took the Mental Health Survey.

Conclusions

Our goal was to investigate whether the genetic effects captured in population-level GWAS can be effectively used to differentiate between siblings within a family for the diseases evaluated on Orchid’s GRS panels. This is an important question in the context of embryo screening, where reduced GRS effect sizes within a family would lead to lower efficacy when prioritizing embryos by disease risk.

We evaluated all the diseases on Orchid’s panel for which large numbers of sibling pairs existed in the UK Biobank (for this reason, excluding Schizophrenia and Bipolar disorder). We did not evaluate the diseases on Orchid’s panel primarily governed by oligogenic effects, which are not captured by a linear mixed model (Type 1 diabetes, Alzheimers’s disease, and Celiac disease). We additionally evaluated Major Depressive Disorder as reported from the UK Biobank Mental Health Survey.

We found that unlike external findings for educational attainment, the estimate for direct effects δ̂ closely matched the between-family effect size β, with the largest reduction measured at 8% for Major Depressive Disorder. These results suggest that the evaluated GRS on Orchid’s panel, developed on population-level datasets, retain their effectiveness when comparing polygenic risk between sibling embryos during embryo screening.

Citations

‍

1. Tubbs, J. D., & Sham, P. C. (2023). Preliminary Evidence for Genetic Nurture in Depression and Neuroticism Through Polygenic Scores. JAMA Psychiatry, 80(8), 832-841. DOI: 10.1001/jamapsychiatry.2023.1544. PMID: 37285136, PMCID: PMC10248817.

2. Lello, L., Raben, T. G., & Hsu, S. D. H. (2020). Sibling validation of polygenic risk scores and complex trait prediction. Sci Rep, 10, 13190. DOI: 10.1038/s41598-020-69927-7.

3. Young, A. I., Nehzati, S. M., Benonisdottir, S., et al. (2022). Mendelian imputation of parental genotypes improves estimates of direct genetic effects. Nat Genet, 54, 897-905. DOI: 10.1038/s41588-022-01085-0.

4. Selzam, S., Ritchie, S. J., Pingault, J. B., Reynolds, C. A., O'Reilly, P. F., & Plomin, R. (2019). Comparing Within- and Between-Family Polygenic Score Prediction. Am J Hum Genet, 105(2), 351-363. DOI: 10.1016/j.ajhg.2019.06.006. Epub 2019 Jul 11. PMID: 31303263, PMCID: PMC6698881.

5. Young, A. I., Benonisdottir, S., Przeworski, M., & Kong, A. (2019). Deconstructing the sources of genotype-phenotype associations in humans. Science, 365(6460), 1396-1400. DOI: 10.1126/science.aax3710. PMID: 31604265, PMCID: PMC6894903.

6. Tanksley, Peter T. et al. (2023). Do polygenic indices capture "direct" effects on child externalizing behavior? Within-family analyses in two longitudinal birth cohorts. medRxiv. DOI: https://doi.org/10.1101/2023.05.31.23290802.

7. Ge, T., Chen, CY., Ni, Y. et al. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019). https://doi.org/10.1038/s41467-019-09718-5

8. Bycroft, C., Freeman, C., Petkova, D. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). https://doi.org/10.1038/s41586-018-0579-z

Supplement

‍

Supplementary Table A: How each disease case is defined when evaluating genetic risk scores in the UK Biobank

Phenotype	ICD-10 Codes	Self-Report Codes	Cases in UK Biobank (White British)
Prostate cancer	C61, D075	1044	13,806
Type 2 diabetes	E11.1-9	1223	30,507
Coronary artery disease	I210-4,I219,I220I221,I228, I232, I233, I235, I236, I238, I249, I252	1075	22,451
Breast cancer	C5.0-9, D05.0, D059	1002	18,588
Atrial fibrillation	I48.0-4, I48.9	1471, 1483	22,472
Schizophrenia	F20.0-9, F21, F23.0-3, F23.8	1289	1,376
Class III Obesity*	-	-
Depression**	-	-
Bipolar disorder	F31	1291	1,855

Class III Obesity was defined as having a BMI (UK Biobank Field 21001) of 40 kg/m2 or above.
The depression phenotype was defined for participants who participated in the Mental Health Survey who had researcher-derived “probable recurrent depression (severe)”, and controls excluded participants with any depression or bipolar disorder.