If \(Y_1\) and \(Y_2\) are two continuous traits, we use the \(\texttt{function_two_continuous.R}\) to approximate conditional phenotype analysis. In the first step, we load the functions.
source('~/Google Drive/conditional analysis/Approxi_Conditiondal_Analysis_R_functions/R_functions/function_two_continuous.R')
Then we read the GWAS summary statistics for two continuous traits. In this example, we approximate the genetic effect in the model of waist circumference (continuous) adjusting for body mass index (continuous).
load('~/Google Drive/conditional analysis/data.twocon.Rda')
dy1 <- data.twocon[[1]]
dy2 <- data.twocon[[2]]
head(dy1)
## Chr SNPID Phys_location phen n0 n1 n2 h2q beta
## 1 7 SNP_A-1780270 78437519 wccm 2049 3685 1796 3.921557e-05 -0.11530186
## 2 15 SNP_A-1780271 31183071 wccm 4232 2691 489 1.660195e-04 -0.32137611
## 3 1 SNP_A-1780272 188074307 wccm 6002 1410 93 7.468645e-06 0.04282617
## 4 20 SNP_A-1780274 33371323 wccm 2837 3447 1143 5.000550e-05 -0.07293575
## 5 12 SNP_A-1780277 73950313 wccm 2940 3541 1023 1.624566e-04 0.19833982
## 6 1 SNP_A-1780278 216957281 wccm 5866 1163 52 6.141465e-04 -1.01504168
## se chisq df model pval hwe97 callrate97 maf97
## 1 0.2378031 0.23509183 1 additive 0.62777375 0.285706 0.997649 0.484918
## 2 0.2770709 1.34538088 1 additive 0.24608726 0.832850 0.983190 0.247310
## 3 0.3868438 0.01225596 1 additive 0.91184900 0.011297 0.993887 0.106091
## 4 0.2467132 0.08739694 1 additive 0.76751286 0.571079 0.982367 0.385725
## 5 0.2465059 0.64738878 1 additive 0.42104781 0.743294 0.994358 0.371380
## 6 0.4255382 5.68971601 1 additive 0.01706462 0.724823 0.938756 0.088154
## ni_final maleHET97 affyremoved p_mishap_faminfo p_diffmiss_gen12
## 1 13 NA 0 0.8570 1.000000
## 2 9 NA 0 0.1630 0.016580
## 3 20 NA 0 0.0000 1.000000
## 4 20 NA 0 0.0518 0.315500
## 5 0 NA 0 0.3310 0.031910
## 6 118 NA 0 0.2320 0.000127
## p_diffmiss_gen13 p_diffmiss_gen23 p_diffmaf_sex inrefgene
## 1 0.337900 0.02210 0.30190 MAGI2
## 2 0.055570 0.38610 0.01015
## 3 0.634000 0.37610 0.86080
## 4 0.000000 0.00000 0.18580 UQCC
## 5 0.001345 0.05896 0.10690
## 6 0.001171 0.00000 0.55670
## refgenes_60kb_from_SNP closestrefgene distancefromclosestrefgene
## 1 MAGI2 MAGI2 483307
## 2 FMN1 FMN1 35694
## 3 FAM5C 259112
## 4 FAM83C;EIF6;MMP24;UQCC UQCC 17527
## 5 CAPS2 CAPS2 6962
## 6 TGFB2 275688
head(dy2)
## Chr SNPID Phys_location phen n0 n1 n2 h2q beta
## 1 7 SNP_A-1780270 78437519 bmi 2049 3685 1796 1.072117e-04 -0.07051900
## 2 15 SNP_A-1780271 31183071 bmi 4232 2691 489 5.432837e-05 -0.11418372
## 3 1 SNP_A-1780272 188074307 bmi 6002 1410 93 2.285974e-04 -0.17315060
## 4 20 SNP_A-1780274 33371323 bmi 2837 3447 1143 6.177918e-05 -0.03718359
## 5 12 SNP_A-1780277 73950313 bmi 2940 3541 1023 1.960031e-04 0.07441231
## 6 1 SNP_A-1780278 216957281 bmi 5866 1163 52 9.736089e-04 -0.44874738
## se chisq df model pval hwe97 callrate97 maf97
## 1 0.09139220 0.5953798 1 additive 0.440346302 0.285706 0.997649 0.484918
## 2 0.10628907 1.1540674 1 additive 0.282699315 0.832850 0.983190 0.247310
## 3 0.14861291 1.3574847 1 additive 0.243973887 0.011297 0.993887 0.106091
## 4 0.09494515 0.1533759 1 additive 0.695329916 0.571079 0.982367 0.385725
## 5 0.09464301 0.6181764 1 additive 0.431725751 0.743294 0.994358 0.371380
## 6 0.16311932 7.5682078 1 additive 0.005940683 0.724823 0.938756 0.088154
## ni_final maleHET97 affyremoved p_mishap_faminfo p_diffmiss_gen12
## 1 13 NA 0 0.8570 1.000000
## 2 9 NA 0 0.1630 0.016580
## 3 20 NA 0 0.0000 1.000000
## 4 20 NA 0 0.0518 0.315500
## 5 0 NA 0 0.3310 0.031910
## 6 118 NA 0 0.2320 0.000127
## p_diffmiss_gen13 p_diffmiss_gen23 p_diffmaf_sex inrefgene
## 1 0.337900 0.02210 0.30190 MAGI2
## 2 0.055570 0.38610 0.01015
## 3 0.634000 0.37610 0.86080
## 4 0.000000 0.00000 0.18580 UQCC
## 5 0.001345 0.05896 0.10690
## 6 0.001171 0.00000 0.55670
## refgenes_60kb_from_SNP closestrefgene distancefromclosestrefgene
## 1 MAGI2 MAGI2 483307
## 2 FMN1 FMN1 35694
## 3 FAM5C 259112
## 4 FAM83C;EIF6;MMP24;UQCC UQCC 17527
## 5 CAPS2 CAPS2 6962
## 6 TGFB2 275688
In the next step, we merge two GWAS summary files.
com=merge(dy1,dy2,by="SNPID")
If we use the full individual level data to estimate the \(\hat{\gamma}_m\), which is the coefficient in the model M3 relating \(Y_1\) to \(Y_2\), we read the phenotype file and apply our function.
m3 <- data.twocon[[3]]
head(m3)
## id wccm bmi sex.x dnaage PC1 PC2 PC3
## 1 1 94.26873 26.98309 1 45.19850 0.009163389 0.024464882 0.018903551
## 2 2 94.00709 24.41432 1 57.01628 -0.036078401 -0.001724910 -0.009747600
## 3 3 88.92366 17.83249 0 56.41468 0.040193234 0.009927302 0.017974667
## 4 4 85.34086 24.59096 0 53.54305 -0.001487380 -0.007601491 0.007681352
## 5 5 73.49879 25.65450 0 38.22636 -0.032282700 0.022234002 -0.009921895
## 6 6 97.90892 26.18636 0 58.68427 -0.013011506 -0.013867713 -0.013348689
## PC4 PC5
## 1 -0.033331258 -0.003200357
## 2 -0.002205311 -0.011662428
## 3 0.001656729 0.004957955
## 4 -0.001610905 0.006022545
## 5 -0.008514919 -0.008379371
## 6 -0.001278763 -0.010783724
r1 <- two_con_f(data = m3,y2 = 'wccm',y1 = 'bmi',cor = c('dnaage','sex.x','PC1','PC2','PC3','PC4','PC5'), d = com, N = dim(m3)[1], beta_m_hat='beta.x',sd_beta_m_hat='se.x',eta_hat='beta.y',sd_eta='se.y',maf='maf97.x', gamma_hat=0, r = dim(com)[1], method = 'full')
head(r1[,c(1,(ncol(r1)-2):ncol(r1))])
## SNPID beta_c z pval_c
## 1 SNP_A-1780270 -0.11038117 -0.4635954 0.64314062
## 2 SNP_A-1780271 -0.31391992 -1.1309072 0.25863956
## 3 SNP_A-1780272 0.05538157 0.1428804 0.88644255
## 4 SNP_A-1780274 -0.07029338 -0.2846966 0.77599521
## 5 SNP_A-1780277 0.19324775 0.7829475 0.43403087
## 6 SNP_A-1780278 -0.99779905 -2.3251304 0.02046676
The column ‘beta_c’ is the approximated coefficient of SNP in the adjusted model, ‘z’ is the approximated z value. The approximated standard error of ‘beta_c’ can be calculated as ‘beta_c’ divided by ‘z’. ‘pval_c’ is the approximated p-value.
If we use a subset of individual level data to estimate the \(\hat{\gamma}_m\), we need to define the participants’ id in the subset and apply our function.
subid <- m3[sample(1:dim(m3)[1],100),'id']
r2 <- two_con_f(data = m3,y2 = 'wccm',y1 = 'bmi',cor = c('dnaage','sex.x','PC1','PC2','PC3','PC4','PC5'),id = 'id',subid = subid, d = com, N = 500, beta_m_hat='beta.x',sd_beta_m_hat='se.x',eta_hat='beta.y',sd_eta='se.y',maf='maf97.x', gamma_hat=0, r = dim(com)[1], method = 'subset')
head(r2[,c(1,(ncol(r2)-2):ncol(r2))])
## SNPID beta_c z pval_c
## 1 SNP_A-1780270 -0.13876909 -0.58743768 0.557176576
## 2 SNP_A-1780271 -0.35993678 -1.30729436 0.191717224
## 3 SNP_A-1780272 -0.01442775 -0.03751197 0.970091860
## 4 SNP_A-1780274 -0.08524865 -0.34797514 0.728006187
## 5 SNP_A-1780277 0.22320432 0.91153739 0.362454200
## 6 SNP_A-1780278 -1.18096672 -2.77972235 0.005646767
If we can identify \(\hat{\gamma}_m\) in the literature, we can input the \(\hat{\gamma}_m\) in the function.
r3 <- two_con_f(data = m3,y2 = 'wccm',y1 = 'bmi',cor = c('dnaage','sex.x','PC1','PC2','PC3','PC4','PC5'),id = 'id',subid = subid, d = com, N = 500, beta_m_hat='beta.x',sd_beta_m_hat='se.x',eta_hat='beta.y',sd_eta='se.y',maf='maf97.x', gamma_hat=2, r = dim(com)[1], method = 'literature')
head(r3[,c(1,(ncol(r3)-2):ncol(r3))])
## SNPID beta_c z pval_c
## 1 SNP_A-1780270 0.025766845 0.169168616 0.8657328
## 2 SNP_A-1780271 -0.093223782 -0.522765690 0.6013703
## 3 SNP_A-1780272 0.390185950 1.579147938 0.1149383
## 4 SNP_A-1780274 0.001431873 0.009080314 0.9927587
## 5 SNP_A-1780277 0.049576544 0.313257700 0.7542163
## 6 SNP_A-1780278 -0.119329731 -0.431818147 0.6660609
If we want to estimate the genetic effect in conditional analysis with continuous trait of interest and binary confounder, we use the \(\texttt{function_y1_binary_y2_continuous.R}\) to approximate conditional phenotype analysis. In the first step, we load the functions.
source('~/Google Drive/conditional analysis/Approxi_Conditiondal_Analysis_R_functions/R_functions/function_y1_binary_y2_continuous.R')
Then we read the GWAS summary statistics for \(Y_1\) and \(Y_2\). In this example, we approximate the genetic effect in the model of BMI (continuous) adjusting for smoking (binary).
load('~/Google Drive/conditional analysis/data.y1bi.Rda')
###GWAS for BMI
dy1 <- data.y1bi[[1]]
####GWAS for Smoking
dy2 <- data.y1bi[[2]]
head(dy1)
## Chr SNPID Phys_location phen n0 n1 n2 h2q
## 1 7 SNP_A-1780270 78437519 resbmi 1706 3090 1510 9.810770e-05
## 2 15 SNP_A-1780271 31183071 resbmi 3535 2287 405 2.364683e-05
## 3 1 SNP_A-1780272 188074307 resbmi 5048 1168 66 6.555359e-05
## 4 20 SNP_A-1780274 33371323 resbmi 2367 2890 970 2.350082e-04
## 5 12 SNP_A-1780277 73950313 resbmi 2481 2978 829 1.081686e-04
## 6 1 SNP_A-1780278 216957281 resbmi 5017 940 38 8.989778e-04
## beta se chisq df model pval hwe97 callrate97
## 1 -0.009359197 0.01890409 0.2451126 1 additive 0.6205377 0.285706 0.997649
## 2 -0.012249043 0.02199955 0.3100106 1 additive 0.5776737 0.832850 0.983190
## 3 -0.017580821 0.03126678 0.3161636 1 additive 0.5739224 0.011297 0.993887
## 4 -0.017682415 0.01950118 0.8221694 1 additive 0.3645466 0.571079 0.982367
## 5 0.012924823 0.01963792 0.4331698 1 additive 0.5104376 0.743294 0.994358
## 6 -0.067371246 0.03445329 3.8237343 1 additive 0.0505315 0.724823 0.938756
## maf97 ni_final maleHET97 affyremoved p_mishap_faminfo p_diffmiss_gen12
## 1 0.484918 13 NA 0 0.8570 1.000000
## 2 0.247310 9 NA 0 0.1630 0.016580
## 3 0.106091 20 NA 0 0.0000 1.000000
## 4 0.385725 20 NA 0 0.0518 0.315500
## 5 0.371380 0 NA 0 0.3310 0.031910
## 6 0.088154 118 NA 0 0.2320 0.000127
## p_diffmiss_gen13 p_diffmiss_gen23 p_diffmaf_sex inrefgene
## 1 0.337900 0.02210 0.30190 MAGI2
## 2 0.055570 0.38610 0.01015
## 3 0.634000 0.37610 0.86080
## 4 0.000000 0.00000 0.18580 UQCC
## 5 0.001345 0.05896 0.10690
## 6 0.001171 0.00000 0.55670
## refgenes_60kb_from_SNP closestrefgene distancefromclosestrefgene
## 1 MAGI2 MAGI2 483307
## 2 FMN1 FMN1 35694
## 3 FAM5C 259112
## 4 FAM83C;EIF6;MMP24;UQCC UQCC 17527
## 5 CAPS2 CAPS2 6962
## 6 TGFB2 275688
head(dy2)
## Chr SNPID Phys_location phen n0 n1 n2 nd0 nd1 nd2
## 1 7 SNP_A-1780270 78437519 ever_smk 1706 3090 1510 872 1634 813
## 2 15 SNP_A-1780271 31183071 ever_smk 3535 2287 405 1876 1182 215
## 3 1 SNP_A-1780272 188074307 ever_smk 5048 1168 66 2630 635 35
## 4 20 SNP_A-1780274 33371323 ever_smk 2367 2890 970 1249 1509 512
## 5 12 SNP_A-1780277 73950313 ever_smk 2481 2978 829 1273 1575 466
## 6 1 SNP_A-1780278 216957281 ever_smk 5017 940 38 2618 497 24
## miss.0 miss.1 miss.diff.p beta se chisq df
## 1 0.002338009 0.003303303 0.63770914 0.043576557 0.04156060 1.099365684 1
## 2 0.013360053 0.017117117 0.25966063 -0.026318965 0.04563392 0.332630499 1
## 3 0.004008016 0.009009009 0.01899738 0.087147650 0.07101088 1.506127014 1
## 4 0.012358049 0.018018018 0.08103405 0.016680572 0.04046256 0.169947645 1
## 5 0.006680027 0.004804805 0.40301614 0.079083581 0.04046040 3.820430047 1
## 6 0.046092184 0.057357357 0.04711066 0.002632004 0.07499745 0.001231629 1
## model remark pval hwe97 callrate97 maf97 ni_final maleHET97
## 1 a <NA> 0.29440535 0.285706 0.997649 0.484918 13 NA
## 2 a <NA> 0.56411424 0.832850 0.983190 0.247310 9 NA
## 3 a <NA> 0.21973102 0.011297 0.993887 0.106091 20 NA
## 4 a <NA> 0.68015834 0.571079 0.982367 0.385725 20 NA
## 5 a <NA> 0.05063124 0.743294 0.994358 0.371380 0 NA
## 6 a <NA> 0.97200433 0.724823 0.938756 0.088154 118 NA
## affyremoved p_mishap_faminfo p_diffmiss_gen12 p_diffmiss_gen13
## 1 0 0.8570 1.000000 0.337900
## 2 0 0.1630 0.016580 0.055570
## 3 0 0.0000 1.000000 0.634000
## 4 0 0.0518 0.315500 0.000000
## 5 0 0.3310 0.031910 0.001345
## 6 0 0.2320 0.000127 0.001171
## p_diffmiss_gen23 p_diffmaf_sex inrefgene refgenes_60kb_from_SNP
## 1 0.02210 0.30190 MAGI2 MAGI2
## 2 0.38610 0.01015 FMN1
## 3 0.37610 0.86080
## 4 0.00000 0.18580 UQCC FAM83C;EIF6;MMP24;UQCC
## 5 0.05896 0.10690 CAPS2
## 6 0.00000 0.55670
## closestrefgene distancefromclosestrefgene
## 1 MAGI2 483307
## 2 FMN1 35694
## 3 FAM5C 259112
## 4 UQCC 17527
## 5 CAPS2 6962
## 6 TGFB2 275688
In the next step, we merge two GWAS summary files.
data<-merge(dy1,dy2,by="SNPID")
###calculate the sample size
data$n_m4<-data$n0.x+data$n1.x+data$n2.x
If we use the full individual level data, we read the phenotype file and apply our function.
pheno <- data.y1bi[[3]]
head(pheno)
## id resbmi ever_smk sex.x dnaage PC1 PC2
## 1 1 -0.38320219 0 1 34.17423 0.028634445 -0.0013222994
## 2 2 0.64536193 1 1 54.90689 0.013705735 -0.0008034025
## 3 3 0.57348457 1 1 65.84717 0.051691126 -0.0269350179
## 4 4 1.66220102 0 0 62.05004 -0.005399918 -0.0090277075
## 5 5 1.37534140 0 0 61.71630 0.005033765 -0.0260456895
## 6 6 0.01273226 1 0 36.64420 0.027360057 -0.0048245266
## PC3 PC4 PC5
## 1 0.007022215 0.015245778 -0.018004480
## 2 -0.007935423 -0.003631598 0.005322603
## 3 -0.014745844 -0.018648797 0.013042895
## 4 0.010562221 -0.008870594 0.000790161
## 5 0.004006013 -0.010453307 0.005747289
## 6 -0.012228854 -0.001714101 -0.019467468
r1<-y1_bi_f(d=data,n_m4='n_m4',b_m1='beta.x',se_m1='se.x',eta_hat = 'beta.y',maf = 'maf97.x', N = 500, Ncase = 245,gamma = 0, data = pheno, id = 'id', subid=NULL ,y2='resbmi',y1 = 'ever_smk', cor = NULL,method='full')
head(r1[,c(1,(ncol(r1)-2):ncol(r1))])
## SNPID se_c z pval_c
## 1 SNP_A-1780270 0.01890818 -0.3497003 0.72657532
## 2 SNP_A-1780271 0.02200255 -0.6321892 0.52728648
## 3 SNP_A-1780272 0.03127566 -0.3864650 0.69916538
## 4 SNP_A-1780274 0.01950433 -0.8526712 0.39387449
## 5 SNP_A-1780277 0.01964731 0.9124115 0.36158711
## 6 SNP_A-1780278 0.03446712 -1.9498365 0.05124219
If we use a subset of individual level data to approximate the conditional analysis, we need to define the participants’ id in the subset and apply our function.
subid <- pheno[sample(1:dim(pheno)[1],100),'id']
r2<-y1_bi_f(d=data,n_m4='n_m4',b_m1='beta.x',se_m1='se.x',eta_hat = 'beta.y',maf = 'maf97.x', N = 500, Ncase = 245,gamma = 0, data = pheno, id = 'id', subid=subid ,y2='resbmi',y1 = 'ever_smk', cor = NULL,method='subset')
head(r2[,c(1,(ncol(r2)-2):ncol(r2))])
## SNPID se_c z pval_c
## 1 SNP_A-1780270 0.01890819 -0.3805767 0.70353022
## 2 SNP_A-1780271 0.02200256 -0.6161680 0.53780621
## 3 SNP_A-1780272 0.03127567 -0.4238106 0.67171844
## 4 SNP_A-1780274 0.01950434 -0.8641270 0.38755141
## 5 SNP_A-1780277 0.01964732 0.8584533 0.39067495
## 6 SNP_A-1780278 0.03446713 -1.9508585 0.05112043
If we can identify \(\hat{\gamma}\) in the literature, we can input the \(\hat{\gamma}\) in the option \(`\texttt{gamma}'\) in the function.
If we want to estimate the genetic effect in conditional analysis with binary trait of interest and continuous confounder, we use the \(\texttt{function_y1_continuous_y2_binary.R}\) to approximate conditional phenotype analysis. In the first step, we load the functions.
source('~/Google Drive/conditional analysis/Approxi_Conditiondal_Analysis_R_functions/R_functions/function_y1_continuous_y2_binary.R')
Then we read the GWAS summary statistics for \(Y_1\) and \(Y_2\). In this example, we approximate the genetic effect in the model of atrial fibrillation (binary) adjusting for height (continuous).
load('~/Google Drive/conditional analysis/data.y2bi.Rda')
###Gwas for AF
m1 <- data.y2bi[[1]]
###GWAS for height
m2 <- data.y2bi[[2]]
head(m1)
## Chr SNPID Phys_location phen N iMAF variMAF Nd iMAFd
## 1 1 1:752566:G:A 752566 prevAF 8466 0.8310948 0.1314984 319 0.8590298
## 2 1 1:752721:A:G 752721 prevAF 8466 0.8206172 0.1293733 319 0.8486019
## 3 1 1:753405:C:A 753405 prevAF 8466 0.8611797 0.1099850 319 0.8775627
## 4 1 1:754503:G:A 754503 prevAF 8466 0.8214673 0.1262385 319 0.8486881
## 5 1 1:754964:C:T 754964 prevAF 8466 0.8222797 0.1251631 319 0.8490846
## 6 1 1:755775:A:G 755775 prevAF 8466 0.8337898 0.1224075 319 0.8588840
## variMAFd beta se chisq df model remark pval effect
## 1 0.11401448 0.3121765 0.1307839 5.697600 1 a NA 0.01698812 A
## 2 0.11524523 0.3188471 0.1374107 5.384236 1 a NA 0.02031948 G
## 3 0.09866396 0.2423083 0.1457886 2.762421 1 a NA 0.09650213 A
## 4 0.11316228 0.3183859 0.1397901 5.187462 1 a NA 0.02275042 A
## 5 0.11250209 0.3168127 0.1406186 5.075976 1 a NA 0.02425961 T
## 6 0.10818897 0.3053463 0.1370439 4.964387 1 a NA 0.02587448 G
## noneffect obs_eaf ratio input_snp iwgadiff iwgafold inrefgene
## 1 G 0.83099 0.99802 Y NA NA
## 2 A 0.82046 0.95044 N NA NA
## 3 C 0.86116 0.91977 N NA NA FAM87B
## 4 G 0.82129 0.92916 N NA NA FAM87B
## 5 C 0.82212 0.92330 N NA NA FAM87B
## 6 A 0.83375 0.93181 N NA NA
## refgenes_60kb_from_SNP closestrefgene
## 1 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## 2 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## 3 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 4 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 5 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 6 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## distancefromclosestrefgene
## 1 184
## 2 29
## 3 0
## 4 0
## 5 0
## 6 561
head(m2)
## Chr SNPID Phys_location phen N iMAF variMAF h2q
## 1 1 1:752566:G:A 752566 hgtcm 8168 0.8309865 0.1315220 0.0002693050
## 2 1 1:752721:A:G 752721 hgtcm 8168 0.8203603 0.1293845 0.0002655501
## 3 1 1:753405:C:A 753405 hgtcm 8168 0.8610646 0.1100020 0.0002268764
## 4 1 1:754503:G:A 754503 hgtcm 8168 0.8212104 0.1262378 0.0002590373
## 5 1 1:754964:C:T 754964 hgtcm 8168 0.8220255 0.1251589 0.0002598921
## 6 1 1:755775:A:G 755775 hgtcm 8168 0.8336904 0.1224042 0.0002636075
## beta se chisq df model pval effect noneffect obs_eaf
## 1 0.2689257 0.1366800 3.871279 1 additive 0.04911907 A G 0.83099
## 2 0.2618523 0.1373180 3.636286 1 additive 0.05653292 G A 0.82046
## 3 0.3125504 0.1519281 4.232181 1 additive 0.03966443 A C 0.86116
## 4 0.2642753 0.1391384 3.607604 1 additive 0.05751593 A G 0.82129
## 5 0.2672485 0.1398220 3.653252 1 additive 0.05595983 T C 0.82212
## 6 0.2840876 0.1422330 3.989363 1 additive 0.04578836 G A 0.83375
## ratio input_snp iwgadiff iwgafold inrefgene
## 1 0.99802 Y NA NA
## 2 0.95044 N NA NA
## 3 0.91977 N NA NA FAM87B
## 4 0.92916 N NA NA FAM87B
## 5 0.92330 N NA NA FAM87B
## 6 0.93181 N NA NA
## refgenes_60kb_from_SNP closestrefgene
## 1 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## 2 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## 3 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 4 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 5 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 6 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## distancefromclosestrefgene
## 1 184
## 2 29
## 3 0
## 4 0
## 5 0
## 6 561
In the next step, we merge two GWAS summary files.
com<-merge(m1,m2,by="SNPID")
If we use the full individual level data, we read the phenotype file and apply our function.
m3 <- data.y2bi[[3]]
head(m3)
## id hgtcm prevAF sex.x dnaage
## 1 1 148.3785 0 1 31.17778
## 2 2 163.2644 0 0 58.27673
## 3 3 169.3237 0 0 74.85126
## 4 4 160.4307 0 1 54.64849
## 5 5 153.8345 1 1 31.08670
## 6 6 166.5532 0 0 60.35989
r1 <- y2_bi_f(d=com,N2=500,beta_m_hat= 'beta.x',sd_beta_m_hat = 'se.x',eta_hat = 'beta.y',sd_eta= 'se.y', maf= 'iMAF.x',N = 500,Ncase = 33,gamma_hat = 0,sd_gamma_hat = 0,mean1 =0,mean0 = 0,data =m3,y2 = 'prevAF',y1 = 'hgtcm',cor =c('dnaage','sex.x'), method='full')
head(r1[,c(1,(ncol(r1)-2):ncol(r1))])
## SNPID beta_c z pval_c
## 1 1:1000156:C:T 0.03870076 0.3975008 0.6911688
## 2 1:1001177:G:C 0.03749607 0.3882467 0.6979997
## 3 1:1002932:C:G 0.04620832 0.4649649 0.6421602
## 4 1:1003629:C:T 0.03592086 0.3765090 0.7066992
## 5 1:1046717:G:C 0.11656667 0.8546671 0.3931472
## 6 1:1046861:G:A 0.11656163 0.8549058 0.3930152
If we use a subset of individual level data to approximate the conditional analysis, we need to define the participants’ id in the subset and apply our function.
subid <- m3[sample(1:dim(m3)[1],100),'id']
t2 <- y2_bi_f(d=com,N2=500,beta_m_hat= 'beta.x',sd_beta_m_hat = 'se.x',eta_hat = 'beta.y',sd_eta= 'se.y', maf= 'iMAF.x',N = 500,Ncase = 33,gamma_hat = 0,sd_gamma_hat = 0,mean1 =0,mean0 = 0,data =m3,y2 = 'prevAF',y1 = 'hgtcm',cor =c('dnaage','sex.x'), method='subset',id='id',subid=subid)
head(r2[,c(1,(ncol(r2)-2):ncol(r2))])
## SNPID se_c z pval_c
## 1 SNP_A-1780270 0.01890819 -0.3805767 0.70353022
## 2 SNP_A-1780271 0.02200256 -0.6161680 0.53780621
## 3 SNP_A-1780272 0.03127567 -0.4238106 0.67171844
## 4 SNP_A-1780274 0.01950434 -0.8641270 0.38755141
## 5 SNP_A-1780277 0.01964732 0.8584533 0.39067495
## 6 SNP_A-1780278 0.03446713 -1.9508585 0.05112043
If we can identify \(\hat{\gamma}_m\), \(sd(\hat{\gamma}_m)\) (i.e., the beta coefficient and its standard error from logistic regressions M3), Mean(Height|AF=1), and Mean(Height|AF=0) in the literature, we can specify the values for ‘\(\texttt{gamma_hat}\)’, ‘\(\texttt{sd_gamma_hat}\)’, ‘\(\texttt{mean1}\)’, and ‘\(\texttt{mean0}\)’, respectively in the function.
If we want to estimate the genetic effect in conditional analysis with binary trait of interest and binary confounder, we use the \(\texttt{function_y1_y2_binary.R}\) to approximate conditional phenotype analysis. In the first step, we load the functions.
source('~/Google Drive/conditional analysis/Approxi_Conditiondal_Analysis_R_functions/R_functions/function_y1_y2_binary.R')
Then we read the GWAS summary statistics for trait of interest and confonders. In this example, we approximate the genetic effect in the model of atrial fibrillation (binary) adjusting for both myocardial infarction (binary) and heart failure (binary).
load('~/Google Drive/conditional analysis/data.twobi.Rda')
###Gwas for AF
m1 <- data.twobi[[1]]
###GWAS for myocardial infarction
m2 <- data.twobi[[2]]
###GWAS for heart failure
m22 <- data.twobi[[3]]
###rename column
names(m1)<-c("Chr","SNPID","Phys_location","phen","N","iMAF","variMAF","Nd","iMAFd","variMAFd","beta","se","chisq","df","model","remark","pval","effect","noneffect","obs_eaf","ratio","input_snp","iwgadiff","iwgafold","inrefgene","refgenes_60kb_from_SNP","closestrefgene","distancefromclosestrefgene")
names(m2)<-c("Chr","SNPID","Phys_location","phen","N","iMAF","variMAF","Nd","iMAFd","variMAFd","beta","se","chisq","df","model","remark","pval","effect","noneffect","obs_eaf","ratio","input_snp","iwgadiff","iwgafold","inrefgene","refgenes_60kb_from_SNP","closestrefgene","distancefromclosestrefgene")
names(m22)<-c("Chr","SNPID","Phys_location","phen","N","iMAF","variMAF","Nd","iMAFd","variMAFd","beta","se","chisq","df","model","remark","pval","effect","noneffect","obs_eaf","ratio","input_snp","iwgadiff","iwgafold","inrefgene","refgenes_60kb_from_SNP","closestrefgene","distancefromclosestrefgene")
head(m1)
## Chr SNPID Phys_location phen N iMAF variMAF Nd iMAFd
## 1 1 1:752566:G:A 752566 prevAF 8466 0.8310948 0.1314984 319 0.8590298
## 2 1 1:752721:A:G 752721 prevAF 8466 0.8206172 0.1293733 319 0.8486019
## 3 1 1:753405:C:A 753405 prevAF 8466 0.8611797 0.1099850 319 0.8775627
## 4 1 1:754503:G:A 754503 prevAF 8466 0.8214673 0.1262385 319 0.8486881
## 5 1 1:754964:C:T 754964 prevAF 8466 0.8222797 0.1251631 319 0.8490846
## 6 1 1:755775:A:G 755775 prevAF 8466 0.8337898 0.1224075 319 0.8588840
## variMAFd beta se chisq df model remark pval effect
## 1 0.11401448 0.3121765 0.1307839 5.697600 1 a NA 0.01698812 A
## 2 0.11524523 0.3188471 0.1374107 5.384236 1 a NA 0.02031948 G
## 3 0.09866396 0.2423083 0.1457886 2.762421 1 a NA 0.09650213 A
## 4 0.11316228 0.3183859 0.1397901 5.187462 1 a NA 0.02275042 A
## 5 0.11250209 0.3168127 0.1406186 5.075976 1 a NA 0.02425961 T
## 6 0.10818897 0.3053463 0.1370439 4.964387 1 a NA 0.02587448 G
## noneffect obs_eaf ratio input_snp iwgadiff iwgafold inrefgene
## 1 G 0.83099 0.99802 Y NA NA
## 2 A 0.82046 0.95044 N NA NA
## 3 C 0.86116 0.91977 N NA NA FAM87B
## 4 G 0.82129 0.92916 N NA NA FAM87B
## 5 C 0.82212 0.92330 N NA NA FAM87B
## 6 A 0.83375 0.93181 N NA NA
## refgenes_60kb_from_SNP closestrefgene
## 1 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## 2 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## 3 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 4 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 5 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 6 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## distancefromclosestrefgene
## 1 184
## 2 29
## 3 0
## 4 0
## 5 0
## 6 561
head(m2)
## Chr SNPID Phys_location phen N iMAF variMAF Nd iMAFd
## 1 1 1:752566:G:A 752566 prevMI 8168 0.8309865 0.1315220 255 0.8590098
## 2 1 1:752721:A:G 752721 prevMI 8168 0.8203603 0.1293845 255 0.8480725
## 3 1 1:753405:C:A 753405 prevMI 8168 0.8610646 0.1100020 255 0.8870510
## 4 1 1:754503:G:A 754503 prevMI 8168 0.8212104 0.1262378 255 0.8481608
## 5 1 1:754964:C:T 754964 prevMI 8168 0.8220255 0.1251589 255 0.8489118
## 6 1 1:755775:A:G 755775 prevMI 8168 0.8336904 0.1224042 255 0.8603627
## variMAFd beta se chisq df model remark pval effect
## 1 0.11717860 0.3087055 0.1233899 6.259355 1 a NA 0.01235392 A
## 2 0.11620949 0.3153932 0.1245776 6.409504 1 a NA 0.01135111 G
## 3 0.09451533 0.3553602 0.1511556 5.526995 1 a NA 0.01872524 A
## 4 0.11382574 0.3154721 0.1264543 6.223783 1 a NA 0.01260455 A
## 5 0.11272816 0.3175989 0.1271465 6.239490 1 a NA 0.01249324 T
## 6 0.10903204 0.3188029 0.1293797 6.071728 1 a NA 0.01373625 G
## noneffect obs_eaf ratio input_snp iwgadiff iwgafold inrefgene
## 1 G 0.83099 0.99802 Y NA NA
## 2 A 0.82046 0.95044 N NA NA
## 3 C 0.86116 0.91977 N NA NA FAM87B
## 4 G 0.82129 0.92916 N NA NA FAM87B
## 5 C 0.82212 0.92330 N NA NA FAM87B
## 6 A 0.83375 0.93181 N NA NA
## refgenes_60kb_from_SNP closestrefgene
## 1 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## 2 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## 3 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 4 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 5 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 6 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## distancefromclosestrefgene
## 1 184
## 2 29
## 3 0
## 4 0
## 5 0
## 6 561
head(m22)
## Chr SNPID Phys_location phen N iMAF variMAF Nd iMAFd
## 1 1 1:752566:G:A 752566 prevCHF 8168 0.8309865 0.1315220 106 0.8162500
## 2 1 1:752721:A:G 752721 prevCHF 8168 0.8203603 0.1293845 106 0.8102736
## 3 1 1:753405:C:A 753405 prevCHF 8168 0.8610646 0.1100020 106 0.8466604
## 4 1 1:754503:G:A 754503 prevCHF 8168 0.8212104 0.1262378 106 0.8117170
## 5 1 1:754964:C:T 754964 prevCHF 8168 0.8220255 0.1251589 106 0.8124764
## 6 1 1:755775:A:G 755775 prevCHF 8168 0.8336904 0.1224042 106 0.8195330
## variMAFd beta se chisq df model remark pval effect
## 1 0.1456776 -0.07227210 0.1851205 0.15241667 1 a NA 0.6962365 A
## 2 0.1465008 -0.03522731 0.1921447 0.03361258 1 a NA 0.8545333 G
## 3 0.1252627 -0.07523197 0.1951751 0.14857859 1 a NA 0.6998974 A
## 4 0.1427549 -0.03191776 0.1940868 0.02704411 1 a NA 0.8693762 A
## 5 0.1417972 -0.03258901 0.1948029 0.02798667 1 a NA 0.8671402 T
## 6 0.1362847 -0.07217108 0.1905409 0.14346626 1 a NA 0.7048591 G
## noneffect obs_eaf ratio input_snp iwgadiff iwgafold inrefgene
## 1 G 0.83099 0.99802 Y NA NA
## 2 A 0.82046 0.95044 N NA NA
## 3 C 0.86116 0.91977 N NA NA FAM87B
## 4 G 0.82129 0.92916 N NA NA FAM87B
## 5 C 0.82212 0.92330 N NA NA FAM87B
## 6 A 0.83375 0.93181 N NA NA
## refgenes_60kb_from_SNP closestrefgene
## 1 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## 2 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## 3 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 4 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 5 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C
## 6 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C FAM87B
## distancefromclosestrefgene
## 1 184
## 2 29
## 3 0
## 4 0
## 5 0
## 6 561
In the next step, we merge these three GWAS summary files.
com<-merge(m1,m2,by="SNPID")
com<-merge(com,m22,by="SNPID")
If we use the full individual level data, we read the phenotype file and apply our function.
m3 <- data.twobi[[4]]
head(m3)
## id prevMI prevCHF prevAF sex.x dnaage
## 1 1 0 0 0 0 67.71914
## 2 2 1 0 0 1 34.82034
## 3 3 0 0 0 0 70.13402
## 4 4 0 0 0 1 33.57259
## 5 5 0 0 0 1 58.83273
## 6 6 0 0 0 0 67.03615
gamma <- get_gamma_full_y2_bi(data =m3,y2 = 'prevAF',y1 = 'prevMI',cor =c('dnaage','sex.x'))
theta <- get_gamma_full_y2_bi(data =m3,y2 = 'prevAF',y1 = 'prevCHF',cor =c('dnaage','sex.x'))
r1 <-y1y2_bi(d=com,data=m3,y1='prevMI',y2='prevAF',y3='prevCHF',N2 = 500, N3 = 500, Ncase2 = 50, Ncase3 = 48,beta_m_hat = 'beta.x',sd_beta_m_hat = 'se.x',eta_hat = 'beta.y',sd_eta = 'se.y',mu_hat='beta',sd_mu_hat = 'se',maf = 'iMAF.x',N = 500,Ncase = 55, gamma_hat = gamma[1],sd_gamma_hat =gamma[2],theta_hat = theta[1],sd_theta_hat = theta[2])
head(r1[,c(1,(ncol(r1)-2):ncol(r1))])
## SNPID beta_c z pval_c
## 1 1:1000156:C:T 0.04584926 0.4707475 0.6380276
## 2 1:1001177:G:C 0.04470200 0.4626822 0.6437947
## 3 1:1002932:C:G 0.05390312 0.5421288 0.5879728
## 4 1:1003629:C:T 0.04294063 0.4498898 0.6529860
## 5 1:1046717:G:C 0.11940880 0.8755593 0.3816926
## 6 1:1046861:G:A 0.11939338 0.8757290 0.3816004
If we use a subset of individual level data to approximate the conditional analysis, we need to define the participants’ id in the subset and apply our function.
sub<-m3[sample(1:dim(m3)[1],100),'id']
gamma <- get_gamma_subset_y2_bi(data = m3, id ='id',subid= sub,y2 = 'prevAF', y1 = 'prevMI',cor =c('dnaage','sex.x'))
theta <- get_gamma_subset_y2_bi(data = m3, id ='id',subid= sub,y2 = 'prevAF', y1 = 'prevCHF',cor =c('dnaage','sex.x'))
r2 <-y1y2_bi(d=com,data=m3,y1='prevMI',y2='prevAF',y3='prevCHF',N2 = 500,N3 = 500, Ncase2 = 50, Ncase3 = 48,beta_m_hat = 'beta.x',sd_beta_m_hat = 'se.x',eta_hat = 'beta.y',sd_eta = 'se.y',mu_hat='beta',sd_mu_hat = 'se',maf = 'iMAF.x',N = 500,Ncase = 55, gamma_hat = gamma[1],sd_gamma_hat =gamma[2],theta_hat = theta[1],sd_theta_hat = theta[2])
head(r2[,c(1,(ncol(r2)-2):ncol(r2))])
## SNPID beta_c z pval_c
## 1 1:1000156:C:T 0.04452690 0.4529221 0.6508025
## 2 1:1001177:G:C 0.04342288 0.4451009 0.6564407
## 3 1:1002932:C:G 0.05262198 0.5237027 0.6007188
## 4 1:1003629:C:T 0.04174655 0.4331938 0.6650619
## 5 1:1046717:G:C 0.11897779 0.8718219 0.3837265
## 6 1:1046861:G:A 0.11897656 0.8720967 0.3835767
If we can identify \(\hat{\gamma}_m\), \(sd(\hat{\gamma}_m)\) (i.e., the beta coefficient and its standard error from logistic regressions: logit(E(AF)) = MI), and \(\hat{\theta}\), \(sd(\hat{\theta})\) (i.e., the beta coefficient and its standard error from logistic regressions: logit(E(AF)) = HF) in the literature, we can specify the values for ‘\(\texttt{gamma_hat}\)’, ‘\(\texttt{sd_gamma_hat}\)’, ‘\(\texttt{theta_hat}\)’, and ‘\(\texttt{sd_theta_hat}\)’, respectively in the function.