Two continuous

If \(Y_1\) and \(Y_2\) are two continuous traits, we use the \(\texttt{function_two_continuous.R}\) to approximate conditional phenotype analysis. In the first step, we load the functions.

source('~/Google Drive/conditional analysis/Approxi_Conditiondal_Analysis_R_functions/R_functions/function_two_continuous.R')

Then we read the GWAS summary statistics for two continuous traits. In this example, we approximate the genetic effect in the model of waist circumference (continuous) adjusting for body mass index (continuous).

load('~/Google Drive/conditional analysis/data.twocon.Rda')
dy1 <- data.twocon[[1]]
dy2 <- data.twocon[[2]]
head(dy1)
##   Chr         SNPID Phys_location phen   n0   n1   n2          h2q        beta
## 1   7 SNP_A-1780270      78437519 wccm 2049 3685 1796 3.921557e-05 -0.11530186
## 2  15 SNP_A-1780271      31183071 wccm 4232 2691  489 1.660195e-04 -0.32137611
## 3   1 SNP_A-1780272     188074307 wccm 6002 1410   93 7.468645e-06  0.04282617
## 4  20 SNP_A-1780274      33371323 wccm 2837 3447 1143 5.000550e-05 -0.07293575
## 5  12 SNP_A-1780277      73950313 wccm 2940 3541 1023 1.624566e-04  0.19833982
## 6   1 SNP_A-1780278     216957281 wccm 5866 1163   52 6.141465e-04 -1.01504168
##          se      chisq df    model       pval    hwe97 callrate97    maf97
## 1 0.2378031 0.23509183  1 additive 0.62777375 0.285706   0.997649 0.484918
## 2 0.2770709 1.34538088  1 additive 0.24608726 0.832850   0.983190 0.247310
## 3 0.3868438 0.01225596  1 additive 0.91184900 0.011297   0.993887 0.106091
## 4 0.2467132 0.08739694  1 additive 0.76751286 0.571079   0.982367 0.385725
## 5 0.2465059 0.64738878  1 additive 0.42104781 0.743294   0.994358 0.371380
## 6 0.4255382 5.68971601  1 additive 0.01706462 0.724823   0.938756 0.088154
##   ni_final maleHET97 affyremoved p_mishap_faminfo p_diffmiss_gen12
## 1       13        NA           0           0.8570         1.000000
## 2        9        NA           0           0.1630         0.016580
## 3       20        NA           0           0.0000         1.000000
## 4       20        NA           0           0.0518         0.315500
## 5        0        NA           0           0.3310         0.031910
## 6      118        NA           0           0.2320         0.000127
##   p_diffmiss_gen13 p_diffmiss_gen23 p_diffmaf_sex inrefgene
## 1         0.337900          0.02210       0.30190     MAGI2
## 2         0.055570          0.38610       0.01015          
## 3         0.634000          0.37610       0.86080          
## 4         0.000000          0.00000       0.18580      UQCC
## 5         0.001345          0.05896       0.10690          
## 6         0.001171          0.00000       0.55670          
##   refgenes_60kb_from_SNP closestrefgene distancefromclosestrefgene
## 1                  MAGI2          MAGI2                     483307
## 2                   FMN1           FMN1                      35694
## 3                                 FAM5C                     259112
## 4 FAM83C;EIF6;MMP24;UQCC           UQCC                      17527
## 5                  CAPS2          CAPS2                       6962
## 6                                 TGFB2                     275688
head(dy2)
##   Chr         SNPID Phys_location phen   n0   n1   n2          h2q        beta
## 1   7 SNP_A-1780270      78437519  bmi 2049 3685 1796 1.072117e-04 -0.07051900
## 2  15 SNP_A-1780271      31183071  bmi 4232 2691  489 5.432837e-05 -0.11418372
## 3   1 SNP_A-1780272     188074307  bmi 6002 1410   93 2.285974e-04 -0.17315060
## 4  20 SNP_A-1780274      33371323  bmi 2837 3447 1143 6.177918e-05 -0.03718359
## 5  12 SNP_A-1780277      73950313  bmi 2940 3541 1023 1.960031e-04  0.07441231
## 6   1 SNP_A-1780278     216957281  bmi 5866 1163   52 9.736089e-04 -0.44874738
##           se     chisq df    model        pval    hwe97 callrate97    maf97
## 1 0.09139220 0.5953798  1 additive 0.440346302 0.285706   0.997649 0.484918
## 2 0.10628907 1.1540674  1 additive 0.282699315 0.832850   0.983190 0.247310
## 3 0.14861291 1.3574847  1 additive 0.243973887 0.011297   0.993887 0.106091
## 4 0.09494515 0.1533759  1 additive 0.695329916 0.571079   0.982367 0.385725
## 5 0.09464301 0.6181764  1 additive 0.431725751 0.743294   0.994358 0.371380
## 6 0.16311932 7.5682078  1 additive 0.005940683 0.724823   0.938756 0.088154
##   ni_final maleHET97 affyremoved p_mishap_faminfo p_diffmiss_gen12
## 1       13        NA           0           0.8570         1.000000
## 2        9        NA           0           0.1630         0.016580
## 3       20        NA           0           0.0000         1.000000
## 4       20        NA           0           0.0518         0.315500
## 5        0        NA           0           0.3310         0.031910
## 6      118        NA           0           0.2320         0.000127
##   p_diffmiss_gen13 p_diffmiss_gen23 p_diffmaf_sex inrefgene
## 1         0.337900          0.02210       0.30190     MAGI2
## 2         0.055570          0.38610       0.01015          
## 3         0.634000          0.37610       0.86080          
## 4         0.000000          0.00000       0.18580      UQCC
## 5         0.001345          0.05896       0.10690          
## 6         0.001171          0.00000       0.55670          
##   refgenes_60kb_from_SNP closestrefgene distancefromclosestrefgene
## 1                  MAGI2          MAGI2                     483307
## 2                   FMN1           FMN1                      35694
## 3                                 FAM5C                     259112
## 4 FAM83C;EIF6;MMP24;UQCC           UQCC                      17527
## 5                  CAPS2          CAPS2                       6962
## 6                                 TGFB2                     275688

In the next step, we merge two GWAS summary files.

com=merge(dy1,dy2,by="SNPID")

If we use the full individual level data to estimate the \(\hat{\gamma}_m\), which is the coefficient in the model M3 relating \(Y_1\) to \(Y_2\), we read the phenotype file and apply our function.

m3 <- data.twocon[[3]]
head(m3)
##   id     wccm      bmi sex.x   dnaage          PC1          PC2          PC3
## 1  1 94.26873 26.98309     1 45.19850  0.009163389  0.024464882  0.018903551
## 2  2 94.00709 24.41432     1 57.01628 -0.036078401 -0.001724910 -0.009747600
## 3  3 88.92366 17.83249     0 56.41468  0.040193234  0.009927302  0.017974667
## 4  4 85.34086 24.59096     0 53.54305 -0.001487380 -0.007601491  0.007681352
## 5  5 73.49879 25.65450     0 38.22636 -0.032282700  0.022234002 -0.009921895
## 6  6 97.90892 26.18636     0 58.68427 -0.013011506 -0.013867713 -0.013348689
##            PC4          PC5
## 1 -0.033331258 -0.003200357
## 2 -0.002205311 -0.011662428
## 3  0.001656729  0.004957955
## 4 -0.001610905  0.006022545
## 5 -0.008514919 -0.008379371
## 6 -0.001278763 -0.010783724
r1 <- two_con_f(data = m3,y2 = 'wccm',y1 = 'bmi',cor = c('dnaage','sex.x','PC1','PC2','PC3','PC4','PC5'), d = com, N = dim(m3)[1], beta_m_hat='beta.x',sd_beta_m_hat='se.x',eta_hat='beta.y',sd_eta='se.y',maf='maf97.x', gamma_hat=0, r = dim(com)[1], method = 'full')
head(r1[,c(1,(ncol(r1)-2):ncol(r1))])
##           SNPID      beta_c          z     pval_c
## 1 SNP_A-1780270 -0.11038117 -0.4635954 0.64314062
## 2 SNP_A-1780271 -0.31391992 -1.1309072 0.25863956
## 3 SNP_A-1780272  0.05538157  0.1428804 0.88644255
## 4 SNP_A-1780274 -0.07029338 -0.2846966 0.77599521
## 5 SNP_A-1780277  0.19324775  0.7829475 0.43403087
## 6 SNP_A-1780278 -0.99779905 -2.3251304 0.02046676

The column ‘beta_c’ is the approximated coefficient of SNP in the adjusted model, ‘z’ is the approximated z value. The approximated standard error of ‘beta_c’ can be calculated as ‘beta_c’ divided by ‘z’. ‘pval_c’ is the approximated p-value.

If we use a subset of individual level data to estimate the \(\hat{\gamma}_m\), we need to define the participants’ id in the subset and apply our function.

subid <- m3[sample(1:dim(m3)[1],100),'id']
r2 <- two_con_f(data = m3,y2 = 'wccm',y1 = 'bmi',cor = c('dnaage','sex.x','PC1','PC2','PC3','PC4','PC5'),id = 'id',subid = subid, d = com, N = 500, beta_m_hat='beta.x',sd_beta_m_hat='se.x',eta_hat='beta.y',sd_eta='se.y',maf='maf97.x', gamma_hat=0, r = dim(com)[1], method = 'subset')
head(r2[,c(1,(ncol(r2)-2):ncol(r2))])
##           SNPID      beta_c           z      pval_c
## 1 SNP_A-1780270 -0.13876909 -0.58743768 0.557176576
## 2 SNP_A-1780271 -0.35993678 -1.30729436 0.191717224
## 3 SNP_A-1780272 -0.01442775 -0.03751197 0.970091860
## 4 SNP_A-1780274 -0.08524865 -0.34797514 0.728006187
## 5 SNP_A-1780277  0.22320432  0.91153739 0.362454200
## 6 SNP_A-1780278 -1.18096672 -2.77972235 0.005646767

If we can identify \(\hat{\gamma}_m\) in the literature, we can input the \(\hat{\gamma}_m\) in the function.

r3 <- two_con_f(data = m3,y2 = 'wccm',y1 = 'bmi',cor = c('dnaage','sex.x','PC1','PC2','PC3','PC4','PC5'),id = 'id',subid = subid, d = com, N = 500, beta_m_hat='beta.x',sd_beta_m_hat='se.x',eta_hat='beta.y',sd_eta='se.y',maf='maf97.x', gamma_hat=2, r = dim(com)[1], method = 'literature')

head(r3[,c(1,(ncol(r3)-2):ncol(r3))])
##           SNPID       beta_c            z    pval_c
## 1 SNP_A-1780270  0.025766845  0.169168616 0.8657328
## 2 SNP_A-1780271 -0.093223782 -0.522765690 0.6013703
## 3 SNP_A-1780272  0.390185950  1.579147938 0.1149383
## 4 SNP_A-1780274  0.001431873  0.009080314 0.9927587
## 5 SNP_A-1780277  0.049576544  0.313257700 0.7542163
## 6 SNP_A-1780278 -0.119329731 -0.431818147 0.6660609

Continuous trait adjusting for binary trait

If we want to estimate the genetic effect in conditional analysis with continuous trait of interest and binary confounder, we use the \(\texttt{function_y1_binary_y2_continuous.R}\) to approximate conditional phenotype analysis. In the first step, we load the functions.

source('~/Google Drive/conditional analysis/Approxi_Conditiondal_Analysis_R_functions/R_functions/function_y1_binary_y2_continuous.R')

Then we read the GWAS summary statistics for \(Y_1\) and \(Y_2\). In this example, we approximate the genetic effect in the model of BMI (continuous) adjusting for smoking (binary).

load('~/Google Drive/conditional analysis/data.y1bi.Rda')
###GWAS for BMI
dy1 <- data.y1bi[[1]]
####GWAS for Smoking
dy2 <- data.y1bi[[2]]
head(dy1)
##   Chr         SNPID Phys_location   phen   n0   n1   n2          h2q
## 1   7 SNP_A-1780270      78437519 resbmi 1706 3090 1510 9.810770e-05
## 2  15 SNP_A-1780271      31183071 resbmi 3535 2287  405 2.364683e-05
## 3   1 SNP_A-1780272     188074307 resbmi 5048 1168   66 6.555359e-05
## 4  20 SNP_A-1780274      33371323 resbmi 2367 2890  970 2.350082e-04
## 5  12 SNP_A-1780277      73950313 resbmi 2481 2978  829 1.081686e-04
## 6   1 SNP_A-1780278     216957281 resbmi 5017  940   38 8.989778e-04
##           beta         se     chisq df    model      pval    hwe97 callrate97
## 1 -0.009359197 0.01890409 0.2451126  1 additive 0.6205377 0.285706   0.997649
## 2 -0.012249043 0.02199955 0.3100106  1 additive 0.5776737 0.832850   0.983190
## 3 -0.017580821 0.03126678 0.3161636  1 additive 0.5739224 0.011297   0.993887
## 4 -0.017682415 0.01950118 0.8221694  1 additive 0.3645466 0.571079   0.982367
## 5  0.012924823 0.01963792 0.4331698  1 additive 0.5104376 0.743294   0.994358
## 6 -0.067371246 0.03445329 3.8237343  1 additive 0.0505315 0.724823   0.938756
##      maf97 ni_final maleHET97 affyremoved p_mishap_faminfo p_diffmiss_gen12
## 1 0.484918       13        NA           0           0.8570         1.000000
## 2 0.247310        9        NA           0           0.1630         0.016580
## 3 0.106091       20        NA           0           0.0000         1.000000
## 4 0.385725       20        NA           0           0.0518         0.315500
## 5 0.371380        0        NA           0           0.3310         0.031910
## 6 0.088154      118        NA           0           0.2320         0.000127
##   p_diffmiss_gen13 p_diffmiss_gen23 p_diffmaf_sex inrefgene
## 1         0.337900          0.02210       0.30190     MAGI2
## 2         0.055570          0.38610       0.01015          
## 3         0.634000          0.37610       0.86080          
## 4         0.000000          0.00000       0.18580      UQCC
## 5         0.001345          0.05896       0.10690          
## 6         0.001171          0.00000       0.55670          
##   refgenes_60kb_from_SNP closestrefgene distancefromclosestrefgene
## 1                  MAGI2          MAGI2                     483307
## 2                   FMN1           FMN1                      35694
## 3                                 FAM5C                     259112
## 4 FAM83C;EIF6;MMP24;UQCC           UQCC                      17527
## 5                  CAPS2          CAPS2                       6962
## 6                                 TGFB2                     275688
head(dy2)
##   Chr         SNPID Phys_location     phen   n0   n1   n2  nd0  nd1 nd2
## 1   7 SNP_A-1780270      78437519 ever_smk 1706 3090 1510  872 1634 813
## 2  15 SNP_A-1780271      31183071 ever_smk 3535 2287  405 1876 1182 215
## 3   1 SNP_A-1780272     188074307 ever_smk 5048 1168   66 2630  635  35
## 4  20 SNP_A-1780274      33371323 ever_smk 2367 2890  970 1249 1509 512
## 5  12 SNP_A-1780277      73950313 ever_smk 2481 2978  829 1273 1575 466
## 6   1 SNP_A-1780278     216957281 ever_smk 5017  940   38 2618  497  24
##        miss.0      miss.1 miss.diff.p         beta         se       chisq df
## 1 0.002338009 0.003303303  0.63770914  0.043576557 0.04156060 1.099365684  1
## 2 0.013360053 0.017117117  0.25966063 -0.026318965 0.04563392 0.332630499  1
## 3 0.004008016 0.009009009  0.01899738  0.087147650 0.07101088 1.506127014  1
## 4 0.012358049 0.018018018  0.08103405  0.016680572 0.04046256 0.169947645  1
## 5 0.006680027 0.004804805  0.40301614  0.079083581 0.04046040 3.820430047  1
## 6 0.046092184 0.057357357  0.04711066  0.002632004 0.07499745 0.001231629  1
##   model remark       pval    hwe97 callrate97    maf97 ni_final maleHET97
## 1     a   <NA> 0.29440535 0.285706   0.997649 0.484918       13        NA
## 2     a   <NA> 0.56411424 0.832850   0.983190 0.247310        9        NA
## 3     a   <NA> 0.21973102 0.011297   0.993887 0.106091       20        NA
## 4     a   <NA> 0.68015834 0.571079   0.982367 0.385725       20        NA
## 5     a   <NA> 0.05063124 0.743294   0.994358 0.371380        0        NA
## 6     a   <NA> 0.97200433 0.724823   0.938756 0.088154      118        NA
##   affyremoved p_mishap_faminfo p_diffmiss_gen12 p_diffmiss_gen13
## 1           0           0.8570         1.000000         0.337900
## 2           0           0.1630         0.016580         0.055570
## 3           0           0.0000         1.000000         0.634000
## 4           0           0.0518         0.315500         0.000000
## 5           0           0.3310         0.031910         0.001345
## 6           0           0.2320         0.000127         0.001171
##   p_diffmiss_gen23 p_diffmaf_sex inrefgene refgenes_60kb_from_SNP
## 1          0.02210       0.30190     MAGI2                  MAGI2
## 2          0.38610       0.01015                             FMN1
## 3          0.37610       0.86080                                 
## 4          0.00000       0.18580      UQCC FAM83C;EIF6;MMP24;UQCC
## 5          0.05896       0.10690                            CAPS2
## 6          0.00000       0.55670                                 
##   closestrefgene distancefromclosestrefgene
## 1          MAGI2                     483307
## 2           FMN1                      35694
## 3          FAM5C                     259112
## 4           UQCC                      17527
## 5          CAPS2                       6962
## 6          TGFB2                     275688

In the next step, we merge two GWAS summary files.

data<-merge(dy1,dy2,by="SNPID")
###calculate the sample size
data$n_m4<-data$n0.x+data$n1.x+data$n2.x

If we use the full individual level data, we read the phenotype file and apply our function.

pheno <- data.y1bi[[3]]
head(pheno)
##   id      resbmi ever_smk sex.x   dnaage          PC1           PC2
## 1  1 -0.38320219        0     1 34.17423  0.028634445 -0.0013222994
## 2  2  0.64536193        1     1 54.90689  0.013705735 -0.0008034025
## 3  3  0.57348457        1     1 65.84717  0.051691126 -0.0269350179
## 4  4  1.66220102        0     0 62.05004 -0.005399918 -0.0090277075
## 5  5  1.37534140        0     0 61.71630  0.005033765 -0.0260456895
## 6  6  0.01273226        1     0 36.64420  0.027360057 -0.0048245266
##            PC3          PC4          PC5
## 1  0.007022215  0.015245778 -0.018004480
## 2 -0.007935423 -0.003631598  0.005322603
## 3 -0.014745844 -0.018648797  0.013042895
## 4  0.010562221 -0.008870594  0.000790161
## 5  0.004006013 -0.010453307  0.005747289
## 6 -0.012228854 -0.001714101 -0.019467468
r1<-y1_bi_f(d=data,n_m4='n_m4',b_m1='beta.x',se_m1='se.x',eta_hat = 'beta.y',maf = 'maf97.x', N = 500, Ncase = 245,gamma = 0, data = pheno, id = 'id', subid=NULL ,y2='resbmi',y1 = 'ever_smk', cor = NULL,method='full')
head(r1[,c(1,(ncol(r1)-2):ncol(r1))])
##           SNPID       se_c          z     pval_c
## 1 SNP_A-1780270 0.01890818 -0.3497003 0.72657532
## 2 SNP_A-1780271 0.02200255 -0.6321892 0.52728648
## 3 SNP_A-1780272 0.03127566 -0.3864650 0.69916538
## 4 SNP_A-1780274 0.01950433 -0.8526712 0.39387449
## 5 SNP_A-1780277 0.01964731  0.9124115 0.36158711
## 6 SNP_A-1780278 0.03446712 -1.9498365 0.05124219

If we use a subset of individual level data to approximate the conditional analysis, we need to define the participants’ id in the subset and apply our function.

subid <- pheno[sample(1:dim(pheno)[1],100),'id']
r2<-y1_bi_f(d=data,n_m4='n_m4',b_m1='beta.x',se_m1='se.x',eta_hat = 'beta.y',maf = 'maf97.x', N = 500, Ncase = 245,gamma = 0, data = pheno, id = 'id', subid=subid ,y2='resbmi',y1 = 'ever_smk', cor = NULL,method='subset')
 
head(r2[,c(1,(ncol(r2)-2):ncol(r2))])
##           SNPID       se_c          z     pval_c
## 1 SNP_A-1780270 0.01890819 -0.3805767 0.70353022
## 2 SNP_A-1780271 0.02200256 -0.6161680 0.53780621
## 3 SNP_A-1780272 0.03127567 -0.4238106 0.67171844
## 4 SNP_A-1780274 0.01950434 -0.8641270 0.38755141
## 5 SNP_A-1780277 0.01964732  0.8584533 0.39067495
## 6 SNP_A-1780278 0.03446713 -1.9508585 0.05112043

If we can identify \(\hat{\gamma}\) in the literature, we can input the \(\hat{\gamma}\) in the option \(`\texttt{gamma}'\) in the function.

Binary trait adjusting for continuous trait

If we want to estimate the genetic effect in conditional analysis with binary trait of interest and continuous confounder, we use the \(\texttt{function_y1_continuous_y2_binary.R}\) to approximate conditional phenotype analysis. In the first step, we load the functions.

source('~/Google Drive/conditional analysis/Approxi_Conditiondal_Analysis_R_functions/R_functions/function_y1_continuous_y2_binary.R')

Then we read the GWAS summary statistics for \(Y_1\) and \(Y_2\). In this example, we approximate the genetic effect in the model of atrial fibrillation (binary) adjusting for height (continuous).

load('~/Google Drive/conditional analysis/data.y2bi.Rda')
###Gwas for AF
m1 <- data.y2bi[[1]]
###GWAS for height
m2 <- data.y2bi[[2]]
head(m1)
##   Chr        SNPID Phys_location   phen    N      iMAF   variMAF  Nd     iMAFd
## 1   1 1:752566:G:A        752566 prevAF 8466 0.8310948 0.1314984 319 0.8590298
## 2   1 1:752721:A:G        752721 prevAF 8466 0.8206172 0.1293733 319 0.8486019
## 3   1 1:753405:C:A        753405 prevAF 8466 0.8611797 0.1099850 319 0.8775627
## 4   1 1:754503:G:A        754503 prevAF 8466 0.8214673 0.1262385 319 0.8486881
## 5   1 1:754964:C:T        754964 prevAF 8466 0.8222797 0.1251631 319 0.8490846
## 6   1 1:755775:A:G        755775 prevAF 8466 0.8337898 0.1224075 319 0.8588840
##     variMAFd      beta        se    chisq df model remark       pval effect
## 1 0.11401448 0.3121765 0.1307839 5.697600  1     a     NA 0.01698812      A
## 2 0.11524523 0.3188471 0.1374107 5.384236  1     a     NA 0.02031948      G
## 3 0.09866396 0.2423083 0.1457886 2.762421  1     a     NA 0.09650213      A
## 4 0.11316228 0.3183859 0.1397901 5.187462  1     a     NA 0.02275042      A
## 5 0.11250209 0.3168127 0.1406186 5.075976  1     a     NA 0.02425961      T
## 6 0.10818897 0.3053463 0.1370439 4.964387  1     a     NA 0.02587448      G
##   noneffect obs_eaf   ratio input_snp iwgadiff iwgafold inrefgene
## 1         G 0.83099 0.99802         Y       NA       NA          
## 2         A 0.82046 0.95044         N       NA       NA          
## 3         C 0.86116 0.91977         N       NA       NA    FAM87B
## 4         G 0.82129 0.92916         N       NA       NA    FAM87B
## 5         C 0.82212 0.92330         N       NA       NA    FAM87B
## 6         A 0.83375 0.93181         N       NA       NA          
##                           refgenes_60kb_from_SNP closestrefgene
## 1 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
## 2 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
## 3 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 4 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 5 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 6 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
##   distancefromclosestrefgene
## 1                        184
## 2                         29
## 3                          0
## 4                          0
## 5                          0
## 6                        561
head(m2)
##   Chr        SNPID Phys_location  phen    N      iMAF   variMAF          h2q
## 1   1 1:752566:G:A        752566 hgtcm 8168 0.8309865 0.1315220 0.0002693050
## 2   1 1:752721:A:G        752721 hgtcm 8168 0.8203603 0.1293845 0.0002655501
## 3   1 1:753405:C:A        753405 hgtcm 8168 0.8610646 0.1100020 0.0002268764
## 4   1 1:754503:G:A        754503 hgtcm 8168 0.8212104 0.1262378 0.0002590373
## 5   1 1:754964:C:T        754964 hgtcm 8168 0.8220255 0.1251589 0.0002598921
## 6   1 1:755775:A:G        755775 hgtcm 8168 0.8336904 0.1224042 0.0002636075
##        beta        se    chisq df    model       pval effect noneffect obs_eaf
## 1 0.2689257 0.1366800 3.871279  1 additive 0.04911907      A         G 0.83099
## 2 0.2618523 0.1373180 3.636286  1 additive 0.05653292      G         A 0.82046
## 3 0.3125504 0.1519281 4.232181  1 additive 0.03966443      A         C 0.86116
## 4 0.2642753 0.1391384 3.607604  1 additive 0.05751593      A         G 0.82129
## 5 0.2672485 0.1398220 3.653252  1 additive 0.05595983      T         C 0.82212
## 6 0.2840876 0.1422330 3.989363  1 additive 0.04578836      G         A 0.83375
##     ratio input_snp iwgadiff iwgafold inrefgene
## 1 0.99802         Y       NA       NA          
## 2 0.95044         N       NA       NA          
## 3 0.91977         N       NA       NA    FAM87B
## 4 0.92916         N       NA       NA    FAM87B
## 5 0.92330         N       NA       NA    FAM87B
## 6 0.93181         N       NA       NA          
##                           refgenes_60kb_from_SNP closestrefgene
## 1 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
## 2 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
## 3 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 4 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 5 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 6 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
##   distancefromclosestrefgene
## 1                        184
## 2                         29
## 3                          0
## 4                          0
## 5                          0
## 6                        561

In the next step, we merge two GWAS summary files.

com<-merge(m1,m2,by="SNPID")

If we use the full individual level data, we read the phenotype file and apply our function.

m3 <- data.y2bi[[3]]
head(m3)
##   id    hgtcm prevAF sex.x   dnaage
## 1  1 148.3785      0     1 31.17778
## 2  2 163.2644      0     0 58.27673
## 3  3 169.3237      0     0 74.85126
## 4  4 160.4307      0     1 54.64849
## 5  5 153.8345      1     1 31.08670
## 6  6 166.5532      0     0 60.35989
r1 <- y2_bi_f(d=com,N2=500,beta_m_hat= 'beta.x',sd_beta_m_hat = 'se.x',eta_hat = 'beta.y',sd_eta= 'se.y', maf= 'iMAF.x',N = 500,Ncase = 33,gamma_hat = 0,sd_gamma_hat = 0,mean1 =0,mean0 = 0,data =m3,y2 = 'prevAF',y1 = 'hgtcm',cor =c('dnaage','sex.x'), method='full')
head(r1[,c(1,(ncol(r1)-2):ncol(r1))])
##           SNPID     beta_c         z    pval_c
## 1 1:1000156:C:T 0.03870076 0.3975008 0.6911688
## 2 1:1001177:G:C 0.03749607 0.3882467 0.6979997
## 3 1:1002932:C:G 0.04620832 0.4649649 0.6421602
## 4 1:1003629:C:T 0.03592086 0.3765090 0.7066992
## 5 1:1046717:G:C 0.11656667 0.8546671 0.3931472
## 6 1:1046861:G:A 0.11656163 0.8549058 0.3930152

If we use a subset of individual level data to approximate the conditional analysis, we need to define the participants’ id in the subset and apply our function.

subid <- m3[sample(1:dim(m3)[1],100),'id']
t2 <- y2_bi_f(d=com,N2=500,beta_m_hat= 'beta.x',sd_beta_m_hat = 'se.x',eta_hat = 'beta.y',sd_eta= 'se.y', maf= 'iMAF.x',N = 500,Ncase = 33,gamma_hat = 0,sd_gamma_hat = 0,mean1 =0,mean0 = 0,data =m3,y2 = 'prevAF',y1 = 'hgtcm',cor =c('dnaage','sex.x'), method='subset',id='id',subid=subid)
head(r2[,c(1,(ncol(r2)-2):ncol(r2))])
##           SNPID       se_c          z     pval_c
## 1 SNP_A-1780270 0.01890819 -0.3805767 0.70353022
## 2 SNP_A-1780271 0.02200256 -0.6161680 0.53780621
## 3 SNP_A-1780272 0.03127567 -0.4238106 0.67171844
## 4 SNP_A-1780274 0.01950434 -0.8641270 0.38755141
## 5 SNP_A-1780277 0.01964732  0.8584533 0.39067495
## 6 SNP_A-1780278 0.03446713 -1.9508585 0.05112043

If we can identify \(\hat{\gamma}_m\), \(sd(\hat{\gamma}_m)\) (i.e., the beta coefficient and its standard error from logistic regressions M3), Mean(Height|AF=1), and Mean(Height|AF=0) in the literature, we can specify the values for ‘\(\texttt{gamma_hat}\)’, ‘\(\texttt{sd_gamma_hat}\)’, ‘\(\texttt{mean1}\)’, and ‘\(\texttt{mean0}\)’, respectively in the function.

Two binary

If we want to estimate the genetic effect in conditional analysis with binary trait of interest and binary confounder, we use the \(\texttt{function_y1_y2_binary.R}\) to approximate conditional phenotype analysis. In the first step, we load the functions.

source('~/Google Drive/conditional analysis/Approxi_Conditiondal_Analysis_R_functions/R_functions/function_y1_y2_binary.R')

Then we read the GWAS summary statistics for trait of interest and confonders. In this example, we approximate the genetic effect in the model of atrial fibrillation (binary) adjusting for both myocardial infarction (binary) and heart failure (binary).

load('~/Google Drive/conditional analysis/data.twobi.Rda')
###Gwas for AF
m1 <- data.twobi[[1]]
###GWAS for myocardial infarction
m2 <- data.twobi[[2]]
###GWAS for heart failure
m22 <- data.twobi[[3]]
###rename column
names(m1)<-c("Chr","SNPID","Phys_location","phen","N","iMAF","variMAF","Nd","iMAFd","variMAFd","beta","se","chisq","df","model","remark","pval","effect","noneffect","obs_eaf","ratio","input_snp","iwgadiff","iwgafold","inrefgene","refgenes_60kb_from_SNP","closestrefgene","distancefromclosestrefgene")
names(m2)<-c("Chr","SNPID","Phys_location","phen","N","iMAF","variMAF","Nd","iMAFd","variMAFd","beta","se","chisq","df","model","remark","pval","effect","noneffect","obs_eaf","ratio","input_snp","iwgadiff","iwgafold","inrefgene","refgenes_60kb_from_SNP","closestrefgene","distancefromclosestrefgene")
names(m22)<-c("Chr","SNPID","Phys_location","phen","N","iMAF","variMAF","Nd","iMAFd","variMAFd","beta","se","chisq","df","model","remark","pval","effect","noneffect","obs_eaf","ratio","input_snp","iwgadiff","iwgafold","inrefgene","refgenes_60kb_from_SNP","closestrefgene","distancefromclosestrefgene")
head(m1)
##   Chr        SNPID Phys_location   phen    N      iMAF   variMAF  Nd     iMAFd
## 1   1 1:752566:G:A        752566 prevAF 8466 0.8310948 0.1314984 319 0.8590298
## 2   1 1:752721:A:G        752721 prevAF 8466 0.8206172 0.1293733 319 0.8486019
## 3   1 1:753405:C:A        753405 prevAF 8466 0.8611797 0.1099850 319 0.8775627
## 4   1 1:754503:G:A        754503 prevAF 8466 0.8214673 0.1262385 319 0.8486881
## 5   1 1:754964:C:T        754964 prevAF 8466 0.8222797 0.1251631 319 0.8490846
## 6   1 1:755775:A:G        755775 prevAF 8466 0.8337898 0.1224075 319 0.8588840
##     variMAFd      beta        se    chisq df model remark       pval effect
## 1 0.11401448 0.3121765 0.1307839 5.697600  1     a     NA 0.01698812      A
## 2 0.11524523 0.3188471 0.1374107 5.384236  1     a     NA 0.02031948      G
## 3 0.09866396 0.2423083 0.1457886 2.762421  1     a     NA 0.09650213      A
## 4 0.11316228 0.3183859 0.1397901 5.187462  1     a     NA 0.02275042      A
## 5 0.11250209 0.3168127 0.1406186 5.075976  1     a     NA 0.02425961      T
## 6 0.10818897 0.3053463 0.1370439 4.964387  1     a     NA 0.02587448      G
##   noneffect obs_eaf   ratio input_snp iwgadiff iwgafold inrefgene
## 1         G 0.83099 0.99802         Y       NA       NA          
## 2         A 0.82046 0.95044         N       NA       NA          
## 3         C 0.86116 0.91977         N       NA       NA    FAM87B
## 4         G 0.82129 0.92916         N       NA       NA    FAM87B
## 5         C 0.82212 0.92330         N       NA       NA    FAM87B
## 6         A 0.83375 0.93181         N       NA       NA          
##                           refgenes_60kb_from_SNP closestrefgene
## 1 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
## 2 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
## 3 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 4 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 5 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 6 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
##   distancefromclosestrefgene
## 1                        184
## 2                         29
## 3                          0
## 4                          0
## 5                          0
## 6                        561
head(m2)
##   Chr        SNPID Phys_location   phen    N      iMAF   variMAF  Nd     iMAFd
## 1   1 1:752566:G:A        752566 prevMI 8168 0.8309865 0.1315220 255 0.8590098
## 2   1 1:752721:A:G        752721 prevMI 8168 0.8203603 0.1293845 255 0.8480725
## 3   1 1:753405:C:A        753405 prevMI 8168 0.8610646 0.1100020 255 0.8870510
## 4   1 1:754503:G:A        754503 prevMI 8168 0.8212104 0.1262378 255 0.8481608
## 5   1 1:754964:C:T        754964 prevMI 8168 0.8220255 0.1251589 255 0.8489118
## 6   1 1:755775:A:G        755775 prevMI 8168 0.8336904 0.1224042 255 0.8603627
##     variMAFd      beta        se    chisq df model remark       pval effect
## 1 0.11717860 0.3087055 0.1233899 6.259355  1     a     NA 0.01235392      A
## 2 0.11620949 0.3153932 0.1245776 6.409504  1     a     NA 0.01135111      G
## 3 0.09451533 0.3553602 0.1511556 5.526995  1     a     NA 0.01872524      A
## 4 0.11382574 0.3154721 0.1264543 6.223783  1     a     NA 0.01260455      A
## 5 0.11272816 0.3175989 0.1271465 6.239490  1     a     NA 0.01249324      T
## 6 0.10903204 0.3188029 0.1293797 6.071728  1     a     NA 0.01373625      G
##   noneffect obs_eaf   ratio input_snp iwgadiff iwgafold inrefgene
## 1         G 0.83099 0.99802         Y       NA       NA          
## 2         A 0.82046 0.95044         N       NA       NA          
## 3         C 0.86116 0.91977         N       NA       NA    FAM87B
## 4         G 0.82129 0.92916         N       NA       NA    FAM87B
## 5         C 0.82212 0.92330         N       NA       NA    FAM87B
## 6         A 0.83375 0.93181         N       NA       NA          
##                           refgenes_60kb_from_SNP closestrefgene
## 1 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
## 2 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
## 3 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 4 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 5 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 6 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
##   distancefromclosestrefgene
## 1                        184
## 2                         29
## 3                          0
## 4                          0
## 5                          0
## 6                        561
head(m22)
##   Chr        SNPID Phys_location    phen    N      iMAF   variMAF  Nd     iMAFd
## 1   1 1:752566:G:A        752566 prevCHF 8168 0.8309865 0.1315220 106 0.8162500
## 2   1 1:752721:A:G        752721 prevCHF 8168 0.8203603 0.1293845 106 0.8102736
## 3   1 1:753405:C:A        753405 prevCHF 8168 0.8610646 0.1100020 106 0.8466604
## 4   1 1:754503:G:A        754503 prevCHF 8168 0.8212104 0.1262378 106 0.8117170
## 5   1 1:754964:C:T        754964 prevCHF 8168 0.8220255 0.1251589 106 0.8124764
## 6   1 1:755775:A:G        755775 prevCHF 8168 0.8336904 0.1224042 106 0.8195330
##    variMAFd        beta        se      chisq df model remark      pval effect
## 1 0.1456776 -0.07227210 0.1851205 0.15241667  1     a     NA 0.6962365      A
## 2 0.1465008 -0.03522731 0.1921447 0.03361258  1     a     NA 0.8545333      G
## 3 0.1252627 -0.07523197 0.1951751 0.14857859  1     a     NA 0.6998974      A
## 4 0.1427549 -0.03191776 0.1940868 0.02704411  1     a     NA 0.8693762      A
## 5 0.1417972 -0.03258901 0.1948029 0.02798667  1     a     NA 0.8671402      T
## 6 0.1362847 -0.07217108 0.1905409 0.14346626  1     a     NA 0.7048591      G
##   noneffect obs_eaf   ratio input_snp iwgadiff iwgafold inrefgene
## 1         G 0.83099 0.99802         Y       NA       NA          
## 2         A 0.82046 0.95044         N       NA       NA          
## 3         C 0.86116 0.91977         N       NA       NA    FAM87B
## 4         G 0.82129 0.92916         N       NA       NA    FAM87B
## 5         C 0.82212 0.92330         N       NA       NA    FAM87B
## 6         A 0.83375 0.93181         N       NA       NA          
##                           refgenes_60kb_from_SNP closestrefgene
## 1 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
## 2 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
## 3 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 4 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 5 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C               
## 6 LOC100288069;FAM87B;LINC00115;LINC01128;FAM41C         FAM87B
##   distancefromclosestrefgene
## 1                        184
## 2                         29
## 3                          0
## 4                          0
## 5                          0
## 6                        561

In the next step, we merge these three GWAS summary files.

com<-merge(m1,m2,by="SNPID")
com<-merge(com,m22,by="SNPID")

If we use the full individual level data, we read the phenotype file and apply our function.

m3 <- data.twobi[[4]]
head(m3)
##   id prevMI prevCHF prevAF sex.x   dnaage
## 1  1      0       0      0     0 67.71914
## 2  2      1       0      0     1 34.82034
## 3  3      0       0      0     0 70.13402
## 4  4      0       0      0     1 33.57259
## 5  5      0       0      0     1 58.83273
## 6  6      0       0      0     0 67.03615
gamma <- get_gamma_full_y2_bi(data =m3,y2 = 'prevAF',y1 = 'prevMI',cor =c('dnaage','sex.x'))
theta <- get_gamma_full_y2_bi(data =m3,y2 = 'prevAF',y1 = 'prevCHF',cor =c('dnaage','sex.x'))
r1 <-y1y2_bi(d=com,data=m3,y1='prevMI',y2='prevAF',y3='prevCHF',N2 = 500, N3 = 500, Ncase2 = 50, Ncase3 = 48,beta_m_hat = 'beta.x',sd_beta_m_hat = 'se.x',eta_hat = 'beta.y',sd_eta = 'se.y',mu_hat='beta',sd_mu_hat = 'se',maf = 'iMAF.x',N = 500,Ncase = 55, gamma_hat = gamma[1],sd_gamma_hat =gamma[2],theta_hat = theta[1],sd_theta_hat = theta[2])
head(r1[,c(1,(ncol(r1)-2):ncol(r1))])
##           SNPID     beta_c         z    pval_c
## 1 1:1000156:C:T 0.04584926 0.4707475 0.6380276
## 2 1:1001177:G:C 0.04470200 0.4626822 0.6437947
## 3 1:1002932:C:G 0.05390312 0.5421288 0.5879728
## 4 1:1003629:C:T 0.04294063 0.4498898 0.6529860
## 5 1:1046717:G:C 0.11940880 0.8755593 0.3816926
## 6 1:1046861:G:A 0.11939338 0.8757290 0.3816004

If we use a subset of individual level data to approximate the conditional analysis, we need to define the participants’ id in the subset and apply our function.

sub<-m3[sample(1:dim(m3)[1],100),'id']
gamma <- get_gamma_subset_y2_bi(data = m3, id ='id',subid= sub,y2 = 'prevAF', y1 = 'prevMI',cor =c('dnaage','sex.x'))
theta <- get_gamma_subset_y2_bi(data = m3, id ='id',subid= sub,y2 = 'prevAF', y1 = 'prevCHF',cor =c('dnaage','sex.x'))
r2 <-y1y2_bi(d=com,data=m3,y1='prevMI',y2='prevAF',y3='prevCHF',N2 = 500,N3 = 500, Ncase2 = 50, Ncase3 = 48,beta_m_hat = 'beta.x',sd_beta_m_hat = 'se.x',eta_hat = 'beta.y',sd_eta = 'se.y',mu_hat='beta',sd_mu_hat = 'se',maf = 'iMAF.x',N = 500,Ncase = 55, gamma_hat = gamma[1],sd_gamma_hat =gamma[2],theta_hat = theta[1],sd_theta_hat = theta[2])
head(r2[,c(1,(ncol(r2)-2):ncol(r2))])
##           SNPID     beta_c         z    pval_c
## 1 1:1000156:C:T 0.04452690 0.4529221 0.6508025
## 2 1:1001177:G:C 0.04342288 0.4451009 0.6564407
## 3 1:1002932:C:G 0.05262198 0.5237027 0.6007188
## 4 1:1003629:C:T 0.04174655 0.4331938 0.6650619
## 5 1:1046717:G:C 0.11897779 0.8718219 0.3837265
## 6 1:1046861:G:A 0.11897656 0.8720967 0.3835767

If we can identify \(\hat{\gamma}_m\), \(sd(\hat{\gamma}_m)\) (i.e., the beta coefficient and its standard error from logistic regressions: logit(E(AF)) = MI), and \(\hat{\theta}\), \(sd(\hat{\theta})\) (i.e., the beta coefficient and its standard error from logistic regressions: logit(E(AF)) = HF) in the literature, we can specify the values for ‘\(\texttt{gamma_hat}\)’, ‘\(\texttt{sd_gamma_hat}\)’, ‘\(\texttt{theta_hat}\)’, and ‘\(\texttt{sd_theta_hat}\)’, respectively in the function.