Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 86 additions & 29 deletions code/pecotmr_integration/twas_ctwas.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,20 @@
"4. Perofrm TWAS test for multiple sets of weights. \n",
"5. For each gene, filter TWAS results by keeping the best model selected by CV. Drop the genes that don't show good evidence of TWAS prediction weights.\n",
"\n",
"### Step 2: Variant Selection for Imputable Genes via the Best Prediction Methods\n",
"### Step 2: Model Selection for Imputable Genes via the Best Prediction Methods\n",
"1. Determine if the gene is imputable at each context based on the twas_cv performance by adjusted $r^2$ (>=0.01) and p-values (<0.05).\n",
"2. The imputable gene-context pair will go through variant selection step. Maximum 10 variants with top pip selected from either `top_loci` table or SuSiE CS set. \n",
"2. The imputable gene-context pair will go through variant selection step. \n",
"3. Harmonize weights against LD reference and udpate SuSiE weight. \n",
"4. Extract weights by best model for the context then by the variant names were selected from the previous step\n",
"5. Output harmonized essential input data for causal TWAS(cTWAS) fine-mapping if `save_ctwas_data` was set to TRUE\n",
"\n",
"### Step 3: cTWAS analysis\n",
"\n",
"**FIXME: add more documentation here**\n",
"1. R package installation is required to run cTWAS section.\n",
" - remotes::install_github(\"xinhe-lab/ctwas\",ref = \"multigroup\")\n",
"3. Assemble on selected best performing TWAS prediction model, TWAS gene Z scores, GWAS SNP z scores, and load LD reference region information for by region cTWAS input. \n",
" - A variant selection is performed on variants based on `top_loci` table, SuSiE CS set, or twas weights effect size if any of `twas_weight_cutoff` (default=0), `cs_min_cor` (default=0), `min_pip_cutoff` (default=0), or `max_num_variants` (default=Inf) is specified other than their default value. \n",
"4. Estimate global group prior using from all regions\n",
"5. Screen for regions with evidence of causal indication and perform cTWAS/M-cTWAS fine-mapping regions passed screening. \n",
"\n",
"### Step 4: MR for candidate genes\n",
"\n",
Expand Down Expand Up @@ -148,33 +153,18 @@
"where\n",
"- `TSS`: Transcription start site\n",
"- `start` and `end`: start and end position of the gene from the [extended TADB window](https://github.com/cumc/xqtl-analysis/blob/main/resource/TADB_enhanced_cis.coding.bed) for cis-finemapping\n",
"- `is_imputable`: status for wether this gene-context pair has cross-validated performance with r-square > 0.01 and pvalue < 0.05 in at least one method for expression level prediction.\n",
"- `is_imputable`: status for wether this gene-context pair has cross-validated performance with r-square > 0.01 and pvalue < 0.05 in at least one method for expression level prediction.\n",
"- `rsq_cv`: cross validation test of r-square for a method.\n",
"- `pval_cv`: cross-validation test of predictive performance p-value for a method.\n",
"- `is_selected_method`: status of being best performing method (for each gene–context pair), we pick the best model with highest cross validation r-square with cross validation pvalue < 0.05\n",
"- `is_selected_method`: status of whether being best performing method (for each gene–context pair), we pick the best model with highest cross validation r-square with cross validation pvalue < 0.05\n",
"- `block`: The LD region where the gene’s transcription start site (TSS) is located, based on [predefined LD blocks](https://github.com/cumc/xqtl-data/blob/main/descriptor/reference_data/ld_reference.md). \n",
"\n",
"If `twas_z` is `NA` it means the context is not imputable for the method of choice\n",
"\n",
"II. a list of `refined_twas_weights_data` per block, in RDS format, of this structure:\n",
"\n",
"```\n",
"$ region_id\n",
" $ gene \n",
" $ context\n",
" $ selected_model\n",
" $ is_imputable \n",
" $ selected_top_variants\n",
" $ selected_model_weights\n",
"\n",
"```\n",
"The twas result table will only contain imputatable genes. It should come with a meta-data file like this:\n",
"\n",
"```\n",
"chrom start end block_id refined_twas_db\n",
"1 1000 5000 block1 block1.rds\n",
"2 2000 6000 block2 block2.rds\n",
"3 3000 7000 block3 block3.rds\n",
"chr\tmolecular_id\tTSS\t start\t end\t context\t gwas_study\t method is_imputable is_selected_method rsq_cv pval_cv twas_z twas_pval type block\t\n",
"1 ENSG00000060718 103108871 101000000 104000000 AC_DeJager_eQTL Bellenguez_EADB_2022 bayes_r TRUE TRUE\t 0.25 3.39e-39\t0.39 0.69 eQTL\t chr1_101384274_104443097\n",
"10 ENSG00000048740 10798396 9320000\t 12336675 Ast_DeJager_eQTL Bellenguez_EADI_2022 enet TRUE FALSE\t 0.16 2.72e-17 -0.21\t0.83 eQTL\t chr10_10500888_12817813\n",
"```"
]
},
Expand All @@ -196,7 +186,7 @@
"outputs": [],
"source": [
"sos run pipeline/twas_ctwas.ipynb twas \\\n",
" --cwd output/twas --name test \\\n",
" --cwd ./output --name test \\\n",
" --gwas_meta_data data/twas/gwas_meta_test.tsv \\\n",
" --ld_meta_data reference_data/ADSP_R4_EUR/ld_meta_file.tsv \\\n",
" --ld_reference_sample_size 17000 \\\n",
Expand Down Expand Up @@ -224,16 +214,83 @@
},
"outputs": [],
"source": [
"# Step1: region data assembly\n",
"sos run pipeline/twas_ctwas.ipynb ctwas \\\n",
" --cwd output/twas --name test \\\n",
" --cwd ./output --name test --name_suffix TEST2 \\\n",
" --thin 1 --prior_var_structure shared_all \\\n",
" --gwas_meta_data data/twas/gwas_meta_test.tsv \\\n",
" --ld_meta_data data/ld_meta_file_with_bim.tsv \\\n",
" --ld_reference_sample_size 17000 \\\n",
" --xqtl_meta_data data/twas/mwe_twas_pipeline_test_small.tsv \\\n",
" --twas_weight_cutoff 0 \\\n",
" --chrom 11 \\\n",
" --regions data/twas/EUR_LD_blocks.bed \\\n",
" --region-name chr10_80126158_82231647 chr11_84267999_86714492"
" --chrom 11 --gwas_study Bellenguez_2022 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"kernel": "SoS"
},
"outputs": [],
"source": [
"# Step2: estimate global parameters for either specific context group (single group: `--no-multi_group`) or shared group (multi-group: `--multi_group`)\n",
"sos run pipeline/twas_ctwas.ipynb ctwas \\\n",
" --run_param_est --skip_assembly --thin 1 \\\n",
" --prior_var_structure shared_all \\\n",
" --cwd ./output --name test --name_suffix TEST2 \\\n",
" --gwas_meta_data data/twas/gwas_meta_test.tsv \\\n",
" --ld_meta_data data/ld_meta_file_with_bim.tsv \\\n",
" --regions data/twas/EUR_LD_blocks.bed \\\n",
" --xqtl_meta_data data/twas/mwe_twas_pipeline_test_small.tsv \\\n",
" --gwas_study Bellenguez_2022"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"kernel": "SoS"
},
"outputs": [],
"source": [
"# Step3: cTWAS causal fine-mapping for a specific region \n",
"sos run pipeline/twas_ctwas.ipynb ctwas \\\n",
" --run_finemapping --skip_assembly \\\n",
" --prior_var_structure shared_all \\\n",
" --cwd ./output \\\n",
" --name test --name_suffix TEST \\\n",
" --gwas_meta_data data/twas/gwas_meta_test.tsv \\\n",
" --ld_meta_data data/ld_meta_file_with_bim.tsv \\\n",
" --regions data/twas/EUR_LD_blocks.bed \\\n",
" --xqtl_meta_data data/twas/mwe_twas_pipeline_test_small.tsv \\\n",
" --gwas_study Bellenguez_2022 \\\n",
" --region-name chr15_63051119_66680537"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "SoS"
},
"source": [
"#### cTWAS output\n",
"```\n",
"id\t molecular_id\ttype\tcontext\t group\t region_id\t z\t susie_pip\tmu\tcs\n",
"ENSG00000148429|eQTL_Inh_DeJager_eQTL\tENSG00000148429\teQTL\tInh_DeJager_eQTL\tInh_DeJager_eQTL|eQTL\t10_10500888_12817813\t-6.20856901762341\t0.0712177905257367\t-5.91566120772632\tL1\n",
"ENSG00000065665|eQTL_Ast_DeJager_eQTL\tENSG00000065665\teQTL\tAst_DeJager_eQTL\tAst_DeJager_eQTL|eQTL\t10_10500888_12817813\t-0.0382795446825138\t1.09919484625465e-09\t-0.0364735926885864\n",
"10:10504379:T:C \t10:10504379:T:C\t SNP\tSNP\t SNP\t 10_10500888_12817813\t1.03614457831325\t4.16455758767142e-12\t0.987261359280169\n",
"ENSG00000134463|pQTL_DLPFC_Bennett_pQTL\tENSG00000134463\tpQTL\tDLPFC_Bennett_pQTL\tDLPFC_Bennett_pQTL|pQTL\t10_10500888_12817813\t0.40824562102498\t1.76596504086746e-09\t0.388985413010046\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "SoS"
},
"source": [
"### iiv. Quantile TWAS"
]
},
{
Expand Down
Loading