This document covers common issues and their solutions for both the training pipeline (Step 1) and validation/scoring workflow (Step 2).
Symptoms:
############################################################
ERROR: Running this pipeline on a head node is not allowed.
Please submit this as a job to the cluster or run it from
an interactive node.
############################################################
Cause: You are trying to run the pipeline on the Skyline/OpenOmics head node, which is not allowed.
Fix: Grab an interactive node first:
srun -N 1 -n 1 --time=8:00:00 --mem=64gb -c 4 --pty bash
export PATH="/data/openomics/prod/elasticQTL/v0.1.0/bin:${PATH}"
pipeline --config config/study.envSymptoms:
bash: pipeline: command not found
Cause: The bin/ directory is not in your PATH.
Fix for Skyline/OpenOmics install:
export PATH="/data/openomics/prod/elasticQTL/v0.1.0/bin:${PATH}"Fix for local clone:
cd /path/to/elasticQTL
export PATH="$(pwd)/bin:${PATH}"To make this permanent, add the export line to your ~/.bashrc or ~/.bash_profile.
Symptoms:
bash: ./pipeline: Permission denied
Cause: The entrypoint scripts are not executable.
Fix:
chmod +x bin/pipeline bin/validateSymptoms:
ERROR: Config file not found: config/study.env
Cause: The config file path is incorrect or the file doesn't exist.
Fix:
- Check that the file exists:
ls -l config/study.env - Use an absolute path:
pipeline --config /full/path/to/config/study.env - Copy from template:
cp config/params.template.env config/study.env
Symptoms:
- Elastic net step says "0 variants found"
- Annotation/merge steps fail
- Error: "No matching variants between files"
Cause: Inconsistent variant ID formats (e.g., .bim uses rsIDs but association results use chr:pos format).
Fix: Standardize variant IDs in PLINK2 before running the pipeline:
plink2 --bfile OLD_PREFIX \
--set-all-var-ids @:#:\$r:\$a \
--new-id-max-allele-len 50 \
--make-bed \
--out NEW_PREFIXThen run the pipeline using --bfile NEW_PREFIX.
Alternative format options:
@:#for chr:pos (no alleles)@:#:\$r:\$afor chr:pos:ref:alt (recommended)@:#[hg38]\$r,\$afor build-specific IDs
Symptoms:
- Error: "Too few samples remaining after missingness filters"
- Warning: "XX% of samples dropped due to missing genotypes"
- Very low sample size in modeling step
Cause: Default strict filters (--geno 0 --mind 0) combined with --missing-policy error require perfect data.
Fix (Option 1): Relax genotype QC filters:
pipeline --config config/study.env \
--geno 0.02 \
--mind 0.05Fix (Option 2): Allow sample dropping:
pipeline --config config/study.env \
--missing-policy drop_samplesFix (Option 3): Use mean imputation (if missingness is sparse):
pipeline --config config/study.env \
--missing-policy mean_imputeFix (Option 4): Let the pipeline decide automatically:
pipeline --config config/study.env \
--missing-policy autoSee PIPELINE.md#missing-genotype-handling for detailed policy explanations.
Symptoms:
- Multiple TEST rows per variant (ADD, DOMDEV, etc.)
- Unexpected number of variants in association results
Cause: PLINK2 outputs multiple test types by default.
Fix: The pipeline automatically filters to TEST == "ADD" when parsing GLM output. No action needed unless you want to change the test type.
Manual verification:
# Check what TEST types are present
cut -f12 01_glm_qtl/qtl_assoc.*.glm.linear | sort -uSymptoms:
- Coefficient signs seem reversed
- Positive weights for protective alleles (or vice versa)
Cause: The counted allele in the .raw export may differ from the A1 allele in PLINK2 association results.
Fix: The pipeline writes allele_map_from_raw.tsv in 06_en_nested/ to document which allele the dosage is counting. Use this to verify alignment.
Check alignment:
# View allele mapping
head -n 20 06_en_nested/allele_map_from_raw.tsvThe .raw suffix allele (e.g., rs123_A) indicates which allele the dosage counts.
Symptoms:
Error: cannot allocate vector of size X Gb
Cause: Not enough memory allocated for the job, especially with large variant sets.
Fix: Request more memory when grabbing an interactive node:
srun -N 1 -n 1 --time=8:00:00 --mem=128gb -c 8 --pty bashOr reduce the variant set size:
pipeline --config config/study.env \
--clump-r2 0.1 \
--p-thresholds 0.1,0.2Symptoms:
- Pipeline exits without error message
- Log shows step N completed but step N+1 never starts
Cause: Output file from previous step already exists and --force was not used.
Fix:
# Re-run with --force to overwrite existing outputs
pipeline --config config/study.env --forceOr manually remove the problematic output directory:
rm -rf /path/to/outdir/XX_stepname/
pipeline --config config/study.envSymptoms:
- Step 4 log shows "0 variants after clumping"
- Downstream steps fail with empty variant lists
Cause: Clumping parameters are too stringent, or input variants are not in LD.
Fix (Option 1): Relax clumping parameters:
pipeline --config config/study.env \
--clump-r2 0.5 \
--clump-kb 1000Fix (Option 2): Check that you have a reasonable number of input variants:
wc -l 01_glm_qtl/qtl_assoc.*.glm.linearFix (Option 3): Verify your candidate variant list is appropriate for QTL analysis.
Symptoms:
Error in library(glmnet) : there is no package called 'glmnet'
Cause: Required R package is not installed.
Fix: On Skyline/OpenOmics, packages should be pre-installed. If running elsewhere:
# In R console
install.packages(c("data.table", "glmnet"))Symptoms:
- Only N-1 cohorts are scored when manifest has N cohorts
- No error message, last cohort silently missing
Cause: Cohort manifest file is missing a final newline character.
Fix:
# Check if file ends with newline
tail -c 1 cohorts.tsv | od -An -tx1
# Add newline if needed (should show '0a')
echo "" >> cohorts.tsvPrevention: Always ensure manifest files end with a newline.
Symptoms:
PLINK ERROR: --bfile prefix '/path/to/file^M' not found
Cause: Windows-style line endings (CRLF) in manifest or config files.
Fix:
# Convert to Unix line endings
dos2unix cohorts.tsv config/validation.env
# Or use sed
sed -i 's/\r$//' cohorts.tsv
sed -i 's/\r$//' config/validation.envPrevention: Edit files on Linux/Mac or configure your editor to use LF line endings.
Symptoms:
N_Variants_Useddiffers across cohort score files- Some cohorts have many fewer variants than others
Cause: Different cohorts have different genotyping platforms or coverage. This is expected behavior.
Expected: Each cohort score file reports its own N_Variants_Used. This can differ when:
- A variant is absent from a cohort (not in that cohort's
.bim) - A variant is dropped during harmonization (e.g., ambiguous SNP with allele disagreement)
Fix (if you need identical variant sets): Use strict intersection mode:
validate --config config/validation.env \
--intersection-mode allInspect per-cohort QC:
# Review what happened to each variant
head -n 50 04_scores/wgs/wgs_qc_report.tsvSymptoms:
- High
N_Dropped_Ambiguousin score files - Warning: "XX ambiguous SNPs dropped"
Cause: Model includes many A/T or C/G SNPs, and validation cohorts code these alleles differently.
Fix (Option 1 - Conservative): Accept the reduction (recommended):
validate --config config/validation.env \
--ambiguous-policy dropFix (Option 2 - Permissive): Keep ambiguous SNPs (use with caution):
validate --config config/validation.env \
--ambiguous-policy keepFix (Option 3 - Strict): Error on ambiguous mismatches:
validate --config config/validation.env \
--ambiguous-policy errorLong-term fix: Realign validation cohorts to a reference genome and re-export with consistent strand coding.
Symptoms:
- Score files reference
Model_Weightcolumn - No
refit_model.rdsin03_refit/directory
Cause: Refit step was skipped or failed.
Fix (if intentional): This is expected when using --skip-refit.
Fix (if unintentional): Check logs to see why refit failed:
cat logs/step_4_*.logThen re-run without --skip-refit:
validate --config config/validation.env --forceSymptoms:
ERROR: Training .raw file not found: /path/to/file.raw
Cause: Path to training genotype matrix is incorrect, or training pipeline didn't complete successfully.
Fix: Verify the training pipeline completed Step 6:
ls -lh /path/to/training_outdir/05_genotypes/ld_variants_forEN.rawUpdate TRAIN_RAW in your validation config to the correct path.
Symptoms:
ERROR: Cohort bfile not found: /path/to/cohort_prefix
PLINK ERROR: Failed to open /path/to/cohort_prefix.bed
Cause: Bfile path in cohort manifest is incorrect or files don't exist.
Fix: Verify each bfile path in the manifest:
# Check each cohort
while read cohort bfile; do
echo "Checking $cohort: $bfile"
ls -lh ${bfile}.bed ${bfile}.bim ${bfile}.fam
done < cohorts.tsvUpdate paths in the manifest to point to existing files.
Symptoms:
N_Flippedis unexpectedly high or low- Uncertainty about whether harmonization is correct
Cause: Validation cohort uses different allele coding or strand than training data.
Fix: Review the QC report for each cohort:
head -n 50 04_scores/wgs/wgs_qc_report.tsvLook at these columns:
Strand_Flip_Detected— Was a strand flip detected?Dosage_Flipped— Was the dosage flipped for this variant?Training_Counted_AllelevsCohort_Counted_Allele— Which alleles are being counted?
See VALIDATION.md#understanding-allele-harmonization for detailed explanation.
Symptoms:
ERROR: Intersection set is empty (0 variants)
Cause: No variants from the trained model are present in all cohorts (when using --intersection-mode all).
Fix (Option 1): Use a less strict intersection mode:
validate --config config/validation.env \
--intersection-mode anyFix (Option 2): Use a single cohort as reference:
validate --config config/validation.env \
--intersection-mode cohort:wgsFix (Option 3): Check variant ID consistency:
# Compare variant IDs between training and validation
head -n 20 00_model/model_variants_chrpos.tsv
head -n 20 /path/to/validation_cohort.bimMay need to standardize IDs (see Issue 5).
Symptoms:
- Loading PLINK2 module unloads PLINK 1.9 (or vice versa)
- "Command not found" for one PLINK version during pipeline run
Cause: HPC module system has conflicting PLINK modules.
Fix: Enable module management in the pipeline:
In your config file:
USE_MODULES=1
MODULE_PLINK1=plink/1.9
MODULE_PLINK2=plink/2.0
MODULE_R=R/4.2
MODULE_INIT=AUTOThe pipeline will automatically module load the correct version before each step.
Symptoms:
ERROR: Cannot find module initialization script
Cause: MODULE_INIT=AUTO detection failed.
Fix: Manually specify the module init script:
MODULE_INIT=/usr/share/modules/init/bashCommon paths:
/usr/share/modules/init/bash/etc/profile.d/modules.sh/usr/share/Modules/init/bash
Symptoms:
- "Singularity not found"
- Container pull failures
Cause: Pipeline may be configured for containerized execution (not typical for Skyline/OpenOmics install).
Fix: Ensure PLINK and R are available in your PATH or via modules. The Skyline/OpenOmics install should have these pre-configured.
Always test your command with --dry-run first:
pipeline --config config/study.env --dry-run
validate --config config/validation.env --dry-runThis shows what commands will be executed without actually running them.
Each step writes a log file to logs/:
# View most recent log
ls -lt logs/ | head
# Check for errors
grep -i error logs/*.log
grep -i warning logs/*.logTest data preparation without running expensive modeling:
# Training: run through genotype export only
pipeline --config config/study.env --to-step 6
# Validation: run matching only
validate --config config/validation.env --to-step 2The training pipeline writes all parameters to manifest/params_used.txt:
cat manifest/params_used.txtThis is helpful for reproducing runs or debugging parameter issues.
Common format issues:
# Check for CRLF line endings
file cohorts.tsv
# Should say "ASCII text", NOT "ASCII text, with CRLF line terminators"
# Check phenotype file has correct columns
head -n 2 phenotypes.tsv
# Check PLINK files are readable
plink2 --bfile PREFIX --freqIf you encounter an issue not covered here:
- Check the logs in
logs/directory - Review the parameters in
manifest/params_used.txt - Try dry-run mode to see what commands will execute
- Simplify — test with a smaller dataset or fewer steps
When reporting a bug, please include:
- The config file used (redact sensitive paths if needed)
- Relevant log files from
logs/ manifest/params_used.txt(for training pipeline)- Description of what you expected vs. what happened
PIPELINE.md— Training workflow documentationVALIDATION.md— Validation/scoring workflow documentationINPUT_FORMATS.md— Input file format specifications