PEPATAC 0.14.0 by jpsmith5 · Pull Request #328 · databio/pepatac

jpsmith5 · 2026-05-28T13:08:29Z

Release 0.14.0. See docs/changelog.md for full Added/Changed/Fixed/Removed entries.

Bumps [pytest](https://github.com/pytest-dev/pytest) from 3.1.3 to 9.0.3. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@3.1.3...9.0.3) --- updated-dependencies: - dependency-name: pytest dependency-version: 9.0.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

Bump pytest from 3.1.3 to 9.0.3

- Drop version+build pins on ucsc-bedgraphtobigwig, ucsc-bedtobigbed, ucsc-bigwigmerge, ucsc-stringify in requirements-conda.yml — the pinned 377 builds required openssl 1.1.1 and conflicted with the env's openssl 3.x. Letting the solver pick a compatible build resolves the install. (#321) - Add r-argparser to requirements-conda.yml; tools/PEPATAC_summarizer.R needs it but it was not in the conda env. (#228) - Add r-r.utils to requirements-conda.yml and R.utils to PEPATACr Imports; required by data.table::fread for .bed.gz peak coverage files. (#229) - Pin GenomicDistributions (>= 1.4.6) and GenomicDistributionsData (>= 1.0.0) in PEPATACr Imports to avoid the chromSizes_hg38 / TSS_hg38 namespace mismatch when one side is upgraded without the other. (#230) - Fix curl-into-variable bug in checkinstall at three sites: the REQS / PIPELINE fallback branches stored file contents in a variable that was used as a path downstream. Switched to mktemp + curl -o for the URL fallback, assigned the path directly for the local branch. (#226) - Drop dead run-container.md link from docs/install.md and refresh intro now that containers were removed in 0.12.0. Closes #228, #229, #230, #321, #226.

…-dedup - _align() referenced an undefined in the single-end + bwa branch paired chain (minus filter_pair) for both --keep and no-keep paths: pm.run([cmd1, cmd2, cmd3, cmd4], <target>). (#299) - Fix missing pm.fail_pipeline in the unmap_fq1 branch of the filter_paired_fq.pl handle check; previously a stuck filter on R1 set an error string that was never raised. Reworked the error message into a shared template that points at the underlying psutil introspection issue and recommends both --keep and --noFIFO as workarounds. (#234) - Add --skip-dedup flag for protocols where duplicates are biologically meaningful (CUT&Tag, CUT&RUN). When set: copy mapping_genome_bam to _sort_dedup.bam so downstream peak calling finds the expected path; report Duplicate_reads=0 and pass through Dedup_aligned_reads/Dedup_alignment_rate/Dedup_total_efficiency from the pre-dedup metrics. Plumbed through sample_pipeline_interface.yaml so it can be set per-sample. (#249) - Drop redundant Time/Success keys from pepatac_output_schema.yaml (both samples: and project: blocks). These are pipestat's auto- tracked status fields and the duplicate declaration triggered "SchemaError: Overlap between project- and sample-level keys" on newer pipestat. (#322, #305) - Fix _LOGGER NameError in tools/bamQC.py and bamSitesToWig.py: the variable was only defined inside , so pararead workers re-importing the module under multiprocessing 'spawn' (macOS default) hit NameError when class methods logged. Added a module-level fallback logger above each class definition. (#266) - Fix peakCounts() ref-peaks ignored when *_peaks_coverage.bed.gz coexists with *_ref_peaks_coverage.bed: the shared variable preferred .bed.gz from the regular peaks file and then looked for a non-existent _ref_peaks_coverage.bed.gz, falling through to the "not derived from a singular reference peak set" warning. Detect ref vs regular extensions independently. (#218, #219) - Guard refgenie[sample.genome] lookups in sample_pipeline_interface.yaml with , so projects with non-refgenie genomes (e.g. galGal6, bosTau9) no longer crash the Jinja template with an attmap AttributeError; instead they fall through to the per-sample paths or error cleanly from pepatac.py. (#231) - Fix plotAnno() empty-input fallback path bug: was constructing file.path(<output_file>, "<sample>_partition_dist.pdf") (treating the output pdf as a directory) and quit()ing the R session. Replaced with return(ggplot()), matching the function's other empty-data branches; the caller writes a clean blank placeholder at the expected target. (#232) Closes #299, #234, #249, #322, #305, #266, #218, #219, #231, #232.

…ps3dp/tools/refgenie_config.yaml workaround - faq.md: expand the TSSE entry to name the refgene_anno asset / UCSC RefGene as the source of TSS coords and note that the cutoff-of-6 threshold is hg38-tuned and empirical. Point at ENCODE ATAC-seq data standards for per-assembly reference numbers. (#235) - assets.md: add a Using a custom adapter file subsection documenting the adapters resource override in pipelines/pepatac.yaml. (#252) - assets.md: document the /home/jps3dp/tools/refgenie_config.yaml-required-even-with-manual-paths quirk and the empty-refgenie-config workaround. The proper fix is in the in-progress refgenie 1.0 migration (PR #327). (#251) - count_table.md: make the per-sample PEPATAC_completed.flag handling explicit in the consensus-peak-set count table workflow. Two paths: delete the flag files (one-liner with find -delete) or pass --ignore-flags to looper run. (#215) - assets.md: troubleshooting subsection for TypeError: 'NoneType' object is not iterable — root-caused to incomplete refgenie assets (commonly missing prealignment FASTA), with diagnostic and fix commands. The error itself is upstream refgenconf behavior; replaced by the refgenie 1.0 migration (PR #327). (#216) - glossary.md: document column formats for _peaks_coverage.bed (8 columns) and _ref_peaks_coverage.bed (15 columns; narrowPeak coordinates + bedtools coverage stats + normalized count). (#233) - assets.md: Running a non-refgenie genome through looper subsection — sample_modifiers/imply pattern with chrom_sizes, genome_index, etc. set per-sample. (#231 docs portion) Closes #235, #252, #251, #215, #216, #233.

Two distinct breakages in the integration-test runner that both surface immediately when running ./tests/scripts/test-integration.sh against a fresh bulker install: 1. Default crate name didnt match what bulker actually caches. The scripts defaulted PEPATAC_TEST_BULKER_CRATE to local/bulker_manifest, but bulker caches tests/bulker_manifest.yaml as bulker/pepatac:1.1.1 -- it auto-namespaces the manifests name: pepatac field under bulker/. bulker crate list | grep -q local/bulker_manifest therefore always failed, even immediately after a successful bulker crate install tests/bulker_manifest.yaml, leaving the runner wedged in a crate not cached loop. services.shs usage banner and tests/README.mds env-var table both advertised yet a *third* name (databio/pepatac), used nowhere in the code -- doc drift. Realigned all three to bulker/pepatac to match what bulker caches. 2. PATH extraction via bulker activate --echo was format-fragile and broke on bulker 0.0.15 (the Rust rewrite). test-integration.sh and services.sh both invoked bulker activate --echo | grep ^export PATH= | sed ... | cut ... to fish the crates shim directory out of bulkers activation output. On bulker 0.0.15, --echo errored (argument --echo cannot be used multiple times -- likely a clap quirk in this version), and even when it worked the parse was brittle to quoting/formatting changes between bulker releases. Switched to bulker exec <crate> -- <cmd>, which lets bulker manage PATH for the duration of one command. The pytest invocation in test-integration.sh and the per-tool which check in services.sh now both run inside bulker exec, with no PATH scraping at all. Files: - tests/scripts/test-integration.sh: default BULKER_CRATE to bulker/pepatac; drop the activate-and-extract-PATH block; run pytest via bulker exec. - tests/scripts/services.sh: default BULKER_CRATE to bulker/pepatac; fix usage banner; rewrite check_tools to probe each tool via bulker exec ... -- which <tool>. - tests/README.md: update env-var table to bulker/pepatac. - tests/integration/{conftest,test_looper_run,test_end_to_end}.py: refresh docstring prereq lines. Regression from f20c354 (clean up integration tests).

Adds testthat coverage for the two PEPATACr bugs fixed in 8bda2e4 so they can't be silently reintroduced. - Hoist peakCounts()'s local detect_ext closure to a package-internal helper .detectPeakCoverageExt(suffix, results_subdir, sample_names, genomes). Same behavior, no functional change; just makes the ext-detection logic unit-testable without reconstructing the full peakCounts() pipeline (which would need valid peak data, chrom sizes, and full sample_table setup). - tests/testthat/test-peakCounts-ext.R: seven scenarios exercising the helper directly, including the exact mixed-state bug from #218/#219 (*_peaks_coverage.bed.gz from the initial sample run alongside *_ref_peaks_coverage.bed from the --frip-ref-peaks re-run), the inverse mixed state, multi-genome sample tables, and the no-match / warning fall-through path. - tests/testthat/test-plotAnno.R: three scenarios for the empty-input fallback from #232 — missing input file, empty (size-0) input file, and the full caller pattern (pdf() / print(plotAnno(...)) / dev.off()) producing a non-empty placeholder pdf at the expected target path. Asserts the old bug's spurious file.path(output_pdf, ...) touch target is NOT created, so the fix can't silently regress.

Two distinct breakages in the integration-test runner that both surface immediately when running ./tests/scripts/test-integration.sh against a fresh bulker install: 1. Default crate name didn't match what bulker actually caches. PEPATAC_TEST_BULKER_CRATE defaulted to 'local/bulker_manifest', but bulker auto-namespaces the manifest's field and caches as 'bulker/pepatac:1.1.1'. The check therefore always failed even right after a successful . 2. PATH extraction via was format-fragile and broke on bulker 0.0.15. Switched to , which lets bulker manage PATH for the duration of one command, with no output parsing at all. Realigned both scripts to default to 'bulker/pepatac'; rewrote test-integration.sh's pytest invocation and services.sh's check_tools to use . Refreshed the README env-var table, services.sh usage banner, and the docstring prereq lines in conftest.py / test_looper_run.py / test_end_to_end.py for consistency. Regression from f20c354 (clean up integration tests).

Both test-integration.sh's install if not cached gate and services.sh's check_crate_cached probed for crate availability via: bulker crate list 2>/dev/null | grep -q But prints crate name and tag in *separate* whitespace-delimited columns, so a literal bulker/pepatac:1.1.1 grep never matches even when the crate is freshly cached. test-integration.sh silently fell through to install from local manifest every run (harmless, just slow). services.sh hard-errored with Bulker crate ... is not cached immediately after a successful install printed Cached: bulker/pepatac:1.1.1 — the user-visible wedge. Replaced both grep checks with , which directly tests the operation we actually care about. The hint text in services.sh's error path now points at (the manifest file path that bulker can actually load) rather than echoing back the cache key, which bulker would 404 on against hub.bulker.io. Surfaced after 28edeec (Fix integration-test bulker crate default to include tag) tightened the default to the full identifier -- that change exposed the previously-masked format mismatch in the list-grep check.

…lved one After bulker-side fixes landed (4c56b39, 28edeec, 9123d8c), the integration runner finally reached pytest -- only to die immediately with No module named pytest. The python3 inside bulker exec resolved to the host's miniforge base install rather than the caller's active conda env (pepatac-env in this case), which is the one that actually has pytest + pypiper + the rest of pepatac's requirements installed. Cause: python3 is declared as a host_command in tests/bulker_manifest.yaml, so bulker forwards it unmodified to the host. But bulker's PATH ordering inside bulker exec walks the system PATH in whatever order and picks the first python3 it finds -- which on a typical HPC node is miniforge base, not the user's currently activated conda env. Fix: capture /home/jps3dp/anaconda3/bin/python3 BEFORE entering bulker exec, then pass that absolute path through to the exec'd command. Python locates its own site-packages from sys.prefix based on executable path, so the conda-env python keeps its installed packages regardless of how it's invoked. Also adds an early-exit ERROR if no python3 is on PATH at all (otherwise the failure mode is a confusing No module named pytest inside bulker exec several seconds later), and prints the resolved python path in the runner banner so it's clear which interpreter the tests are running under.

Commit 598aa3b captured command -v python3 before entering bulker exec, on the assumption that the callers active env would be first on PATH. That assumption holds on developer machines where conda activate is the last PATH-mutating step. On HPC nodes where a module load miniforge fires AFTER conda activate pepatac-env, the modules PATH prepend buries the conda envs bin/ behind miniforge base -- command -v python3 returns miniforge (no pytest installed) even though the prompt says (pepatac-env) and CONDA_PREFIX points at the env. Rather than picking a single env var or PATH lookup as authoritative, walk a small candidate list and pick the FIRST python that actually imports pytest: PYTHON_CANDIDATES=() [ -n /home/jps3dp/anaconda3 ] && PYTHON_CANDIDATES+=(/home/jps3dp/anaconda3/bin/python3) [ -n ] && PYTHON_CANDIDATES+=(/bin/python3) PYTHON_CANDIDATES+=(/home/jps3dp/anaconda3/bin/python3) PYTHON_CANDIDATES+=(/home/jps3dp/anaconda3/bin/python) for candidate in ; do [ -x ] || continue if -c import pytest >/dev/null 2>&1; then ACTIVE_PYTHON= break fi done If nothing in the list imports pytest, the script errors out with the exact list of candidates that were tried -- a clearer failure than the previous No module named pytest buried inside bulker exec output.

…r exec

…ulker

- tools/pepatac_summarizer/: Python package with CLI, consensus peak calling via gtars, peak counts, and summary plots - pipelines/pepatac_collator.py: --summarizer python|R dispatch, defaults to python - Remove obsolete PEPATACr R tests - tests/test_summarizer.py: unit tests - tests/test_summarizer_integration.py: integration tests

- Add plot_tss_distance using TssIndex.from_regionset.calc_tss_distances; wire into pepatac.py anno block (replaces R placeholder) - Fix plot_frif to sum read counts from bedtools coverage outputs - Reorder plot_partition_distribution to horizontal stacked bar with inline percent labels; add natural chrom sort + canonical chrom filter - Add fragment-distribution median; add chrom/tssdist/part/frif CLI subcommands - Align PartitionList.from_gtf defaults to R's GenomicDistributions: core_prom=100, prox_prom=2000 (was 2000/10000)

…iled extensions

…tional)

…ie_config.yaml (#251)

…olves during runp.

…bulker syntax; checkinstall fixes

…original hard-coded for redundancy and backwards compatibility

nsheff and others added 30 commits April 11, 2026 22:12

clean up integration tests

f20c354

Merge pull request #326 from databio/dependabot/pip/pytest-9.0.3

027b757

Bump pytest from 3.1.3 to 9.0.3

Fix duplicate pytest_addoption registration on user-targeted runs

b0b8d7a

Force --compute local in run_looper_pipeline fixture

1afdca2

Use to select compute package, not

ae7df6c

Schema: allow null thumbnail_path for FastQC report objects

cd43eb3

Set PYTHONNOUSERSITE=1 to stop host user-site from leaking into bulke…

889a506

…r exec

Propagate PYTHONNOUSERSITE through apptainer container boundary

f366b2a

Use raw strings in re.sub patterns to silence Python 3.12 SyntaxWarning

30cd8ea

Make R library load failures visible; stop host R libs leaking into b…

e713bbc

…ulker

Block host ~/.Renviron / ~/.Rprofile from being sourced by container R

a14a20e

Guard --lite cleanup against None paths from skipped QC steps

5cc7350

update Time/Success reporting based on updated pipestat functionality

4285312

update minimum refgenconf

a41d8ef

Test infra: use databio/pepatac:1.1.3 from bulker hub; clarify runtime

75798c3

drafts at changing qc backend

b49a5de

track and update changes

5d1cd67

Pin gtars>=0.6.1 in requirements.txt; document HPC test gotchas

55b76c4

jpsmith5 added 11 commits May 19, 2026 06:56

Bump rust-gtars conda pin from 0.2.4 to 0.8.0

ca81db9

bump version

1208065

Add SE+bwa, --skip-dedup, and --qc-backend gtars integration tests

ee2c774

Make importable when pepatac.py runs as a script

2cac7f8

Propagate /home/jps3dp/anaconda3/lib through LD_LIBRARY_PATH for comp…

5e2575c

…iled extensions

Keep LD_LIBRARY_PATH on the host only; don't propagate into container

504bbb7

test_end_to_end: drop TSS PDF assertion in gtars test (refgene_tss op…

1fa5571

…tional)

expand bulker install and run directions for clarity

16c18bb

test regression coverage for looper without /home/jps3dp/tools/refgen…

0e1be1a

…ie_config.yaml (#251)

dont inherit host envvar

13da61b

bump looper to 2.1.1, yacman to 1.0.0, align conda recipe

21be24d

jpsmith5 requested a review from nsheff May 28, 2026 13:08

jpsmith5 and others added 10 commits June 1, 2026 06:28

Anchor pytest rootdir at project root via pytest.ini

40618e4

bump bulker crate version

5fe87fd

PEPATACr: migrate deprecated ggplot2 size→linewidth; declare reshape2.

cc3d891

put tools/ on PYTHONPATH so the default Python pepatac_summarizer res…

639b8ac

…olves during runp.

use bedtools for counts until gtars releases from_bam in future API

58e0720

silence the pandas fillna/replace downcasting FutureWarning

76a9fd3

update R install paths (e.g. BiocManager GenomicDistributions); Rust-…

3578d8c

…bulker syntax; checkinstall fixes

pin versions

a08a933

move mitochondrial chromosome list to be user editable while keeping …

f9e6f0e

…original hard-coded for redundancy and backwards compatibility

gtars QC: skip TSS dist plot when no TSS asset

7d352ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PEPATAC 0.14.0#328

PEPATAC 0.14.0#328
jpsmith5 wants to merge 51 commits into
masterfrom
dev

jpsmith5 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jpsmith5 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants