Skip to content

Falcon debug#46

Open
XianghuWang-287 wants to merge 4 commits into
masterfrom
falcon_debug
Open

Falcon debug#46
XianghuWang-287 wants to merge 4 commits into
masterfrom
falcon_debug

Conversation

@XianghuWang-287

Copy link
Copy Markdown
Collaborator

No description provided.

XianghuWang-287 and others added 4 commits May 21, 2026 12:11
Falcon's CSV doesn't carry per-spectrum precursor intensity, so
convert_falcon_to_mscluster_format.py was hardcoding #PrecIntensity to 0
for every row. That zeroed out sum(precursor intensity) in
clustersummary.tsv and turned every cell of
featuretable_reformatted_precursorintensity.csv into 0.0 when the
clustering tool was falcon. mscluster's binary writes #PrecIntensity
itself from the input mzML, so the precursor-intensity feature table
worked there.

Look up the value from the original input mzML/mzXML the same way
mscluster does, using ming_spectrum_library, and fall back to 0 only
when the file or scan isn't available.
ming_spectrum_library.load_mzml_file unconditionally does
float(activation["collision energy"]) and raises KeyError on any MS2
spectrum that lacks the "collision energy" CV term. That blew up parsing
of an entire mzML in the lookup helper (verified against
benchmark_featurefinding/MSV000080555/C7_RC7_01_8270.mzML) and silently
fell back to 0 intensity for the whole file -- reintroducing the bug
this change is supposed to fix.

Read the mzML / mzXML directly with pyteomics in the helper and pull
only the field we need (precursor "peak intensity" CV term for mzML,
@precursorIntensity for mzXML). Verified on three benchmark files
(C6_GC6_01_2916, C7_RC7_01_8270, WE_Bio_T1; 5357 MS2 spectra total) the
lookup matches an independent pyteomics read exactly, including nonzero
values such as 439017.71875.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant