Skip to content

add fractional site occupancy for pdb files#109

Open
chselz wants to merge 3 commits into
grimme-lab:mainfrom
chselz:pdb_frac_occ
Open

add fractional site occupancy for pdb files#109
chselz wants to merge 3 commits into
grimme-lab:mainfrom
chselz:pdb_frac_occ

Conversation

@chselz

@chselz chselz commented Apr 15, 2026

Copy link
Copy Markdown

in the documentation it is said that fractional site occupancies are not supported. Further it was a feature requested in
grimme-lab/xtb#194 and as we plan to use mctc-lib upstream fully i think its a good idea to implement it here directly

For me it compiled sucessfully with gfortran 13.3.0 and with fpm 10.1 as a build system

Signed-off-by: Christian Selzer <s6chselz@uni-bonn.de>
@codecov

codecov Bot commented Apr 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 11.96%. Comparing base (8cd0cb4) to head (9f2f66a).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #109   +/-   ##
=======================================
  Coverage   11.96%   11.96%           
=======================================
  Files           5        5           
  Lines         209      209           
  Branches       95       95           
=======================================
  Hits           25       25           
  Misses        174      174           
  Partials       10       10           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@awvwgk awvwgk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test case where occupancies tie? Like two times 0.5 or similar.

Signed-off-by: Christian Selzer <s6chselz@uni-bonn.de>
@chselz

chselz commented Apr 15, 2026

Copy link
Copy Markdown
Author

i now added two test with tied occupations. when there is a tie the first encountered one should be retained.

@awvwgk awvwgk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when writing a PDB with partial occupancy? Can we have a test case for this.

Also I would like to see some negative tests, e.g. what happens when we have partial occupancy and no alternative location, should that be an error?

Please check the coverage report to see whether all if bodies are actually tested.

Signed-off-by: Christian Selzer <s6chselz@uni-bonn.de>
@awvwgk

awvwgk commented Apr 19, 2026

Copy link
Copy Markdown
Member

Sorry for stretching this longer and introducing more and more feature creep in this patch.

There is a lot of edge cases with occupancies and I don't see those handled or tested properly yet.

An example would be the following system:

ATOM      1  N   SER A   1       0.000   0.000   0.000  1.00 10.00           N
ATOM      2  CA ASER A   1       1.000   0.000   0.000  0.50 10.00           C
ATOM      3  CA BSER A   1       2.000   0.000   0.000  0.50 10.00           C
ATOM      4  C   SER A   1       3.000   0.000   0.000  1.00 10.00           C
ATOM      5  O  BSER A   1       4.000   0.000   0.000  0.50 10.00           O
ATOM      6  O  ASER A   1       5.000   0.000   0.000  0.50 10.00           O

I believe the current implementation would select atom 2 and atom 5 over atom 3 and atom 6, due to the tie in occupancies. But given the location identifier the only correct choice for an implementation is to select 2 and 6 or 3 and 5, but not to mix them.

Another question would be of what to do with a case like this where the location identifier is missing

ATOM      1  N   SER A   1       0.000   0.000   0.000  1.00 10.00           N
ATOM      2  CA  SER A   1       1.000   0.000   0.000  0.40 10.00           C
ATOM      3  CA  SER A   1       2.000   0.000   0.000  0.60 10.00           C
ATOM      4  C   SER A   1       3.000   0.000   0.000  1.00 10.00           C
ATOM      5  O   SER A   1       4.000   0.000   0.000  1.00 10.00           O

Or only a single atom provides a location identifier

ATOM      1  N   SER A   1       0.000   0.000   0.000  1.00 10.00           N
ATOM      2  CA  SER A   1       1.000   0.000   0.000  0.40 10.00           C
ATOM      3  CA BSER A   1       2.000   0.000   0.000  0.60 10.00           C
ATOM      4  C   SER A   1       3.000   0.000   0.000  1.00 10.00           C
ATOM      5  O   SER A   1       4.000   0.000   0.000  1.00 10.00           O

If those cases are incorrect in the PDB format, they must be rejected by the PDB parser here.

@awvwgk

awvwgk commented Apr 19, 2026

Copy link
Copy Markdown
Member

Also, given that all test cases for this feature are constructed and not real world examples, I am very hesitant to accept this feature without being tested on actual PDB files showing how this feature will be used in reality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants