Skip to content

Gemini Changes#249

Open
moonraker595 wants to merge 56 commits into
mainfrom
Adding-strings-to-model
Open

Gemini Changes#249
moonraker595 wants to merge 56 commits into
mainfrom
Adding-strings-to-model

Conversation

@moonraker595

@moonraker595 moonraker595 commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

This PR:

  1. Adds string as a new data type:
    Introduces string as a supported data type.

  2. Allows string for shotnum field:
    Updates the shotnum field to accept both int and string values.
    Maintains backward compatibility with existing numeric shot numbers.

  3. Add an optional type field to shotnum:
    Introduces an optional type attribute to the model for shotnum datatype.
    If type is not provided, the system infers the type based on the value (str or int).

  4. Support sub-second timestamp ingestion:
    Extends ingestion logic to handle timestamps with sub-second precision.

  5. Enhancements to range converter:
    Updates range conversion logic to correctly handle mixed shotnum types (int and string).

  6. Clean up:
    Adds a new lock file for dependency updates (which is the bulk of the reported line changes)

@moonraker595 moonraker595 changed the title Adding strings to model Adding strings and timestamps Apr 14, 2026
@codecov

codecov Bot commented Apr 14, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 70.28571% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.67%. Comparing base (6932aab) to head (cc43254).

Files with missing lines Patch % Lines
operationsgateway_api/src/auth/authentication.py 9.67% 28 Missing ⚠️
...nsgateway_api/src/records/ingestion/hdf_handler.py 81.25% 4 Missing and 5 partials ⚠️
operationsgateway_api/src/records/record.py 45.45% 3 Missing and 3 partials ⚠️
...perationsgateway_api/src/records/export_handler.py 66.66% 1 Missing and 2 partials ⚠️
operationsgateway_api/src/routes/auth.py 66.66% 2 Missing and 1 partial ⚠️
...gateway_api/src/records/ingestion/record_checks.py 0.00% 1 Missing and 1 partial ⚠️
operationsgateway_api/src/routes/export.py 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #249      +/-   ##
==========================================
- Coverage   96.58%   95.67%   -0.91%     
==========================================
  Files          88       88              
  Lines        4946     5069     +123     
  Branches      878      910      +32     
==========================================
+ Hits         4777     4850      +73     
- Misses        104      145      +41     
- Partials       65       74       +9     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@patrick-austin patrick-austin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor points - the only one I feel somewhat strongly about is using SESSION_DATETIME_FORMAT in the tests as well as the equivalent source code being tested for consistency/clarity.

In terms of tests, I think what we want is (let me know if I'm missing anything):

  • Reduce overall time taken on CI (by reducing the amount of data ingest needed)
  • Explicitly test ingest of all the Gemini features:
    • Sub-second timestampts/record_ids
    • String channels
    • Non-int shotnumbers
  • Implicitly test "facility agnostic" existing functionality by running entire suite on ingested "Gemini-style" data

I think we can achieve this with a combination of pytest.ini config, fixtures, and pytest.mark.skipif in case there's any tests that only make sense in an EPAC or Gemini specific context. I'll probably start this in earnest on Friday, and might need to change approach once I get started - we'll see.

The main thing I don't have a handle on is real(istic) Gemini data - do you have any existing HDF5 files we could use as the basis for this? Or a script to generate dummy data? I can manually create something which just has the right data to get coverage, but would prefer it to be as realistic as practicable.

Comment thread test/sessions/test_user_session.py Outdated
Comment thread test/sessions/test_user_session.py Outdated
Comment thread test/sessions/test_user_session.py Outdated
Comment thread test/sessions/test_user_session_list.py Outdated
Comment thread test/sessions/test_user_session_list.py Outdated
Comment thread operationsgateway_api/src/records/ingestion/hdf_handler.py Outdated
Comment thread operationsgateway_api/src/records/ingestion/hdf_handler.py Outdated
Comment thread operationsgateway_api/src/records/ingestion/channel_checks.py Outdated
@moonraker595

Copy link
Copy Markdown
Contributor Author

Thanks for this. I'll resubmit for review, but on the understanding that it won't be merged until the tests are in better shape.

I agree with the test plan. pytest.mark looks particularly powerful when we want to run the same test against Clara and Gemini but not EPAC, like @pytest.mark.facilities("clara", "gemini")

What I'm not sure about is the teardown. Should each facility have a separate ingest process before each set of tests run? Obviously, EPAC will because the data is in a different format, but do we need to do that for Clara & Gemini? (I appreciate there are no tests yet for Clara...just thinking ahead), maybe it could be configurable?

Test data wise, there's plenty of. I have an option to save hdf files to disk from the conversion script (the thing that copies the data to OG), or there's the HDF generator script which allows you to put anything you want in an HDF file, handy if you want invalid ones. Just let me know what you need.

@patrick-austin patrick-austin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

I'll resubmit for review, but on the understanding that it won't be merged until the tests are in better shape.

That sounds good to me, obviously if there's a functional need to merge this sooner (e.g. Louise needs it on main, or to avoid conflicts with subsequent development) then that's fine too but it would be convenient to keep this open while I work on the tests, if for nothing else then to make it easier to track the lines added that will need coverage.

What I'm not sure about is the teardown.

I have two ideas that might work. One would be to move the ingest to a fixture which runs before the tests session, then tears down once complete. The other would be to have multiple buckets for each facility, and ingest to those via whatever method. In both cases, I think we'd indicate the facility via the config settings and run pytest for each facility. i.e. pytest --config-file=epac-pytest.ini ... where epac-pytest.ini can then define some/all of the config settings via Pydantic settings sources. I think the latter would be more convenient, as you would not need to re-ingest locally, just replace the current single bucket with two new ones, ingest a smaller amount of data to each, and then run the two test suites as needed.

Things might work out differently of course, but I'm hopeful it could work that way. It would also then be a case of a third bucket for CLARA or other facility specific formatting.

Just let me know what you need.

Thanks, I'll let you know what I start on this in earnest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants