Gemini Changes#249
Conversation
… ids) to short or long
…ding-strings-to-model
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #249 +/- ##
==========================================
- Coverage 96.58% 95.67% -0.91%
==========================================
Files 88 88
Lines 4946 5069 +123
Branches 878 910 +32
==========================================
+ Hits 4777 4850 +73
- Misses 104 145 +41
- Partials 65 74 +9 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
patrick-austin
left a comment
There was a problem hiding this comment.
A few minor points - the only one I feel somewhat strongly about is using SESSION_DATETIME_FORMAT in the tests as well as the equivalent source code being tested for consistency/clarity.
In terms of tests, I think what we want is (let me know if I'm missing anything):
- Reduce overall time taken on CI (by reducing the amount of data ingest needed)
- Explicitly test ingest of all the Gemini features:
- Sub-second timestampts/record_ids
- String channels
- Non-int shotnumbers
- Implicitly test "facility agnostic" existing functionality by running entire suite on ingested "Gemini-style" data
I think we can achieve this with a combination of pytest.ini config, fixtures, and pytest.mark.skipif in case there's any tests that only make sense in an EPAC or Gemini specific context. I'll probably start this in earnest on Friday, and might need to change approach once I get started - we'll see.
The main thing I don't have a handle on is real(istic) Gemini data - do you have any existing HDF5 files we could use as the basis for this? Or a script to generate dummy data? I can manually create something which just has the right data to get coverage, but would prefer it to be as realistic as practicable.
|
Thanks for this. I'll resubmit for review, but on the understanding that it won't be merged until the tests are in better shape. I agree with the test plan. What I'm not sure about is the teardown. Should each facility have a separate ingest process before each set of tests run? Obviously, EPAC will because the data is in a different format, but do we need to do that for Clara & Gemini? (I appreciate there are no tests yet for Clara...just thinking ahead), maybe it could be configurable? Test data wise, there's plenty of. I have an option to save hdf files to disk from the conversion script (the thing that copies the data to OG), or there's the HDF generator script which allows you to put anything you want in an HDF file, handy if you want invalid ones. Just let me know what you need. |
patrick-austin
left a comment
There was a problem hiding this comment.
Changes look good.
I'll resubmit for review, but on the understanding that it won't be merged until the tests are in better shape.
That sounds good to me, obviously if there's a functional need to merge this sooner (e.g. Louise needs it on main, or to avoid conflicts with subsequent development) then that's fine too but it would be convenient to keep this open while I work on the tests, if for nothing else then to make it easier to track the lines added that will need coverage.
What I'm not sure about is the teardown.
I have two ideas that might work. One would be to move the ingest to a fixture which runs before the tests session, then tears down once complete. The other would be to have multiple buckets for each facility, and ingest to those via whatever method. In both cases, I think we'd indicate the facility via the config settings and run pytest for each facility. i.e. pytest --config-file=epac-pytest.ini ... where epac-pytest.ini can then define some/all of the config settings via Pydantic settings sources. I think the latter would be more convenient, as you would not need to re-ingest locally, just replace the current single bucket with two new ones, ingest a smaller amount of data to each, and then run the two test suites as needed.
Things might work out differently of course, but I'm hopeful it could work that way. It would also then be a case of a third bucket for CLARA or other facility specific formatting.
Just let me know what you need.
Thanks, I'll let you know what I start on this in earnest.
…verhaul Overhaul tests to account for Gemini data format and features
This PR:
Adds string as a new data type:
Introduces string as a supported data type.
Allows string for shotnum field:
Updates the shotnum field to accept both int and string values.
Maintains backward compatibility with existing numeric shot numbers.
Add an optional type field to shotnum:
Introduces an optional type attribute to the model for shotnum datatype.
If type is not provided, the system infers the type based on the value (str or int).
Support sub-second timestamp ingestion:
Extends ingestion logic to handle timestamps with sub-second precision.
Enhancements to range converter:
Updates range conversion logic to correctly handle mixed shotnum types (int and string).
Clean up:
Adds a new lock file for dependency updates (which is the bulk of the reported line changes)