feat(websoc-scraper): Dynamic Resolution for Enrollment History#300
feat(websoc-scraper): Dynamic Resolution for Enrollment History#300aadi-shanker wants to merge 42 commits into
Conversation
…date createdAt to timestamp
…ollment management
…ng in websoc scraper
ParzivalPerhaps
left a comment
There was a problem hiding this comment.
Aadi owes me $10,000
lgtm
…ent history query
… clarify data availability
Done |
| .enum(websocSectionTypes, { error: (_issue) => "Invalid sectionType provided" }) | ||
| .optional(), | ||
| from: z.iso.datetime({ error: "Invalid from date provided" }).optional().openapi({ | ||
| description: "Start of the time range (ISO 8601 timestamp). Optional.", |
There was a problem hiding this comment.
No need to mark this as optional in text
| units: z.string(), | ||
| instructors: z.string().array(), | ||
| meetings: z.object({ bldg: z.string().array(), days: z.string(), time: z.string() }).array(), |
There was a problem hiding this comment.
I don't think there's a reason to serve these fields and anything besides identifying the section uniquely. Thoughts?
| .where(inArray(websocSectionEnrollment.sectionId, transformedSectionRows.keys().toArray())) | ||
| .orderBy(websocSectionEnrollment.createdAt); | ||
| .orderBy( | ||
| sql`DATE(${websocSectionEnrollment.createdAt})`, |
There was a problem hiding this comment.
There's no need to convert this to DATE before ordering.
| .orderBy( | ||
| sql`DATE(${websocSectionEnrollment.createdAt})`, | ||
| websocSectionEnrollment.sectionId, | ||
| desc(websocSectionEnrollment.createdAt), |
There was a problem hiding this comment.
I'm not comfortable with two ORDER BYs like this without showing it's not a major performance issue. And shouldn't the snapshots be in increasing order anyway?
| for (const [sectionId, section] of transformedSectionRows) { | ||
| granularMapping.set(sectionId, { | ||
| year: section.year, | ||
| quarter: section.quarter, | ||
| sectionCode: section.sectionCode, | ||
| department: section.department, | ||
| courseNumber: section.courseNumber, | ||
| sectionType: section.sectionType, | ||
| sectionNum: section.sectionNum, | ||
| units: section.units, | ||
| instructors: Array.from(section.instructors), | ||
| meetings: section.meetings.map(({ bldg, ...rest }) => ({ | ||
| bldg: Array.from(bldg), | ||
| ...rest, | ||
| })), | ||
| finalExam: section.finalExam, | ||
| snapshots: [], | ||
| }); |
There was a problem hiding this comment.
See above about excess fields. That would probably also make this unnecessary.
Why can't we ARRAY_AGG this?
Description
Implements variable-frequency enrollment snapshots based on academic calendar periods. The scraper now captures enrollment data at different frequencies depending on enrollment activity:
ENROLLMENT period (Week 8-10): Every 3 hours, 7am-7pm only (~5 snapshots/day)
ADD_DROP period (Week 1-2): Every 6 hours 24/7, with hourly snapshots on Week 2 Friday 12pm-5pm (~4-8 snapshots/day)
REGULAR period (Week 3-7): Once per week (168 hours)
Between quarters: Once per week for late enrollment
Database-driven frequency tracking checks hours elapsed since the last snapshot and only inserts when thresholds are met. This prevents missed snapshots during scraper outages (self-correcting behavior) while optimizing storage usage.
Related Issue
#134
Motivation and Context
Missing intraday enrollment trends -Want to increase and visualize hourly changes during critical periods
How Has This Been Tested?
I ran a test file with multiple detection unit tests based on given fake dates. I also ran the scraper using fake dates just to double check the processing logic is correct.
Screenshots (if appropriate):
Types of changes
Checklist: