Skip to content

feat(websoc-scraper): Dynamic Resolution for Enrollment History#300

Open
aadi-shanker wants to merge 42 commits into
mainfrom
enrollmentdatafix
Open

feat(websoc-scraper): Dynamic Resolution for Enrollment History#300
aadi-shanker wants to merge 42 commits into
mainfrom
enrollmentdatafix

Conversation

@aadi-shanker
Copy link
Copy Markdown
Contributor

@aadi-shanker aadi-shanker commented Feb 5, 2026

Description

Implements variable-frequency enrollment snapshots based on academic calendar periods. The scraper now captures enrollment data at different frequencies depending on enrollment activity:

ENROLLMENT period (Week 8-10): Every 3 hours, 7am-7pm only (~5 snapshots/day)
ADD_DROP period (Week 1-2): Every 6 hours 24/7, with hourly snapshots on Week 2 Friday 12pm-5pm (~4-8 snapshots/day)
REGULAR period (Week 3-7): Once per week (168 hours)
Between quarters: Once per week for late enrollment

Database-driven frequency tracking checks hours elapsed since the last snapshot and only inserts when thresholds are met. This prevents missed snapshots during scraper outages (self-correcting behavior) while optimizing storage usage.

Related Issue

#134

Motivation and Context

Missing intraday enrollment trends -Want to increase and visualize hourly changes during critical periods

How Has This Been Tested?

I ran a test file with multiple detection unit tests based on given fake dates. I also ran the scraper using fake dates just to double check the processing logic is correct.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code involves a change to the database schema.
  • My code requires a change to the documentation.

@aadi-shanker aadi-shanker marked this pull request as ready for review February 13, 2026 01:53
laggycomputer

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@ParzivalPerhaps ParzivalPerhaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aadi owes me $10,000

lgtm

Comment thread packages/stdlib/src/calendar-utils.ts Outdated
Comment thread packages/stdlib/src/index.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/data-pipeline/websoc-scraper/src/lib.ts Outdated
Comment thread apps/api/src/services/enrollment-history.ts
Comment thread apps/api/src/services/enrollment-history.ts Outdated
Comment thread apps/api/src/services/enrollment-history.ts
Comment thread apps/api/src/rest/routes/enrollment-history.ts Outdated
Comment thread apps/api/src/schema/enrollment-history.ts Outdated
Comment thread apps/api/src/schema/enrollment-history.ts Outdated
Comment thread apps/api/src/rest/routes/enrollment-history.ts Outdated
Copy link
Copy Markdown
Member

@laggycomputer laggycomputer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-migrate

@aadi-shanker
Copy link
Copy Markdown
Contributor Author

re-migrate

Done

.enum(websocSectionTypes, { error: (_issue) => "Invalid sectionType provided" })
.optional(),
from: z.iso.datetime({ error: "Invalid from date provided" }).optional().openapi({
description: "Start of the time range (ISO 8601 timestamp). Optional.",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to mark this as optional in text

Comment on lines +127 to +129
units: z.string(),
instructors: z.string().array(),
meetings: z.object({ bldg: z.string().array(), days: z.string(), time: z.string() }).array(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a reason to serve these fields and anything besides identifying the section uniquely. Thoughts?

.where(inArray(websocSectionEnrollment.sectionId, transformedSectionRows.keys().toArray()))
.orderBy(websocSectionEnrollment.createdAt);
.orderBy(
sql`DATE(${websocSectionEnrollment.createdAt})`,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to convert this to DATE before ordering.

.orderBy(
sql`DATE(${websocSectionEnrollment.createdAt})`,
websocSectionEnrollment.sectionId,
desc(websocSectionEnrollment.createdAt),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not comfortable with two ORDER BYs like this without showing it's not a major performance issue. And shouldn't the snapshots be in increasing order anyway?

Comment on lines +210 to +227
for (const [sectionId, section] of transformedSectionRows) {
granularMapping.set(sectionId, {
year: section.year,
quarter: section.quarter,
sectionCode: section.sectionCode,
department: section.department,
courseNumber: section.courseNumber,
sectionType: section.sectionType,
sectionNum: section.sectionNum,
units: section.units,
instructors: Array.from(section.instructors),
meetings: section.meetings.map(({ bldg, ...rest }) => ({
bldg: Array.from(bldg),
...rest,
})),
finalExam: section.finalExam,
snapshots: [],
});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above about excess fields. That would probably also make this unnecessary.

Why can't we ARRAY_AGG this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Higher Resolution for Enrollment History

4 participants