feat: sort course name autocomplete results by totalStudents#620
feat: sort course name autocomplete results by totalStudents#620SinhSinhAn wants to merge 7 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Question: should we include totalStudents as a weighted parameter in the algo as well? Searching for "Machine Learning" doesn't show CS 4375 because it misses the cutoff ("Introduction to Machine Learning" has a bigger edit distance than, say "Statistical Machine Learning") |
|
I think the issue was to use totalStudents as a weighted parameter. Since the results of the course name search are already sorted by the algorithms determined similarity, we don't need to override that and sort by totalStudents but instead factor that into the existing sorting |
Add totalStudents to each entry in course_name_table.json by summing total_students across all sections and academic sessions. Use this as a secondary sort in courseNameAutocomplete so more popular courses (e.g. CS 4375) surface above obscure ones when search terms are equal. Closes #517
9ff8775 to
3cc7264
Compare
Use log10(totalStudents) as a weighted popularity bonus in the distance calculation so popular courses survive the std dev cutoff. This helps courses like CS 4375 rank above CS 6375 for "machine learning" even though "Introduction to Machine Learning" has a larger edit distance than "Statistical Machine Learning".
|
Once this is merged, would whoever merges it add a quick issue to Notebook to replicate this functionality there? Just link to this PR and say the issue is waiting on UTDNebula/utd-notebook#173 if that PR is not already merged. |
Summary
When a user searches by course name (e.g. "machine learning" instead of "CS 4375"), the autocomplete results were ranked purely by text relevance. This meant that obscure or graduate-level courses could appear above popular undergrad ones even when they were equally relevant matches for example, CS 6375 showing up before CS 4375.
This PR fixes that by using enrollment data as a tiebreaker. Here is what changed:
src/scripts/generateCourseNameTable.tsThe script that buildscourse_name_table.jsonnow computes atotalStudentsvalue for each course entry by summingtotal_studentsacross every section across every academic session in the aggregated data. This follows the same pattern already used ingenerateAutocompleteGraph.tsfrom PR #572.src/app/api/courseNameAutocomplete/route.tsAfter the existing relevance filter (distance + standard deviation cutoff), results are now sorted bytotalStudentsdescending before being returned. Courses that more students have taken will rank above equally-relevant but less popular ones.Note:
course_name_table.jsonneeds to be regenerated vianpm run buildcoursenamesfor thetotalStudentsvalues to appear in the output.Closes #517