-
Notifications
You must be signed in to change notification settings - Fork 1
Improve NLP Tool Output #8
Copy link
Copy link
Open
Labels
eventuallyThis is something we'll do eventually, but we don't know when, and it isn't on a critical path.This is something we'll do eventually, but we don't know when, and it isn't on a critical path.
Metadata
Metadata
Assignees
Labels
eventuallyThis is something we'll do eventually, but we don't know when, and it isn't on a critical path.This is something we'll do eventually, but we don't know when, and it isn't on a critical path.
Type
Fields
Give feedbackNo fields configured for issues without a type.
In #4 and FreeAndFair/TuskMobileVoting#60, we discussed the fact that the current output of the NLP tool is pretty rough; the raw output includes things like pieces of LaTeX equations, footnote markers, etc. I addressed this manually in #4 by running the combined histograms through an LLM with some manual cleanup stages ("eliminate everything that starts with a symbol", "eliminate everything that doesn't have at least one word in it", etc.), and also, for the verb phrases, had it coalesce phrases with the same primary verb. We should, for the future, consider some extensions to the NLP tool to: