Improve pattern matching #133
edoardottt
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
From @ocervell:
I came across huge matches like:
[ { "name":"PHP error", "match":"PHP error" }, { "name":"MySQL error", "match":"warning_forbid_default_priv"<MORE THAN 20000 LINES HERE>" } ]which completely destroy my terminal 😄
So we might think about either:
We could end up with a JSON format like:
[ { "name": "MySQL Error", "results": [ { "type": "Regex", "details": {"match": "Warning: ...<truncated_output>mysqli error: need new cache refresh... <truncated_output>", "regex": "(?i)Warning.*?mysqli?", "location": "line 42", "source": "body"} } ] } ]Additionally, regexes have their limits - ideally we want to see one step further and create some kind of pattern-recognition algorithms, or using even using ML for this kind of tasks. It could be a good evolution for cariddi ;) The
typekey would be useful in that case to differenciate the matches from regex matches:[ { "type": "PatternFinder", "details": {"match": "Warning: ...<truncated_output>mysqli error: need new cache refresh... <truncated_output>", "matcher": "error-finder", "version": "2.0.1"} }, { "type": "ML", "details": {"model_name": "my-awesome-ml-model", "version": "0.0.1"} } ]There is also room to improve the findings by filtering which ones are found important or not, for instance:
licensing@<domain>orsales@<domain>is very common and not very sensitiveetc...
Those "rules" could be first hardcoded by us on a case-by-case and then learned by ML as well at some point, and a
severityfield could be set for each finding.There might be a need to create separate issues for some of those points since it's not directly linked to the JSON lines aggregation. Feel free to copy-paste some of my comments there.
Beta Was this translation helpful? Give feedback.
All reactions