[uss_qualifier/reports] Add test artifacts obfuscator#1477
[uss_qualifier/reports] Add test artifacts obfuscator#1477BenjaminPelletier wants to merge 7 commits into
Conversation
| Detailed information on command-line options and toggles can be retrieved using the `--help` flag: | ||
|
|
||
| ```bash | ||
| PYTHONPATH=. uv run --index https://pypi.org/simple python monitoring/uss_qualifier/reports/obfuscate.py --help |
There was a problem hiding this comment.
nit: This could be simplified to PYTHONPATH=. uv run monitoring/uss_qualifier/reports/obfuscate.py --help
(python is not needed, --index is environment-specific)
| elif k == "manager" and isinstance(v, str): | ||
| participant_ids.add(v) | ||
| elif isinstance(v, str): | ||
| for url in find_urls(v): |
There was a problem hiding this comment.
reuse scan_text bellow? it do the same loop
|
|
||
|
|
||
| def test_find_urls(): | ||
| text = "Check http://dss1.uss1.localutm/dss/v1, or go to https://github.com/interuss. Also see (http://localhost:8082/status)." |
There was a problem hiding this comment.
Should we test thoses extra cases?
http://user@localhost
http://user:password@localhost
http://localhost?q=a2
There was a problem hiding this comment.
Sure, sounds good.
| # 2. Obfuscate hostnames | ||
| if config.obfuscate_hostnames: | ||
| for h, mapped_h in hostname_map.items(): | ||
| s = re.sub(rf"\b{re.escape(h)}\b", mapped_h, s) |
There was a problem hiding this comment.
That an edge case, but if we have url in uppercase for some reason, urlparse normalize it to lowercase and it won't be replaced there.
urlparse("http://TEST").hostname
'test'
| s = re.sub(rf"\b{re.escape(h)}\b", mapped_h, s) | |
| s = re.sub(rf"\b{re.escape(h)}\b", mapped_h, s, flags=re.IGNORECASE) |
There was a problem hiding this comment.
Added failing test and fixed; nice esoteric catch and I learned something new
|
|
||
| # 1. Obfuscate tokens | ||
| if config.obfuscate_tokens: | ||
| s = re.sub(r"(?i)\bBearer\s+\S+", "Bearer REDACTED", s) |
There was a problem hiding this comment.
| s = re.sub(r"(?i)\bBearer\s+\S+", "Bearer REDACTED", s) | |
| s = re.sub(r"(?i)\bBearer\s+\S+", "Bearer REDACTED", s, flags=re.IGNORECASE) |
To handle cases like bEarER that may be valid
| # Generate maps | ||
| participant_map = {} | ||
| for idx, pid in enumerate( | ||
| sorted(sorted(participant_ids), key=len, reverse=True), start=1 |
There was a problem hiding this comment.
Yes; that's where length-equal keys are alphabetized
| scan_json(v, participant_ids, hostnames) | ||
| elif isinstance(obj, list): | ||
| for item in obj: | ||
| scan_json(item, participant_ids, hostnames) |
There was a problem hiding this comment.
should scan_json do something on strings ?
There is an isinstance(v, str) in dict branch, but not in list, meaning str in dict and list are not handled the same way
There was a problem hiding this comment.
That isinstance is checking whether the dict contains {"Authorization": "Bearer xxx"}. A list cannot contain this key-value pair.
There was a problem hiding this comment.
That the other loop, that one is collecting participants and urls
And it will only collect urls that are single items of dict.
With a file like that (tested with test_obfuscate_directory):
report_data = {
"participant_id": "uss1_core",
"participants": ["uss1_core", "uss2_core"],
"url": "http://dss1.uss3.localutm/dss/v1",
"urls": ["http://dss1.uss4.localutm/dss/v1"],
"report": {
"queries": [
{
"request": {
"url": "http://dss1.uss1.localutm/dss/v1",
"urls": ["http://dss1.uss2.localutm/dss/v1"],
"headers": {"Authorization": "Bearer mytoken"},
}
}
]
},
}
dss1.uss2.localutm and dss1.uss4.localutm won't be detected.
I don't know if it's a possibility (and I would guess in a report the probably to see the hostname in a simple location is high), but that probably better to handle this case no?
Co-authored-by: Maximilien Cuony <the-glu@users.noreply.github.com>
Some users may be hesitant to include full test artifacts when reporting issues or requesting help because those artifacts may contain semi-sensitive information. This PR adds a tool that attempts to ease this problem somewhat by obfuscating sensitive information while still retaining nearly-full diagnostic value for the artifacts.
Written mostly with Gemini.