-
Notifications
You must be signed in to change notification settings - Fork 89
Revert "feat(log-surgeon)!: Support using a single capture group in schema rules to achieve parity with the heuristic parser." #1946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…chema ru…" This reverts commit 4246ada.
WalkthroughThis refactoring updates token handling and dictionary processing across the parser and archive writer, switching from optional/getter-based access patterns to pointer-based patterns for timestamps and type IDs. Changes include removing static text and escape methods from the public API, rewriting the archive message encoding logic with new token iteration and switch-based handling, and refactoring test infrastructure. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@components/core/src/clp/streaming_archive/writer/Archive.cpp`:
- Around line 392-399: Replace the negated boolean checks that use the "!"
operator with the repository style "false == <expression>" form in the
Archive.cpp block where you call
EncodedVariableInterpreter::convert_string_to_representable_integer_var(token.to_string(),
encoded_var) and related conditional checks (also the similar checks around
lines 409-416 and 438-439); e.g., change `if
(!EncodedVariableInterpreter::convert_string_to_representable_integer_var(...))`
to `if (false ==
EncodedVariableInterpreter::convert_string_to_representable_integer_var(...))`
and apply the same pattern for other negations in this function that involve
encode/encode_var_dict_id/m_var_dict.add_entry logic to keep conditionals
consistent with project style.
- Around line 390-422: When TokenInt or TokenFloat falls back to the variable
dictionary (inside the branches for
static_cast<int>(log_surgeon::SymbolId::TokenInt) and TokenFloat) you currently
call m_var_dict.add_entry(...) and set encoded_var but you never record the
dictionary id in m_var_ids, so update_segment_indices misses these entries;
after m_var_dict.add_entry(token.to_string(), id) and after setting encoded_var
= EncodedVariableInterpreter::encode_var_dict_id(id), append the id to m_var_ids
(e.g. m_var_ids.push_back(id)) in both branches so the ids are tracked for
update_segment_indices.
In `@components/core/tests/test-ParserWithUserSchema.cpp`:
- Around line 159-181: The commented-out tests (the TEST_CASE blocks referencing
generate_log_parser and compress and the TODO notes) must be either tracked with
an issue or removed; open a tracking ticket and replace the commented blocks
with a single short TODO comment that includes the issue ID and the original
test names ("Test error for creating log file with delimiter in regex pattern" /
"Test error for missing log file") plus the referenced symbols
generate_log_parser and compress, or delete the commented tests entirely and add
a test-placeholder comment referencing the new issue so coverage gaps are
traceable and the intent to re-enable behavior is preserved.
| case static_cast<int>(log_surgeon::SymbolId::TokenInt): { | ||
| encoded_variable_t encoded_var; | ||
| if (!EncodedVariableInterpreter::convert_string_to_representable_integer_var( | ||
| token.to_string(), | ||
| encoded_var | ||
| )) | ||
| { | ||
| variable_dictionary_id_t id; | ||
| m_var_dict.add_entry(token.to_string(), id); | ||
| encoded_var = EncodedVariableInterpreter::encode_var_dict_id(id); | ||
| m_logtype_dict_entry.add_dictionary_var(); | ||
| } else { | ||
| m_logtype_dict_entry.add_int_var(); | ||
| } | ||
| m_encoded_vars.push_back(encoded_var); | ||
| break; | ||
| } | ||
| case static_cast<int>(log_surgeon::SymbolId::TokenFloat): { | ||
| encoded_variable_t encoded_var; | ||
| if (!EncodedVariableInterpreter::convert_string_to_representable_float_var( | ||
| token.to_string(), | ||
| encoded_var | ||
| )) | ||
| { | ||
| variable_dictionary_id_t id; | ||
| m_var_dict.add_entry(token.to_string(), id); | ||
| encoded_var = EncodedVariableInterpreter::encode_var_dict_id(id); | ||
| m_logtype_dict_entry.add_dictionary_var(); | ||
| } else { | ||
| m_logtype_dict_entry.add_float_var(); | ||
| } | ||
| m_encoded_vars.push_back(encoded_var); | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dictionary IDs aren’t tracked for int/float fallbacks.
When an int/float token falls back to the variable dictionary, the ID isn’t added to m_var_ids, so update_segment_indices won’t record it. This can leave segment indices incomplete for those values.
✅ Suggested fix
if (!EncodedVariableInterpreter::convert_string_to_representable_integer_var(
token.to_string(),
encoded_var
))
{
variable_dictionary_id_t id;
m_var_dict.add_entry(token.to_string(), id);
+ m_var_ids.push_back(id);
encoded_var = EncodedVariableInterpreter::encode_var_dict_id(id);
m_logtype_dict_entry.add_dictionary_var();
} else {
m_logtype_dict_entry.add_int_var();
}
@@
if (!EncodedVariableInterpreter::convert_string_to_representable_float_var(
token.to_string(),
encoded_var
))
{
variable_dictionary_id_t id;
m_var_dict.add_entry(token.to_string(), id);
+ m_var_ids.push_back(id);
encoded_var = EncodedVariableInterpreter::encode_var_dict_id(id);
m_logtype_dict_entry.add_dictionary_var();
} else {
m_logtype_dict_entry.add_float_var();
}🤖 Prompt for AI Agents
In `@components/core/src/clp/streaming_archive/writer/Archive.cpp` around lines
390 - 422, When TokenInt or TokenFloat falls back to the variable dictionary
(inside the branches for static_cast<int>(log_surgeon::SymbolId::TokenInt) and
TokenFloat) you currently call m_var_dict.add_entry(...) and set encoded_var but
you never record the dictionary id in m_var_ids, so update_segment_indices
misses these entries; after m_var_dict.add_entry(token.to_string(), id) and
after setting encoded_var = EncodedVariableInterpreter::encode_var_dict_id(id),
append the id to m_var_ids (e.g. m_var_ids.push_back(id)) in both branches so
the ids are tracked for update_segment_indices.
| if (!EncodedVariableInterpreter::convert_string_to_representable_integer_var( | ||
| token.to_string(), | ||
| encoded_var | ||
| )) | ||
| { | ||
| variable_dictionary_id_t id; | ||
| m_var_dict.add_entry(token.to_string(), id); | ||
| encoded_var = EncodedVariableInterpreter::encode_var_dict_id(id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Use false == instead of ! for these checks.
This aligns with the repo’s style rule and keeps conditionals consistent.
♻️ Suggested refactor
- if (!EncodedVariableInterpreter::convert_string_to_representable_integer_var(
+ if (false == EncodedVariableInterpreter::convert_string_to_representable_integer_var(
token.to_string(),
encoded_var
))
@@
- if (!EncodedVariableInterpreter::convert_string_to_representable_float_var(
+ if (false == EncodedVariableInterpreter::convert_string_to_representable_float_var(
token.to_string(),
encoded_var
))
@@
- if (!m_logtype_dict_entry.get_value().empty()) {
+ if (false == m_logtype_dict_entry.get_value().empty()) {As per coding guidelines: Prefer false == <expression> rather than !<expression>.
Also applies to: 409-416, 438-439
🤖 Prompt for AI Agents
In `@components/core/src/clp/streaming_archive/writer/Archive.cpp` around lines
392 - 399, Replace the negated boolean checks that use the "!" operator with the
repository style "false == <expression>" form in the Archive.cpp block where you
call
EncodedVariableInterpreter::convert_string_to_representable_integer_var(token.to_string(),
encoded_var) and related conditional checks (also the similar checks around
lines 409-416 and 438-439); e.g., change `if
(!EncodedVariableInterpreter::convert_string_to_representable_integer_var(...))`
to `if (false ==
EncodedVariableInterpreter::convert_string_to_representable_integer_var(...))`
and apply the same pattern for other negations in this function that involve
encode/encode_var_dict_id/m_var_dict.add_entry logic to keep conditionals
consistent with project style.
| // TODO: This test doesn't currently work because delimiters are allowed in | ||
| // schema files, and there is no option to disable this yet | ||
| // TEST_CASE("Test error for creating log file with delimiter in regex pattern", | ||
| // "[LALR1Parser]SchemaParser]") { | ||
| // std::string file_path = "../tests/test_schema_files/schema_with_delimiter_in_regex_error.txt"; | ||
| // std::string file_name = boost::filesystem::canonical(file_path).string(); | ||
| // REQUIRE_THROWS_WITH(generate_log_parser(file_path), | ||
| // file_name + | ||
| // ":2: error: 'equals' has regex pattern which contains delimiter '='.\n" | ||
| // + " equals:.*=.*\n" | ||
| // + " ^^^^^\n"); | ||
| //} | ||
|
|
||
| // TODO: This error check is performed correctly by CLP, but it is handled by | ||
| // something different now so this test will fail as is | ||
| // TEST_CASE("Test error for missing log file", "[LALR1Parser][LogParser]") { | ||
| // std::string file_name = "../tests/test_log_files/missing_log.txt"; | ||
| // std::string file_path = boost::filesystem::weakly_canonical(file_name).string(); | ||
| // REQUIRE_THROWS(compress("../tests/test_archives", file_name, | ||
| // "../tests/test_schema_files/schema_that_does_not_exist.txt"), | ||
| // "Specified schema file does not exist."); | ||
| //} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Track or remove the disabled tests.
Leaving tests commented out can create permanent coverage gaps; please link these TODOs to an issue or re-enable when the dependency is ready.
🧰 Tools
🪛 Cppcheck (2.19.0)
[style] 161-161: The function 'push_back_all' is never used.
(unusedFunction)
[style] 160-160: The function 'set_is_split' is never used.
(unusedFunction)
[style] 166-166: The function 'get_orig_path' is never used.
(unusedFunction)
[style] 168-168: The function 'get_orig_file_id' is never used.
(unusedFunction)
[style] 170-170: The function 'get_orig_file_id_as_string' is never used.
(unusedFunction)
[style] 176-176: The function 'get_id_as_string' is never used.
(unusedFunction)
[style] 178-178: The function 'get_begin_ts' is never used.
(unusedFunction)
[style] 180-180: The function 'get_end_ts' is never used.
(unusedFunction)
🤖 Prompt for AI Agents
In `@components/core/tests/test-ParserWithUserSchema.cpp` around lines 159 - 181,
The commented-out tests (the TEST_CASE blocks referencing generate_log_parser
and compress and the TODO notes) must be either tracked with an issue or
removed; open a tracking ticket and replace the commented blocks with a single
short TODO comment that includes the issue ID and the original test names ("Test
error for creating log file with delimiter in regex pattern" / "Test error for
missing log file") plus the referenced symbols generate_log_parser and compress,
or delete the commented tests entirely and add a test-placeholder comment
referencing the new issue so coverage gaps are traceable and the intent to
re-enable behavior is preserved.
Reverts #1273
Summary by CodeRabbit
Release Notes
Bug Fixes
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.