filter_lookup: added filter for key value lookup#10620
filter_lookup: added filter for key value lookup#10620olegmukhin wants to merge 7 commits intofluent:masterfrom
Conversation
|
Test configuration Fluent Bit YAML Configuration To test new filter we will load a range of log values including, strings (different cases), integer, boolean, embedded quotes and other value types. devices.log {"hostname": "server-prod-001"}
{"hostname": "Server-Prod-001"}
{"hostname": "db-test-abc"}
{"hostname": 123}
{"hostname": true}
{"hostname": " host with space "}
{"hostname": "quoted \"host\""}
{"hostname": "unknown-host"}
{}
{"hostname": [1,2,3]}
{"hostname": {"sub": "val"}}
{"hostname": " "}CSV configuration will aim to test key overwrites, different types of strings, use and escaping of quotes. device-bu.csv When executed with verbose flag the following out is produced. Test output Output shows correct matching and handling of different value types and correct output when no match is detected. Valgrind summary (after run with multiple types of lookups): |
|
Documentation for this filter has been submitted as part of #fluent/fluent-bit-docs/pull/1953. |
|
Added unit tests for lookup filter. All tests pass: Valgrind results are showing appropriate memory management. |
|
Added fix for failing checks on Cent OS 7 and Windows. Please rerun. |
|
Last check is failing due to Cent OS 7 incompatibility in unit test file - fix in last commit. Please rerun. |
|
Could this please get one more run at the checks? I didn't realise we need this to compile on Cent OS 7 - should be good now with last commit. |
|
Can you rebase and push so it reruns tests? |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (10)
plugins/filter_lookup/lookup.h (1)
28-31: Use an enum for metric IDs and confirm legacy metrics exposureMinor: prefer an enum to group these IDs and avoid macro leakage. Also, since the runtime tests fetch metrics via flb_metrics_get_id(), please confirm the plugin populates the legacy metrics list (f_ins->metrics) in addition to cmetrics; otherwise tests on builds without FLB_HAVE_METRICS will fail.
Apply:
-#define FLB_LOOKUP_METRIC_PROCESSED 200 -#define FLB_LOOKUP_METRIC_MATCHED 201 -#define FLB_LOOKUP_METRIC_SKIPPED 202 +enum { + FLB_LOOKUP_METRIC_PROCESSED = 200, + FLB_LOOKUP_METRIC_MATCHED = 201, + FLB_LOOKUP_METRIC_SKIPPED = 202 +};tests/runtime/filter_lookup.c (2)
125-145: Don’t assume null-terminated output; respect sizeThe lib output callback receives a buffer and length. Using strstr() on an assumed NUL-terminated buffer is brittle. Copy to a NUL-terminated scratch buffer first.
static int cb_check_result_json(void *record, size_t size, void *data) { - char *p; - char *expected; - char *result; + char *p; + char *expected; + char *result; expected = (char *) data; - result = (char *) record; + result = flb_malloc(size + 1); + if (!result) { + flb_free(record); + return -1; + } + memcpy(result, record, size); + result[size] = '\0'; p = strstr(result, expected); TEST_CHECK(p != NULL); if (p == NULL) { flb_error("Expected to find: '%s' in result '%s'", expected, result); } - flb_free(record); + flb_free(result); + flb_free(record); return 0; }
449-497: Strengthen “no match” test: assert result_key is absentCurrently this test only checks that another field exists. Add a negative assertion to ensure the filter didn’t inject result_key on misses.
+/* Callback to assert a substring is NOT present */ +static int cb_assert_not_contains(void *record, size_t size, void *data) +{ + char *needle = (char *) data; + char *buf = flb_malloc(size + 1); + if (!buf) { flb_free(record); return -1; } + memcpy(buf, record, size); + buf[size] = '\0'; + TEST_CHECK(strstr(buf, needle) == NULL); + if (strstr(buf, needle) != NULL) { + flb_error("Unexpected substring '%s' found in: '%s'", needle, buf); + } + flb_free(buf); + flb_free(record); + return 0; +} @@ - /* Should NOT contain the result_key since no match was found */ - cb_data.cb = cb_check_result_json; - cb_data.data = "\"other_field\":\"test\""; + /* Should NOT contain result_key since no match was found */ + cb_data.cb = cb_assert_not_contains; + cb_data.data = "\"user_name\":";plugins/filter_lookup/lookup.c (7)
766-768: Float formatting may not match CSV keys; use compact representation.
%fprints 6 decimals (e.g.,1.0→1.000000). Prefer%.15gto preserve intent and improve match rates.- case FLB_RA_FLOAT: - required_size = snprintf(NULL, 0, "%f", rval->o.via.f64); + case FLB_RA_FLOAT: + required_size = snprintf(NULL, 0, "%.15g", rval->o.via.f64); break; ... - case FLB_RA_FLOAT: - printed = snprintf(dynamic_val_buf, required_size + 1, "%f", rval->o.via.f64); + case FLB_RA_FLOAT: + printed = snprintf(dynamic_val_buf, required_size + 1, "%.15g", rval->o.via.f64); break;Also applies to: 815-817
46-74: Metrics macros: avoid static mutable storage and guard counter pointers.
static char* labels_array[1];is shared across threads. Also,cmt_counter_create()can fail; guardctx->cmt_*in the macros.-#define INCREMENT_SKIPPED_METRIC(ctx, ins) do { \ - uint64_t ts = cfl_time_now(); \ - static char* labels_array[1]; \ - labels_array[0] = (char*)flb_filter_name(ins); \ - cmt_counter_add(ctx->cmt_skipped, ts, 1, 1, labels_array); \ - flb_metrics_sum(FLB_LOOKUP_METRIC_SKIPPED, 1, ins->metrics); \ -} while(0) +#define INCREMENT_SKIPPED_METRIC(ctx, ins) do { \ + uint64_t ts = cfl_time_now(); \ + char* labels_array[1]; \ + labels_array[0] = (char *) flb_filter_name(ins); \ + if ((ctx)->cmt_skipped) { cmt_counter_add((ctx)->cmt_skipped, ts, 1, 1, labels_array); } \ + if ((ins)->metrics) { flb_metrics_sum(FLB_LOOKUP_METRIC_SKIPPED, 1, (ins)->metrics); } \ +} while (0)Apply the same pattern to
INCREMENT_MATCHED_METRICandINCREMENT_PROCESSED_METRIC.
529-547: Defensive checks on metric creation.If any
cmt_counter_create()returns NULL, later increments will segfault without guards. Consider logging and continuing without those counters.- ctx->cmt_processed = cmt_counter_create(ins->cmt, + ctx->cmt_processed = cmt_counter_create(ins->cmt, "fluentbit", "filter", "lookup_processed_records_total", "Total number of processed records", 1, labels_name); + if (!ctx->cmt_processed) { flb_plg_warn(ins, "failed to create processed counter"); } ... - ctx->cmt_matched = cmt_counter_create(ins->cmt, + ctx->cmt_matched = cmt_counter_create(ins->cmt, "fluentbit", "filter", "lookup_matched_records_total", "Total number of matched records", 1, labels_name); + if (!ctx->cmt_matched) { flb_plg_warn(ins, "failed to create matched counter"); } ... - ctx->cmt_skipped = cmt_counter_create(ins->cmt, + ctx->cmt_skipped = cmt_counter_create(ins->cmt, "fluentbit", "filter", "lookup_skipped_records_total", "Total number of skipped records due to errors", 1, labels_name); + if (!ctx->cmt_skipped) { flb_plg_warn(ins, "failed to create skipped counter"); }
261-269: Always skipping the first CSV line assumes a header. Make this configurable.Not all CSVs include a header; the first record would be dropped. Add a
skip_header(bool) option or auto-detect, and document the behavior.I can draft the config plumb-through and tests if you want it in this PR.
606-609: Avoid relying on internalht->total_count.Accessing struct internals risks future ABI churn. Track a local “loaded entries” count in
load_csv()and log that instead.
919-938: Precomputeresult_keylength and usememcmp.Minor perf/readability tweak in the hot path.
- if (log_event.body && log_event.body->type == MSGPACK_OBJECT_MAP) { + if (log_event.body && log_event.body->type == MSGPACK_OBJECT_MAP) { int i; + size_t rkey_len = strlen(ctx->result_key); for (i = 0; i < log_event.body->via.map.size; i++) { msgpack_object_kv *kv = &log_event.body->via.map.ptr[i]; if (kv->key.type == MSGPACK_OBJECT_STR && - kv->key.via.str.size == strlen(ctx->result_key) && - strncmp(kv->key.via.str.ptr, ctx->result_key, kv->key.via.str.size) == 0) { + kv->key.via.str.size == rkey_len && + memcmp(kv->key.via.str.ptr, ctx->result_key, rkey_len) == 0) { continue; }
183-244: CSV support limitations (multiline fields).Reader splits on newline before CSV parsing; embedded newlines within quoted fields aren’t supported. If out-of-scope, document explicitly.
Also applies to: 291-417
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
cmake/plugins_options.cmake(1 hunks)plugins/CMakeLists.txt(1 hunks)plugins/filter_lookup/CMakeLists.txt(1 hunks)plugins/filter_lookup/lookup.c(1 hunks)plugins/filter_lookup/lookup.h(1 hunks)tests/runtime/CMakeLists.txt(1 hunks)tests/runtime/filter_lookup.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/filter_lookup/lookup.c (7)
include/fluent-bit/flb_mem.h (2)
flb_free(126-128)flb_calloc(84-96)src/flb_hash_table.c (4)
flb_hash_table_add(401-494)flb_hash_table_create(99-137)flb_hash_table_destroy(197-215)flb_hash_table_get(496-522)include/fluent-bit/flb_filter.h (1)
flb_filter_config_map_set(125-129)src/flb_record_accessor.c (3)
flb_ra_create(271-358)flb_ra_destroy(232-248)flb_ra_get_value_object(803-814)src/flb_log_event_encoder.c (7)
flb_log_event_encoder_begin_record(246-254)flb_log_event_encoder_set_timestamp(276-287)flb_log_event_encoder_commit_record(256-274)flb_log_event_encoder_rollback_record(241-244)flb_log_event_encoder_init(42-74)flb_log_event_encoder_claim_internal_buffer_ownership(118-124)flb_log_event_encoder_destroy(99-116)src/flb_log_event_decoder.c (3)
flb_log_event_decoder_init(99-116)flb_log_event_decoder_destroy(147-179)flb_log_event_decoder_next(310-406)src/flb_ra_key.c (1)
flb_ra_key_value_destroy(842-851)
tests/runtime/filter_lookup.c (4)
src/flb_lib.c (11)
flb_create(138-220)flb_input(261-271)flb_input_set(300-330)flb_filter(287-297)flb_output(274-284)flb_output_set(515-546)flb_stop(942-985)flb_destroy(223-258)flb_filter_set(613-644)flb_start(914-925)flb_lib_push(774-801)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)plugins/filter_lookup/lookup.c (3)
dynbuf_init(141-151)dynbuf_append_char(154-169)dynbuf_destroy(172-180)src/flb_metrics.c (1)
flb_metrics_get_id(62-75)
🔇 Additional comments (3)
cmake/plugins_options.cmake (1)
89-89: New lookup filter option — LGTMOption is correctly added and follows the FLB_MINIMAL override behavior.
plugins/filter_lookup/lookup.c (2)
1-1028: Overall: solid, careful implementation.CSV parsing with proper escaping, case-insensitive matching via normalized keys, RA integration, encoder/decoder usage, and exhaustive cleanup paths are well-structured. Nice work.
990-1007: Ownership verified — hashtable copies/frees its own buffer when val_size > 0 (no double-free).entry_set_value() (flb_hash_table.c) allocates and copies the supplied buffer when val_size > 0 and flb_hash_table_entry_free()/flb_hash_table_destroy() free that internal copy; the plugin allocates val_heap, stores it in ctx->val_list and frees it in cb_lookup_exit — these are distinct allocations, so the current cleanup is correct.
|
@patrick-stephens rebaselined to latest commit as advised, built/deployed without issues (related to the plug-in), and also addressed all the new review comments from the AI reviewer in the last commit. Let me know if there are any further changes required - thanks. I see some potential improvements (such as multiline CSV data support, JSON file support, support for occasionally checking for new data in lookup files, etc.), but I think these should be separate PRs to keep things clean. |
|
Looking forward to testing this new filter out! 🎉 It would be great if there are features such as:
Also I see that it is intended for static datasets but you could add a section on using hot-reload to update. Although not ideal; could be a way for dynamic updating of a CSV file eg. |
patrick-stephens
left a comment
There was a problem hiding this comment.
Triggering a full package build to ensure this builds for all targets as it is enabled by default.
|
@patrick-stephens everything seems to build. Only flb-it-aws_credentials_sts unit test fails in some jobs because I don't have the f879a93 in my branch. Do you want me to rebase? |
|
Yeah let's get it in to confirm, we really want green CI even if (we think) we know why :) |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
cmake/plugins_options.cmake(1 hunks)plugins/CMakeLists.txt(1 hunks)plugins/filter_lookup/CMakeLists.txt(1 hunks)plugins/filter_lookup/lookup.c(1 hunks)plugins/filter_lookup/lookup.h(1 hunks)tests/runtime/CMakeLists.txt(1 hunks)tests/runtime/filter_lookup.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- cmake/plugins_options.cmake
- plugins/filter_lookup/CMakeLists.txt
- tests/runtime/CMakeLists.txt
🧰 Additional context used
🧬 Code graph analysis (2)
tests/runtime/filter_lookup.c (4)
src/flb_lib.c (11)
flb_create(143-225)flb_input(266-276)flb_input_set(305-335)flb_filter(292-302)flb_output(279-289)flb_output_set(520-551)flb_stop(1011-1055)flb_destroy(228-263)flb_filter_set(618-649)flb_start(983-994)flb_lib_push(843-870)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)plugins/filter_lookup/lookup.c (3)
dynbuf_init(148-158)dynbuf_append_char(161-179)dynbuf_destroy(182-190)src/flb_metrics.c (1)
flb_metrics_get_id(62-75)
plugins/filter_lookup/lookup.c (7)
tests/runtime/filter_lookup.c (3)
dynbuf_init(682-689)dynbuf_append_char(691-702)dynbuf_destroy(704-711)include/fluent-bit/flb_mem.h (2)
flb_free(126-128)flb_calloc(84-96)src/flb_hash_table.c (4)
flb_hash_table_add(401-494)flb_hash_table_create(99-137)flb_hash_table_destroy(197-215)flb_hash_table_get(496-522)src/flb_record_accessor.c (3)
flb_ra_create(271-358)flb_ra_destroy(232-248)flb_ra_get_value_object(803-814)src/flb_log_event_encoder.c (7)
flb_log_event_encoder_begin_record(246-254)flb_log_event_encoder_set_timestamp(276-287)flb_log_event_encoder_commit_record(256-274)flb_log_event_encoder_rollback_record(241-244)flb_log_event_encoder_init(42-74)flb_log_event_encoder_claim_internal_buffer_ownership(118-124)flb_log_event_encoder_destroy(99-116)src/flb_log_event_decoder.c (3)
flb_log_event_decoder_init(103-120)flb_log_event_decoder_destroy(151-178)flb_log_event_decoder_next(309-456)src/flb_ra_key.c (1)
flb_ra_key_value_destroy(842-851)
|
@patrick-stephens addressed new AI review comment and rebased🤞. Let me know if there is anything else outstanding here. Thanks. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
plugins/filter_lookup/lookup.c (2)
348-550: Consider removing redundantgoto next_linestatements.The
next_linelabel at line 548 simply executescontinue, making thegotostatements at lines 348, 364, and 380 unnecessary. You can replace them with directcontinuestatements for clearer code flow.Apply this diff to simplify:
if (dynbuf_append_char(&key_buf, '"') != 0) { flb_plg_debug(ctx->ins, "Buffer allocation failed for line %d", line_num); dynbuf_destroy(&key_buf); dynbuf_destroy(&val_buf); flb_free(line); line_num++; - goto next_line; + continue; }Apply similar changes at lines 364 and 380, and remove the
next_line:label entirely.
1008-1008: Use precomputedresult_key_leninstead ofstrlen.Line 628 precomputes
ctx->result_key_lenspecifically for hot-path optimization, but this line callsstrlen(ctx->result_key)instead. Use the precomputed value for consistency and to avoid redundant string traversal.Apply this diff:
- ret = flb_log_event_encoder_append_body_string(&log_encoder, ctx->result_key, strlen(ctx->result_key)); + ret = flb_log_event_encoder_append_body_string(&log_encoder, ctx->result_key, ctx->result_key_len);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
plugins/filter_lookup/lookup.c(1 hunks)plugins/filter_lookup/lookup.h(1 hunks)tests/runtime/filter_lookup.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/runtime/filter_lookup.c
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/filter_lookup/lookup.c (6)
tests/runtime/filter_lookup.c (3)
dynbuf_init(691-698)dynbuf_append_char(700-711)dynbuf_destroy(713-720)include/fluent-bit/flb_mem.h (2)
flb_free(126-128)flb_calloc(84-96)src/flb_hash_table.c (4)
flb_hash_table_add(401-494)flb_hash_table_create(99-137)flb_hash_table_destroy(197-215)flb_hash_table_get(496-522)src/flb_record_accessor.c (3)
flb_ra_create(271-358)flb_ra_destroy(232-248)flb_ra_get_value_object(803-814)src/flb_log_event_encoder.c (7)
flb_log_event_encoder_begin_record(246-254)flb_log_event_encoder_set_timestamp(276-287)flb_log_event_encoder_commit_record(256-274)flb_log_event_encoder_rollback_record(241-244)flb_log_event_encoder_init(42-74)flb_log_event_encoder_claim_internal_buffer_ownership(118-124)flb_log_event_encoder_destroy(99-116)src/flb_ra_key.c (1)
flb_ra_key_value_destroy(842-851)
🔇 Additional comments (2)
plugins/filter_lookup/lookup.h (1)
1-56: LGTM! Well-structured plugin header.The header properly defines the plugin's public API with appropriate include guards, conditional metrics compilation, and clear field documentation. The precomputed
result_key_lenfield (line 40) is a good optimization for the hot path.plugins/filter_lookup/lookup.c (1)
1056-1094: LGTM! Proper resource cleanup and plugin registration.The exit callback correctly frees all resources in the proper order: val_list nodes, record accessor, hash table, and context. The plugin registration with config_map is well-structured with appropriate field types and descriptions.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
plugins/filter_lookup/lookup.c (1)
348-348: Optional:goto next_lineis redundant—replace withcontinue.Multiple error paths use
goto next_line(lines 348, 364, 380, 413, 429, 444), but thenext_linelabel at line 548 immediately executescontinue. Since cleanup is already complete before the goto, you can replace allgoto next_linestatements withcontinuedirectly, eliminating the unnecessary label.Also applies to: 364-364, 380-380, 413-413, 429-429, 444-444
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/filter_lookup/lookup.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/filter_lookup/lookup.c (6)
include/fluent-bit/flb_mem.h (2)
flb_free(126-128)flb_calloc(84-96)src/flb_hash_table.c (4)
flb_hash_table_add(401-494)flb_hash_table_create(99-137)flb_hash_table_destroy(197-215)flb_hash_table_get(496-522)src/flb_record_accessor.c (3)
flb_ra_create(271-358)flb_ra_destroy(232-248)flb_ra_get_value_object(803-814)src/flb_log_event_encoder.c (7)
flb_log_event_encoder_begin_record(246-254)flb_log_event_encoder_set_timestamp(276-287)flb_log_event_encoder_commit_record(256-274)flb_log_event_encoder_rollback_record(241-244)flb_log_event_encoder_init(42-74)flb_log_event_encoder_claim_internal_buffer_ownership(118-124)flb_log_event_encoder_destroy(99-116)src/flb_log_event_decoder.c (3)
flb_log_event_decoder_init(103-120)flb_log_event_decoder_destroy(151-178)flb_log_event_decoder_next(309-456)src/flb_ra_key.c (1)
flb_ra_key_value_destroy(842-851)
🔇 Additional comments (2)
plugins/filter_lookup/lookup.c (2)
987-1011: LGTM: Error recovery correctly breaks out of for-loop.The
goto next_recordpattern at line 1002 is correct. Whenappend_body_valuesfails during key copying, the code:
- Rolls back the incomplete record
- Emits the original record
- Jumps to
next_recordlabel (line 1007)- Checks the error condition (line 1009) and continues the outer while-loop
This correctly skips the rest of the record construction (appending result_key and committing) and moves to the next log event. The past review comment suggesting this was incorrect was mistaken—the current implementation properly handles the error case.
1-1100: Well-implemented CSV lookup filter with robust error handling.This is a comprehensive and well-structured implementation of a CSV-based lookup filter. The code demonstrates:
✅ Solid design decisions:
- Proper separation of concerns with helper functions
- Robust CSV parsing with quote handling and dynamic buffers
- Comprehensive error handling and cleanup paths
- Efficient hash table usage for lookups
- Support for non-string type conversion with explicit array/map skipping
✅ Memory safety:
- Past review issues (unmatched quote detection, val_node allocation order) have been properly addressed
- Consistent cleanup patterns throughout
- Proper ownership tracking via val_list
✅ Good practices:
- Platform-specific code properly guarded
- Metrics integration with proper guards
- Configurable behavior via config_map
- Comprehensive logging at appropriate levels
The implementation follows Fluent Bit patterns correctly and is production-ready.
|
@patrick-stephens I think maybe the runner ran out of disk space for 1 of the jobs? I don't think this is related to this PR, but let me know if you need anything :) |
|
We fixed no left device error in the current master. |
New filter aims to address use case of simple data enrichment using static key value lookup. The filter loads first two columns of CSV file into memory as a hash table. When a specified record value matches the key in the hash table the value will be appended to the record (based on key name defined in the filter inputs).) Tested with valgrind. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
- Removed unecessary FLB_FILTER_LOOKUP build flag now LookUp is enabled by default like other filters (without flag). - Fixed critical use-after-free bug in numeric value lookups. - Added processed_records_total, matched_records_total and skipped_records_total metrics to enable operational visibility - Added unit tests to cover handling of different data types, CSV loading/handling and metrics tests. Tested with valgrind - no memory leaks. All unit tests pass. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
- fix variable declarations and remove C99 features - Conditional compilation for Windows vs Unix headers/functions - Replace bool with int, fix format specifiers, update comments All 15 unit tests for filter passed. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
- fix variable declarations and remove C99 features for unit tests - Conditional compilation for Windows for unit test features All 15 unit tests for filter passed. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
Addressed following issues: Fix potential memory leak when val_node allocation fails Wrap test metrics code with FLB_HAVE_METRICS guards Replace metric macros with enum to prevent namespace pollution Gate plugin registration on FLB_RECORD_ACCESSOR option Add unmatched quote detection after key parsing in CSV loader Replace magic numbers with semantic msgpack type checking Fix thread safety in lookup filter metrics macros Eliminated potential segfaults from null pointer dereferences Added defensive checks to the metric creation code Optimise hot path by eliminating repeated strlen calls Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
- Added skip_header_row config option (defaults to false for headerless CSVs) - Renamed 'file' parameter to 'data_source' for future URL/database support - Added skip_header_row test to verify both true/false behavior Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
Fix bug where continue inside for-loop breaks error recovery. The fix uses goto to jump past the key copying code and explicitly checks encoder state before proceeding. Signed-off-by: Oleg Mukhin <oleg.v.mukhin@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
plugins/filter_lookup/lookup.c (1)
1014-1014: Minor optimization: use precomputedresult_key_len.Line 1014 calls
strlen(ctx->result_key)in the hot path, butresult_key_lenis already precomputed during initialization (line 628) for exactly this purpose.Apply this diff:
- ret = flb_log_event_encoder_append_body_string(&log_encoder, ctx->result_key, strlen(ctx->result_key)); + ret = flb_log_event_encoder_append_body_string(&log_encoder, ctx->result_key, ctx->result_key_len);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
cmake/plugins_options.cmake(1 hunks)plugins/CMakeLists.txt(1 hunks)plugins/filter_lookup/CMakeLists.txt(1 hunks)plugins/filter_lookup/lookup.c(1 hunks)plugins/filter_lookup/lookup.h(1 hunks)tests/runtime/CMakeLists.txt(1 hunks)tests/runtime/filter_lookup.c(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/runtime/filter_lookup.c
🚧 Files skipped from review as they are similar to previous changes (4)
- cmake/plugins_options.cmake
- plugins/filter_lookup/CMakeLists.txt
- plugins/CMakeLists.txt
- tests/runtime/CMakeLists.txt
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/filter_lookup/lookup.c (9)
tests/runtime/filter_lookup.c (3)
dynbuf_init(691-698)dynbuf_append_char(700-711)dynbuf_destroy(713-720)include/fluent-bit/flb_mem.h (2)
flb_free(126-128)flb_calloc(84-96)src/flb_hash_table.c (4)
flb_hash_table_add(401-494)flb_hash_table_create(99-137)flb_hash_table_destroy(197-215)flb_hash_table_get(496-522)lib/cmetrics/src/cmt_counter.c (1)
cmt_counter_create(26-81)src/flb_metrics.c (1)
flb_metrics_add(129-178)src/flb_record_accessor.c (3)
flb_ra_create(271-358)flb_ra_destroy(232-248)flb_ra_get_value_object(803-814)src/flb_log_event_encoder.c (6)
flb_log_event_encoder_begin_record(246-254)flb_log_event_encoder_commit_record(256-274)flb_log_event_encoder_rollback_record(241-244)flb_log_event_encoder_init(42-74)flb_log_event_encoder_claim_internal_buffer_ownership(118-124)flb_log_event_encoder_destroy(99-116)src/flb_log_event_decoder.c (3)
flb_log_event_decoder_init(103-120)flb_log_event_decoder_destroy(151-178)flb_log_event_decoder_next(309-456)src/flb_ra_key.c (1)
flb_ra_key_value_destroy(842-851)
🔇 Additional comments (8)
plugins/filter_lookup/lookup.h (1)
1-56: LGTM! Clean and well-structured header.The header file is well-organized with:
- Proper header guards
- Clear separation of concerns (metrics enum, context struct, plugin declaration)
- Good inline documentation for the hot-path optimization (
result_key_len)- Appropriate conditional compilation for metrics
plugins/filter_lookup/lookup.c (7)
45-77: LGTM! Well-structured metric macros.The metric increment macros properly handle both the CMT (CMetrics) system and the legacy metrics system, with appropriate guards for when
FLB_HAVE_METRICSis not defined.
88-260: LGTM! Robust helper functions with proper error handling.The helper functions are well-implemented:
normalize_and_trimcorrectly handles whitespace and case normalization with clear return codes- Dynamic buffer operations use standard 2× growth strategy
read_line_dynamicproperly handles arbitrary line lengths, EOF conditions, and line ending normalization
262-554: LGTM! CSV parser with comprehensive error handling.The CSV loading function correctly handles:
- Optional header row skipping based on configuration
- Quote escaping (doubled quotes within quoted fields)
- Unmatched quote detection for both keys and values
- Proper memory management with val_node tracking before hash insertion
- Dynamic buffer cleanup on all error paths
All previously flagged issues have been properly addressed.
556-679: LGTM! Comprehensive initialization with proper validation.The initialization callback is well-structured with:
- Both CMT and legacy metrics setup
- Required parameter validation
- Cross-platform file accessibility checks
- Hot-path optimization:
result_key_lenprecomputed for fast comparison- Thorough error cleanup ensuring no resource leaks
681-708: LGTM! Clean helper for emitting original records.The
emit_original_recordhelper properly encapsulates the record emission pattern with appropriate error handling and metric tracking.
710-1005: LGTM! Robust record processing with comprehensive error handling.The filter callback is well-implemented with:
- Proper decoder/encoder lifecycle management
- Thorough type handling for lookup values (primitives converted to strings, arrays/maps skipped)
- Correct error recovery using goto/label pattern (addressing previous review)
- Comprehensive error handling with rollback and original record emission
- Efficient
any_modifiedtracking to avoid unnecessary buffer copies
1062-1100: LGTM! Complete cleanup and plugin registration.The exit callback and plugin registration are properly implemented:
- Safe list traversal and cleanup of all tracked values
- Destruction of record accessor and hash table
- Comprehensive config map with clear descriptions
- Standard plugin struct definition
|
@cosmo0920 @patrick-stephens thank you both. Rebased. 🤞 I'll pick up the nitpick comment from CodeRabbit in next PR - just want to get this one over the line. |
|
Can you link the docs PR? |
|
@patrick-stephens the docs PR is here: fluent/fluent-bit-docs#1953. Seems like some connectivity errors in some jobs? I don't see anything related to my PR that could have caused these so will need guidance if you need anything from my side. @edsiper this is just a first iteration of what I think is a useful plugin - our team had to develop workarounds to enable something similar. If this gets merged I will add PRs on supporting data reload, other data formats (e.g. JSON) and some performance optimisations in the near future. |
|
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
|
Just for my curiosity but is this patch can be replaced with using processor_content_modifier? |
Added a new LookUp filter to address use case when enrichment of record is required based on simple static key value lookup.
The filter loads a CSV file into a hash table for performance. It consider first column of the CSV to be the key and the second column to be the value. All other columns are ignored.
Where a record value (identified by
lookup_keyinput) matches the key from the CSV, the value from the CSV row is added under a new key (defined byresult_keyinput) to the record.Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.