Skip to content

in_tail: fix file_cache_advise causing high disk IOPS#11660

Draft
singholt wants to merge 1 commit intofluent:masterfrom
singholt:in_tail-fix-fadvise-iops
Draft

in_tail: fix file_cache_advise causing high disk IOPS#11660
singholt wants to merge 1 commit intofluent:masterfrom
singholt:in_tail-fix-fadvise-iops

Conversation

@singholt
Copy link
Copy Markdown
Contributor

@singholt singholt commented Apr 2, 2026

Problem

The file_cache_advise option (added in #8422) calls posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED) before read() in flb_tail_file_chunk(). With offset=0, len=0 this evicts the entire file from the page cache on every chunk cycle, including pages the kernel had readahead-cached for the next read. Every subsequent read() becomes a cache miss and must go to disk, causing high IOPS.

Fix

Move the posix_fadvise call to after read and processing, and scope it to only the already-consumed byte range (0 to stream_offset) so that kernel readahead pages for upcoming reads are preserved.

Before

flb_tail_file_chunk()

├─ check buffer capacity, resize if needed

├─ posix_fadvise(fd, 0, 0, DONTNEED) ← evicts ENTIRE file from page cache

├─ read(fd, buf, size) ← cache miss, must hit disk every time

├─ process_content()

├─ update stream_offset

└─ return

After

flb_tail_file_chunk()

├─ check buffer capacity, resize if needed

├─ read(fd, buf, size) ← hits page cache normally

├─ process_content()

├─ update stream_offset

├─ posix_fadvise(fd, 0, stream_offset, ← evicts only consumed bytes;
│ DONTNEED) readahead pages preserved

└─ return

Impact

  • Same memory benefit: processed data no longer lingers in page cache
  • No IOPS penalty: kernel readahead works as intended for sequential reads
  • Customers tailing many files under high append rates should see significantly reduced disk I/O

PENDING TESTING, DO NOT MERGE

The posix_fadvise(POSIX_FADV_DONTNEED) call was placed before the
read and evicted the entire file from the page cache on every chunk
cycle. This forced every subsequent read to go to disk, causing
excessive IOPS.

Move the call to after read and processing, and scope it to only
the already-consumed byte range (0 to stream_offset) so that kernel
readahead pages for upcoming reads are preserved.

Signed-off-by: Anuj Singh <singholt@amazon.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: efc0ea15-801e-4df3-8e61-58b1b73b905a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant