Skip to content

[Feature Request]: Deleted files from data source (S3 or S3 compatible) are still exists in Ragflow Dataset #11460

@Furkan-Demir

Description

@Furkan-Demir

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (Language Policy).
  • Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

Describe your problem

Version: v0.22.1 (Docker, CPU)

Deleted files in the data source are not removed from the Ragflow dataset. Is it bug or out-of-scope?

Steps to Reproduce:

  1. Add an S3 (or S3-compatible) bucket as a data source.
  2. Upload several files to the bucket, let's say a.pdf, b.pdf, c.pdf.
  3. Wait for Ragflow’s sync job to complete.
  4. Confirm that the files appear in the Ragflow dataset (sync works as expected).
  5. Delete b.pdf from the data source.
  6. Wait for the next Ragflow sync job.
  7. Observe that b.pdf still appears in the Ragflow dataset.

Expected Behavior:

Deleted files in the data source should also be removed from the Ragflow dataset after the sync job.

Actual Behavior:

The deleted file (b.pdf) remains in the Ragflow dataset even after multiple sync cycles.

Metadata

Metadata

Assignees

Labels

💞 featureFeature request, pull request that fullfill a new feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions