Self Checks
Describe your problem
Version: v0.22.1 (Docker, CPU)
Deleted files in the data source are not removed from the Ragflow dataset. Is it bug or out-of-scope?
Steps to Reproduce:
- Add an S3 (or S3-compatible) bucket as a data source.
- Upload several files to the bucket, let's say
a.pdf, b.pdf, c.pdf.
- Wait for Ragflow’s sync job to complete.
- Confirm that the files appear in the Ragflow dataset (sync works as expected).
- Delete
b.pdf from the data source.
- Wait for the next Ragflow sync job.
- Observe that
b.pdf still appears in the Ragflow dataset.
Expected Behavior:
Deleted files in the data source should also be removed from the Ragflow dataset after the sync job.
Actual Behavior:
The deleted file (b.pdf) remains in the Ragflow dataset even after multiple sync cycles.