Make detection of binary files more robust#515
Make detection of binary files more robust#515martenlienen wants to merge 2 commits intoxenodium:mainfrom
Conversation
| (if (and mime-type | ||
| (not (or (string-prefix-p "text/" mime-type) | ||
| (string-match-p "\\(?:json\\|xml\\|yaml\\)$" mime-type)))) | ||
| ;; Read contents as base-64 encoded binary |
There was a problem hiding this comment.
Won't small non-image binary files (PDFs, .docx, etc.) that pass the size check will now be embedded as "resource" blocks with base64-encoded content in the text field? Before this PR, json-serialize would error on non-UTF-8 bytes and the condition-case would catch it, falling back to a plain text mention, somewhat accidentally the right behavior.
Maybe resource embed branch in the caller probably needs a (not (map-elt file :base64-p)) guard so binary non-image files fall through to resource_link instead?
There was a problem hiding this comment.
I worked on this, because I @-mentioned a small PDF file that was then embedded in the message but not base64 encoded. agent-shell then refused any further cooperation, when sending the request failed on json-serialize in acp.
| (push `((type . "resource") | ||
| (resource . ((uri . ,(concat "file://" resolved-path)) | ||
| (text . ,(map-elt file :content)) | ||
| (,(if (map-elt file :base64-p) 'blob 'text) . ,(map-elt file :content)) |
There was a problem hiding this comment.
This embeds small non-text files as binary resources now as documented here. And I think at least claude would also support embedded PDFs, if these API docs also apply to ACP.
Regarding and agent-shell-text-file-capabilities above, why is this necessary to work with embedded resources? From the naming and the pattern in the image case, I would think that supports-embedded-context would be the right test to check if the agent can deal with embedded context.
The previous heuristic would try to embed non-image files as text, which would fail, for example, for small PDFs. For those files, which contain non-UTF-8 bytes,
json-serializewould raise an error, because it would be an invalid UTF-8 string.The new function has more robust MIME type detection based on Emacs'
mailcapand ensures that only files with valid multi-byte content will not be base64 encoded.Checklist
M-x checkdocandM-x byte-compile-file.