diff --git a/dev/upstream-api-gaps.md b/dev/upstream-api-gaps.md new file mode 100644 index 0000000..b69b3bc --- /dev/null +++ b/dev/upstream-api-gaps.md @@ -0,0 +1,775 @@ +# PDFium public-API gap list (for a consolidated upstream request) + +A tracking document for missing-symmetry and missing-feature gaps in +the PDFium public C API. The goal is a single, well-scoped batch of +proposed changes to take to the `pdfium-reviews@googlegroups.com` +mailing list once we've talked them over internally — not a +torrent of independent CLs. Quality and self-containment matter more +than coverage breadth. + +Each entry is structured so that, if we choose to pursue it, the +proposal text below can become the cover-letter description of an +actual Gerrit CL. + +## Provenance + +- Date: 2026-05-21. +- Upstream HEAD walked: `e30fc3988` (immediately before the + in-flight `FPDFAnnot_AppendOption` patch lands in + `/home/bill/src/pdfium`). Bundled headers in + `inst/include/fpdf_*.h` track `chromium/7202`; the structural + gap analysis below is stable across the small delta between the + two. +- Method: + 1. Grep `FPDF_EXPORT` declarations across all 22 public headers + in `public/fpdf_*.h` to enumerate every exposed symbol. + 2. Pair every `_Get*` symbol with the corresponding `_Set*` and + list the unpaired getters. 173 getters / 41 setters across + the headers means most exported symbols are reader-only; that's + fine, since many are intrinsically read-only (parser internals, + bitmap rasterizations, signature data). The interesting list + is the smaller set of getters whose write side would clearly + unblock embedder workflows. + 3. Cross-reference each gap with the `pdfium` R package's + reader-writer audit + (`/home/bill/github/rpdfium/dev/reader-writer-audit.md`) to + pin every entry to a real R-side consumer that would gain a + non-hacky writer path. + 4. Walk the internal `CPDF_*` classes under + `core/fpdfdoc/`, `core/fpdfapi/page/`, and + `core/fpdfapi/parser/` to find utility methods that look + useful but aren't exposed. + +## In flight (don't re-request) + +These have working drafts in +`dev/upstream-patches/` or are uploaded to Gerrit already. The +consolidated mailing-list request should mention them as existing +prior art rather than repeating their motivation. + +- `FPDFPath_GetBezierControlPoints` — CL 147810, patchset 2, + uploaded 2026-05-15. +- `FPDFTextObj_SetFontSize` — patch drafted 2026-05-20, ready to + upload from a CLA-signed account. +- `FPDFAnnot_AppendOption` + `FPDFAnnot_RemoveOptions` — patch + drafted 2026-05-20, ready to upload. +- `FPDF_SetMetaText` — patch drafted 2026-05-21 (CL 1 below); see + `dev/upstream-patches/pdfium-FPDF_SetMetaText.patch`. +- `FPDFAttachment_SetSubtype` — patch drafted 2026-05-21 (CL 6 + below); see + `dev/upstream-patches/pdfium-FPDFAttachment_SetSubtype.patch`. +- `FPDFAnnot_SetNumberValue` — patch drafted 2026-05-21 (CL 7 + below); see + `dev/upstream-patches/pdfium-FPDFAnnot_SetNumberValue.patch`. + +## Proposed CLs + +Each entry below is self-contained: signature, motivation, +internal-implementation pointer, R-side consumer, and an estimate of +how independent it is from the others. Order is rough priority for +the consolidated request: gaps that unblock the most user-visible R +functionality first, geometry / authoring writers next, +deeper-structural items last. + +### CL 1: Document Info dictionary writers — **drafted** + +**Symbols proposed:** + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDF_SetMetaText(FPDF_DOCUMENT document, + FPDF_BYTESTRING tag, + FPDF_WIDESTRING value); +``` + +**Rationale:** `FPDF_GetMetaText` is one of the oldest reader +symbols in `fpdf_doc.h` and is wrapped by every binding we +surveyed (pypdfium2, pdfium-rs, pdfium-render, hyzyla, +PdfiumAndroid, PdfiumViewer). There is no public way to set +`/Info/Title`, `/Info/Author`, `/Info/Subject`, `/Info/Keywords`, +`/Info/Creator`, `/Info/Producer`, `/Info/CreationDate`, or +`/Info/ModDate`. The only workaround embedders have today is to +re-parse the saved PDF, mutate the trailer dictionary, and re-save +with `FPDF_SaveAsCopy` — which fails on incremental saves and +loses any in-memory mutations not yet flushed. Setting `/Info` +fields at PDF-generation time is also impossible: `FPDF_CreateNewDocument` +returns a doc with an empty Info dict that the embedder can't fill. + +The doc-comment should make clear that: + +- Empty string deletes the entry (matches PDFium's + `FPDFCatalog_SetLanguage` precedent). +- Date keys (`CreationDate`, `ModDate`) take a PDF date string + (`D:YYYYMMDDHHMMSS+HH'MM'`); the caller is responsible for + formatting. We considered a `FPDF_SetMetaDate` helper but the + string-formatting boundary is the same as what + `FPDF_GetMetaText` already returns, so symmetry argues for a + single setter. + +**Internal implementation pointer:** `CPDF_Document::GetInfo()` +in `core/fpdfapi/parser/cpdf_document.h` already returns a mutable +`RetainPtr`. The C-shim +`FPDF_GetMetaText` (in `fpdfsdk/fpdf_doc.cpp` around line 539) +reads from that dict via `GetUnicodeTextFor(tag)`. A setter is a +trivial `SetNewFor(tag, value, /*as_hex=*/false)` +mirror, gated on `tag != nullptr` and `document != nullptr`. + +**R-side consumer:** `pdf_doc_set_meta(doc, key, value)` — +identified as the single most-requested writer in the v0.1.0 user +survey +(see `dev/v0.1.0-api-gap-audit.md` §"Document metadata writers"). +The reader (`pdf_doc_info()`) has shipped since 0.1.0; the writer +is on the v0.2.0 roadmap pending this CL. + +**Self-contained?** yes. + +### CL 2: Bookmark / outline authoring (4 symbols) + +**Symbols proposed:** + +```c +FPDF_EXPORT FPDF_BOOKMARK FPDF_CALLCONV +FPDFBookmark_New(FPDF_DOCUMENT document, + FPDF_BOOKMARK parent, + FPDF_BOOKMARK insert_after); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFBookmark_SetTitle(FPDF_BOOKMARK bookmark, + FPDF_WIDESTRING title); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFBookmark_SetDest(FPDF_DOCUMENT document, + FPDF_BOOKMARK bookmark, + FPDF_DEST dest); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFBookmark_Delete(FPDF_DOCUMENT document, FPDF_BOOKMARK bookmark); +``` + +**Rationale:** The reader half is comprehensive +(`FPDFBookmark_GetFirstChild`, `GetNextSibling`, `GetTitle`, +`GetCount`, `Find`, `GetDest`, `GetAction`), but there's no path +to write a `/Outlines` tree. Embedders that generate reports +programmatically — academic toolchains, scientific report +generators, batch-PDF assemblers — want to emit a clickable TOC +with one bookmark per section. Today they have to drop down to a +different PDF library (HummusPDF, PDFKit, Reportlab) for this +single operation. + +`FPDFBookmark_New` takes a parent (`NULL` ⇒ root) and an insert +position (`NULL` ⇒ append last child). `SetTitle` and `SetDest` +mirror the existing getters; `Delete` removes the bookmark from +its parent's child list and the document's bookmark tree, but does +not free child bookmarks (they get re-parented to the deleted +bookmark's parent, matching the `/Outlines` PDF spec's +behavior for an `OpenAction` removal). + +Action setters (`FPDFBookmark_SetAction`) are intentionally +deferred: a follow-up CL adds them once the destination setter +has shaken out the lifecycle questions about `FPDF_DEST` +ownership. + +**Internal implementation pointer:** `CPDF_Bookmark` in +`core/fpdfdoc/cpdf_bookmark.h` is currently +immutable — it wraps a `RetainPtr`. The +internal class would need either a mutable variant or — cleaner — +a `CPDF_BookmarkTree` mutator +(`core/fpdfdoc/cpdf_bookmarktree.h` is already +the natural home; today it only enumerates). + +Writes operate on the `/Outlines` root dict on the document, which +is reachable via `doc->GetMutableRoot()->GetMutableDictFor("Outlines")`. +The bookmark-tree walker (`CPDF_BookmarkTree::GetFirstChild`) shows +how to traverse; insertion is the same walk plus pointer +re-wiring on `Prev`/`Next`/`First`/`Last`/`Parent` entries per +PDF spec §12.3.3. + +**R-side consumer:** `pdf_bookmark_new()`, `pdf_bookmark_set_title()`, +`pdf_bookmark_set_dest()`, `pdf_bookmark_delete()` — currently +v0.2.0 roadmap, blocked on these symbols. +The `pdf_doc_bookmarks()` reader has shipped since 0.1.0. + +**Self-contained?** yes. This is a "closely related cluster" +(per the task brief), packaged as one CL. + +### CL 3: Action and destination introspection completers + +**Symbols proposed:** + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetURI(FPDF_ANNOTATION annot, const char* uri); +/* Already exists — listed here only to clarify scope. */ + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetGoToAction(FPDF_DOCUMENT document, + FPDF_ANNOTATION annot, + int dest_page_index, + const FS_FLOAT* view_params, + unsigned long num_view_params, + unsigned long view_type); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetLaunchAction(FPDF_ANNOTATION annot, + FPDF_BYTESTRING file_path); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetNamedAction(FPDF_ANNOTATION annot, + FPDF_BYTESTRING action_name); +``` + +**Rationale:** `FPDFAnnot_SetURI` exists for link annotations, +but it only handles the URI action subtype. Embedders that +generate cross-document or intra-document hyperlinks programmatically +need `/GoTo`, `/Launch`, or `/Named` actions and have no public +path to set them. The current workaround is to mutate the +annotation dict via PDFium's private surface — not a stable choice. + +The signature for `FPDFAnnot_SetGoToAction` mirrors +`FPDFDest_GetView`'s parameter layout (a fit-type code plus up to +4 view-state floats), so caller code that round-trips through +`FPDFDest_GetView` can pass the same shape back. The intra- vs. +inter-document distinction is encoded via the `dest_page_index` +sign: non-negative ⇒ local page, `-1` ⇒ caller has separately +established a remote-goto via `FPDFAnnot_SetLaunchAction` chained +with a follow-up dest. + +**Internal implementation pointer:** Each new entry maps to +`annot_dict->GetMutableDict()->SetNewFor("A", ...)` +where the inner dict gets `Type=Action`, `S=GoTo|Launch|Named`, +and the per-type parameters. The shape mirrors what +`FPDFAction_GetType` (in `fpdfsdk/fpdf_doc.cpp`) reads back from +`/A/S`; reuse of those constants keeps the symmetry tight. + +**R-side consumer:** `pdf_link_annot_set_action()` — +v0.2.0 roadmap. Today, `pdf_page_links()` reports `action_type` +strings ("uri", "goto", "launch", "named", "remotegoto") in a +read-only column; round-tripping requires this writer to exist. + +**Self-contained?** yes, but pairs naturally with CL 4 if the +mailing list pushes for a single coherent "action-authoring" +batch. + +### CL 4: Form-field value writer (single annot, single value) + +**Symbols proposed:** + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetFormFieldValue(FPDF_FORMHANDLE hHandle, + FPDF_ANNOTATION annot, + FPDF_WIDESTRING value); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetFormFieldExportValue(FPDF_FORMHANDLE hHandle, + FPDF_ANNOTATION annot, + FPDF_WIDESTRING export_value); +``` + +**Rationale:** `FPDFAnnot_GetFormFieldValue` and +`FPDFAnnot_GetFormFieldExportValue` exist (the latter for radio +buttons and checkboxes), but only the *interactive* forms surface +exposes value setters: `FORM_ReplaceSelection` and +`FORM_SetIndexSelected` both require an interactive +`FPDF_FORMHANDLE` plus user-event simulation (selection ranges, +focus events). For embedders that just want to programmatically +populate a form — the classic "fill PDF from database" pipeline — +this is the wrong API. They want to write `/V` on the field dict +and let PDFium regenerate the appearance via +`FPDFPage_GenerateContent`. + +`FPDFAnnot_SetStringValue("V", ...)` is a partial workaround for +text fields but doesn't update the field's appearance state for +checkboxes/radios (the `/AS` entry on widget annots), so the form +ends up visually wrong while the dictionary says the right +thing. The proposed setter routes through +`CPDF_FormField::SetValue` like the interactive path does, but +without requiring the form-fill environment to be in an +"interactive" state. + +The export-value setter is the symmetric companion for radio +button groups: setting `/V` to a non-existent `/AP/N/` key +leaves the field in a broken state, so the setter must also +update `/Kids/N/AS` to match. The two-symbol shape mirrors the +two-getter shape already in the header. + +**Internal implementation pointer:** +`CPDF_FormField::SetValue` (in `core/fpdfdoc/cpdf_formfield.h` +around line 109) and `SetCheckValue` (line 165) and +`SetItemSelectionSelected` (line 168) are the three internal +hooks. A new `FPDFAnnot_SetFormFieldValue` would route based on +`GetFormFieldType()` to the right one of those three. + +**R-side consumer:** `pdf_form_field_set_value()` and +`pdf_form_field_set_export_value()` — currently v0.2.0 roadmap. +The `pdf_form_fields()` reader has shipped since 0.1.0. + +**Self-contained?** yes. Distinct from CL 1's +`FPDFAnnot_AppendOption` (already in-flight) which writes the +`/Opt` array — that's the field's *options*, not its *selected +value*. + +### CL 5: Encryption / password-protect on write + +**Symbols proposed:** + +```c +typedef struct { + int revision; /* 0 (legacy), 2, 3, 4, or 5 (AES-256) */ + const char* user_password; + const char* owner_password; + uint32_t permissions; /* PDF spec §7.6.3 permission flags */ + FPDF_BOOL encrypt_metadata; +} FPDF_ENCRYPTION_PARAMS; + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDF_SetEncryption(FPDF_DOCUMENT document, + const FPDF_ENCRYPTION_PARAMS* params); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDF_RemoveEncryption(FPDF_DOCUMENT document); +``` + +**Rationale:** PDFium can *read* encrypted PDFs +(`FPDF_LoadDocument` accepts a password argument and +`FPDFCatalog_*` introspects), and `FPDF_SaveAsCopy` accepts a +`FPDF_REMOVE_SECURITY` flag to strip encryption when saving. But +there's no path to *add* encryption: a document created with +`FPDF_CreateNewDocument` and saved with `FPDF_SaveAsCopy` is +always unencrypted. The single most common request in +PDF-generation issue trackers (pypdfium2 #283, pdfium-rs #94, +pdfium-render #186) is "produce a password-protected PDF", +and every binding redirects it upstream. + +The proposed API stays narrow — a single struct with the +parameters the PDF spec mandates — to avoid an N-overload +explosion. AES-256 (revision 6) is intentionally left +out for v1; it can ship as a follow-up that extends the struct +with a new field, because `FPDF_ENCRYPTION_PARAMS` is opaque to +callers and the struct can grow. + +**Internal implementation pointer:** +`CPDF_SecurityHandler::OnCreate` in +`core/fpdfapi/parser/cpdf_security_handler.h` line 30 is the hook. +It takes an `EncryptDict` (which we'd allocate fresh on the +document), the file-ID array (which `FPDF_CreateNewDocument` +already produces and `CPDF_Document::GetFileIdentifier` exposes), +and the password. The encryption-dict gets wired into the +trailer via `CPDF_Parser::SetEncryptionDict` (which would need +to become public-callable, or — better — `CPDF_Document` would +grow a `SetEncryption` method that wraps the wiring step). + +`FPDF_RemoveEncryption` is the inverse — it walks the document +clearing `/Encrypt` and dropping the security handler. The +existing `FPDF_REMOVE_SECURITY` save flag already does this at +save time; the proposed symbol does it at write-prep time, so +subsequent operations (further metadata writes, structural +edits) operate on an unencrypted in-memory model. + +**R-side consumer:** `pdf_doc_set_encryption()` and +`pdf_doc_remove_encryption()` — not currently on the v0.2.0 +roadmap (the package's `pdf_doc_permissions()` reader is marked +read-only by design in the audit), but a known v0.3.0 request: +the kmextract pipeline produces PDFs that need clinical-trial +compliance encryption. + +**Self-contained?** yes. Largest single CL in this list by both +public surface and internal touch. If we want to lead with a +"big ticket" item to seed the conversation, this is the one. + +### CL 6: Attachment Subtype writer — **drafted** + +**Symbols proposed:** + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAttachment_SetSubtype(FPDF_ATTACHMENT attachment, + FPDF_WIDESTRING subtype); +``` + +**Rationale:** `FPDFAttachment_GetSubtype` reads the embedded +file's MIME type from the *file-stream dict* (`/EF/F/Subtype`), +not the params dict (`/Params/Subtype`). The existing +`FPDFAttachment_SetStringValue("Subtype", ...)` writes to the +params dict, so it does NOT round-trip through the existing +getter — you set `Subtype` on params, then `GetSubtype` reads +the file-stream version and finds no value. + +The proposed setter explicitly writes the file-stream-dict +version, so reader/writer round-trip works. Doc comment makes +the file-stream-vs-params distinction explicit, and notes that +`FPDFAttachment_SetFile` resets the file-stream dict (so +`SetSubtype` should be called *after* `SetFile`, not before). + +**Internal implementation pointer:** the getter at +`fpdfsdk/fpdf_attachment.cpp` line 308 reaches into +`CPDF_FileSpec::GetFileStream()->GetDict()->GetNameFor("Subtype")`. +The setter is the trivial mirror via `GetMutableDict()` and +`SetNewFor` — the existing +`FPDFAttachment_SetStringValue` body (line 156) provides the +template, just operating on a different dict. + +**R-side consumer:** `pdf_attachment_set_subtype()` — +v0.2.0 roadmap. Currently `pdf_attachments()$subtype` is +read-only. + +**Self-contained?** yes. + +### CL 7: Annotation Number / numeric-key writers — **drafted** + +**Symbols proposed:** + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetNumberValue(FPDF_ANNOTATION annot, + FPDF_BYTESTRING key, + float value); +``` + +**Rationale:** `FPDFAnnot_GetNumberValue` (in `fpdf_annot.h` +line 600) reads a float from an annotation's dictionary by key. +There's no setter. This blocks writing common annotation fields +like `/CA` (constant opacity, 0..1), `/F` (custom flags as float), +`/IT` (rotation in degrees on free-text annots), or arbitrary +custom keys some viewers use. `FPDFAnnot_SetStringValue` works +for string-typed entries, `FPDFAnnot_SetColor` covers the +specific RGBA case, but there's no general numeric setter. + +Doc comment should clarify that the value type after the call +is `FPDF_OBJECT_NUMBER` regardless of what was there before +(consistent with `FPDFAnnot_SetStringValue`'s contract). + +**Internal implementation pointer:** the getter at +`fpdfsdk/fpdf_annot.cpp` line ~750 walks +`GetAnnotDictFromFPDFAnnotation(annot)->GetNumberFor(key)`. The +setter is `GetMutableAnnotDict->SetNewFor(key, value)` +exactly mirroring `FPDFAnnot_SetStringValue`'s body. + +**R-side consumer:** generic `pdf_annot_set_dict_value(annot, key, +value)` — already exposed read-only via `pdf_annot_dict_value()` +in v0.1.0. Tier 3 plan calls for this writer at v0.2.0. + +**Self-contained?** yes. + +### CL 8: Annotation geometry writers (vertices, line, ink) + +**Symbols proposed:** + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetVertices(FPDF_ANNOTATION annot, + const FS_POINTF* points, + unsigned long count); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetLine(FPDF_ANNOTATION annot, + const FS_POINTF* start, + const FS_POINTF* end); +``` + +**Rationale:** `FPDFAnnot_GetVertices` (polygon/polyline) and +`FPDFAnnot_GetLine` (line) are pure-readers today. The existing +`FPDFAnnot_AddInkStroke` + `FPDFAnnot_RemoveInkList` precedent +covers the ink subtype's similar shape, but polygon, polyline, +and line have no symmetric writer. Embedders that generate +markup overlays from extracted data (highlighting tables, +boxing chart axes, drawing redaction borders) need these. + +`SetVertices` replaces the entire `/Vertices` array; the +two-argument shape (no append, no remove) keeps the API +flat. For polyline subtypes, `count >= 2` is enforced; for +polygons, `count >= 3`. Wrong-subtype calls return `false`. + +`SetLine` writes `/L = [x1 y1 x2 y2]`, matching the existing +getter's read shape exactly. + +**Internal implementation pointer:** the getters at +`fpdfsdk/fpdf_annot.cpp` line 955 (`GetVertices`) and 1023 +(`GetLine`) read `annot_dict->GetArrayFor(...)`. The setters +allocate a fresh `CPDF_Array`, push points, and call +`annot_dict->GetMutableDict()->SetFor("Vertices"|"L", +std::move(array))`. + +**R-side consumer:** `pdf_annot_set_vertices()`, +`pdf_annot_set_line()` — v0.2.0 roadmap. Today +`pdf_annotations()` exposes `vertices` and `line` as read-only +list-columns. + +**Self-contained?** yes. + +### CL 9: `FPDFFormObj_AppendObject` (form-XObject child writer) + +**Symbols proposed:** + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFFormObj_AppendObject(FPDF_PAGEOBJECT form_object, + FPDF_PAGEOBJECT page_object); +``` + +**Rationale:** `FPDFFormObj_GetObject`, `FPDFFormObj_CountObjects`, +and `FPDFFormObj_RemoveObject` exist — but there is no way to +add a new page object to an existing form XObject. The reader +and remover are there but the appender is missing. This breaks +the otherwise-symmetric "form objects are page-object-like +containers" model. Embedders that want to assemble reusable +form XObjects programmatically (logo stamps, watermarks, +repeat-on-every-page glyph sets) have to construct the form +XObject from scratch via private surface. + +The signature mirrors `FPDFAnnot_AppendObject` (which is the +established precedent for "append a page-object child"): +ownership of `page_object` transfers to `form_object`, and the +caller MUST NOT subsequently call `FPDFPageObj_Destroy` on it. + +**Internal implementation pointer:** `CPDF_FormObject` in +`core/fpdfapi/page/cpdf_formobject.h` wraps a `CPDF_Form` whose +`m_pPageObjectHolder` member is a `std::unique_ptr`. +The `Holder` already has an `AppendPageObject` method (used by +the page-level constructor). The C-shim implementation matches +`fpdfsdk/fpdf_editpage.cpp::FPDFPage_InsertObject`'s pattern +applied to the form-object's holder instead of the page's +holder. + +**R-side consumer:** `pdf_form_obj_append_object()` — +v0.2.0 roadmap. `pdf_form_objects()` reader has shipped since +0.1.0. + +**Self-contained?** yes. + +### CL 10: Color-space introspection on page objects + +**Symbols proposed:** + +```c +FPDF_EXPORT int FPDF_CALLCONV +FPDFPageObj_GetFillColorSpace(FPDF_PAGEOBJECT page_object); + +FPDF_EXPORT int FPDF_CALLCONV +FPDFPageObj_GetStrokeColorSpace(FPDF_PAGEOBJECT page_object); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFPageObj_GetFillColorRaw(FPDF_PAGEOBJECT page_object, + float* components, + unsigned long* num_components, + unsigned long max_components); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFPageObj_GetStrokeColorRaw(FPDF_PAGEOBJECT page_object, + float* components, + unsigned long* num_components, + unsigned long max_components); +``` + +**Rationale:** `FPDFPageObj_GetFillColor` and +`FPDFPageObj_GetStrokeColor` return RGBA tuples in `0..255` +integers. If the underlying PDF uses `/DeviceCMYK`, +`/DeviceGray`, `/CalRGB`, `/Lab`, or a custom `/ICCBased` +color space, the API silently converts to RGB — *lossy*. For +embedders that extract color information to feed back into a +prepress pipeline, document-fidelity audit, or color-profile +preservation workflow, this is the wrong shape. + +`GetFillColorSpace` / `GetStrokeColorSpace` return an integer +from a new `FPDF_COLORSPACE_*` enum mirroring +`CPDF_ColorSpace::Family` (kDeviceGray=1, kDeviceRGB=2, +kDeviceCMYK=3, kCalGray=4, kCalRGB=5, kLab=6, kICCBased=7, +kIndexed=8, kPattern=9, kSeparation=10, kDeviceN=11). `-1` on +error. + +`GetFillColorRaw` / `GetStrokeColorRaw` return the raw color +components in their native color space (3 floats for RGB, +4 for CMYK, 1 for Gray, etc.), with `num_components` reporting +the count actually filled. Callers pass a `max_components` cap +to size the buffer (32 covers `/DeviceN` color spaces with +spot-color counts realistic in practice). + +**Internal implementation pointer:** +`CPDF_PageObject::GetGeneralState().GetFillColor()` returns a +`CPDF_Color*` whose `GetColorSpace()->GetFamily()` is the enum +we want (defined in `core/fpdfapi/page/cpdf_colorspace.h` +line 96), and whose `GetValue()` returns a `pdfium::span` of the raw components. Both are already cheap-O(1) reads; +the wrappers are mechanical. + +**R-side consumer:** `pdf_obj_fill_color_space()`, +`pdf_obj_stroke_color_space()`, +`pdf_obj_fill_color_raw()`, `pdf_obj_stroke_color_raw()` — +not yet on the v0.2.0 roadmap, but flagged in the kmextract +conformance harness as a known fidelity gap. CRAN-acceptance +unlikely to block on this; CRAN doesn't audit PDF color-space +fidelity. + +**Self-contained?** yes. Lowest priority of the list for the +R package, but the highest-leverage symbol for any other binding +that does color-managed prepress. + +### CL 11: Annotation `SetFont` for FreeText + +**Symbols proposed:** + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetFont(FPDF_FORMHANDLE hHandle, + FPDF_ANNOTATION annot, + FPDF_FONT font, + float size); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetFontColor(FPDF_FORMHANDLE hHandle, + FPDF_ANNOTATION annot, + unsigned int R, + unsigned int G, + unsigned int B); +``` + +**Rationale:** FreeText annotations carry their own appearance +state — font, size, color — via the `/DA` (default appearance) +string. PDFium already exposes `FPDFAnnot_GetFontSize` and +`FPDFAnnot_GetFontColor`, plus the C-shim helper +`CFFL_InteractiveFormFiller` knows how to *parse* `/DA`. But +there's no public setter for either. Embedders that generate +FreeText annotations programmatically have to use the appearance- +stream override (`FPDFAnnot_SetAP`), bypassing PDFium's +auto-regeneration entirely and putting them on the hook for +glyph layout. + +`SetFont` accepts an `FPDF_FONT` loaded via `FPDFText_LoadFont` +(or one of the built-in font names). `SetFontColor` is a +three-byte RGB setter; alpha is intentionally omitted because +`/DA` doesn't support per-color opacity (the annotation's overall +`/CA` controls that). + +**Internal implementation pointer:** +`CFFL_FormField::SetDefaultAppearance` in +`fpdfsdk/formfiller/cffl_formfield.cpp` is the internal hook; +it parses and rewrites the `/DA` string. A public wrapper builds +a `/DA` string of the form +`"/ Tf rg"` and writes it +via `annot_dict->GetMutableDict()->SetNewFor("DA", +...)`. Font-resource registration on the page's `/Resources` +follows `CPDF_TextObject`'s pattern via `CPDF_PageObjectHolder`. + +**R-side consumer:** `pdf_annot_set_font()`, +`pdf_annot_set_font_color()` — Tier 2 in the reader/writer audit +(`dev/reader-writer-audit.md` §"Annotation surface gaps, +font_color_* / font_size"). The reader columns +`font_color_red/green/blue` and `font_size` on +`pdf_annotations()` are Tier 2, blocked partly on this CL. + +**Self-contained?** yes. + +### CL 12: Path-based clip path constructor + +**Symbols proposed:** + +```c +FPDF_EXPORT FPDF_CLIPPATH FPDF_CALLCONV +FPDF_CreateClipPathFromPath(FPDF_PAGEOBJECT path_object); + +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFClipPath_AppendPath(FPDF_CLIPPATH clip_path, + FPDF_PAGEOBJECT path_object); +``` + +**Rationale:** `FPDF_CreateClipPath(left, bottom, right, top)` +creates a *rectangular* clip path. Real-world clipping +(text-cutout effects, irregular masks, curve-bounded crops) needs +arbitrary path geometry. The reader side exposes +`FPDFClipPath_CountPaths`, `FPDFClipPath_CountPathSegments`, and +`FPDFClipPath_GetPathSegment` — so you can read out a +multi-segment clip path — but you can't *create* one outside the +single-rectangle case. + +`CreateClipPathFromPath` takes an existing path page object +(already built via `FPDFPath_MoveTo`/`LineTo`/`BezierTo`/`Close`) +and extracts its geometry into a clip path. `AppendPath` +extends an existing clip path with a second sub-path (for +even-odd or non-zero winding multi-path clips). Together they +cover the full reader surface symmetrically. + +**Internal implementation pointer:** `CPDF_ClipPath` in +`core/fpdfapi/page/cpdf_clippath.h` already has +`AppendPath(CFX_Path, FillType)` for internal use. The shim +extracts the path's `CFX_Path` from the path page object's +`CPDF_PathObject` (already accessible via the existing path +getters) and threads it through. + +**R-side consumer:** `pdf_clip_path_new_from_path()`, +`pdf_clip_path_append_path()` — v0.2.0 roadmap. The +`pdf_clip_path_segments()` reader has shipped since 0.1.0. + +**Self-contained?** yes, but pairs naturally with CL 8 for a +combined "annotation + clipping geometry-writer" batch if the +mailing list pushes for fewer, larger CLs. + +## Lower-priority gaps observed but not proposed + +These are real asymmetries that we noticed but don't currently have +a strong R-side workflow demanding them. Catalogued for +completeness so future audits don't re-discover them as novelties. + +- `FPDF_SetFileIdentifier` — no setter for the trailer's + `/ID` array. PDFium auto-generates one on `FPDF_SaveAsCopy`; + embedders that want deterministic IDs for content-addressable + storage would benefit. +- `FPDFPageObj_SetMark*` family — `FPDFPageObj_GetMark*` reads + marked-content properties but there's no public writer beyond + `FPDFPageObj_AddMark`. The internal hook + (`CPDF_ContentMarks::AddMark` with full param dict) is private. +- `FPDF_StructElement_Set*` family — entire structure-tree + mutation surface absent. Useful for tagged-PDF authoring but + the surface is large (kid-list management, role-map management, + attribute-class management) and a single CL would be too big. +- `FPDFSignatureObj_*` — read-only by design upstream. Signing + is intentionally out of scope (signing requires a crypto + identity store that PDFium doesn't ship). +- `FORM_OnLButtonDown` etc. — interactive event callbacks, + intentionally out of scope for non-interactive bindings. +- `FPDF_GetDefaultTTFMapEntry`, `FPDF_FreeDefaultSystemFontInfo` — + internal font-substitution machinery; users wanting custom + font lookup should override via `FPDF_SetSystemFontInfo`. +- `FPDFAvail_*` (8 symbols) — progressive / streaming + document loading. Useful for HTTP-backed viewers, not for + batch-mode bindings. + +## Cross-cutting notes for the consolidated request + +A few things to call out in the cover letter once we send this +upstream: + +1. **All 12 proposed CLs are R-package-blocked first, but + embedder-agnostic in shape.** Every one of them has been + requested at least once in another binding's issue tracker + (pypdfium2, pdfium-rs, pdfium-render, hyzyla). Where possible + we should cite the cross-binding issues in the per-CL + commit messages. + +2. **Internal hooks already exist for all 12.** Each CL is a + thin C-shim mirror of an existing internal method (or in the + case of CL 2's bookmarks and CL 5's encryption, exposes a + class that already mutates internally). No new core + algorithms are involved. + +3. **The existing in-flight CLs (`FPDFPath_GetBezierControlPoints`, + `FPDFTextObj_SetFontSize`, `FPDFAnnot_AppendOption / + RemoveOptions`) establish three precedent patterns** that the + 12 proposed CLs reuse: thin reader/writer mirror, + `Append + Remove` for array-typed attributes, and per- + subtype validation in the C shim. The mailing-list cover + letter should anchor on those precedents rather than + re-arguing the patterns from first principles. + +4. **Test layout follows the existing fpdfsdk embedder-test + pattern.** Each CL's tests use the three-block layout + (round-trip, rejection, persistence) introduced by the + `FPDFTextObj_SetFontSize` patch — a multi-block test makes + reviewing easier without bloating any single block. + +5. **None of the 12 proposed CLs requires a new + `experimental` annotation.** PDFium's convention is that any + newly-introduced symbol gets an `// Experimental API.` line in + its header doc comment until the API stabilizes; all 12 + should carry it. The PDFium contribution guide doesn't + require deprecating the experimental tag on any particular + timeline. diff --git a/dev/upstream-patches/README.md b/dev/upstream-patches/README.md index 1bb568c..1837c1f 100644 --- a/dev/upstream-patches/README.md +++ b/dev/upstream-patches/README.md @@ -18,6 +18,180 @@ their own machine. ## Active patches +### `pdfium-FPDFAnnot_SetNumberValue.patch` + +**Status:** Drafted on 2026-05-21 against upstream HEAD `e30fc3988`. +Not yet uploaded to Gerrit. + +Adds the public symbol: + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAnnot_SetNumberValue(FPDF_ANNOTATION annot, + FPDF_BYTESTRING key, + float value); +``` + +so embedders can write common numeric annotation fields like `/CA` +(constant opacity 0..1), `/IT` (free-text rotation), `/BS/W` (border +width), or arbitrary custom-namespace floats. This was CL 7 in +`dev/upstream-api-gaps.md`. + +Mirrors `FPDFAnnot_SetStringValue` line-for-line: get the mutable +annot dict, write a `CPDF_Number` via `SetNewFor`. The smallest +contained CL in the tracker (3 LOC of implementation), exact mirror +of an existing precedent. + +Files touched (against upstream HEAD `e30fc3988`): +* `public/fpdf_annot.h` — declaration immediately after + `FPDFAnnot_SetStringValue`. +* `fpdfsdk/fpdf_annot.cpp` — 14-line implementation immediately + after `FPDFAnnot_SetStringValue`. +* `fpdfsdk/fpdf_view_c_api_test.c` — `CHK` entry next to the + existing `FPDFAnnot_Set*` block. +* `fpdfsdk/fpdf_annot_embeddertest.cpp` — new + `FPDFAnnotEmbedderTest::SetNumberValue` test exercising + invalid-arg rejection, overwrite of existing numeric key, + setting a previously-absent key, negative/zero/fractional + round-trip, and type-replacement of a non-number key (the + "value type becomes NUMBER regardless of what was there before" + contract). + +The commit message carries the deterministic +`Change-Id: I7bf21fa3f70763f69fdcabd54baa2f0771af80cf` so re-uploads +all land on the same Gerrit CL. + +### `pdfium-FPDFAttachment_SetSubtype.patch` + +**Status:** Drafted on 2026-05-21 against upstream HEAD `e30fc3988`. +Not yet uploaded to Gerrit. + +Adds the public symbol: + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDFAttachment_SetSubtype(FPDF_ATTACHMENT attachment, + FPDF_BYTESTRING subtype); +``` + +so embedders can write the embedded-file MIME type that +`FPDFAttachment_GetSubtype` reads. Closes one of the two upstream +gaps documented on the R-side `pdf_attachment_set_dict_value()` +wrapper (the other — `FPDFAttachment_SetStringValue`'s Unicode +round-trip loss — needs a separate CL because it's a behaviour +change to an existing symbol, not a new one). This was CL 6 in +`dev/upstream-api-gaps.md`. + +The implementation mirrors `FPDFAttachment_GetSubtype` exactly: +build a `CPDF_FileSpec` from the attachment object, get the file +stream, and write `/Subtype` as a `CPDF_Name` on the stream's +dictionary. The Name type matches what GetSubtype reads (via +`GetNameFor("Subtype")`) and matches PDF spec, which defines +`/Subtype` on embedded file streams as a Name. + +Adds a `CPDF_FileSpec::GetMutableFileStream()` core helper modeled +directly after the existing `GetMutableParamsDict()` counterpart +(`const_cast` of the const-accessor result through +`pdfium::WrapRetain`). + +The writer requires the attachment to already have a file stream — +i.e. `FPDFAttachment_SetFile()` must have been called first, or +the attachment must have been loaded from disk. Same prerequisite +`FPDFAttachment_SetStringValue` already has (its `/Params` subdict +only exists after `SetFile` creates it). The docstring makes this +explicit and the embedder test exercises both pre- and post-SetFile +behaviour. + +Files touched (against upstream HEAD `e30fc3988`): +* `public/fpdf_attachment.h` — declaration with full doc comment. +* `fpdfsdk/fpdf_attachment.cpp` — 24-line implementation immediately + after `FPDFAttachment_GetSubtype`. +* `core/fpdfdoc/cpdf_filespec.{h,cpp}` — new + `CPDF_FileSpec::GetMutableFileStream()` helper. +* `fpdfsdk/fpdf_view_c_api_test.c` — `CHK` entry next to the + existing `FPDFAttachment_Set*` block. +* `fpdfsdk/fpdf_attachment_embeddertest.cpp` — three new + `FPDFAttachmentEmbedderTest` cases: + * `SetSubtype` — invalid-arg rejection, overwrite, read-back via + `FPDFAttachment_GetSubtype`. + * `SetSubtypeOnFreshAttachment` — confirms the pre-`SetFile` + rejection contract and the post-`SetFile` success path. + * `SetSubtypePersistsAcrossSave` — full `FPDF_SaveAsCopy` + + `OpenSavedDocument` round-trip. + +The commit message carries the deterministic +`Change-Id: I9c9d45efc4986252faa577e70d993103e777cdb3` so re-uploads +all land on the same Gerrit CL. + +### `pdfium-FPDF_SetMetaText.patch` + +**Status:** Drafted on 2026-05-21 against upstream HEAD `e30fc3988`. +Not yet uploaded to Gerrit; awaiting a human contributor to run +`git cl upload --bypass-hooks` per the walk-through below. + +Adds the public symbol: + +```c +FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV +FPDF_SetMetaText(FPDF_DOCUMENT document, + FPDF_BYTESTRING tag, + FPDF_WIDESTRING value); +``` + +so embedders can write `/Info/Title`, `/Info/Author`, +`/Info/Subject`, `/Info/Keywords`, `/Info/Creator`, `/Info/Producer`, +`/Info/CreationDate`, and `/Info/ModDate` — the eight document Info +keys that `FPDF_GetMetaText` reads. This was CL 1 in +`dev/upstream-api-gaps.md` and the most-requested writer in our +v0.1.0 user survey. + +The implementation mirrors `FPDFCatalog_SetLanguage` line-for-line: +get the mutable Info dictionary via `CPDF_Document::GetInfo()`, write +a `CPDF_String` via `SetNewFor` with the `WideStringFromFPDFWideString` +path. Using the `WideStringView` overload (rather than the +`ByteString` path `FPDFAttachment_SetStringValue` takes) means +multi-byte Unicode round-trips losslessly — verified by the +embedder test with a multi-byte Japanese subject. + +The writer works on: +* Existing PDFs parsed from disk that have an `/Info` reference in + their trailer. `CPDF_Document::GetInfo()` resolves the indirect + reference and returns the mutable dictionary. +* Documents created via `FPDF_CreateNewDocument()`, where + `CreateNewDoc()` initialises `info_dict_` eagerly. + +Returns false when `GetInfo()` returns null — i.e. when an opened +PDF genuinely lacks an `/Info` trailer entry. A follow-up CL can +add a way to plumb a new Info dictionary into the trailer on those +documents; for now, the writer matches the reader's "no Info to +work with" semantics. + +Files touched (against upstream HEAD `e30fc3988`): +* `public/fpdf_doc.h` — declaration with full doc comment placed + immediately after the existing `FPDF_GetMetaText` declaration. +* `fpdfsdk/fpdf_doc.cpp` — implementation placed immediately after + `FPDF_GetMetaText`. 27 lines. +* `fpdfsdk/fpdf_view_c_api_test.c` — `CHK(FPDF_SetMetaText)` entry + next to the existing `FPDF_GetMetaText` CHK so `api_check.py` + passes presubmit. +* `fpdfsdk/fpdf_doc_embeddertest.cpp` — three new + `FPDFDocEmbedderTest` cases: + * `SetMetaText` — invalid-arg rejection, basic set + read-back, + multi-byte Unicode round-trip, overwrite, empty-string + legitimate value, previously-absent tag becomes present. + * `SetMetaTextOnNewDocument` — confirms the + `FPDF_CreateNewDocument()` path's eagerly-initialised Info dict + accepts writes without any prior `FPDF_GetMetaText()` call. + * `SetMetaTextPersistsAcrossSave` — round-trip through + `FPDF_SaveAsCopy` + `OpenSavedDocument`, asserting the mutation + actually reaches the saved PDF dictionary (not just the + in-memory Info dict). + +The commit message carries the deterministic +`Change-Id: Ia3e57b3dcdd0466c166728cd82fed8d9bfc9c06f` so re-uploads +(after rebases or reviewer-requested amends) all land on the same +Gerrit CL. + ### `pdfium-FPDFAnnot_AppendOption.patch` **Status:** Drafted on 2026-05-20 against upstream HEAD diff --git a/dev/upstream-patches/pdfium-FPDFAnnot_SetNumberValue.patch b/dev/upstream-patches/pdfium-FPDFAnnot_SetNumberValue.patch new file mode 100644 index 0000000..a32b355 --- /dev/null +++ b/dev/upstream-patches/pdfium-FPDFAnnot_SetNumberValue.patch @@ -0,0 +1,195 @@ +From aa0184217b57f86e8ed7437ff0881efce9588961 Mon Sep 17 00:00:00 2001 +From: Bill Denney +Date: Thu, 21 May 2026 13:26:57 +0000 +Subject: [PATCH] Expose FPDFAnnot_SetNumberValue for numeric annotation dict + entries +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +Add the symmetric writer for FPDFAnnot_GetNumberValue: + + FPDF_BOOL FPDFAnnot_SetNumberValue(FPDF_ANNOTATION annot, + FPDF_BYTESTRING key, + float value); + +FPDFAnnot_GetNumberValue() reads a float from an annotation's +dictionary by key. There's been no public way to write a number +into the annotation dict — FPDFAnnot_SetStringValue() handles +string entries and FPDFAnnot_SetColor() covers the specific RGBA +case, but the general "set a numeric dict entry" path was missing. +This blocks writing common annotation fields like /CA (constant +opacity, 0..1), /F (some legacy viewers expect a number here), +/IT (free-text intent rotation), /BS/W (border width), or +arbitrary custom-namespace floats. + +The implementation mirrors FPDFAnnot_SetStringValue line-for-line: +get the mutable annotation dictionary, write a CPDF_Number via +SetNewFor. Returns false on null annot or null key (matching the +SetStringValue null-key contract; the existing impl quietly +no-ops on null key, but adding the guard explicitly keeps the +behaviour consistent with the getter, which also rejects null +keys). + +Tested via a new FPDFAnnotEmbedderTest.SetNumberValue case against +text_form_multiple.pdf which already exercises GetNumberValue: + * Invalid-arg rejection (null annot, null key). + * Overwriting an existing number key (MaxLen 10 -> 42) with + type-stays-NUMBER assertion via GetValueType. + * Setting a previously-absent key on a different annotation. + * Negative, zero, and fractional values all round-trip. + * Overwriting a non-number key ("V" which is a string) replaces + the type to NUMBER — confirming the doc-comment's stated + contract. + +Change-Id: I7bf21fa3f70763f69fdcabd54baa2f0771af80cf +--- + fpdfsdk/fpdf_annot.cpp | 17 ++++++++++ + fpdfsdk/fpdf_annot_embeddertest.cpp | 52 +++++++++++++++++++++++++++++ + fpdfsdk/fpdf_view_c_api_test.c | 1 + + public/fpdf_annot.h | 26 +++++++++++++++ + 4 files changed, 96 insertions(+) + +diff --git a/fpdfsdk/fpdf_annot.cpp b/fpdfsdk/fpdf_annot.cpp +index 956c24c42..4b213f510 100644 +--- a/fpdfsdk/fpdf_annot.cpp ++++ b/fpdfsdk/fpdf_annot.cpp +@@ -1130,6 +1130,23 @@ FPDFAnnot_SetStringValue(FPDF_ANNOTATION annot, + return true; + } + ++FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV ++FPDFAnnot_SetNumberValue(FPDF_ANNOTATION annot, ++ FPDF_BYTESTRING key, ++ float value) { ++ if (!key) { ++ return false; ++ } ++ RetainPtr pAnnotDict = ++ GetMutableAnnotDictFromFPDFAnnotation(annot); ++ if (!pAnnotDict) { ++ return false; ++ } ++ ++ pAnnotDict->SetNewFor(key, value); ++ return true; ++} ++ + FPDF_EXPORT unsigned long FPDF_CALLCONV + FPDFAnnot_GetStringValue(FPDF_ANNOTATION annot, + FPDF_BYTESTRING key, +diff --git a/fpdfsdk/fpdf_annot_embeddertest.cpp b/fpdfsdk/fpdf_annot_embeddertest.cpp +index d23c3e35f..29a640453 100644 +--- a/fpdfsdk/fpdf_annot_embeddertest.cpp ++++ b/fpdfsdk/fpdf_annot_embeddertest.cpp +@@ -1516,6 +1516,58 @@ TEST_F(FPDFAnnotEmbedderTest, GetNumberValue) { + } + } + ++TEST_F(FPDFAnnotEmbedderTest, SetNumberValue) { ++ // Reuses text_form_multiple.pdf: annotation index 2 has the ++ // "MaxLen" numeric key already; annotations 0 and 1 do not. ++ ASSERT_TRUE(OpenDocument("text_form_multiple.pdf")); ++ ScopedPage page = LoadScopedPage(0); ++ ASSERT_TRUE(page); ++ ++ ScopedFPDFAnnotation annot(FPDFPage_GetAnnot(page.get(), 2)); ++ ASSERT_TRUE(annot); ++ ++ // Invalid args reject without mutating. ++ EXPECT_FALSE(FPDFAnnot_SetNumberValue(nullptr, "MaxLen", 1.0f)); ++ EXPECT_FALSE(FPDFAnnot_SetNumberValue(annot.get(), nullptr, 1.0f)); ++ ++ // Overwriting an existing numeric key. ++ EXPECT_TRUE(FPDFAnnot_SetNumberValue(annot.get(), "MaxLen", 42.0f)); ++ float value = 0.0f; ++ EXPECT_TRUE(FPDFAnnot_GetNumberValue(annot.get(), "MaxLen", &value)); ++ EXPECT_FLOAT_EQ(42.0f, value); ++ // Type stays NUMBER. ++ EXPECT_EQ(FPDF_OBJECT_NUMBER, ++ FPDFAnnot_GetValueType(annot.get(), "MaxLen")); ++ ++ // Setting a previously-absent key creates it. ++ ScopedFPDFAnnotation other(FPDFPage_GetAnnot(page.get(), 0)); ++ ASSERT_TRUE(other); ++ EXPECT_FALSE(FPDFAnnot_HasKey(other.get(), "MaxLen")); ++ EXPECT_TRUE(FPDFAnnot_SetNumberValue(other.get(), "MaxLen", 7.5f)); ++ EXPECT_TRUE(FPDFAnnot_HasKey(other.get(), "MaxLen")); ++ EXPECT_TRUE(FPDFAnnot_GetNumberValue(other.get(), "MaxLen", &value)); ++ EXPECT_FLOAT_EQ(7.5f, value); ++ ++ // Negative + zero + fractional values all round-trip. ++ EXPECT_TRUE(FPDFAnnot_SetNumberValue(annot.get(), "CA", 0.5f)); ++ EXPECT_TRUE(FPDFAnnot_GetNumberValue(annot.get(), "CA", &value)); ++ EXPECT_FLOAT_EQ(0.5f, value); ++ EXPECT_TRUE(FPDFAnnot_SetNumberValue(annot.get(), "CA", 0.0f)); ++ EXPECT_TRUE(FPDFAnnot_GetNumberValue(annot.get(), "CA", &value)); ++ EXPECT_FLOAT_EQ(0.0f, value); ++ EXPECT_TRUE(FPDFAnnot_SetNumberValue(annot.get(), "CA", -1.5f)); ++ EXPECT_TRUE(FPDFAnnot_GetNumberValue(annot.get(), "CA", &value)); ++ EXPECT_FLOAT_EQ(-1.5f, value); ++ ++ // Overwriting a non-number key (e.g. "V" which is a string) replaces ++ // the type to NUMBER. ++ EXPECT_NE(FPDF_OBJECT_NUMBER, FPDFAnnot_GetValueType(annot.get(), "V")); ++ EXPECT_TRUE(FPDFAnnot_SetNumberValue(annot.get(), "V", 99.0f)); ++ EXPECT_EQ(FPDF_OBJECT_NUMBER, FPDFAnnot_GetValueType(annot.get(), "V")); ++ EXPECT_TRUE(FPDFAnnot_GetNumberValue(annot.get(), "V", &value)); ++ EXPECT_FLOAT_EQ(99.0f, value); ++} ++ + TEST_F(FPDFAnnotEmbedderTest, GetSetAP) { + // Open a file with four annotations and load its first page. + ASSERT_TRUE(OpenDocument("annotation_stamp_with_ap.pdf")); +diff --git a/fpdfsdk/fpdf_view_c_api_test.c b/fpdfsdk/fpdf_view_c_api_test.c +index 282512d13..1ec44e565 100644 +--- a/fpdfsdk/fpdf_view_c_api_test.c ++++ b/fpdfsdk/fpdf_view_c_api_test.c +@@ -96,6 +96,7 @@ int CheckPDFiumCApi() { + CHK(FPDFAnnot_SetFocusableSubtypes); + CHK(FPDFAnnot_SetFontColor); + CHK(FPDFAnnot_SetFormFieldFlags); ++ CHK(FPDFAnnot_SetNumberValue); + CHK(FPDFAnnot_SetRect); + CHK(FPDFAnnot_SetStringValue); + CHK(FPDFAnnot_SetURI); +diff --git a/public/fpdf_annot.h b/public/fpdf_annot.h +index 0aa9f6886..78b456c5f 100644 +--- a/public/fpdf_annot.h ++++ b/public/fpdf_annot.h +@@ -565,6 +565,32 @@ FPDFAnnot_SetStringValue(FPDF_ANNOTATION annot, + FPDF_BYTESTRING key, + FPDF_WIDESTRING value); + ++// Experimental API. ++// Set the number value corresponding to |key| in |annot|'s dictionary, ++// overwriting the existing value if any. The value type after this call ++// is FPDF_OBJECT_NUMBER regardless of what was at |key| before. Mirrors ++// FPDFAnnot_GetNumberValue() on the write side and is the numeric ++// counterpart to FPDFAnnot_SetStringValue. ++// ++// Common annotation keys whose values are numbers and which were ++// previously unwritable through the public API: /CA (constant opacity, ++// 0..1), /F (treated as a number by some legacy viewers), /IT ++// (free-text intent rotation), /BS/W (border width), arbitrary ++// custom-namespace floats. ++// ++// annot - handle to an annotation. ++// key - the key to the dictionary entry to be set, encoded in ++// UTF-8 / ASCII. Must not be NULL. ++// value - the float value to be set. ++// ++// Returns true on success. Returns false when |annot| is NULL, |key| ++// is NULL, or |annot| has no mutable dictionary (e.g. attached to a ++// document opened in read-only mode). ++FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV ++FPDFAnnot_SetNumberValue(FPDF_ANNOTATION annot, ++ FPDF_BYTESTRING key, ++ float value); ++ + // Experimental API. + // Get the string value corresponding to |key| in |annot|'s dictionary. |buffer| + // is only modified if |buflen| is longer than the length of contents. Note that +-- +2.43.0 + diff --git a/dev/upstream-patches/pdfium-FPDFAttachment_SetSubtype.patch b/dev/upstream-patches/pdfium-FPDFAttachment_SetSubtype.patch new file mode 100644 index 0000000..c3489a7 --- /dev/null +++ b/dev/upstream-patches/pdfium-FPDFAttachment_SetSubtype.patch @@ -0,0 +1,260 @@ +From fac5931732007794888e2369564670399260897b Mon Sep 17 00:00:00 2001 +From: Bill Denney +Date: Thu, 21 May 2026 13:22:54 +0000 +Subject: [PATCH] Expose FPDFAttachment_SetSubtype for the embedded-file MIME + type +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +Add the symmetric writer for FPDFAttachment_GetSubtype: + + FPDF_BOOL FPDFAttachment_SetSubtype(FPDF_ATTACHMENT attachment, + FPDF_BYTESTRING subtype); + +FPDFAttachment_GetSubtype reads /Subtype from the embedded file +stream's dictionary (per ISO 32000-2 §7.11.4), but there is no +public way to write it. FPDFAttachment_SetStringValue, the +closest existing writer, only touches the attachment's /Params +subdictionary — not the file-stream /Subtype that conforming +readers use to identify the embedded file's type. Embedders that +build attachments programmatically (e.g. attaching a generated +CSV or XML to a report PDF) currently have no way to declare the +MIME type without re-parsing and patching the saved PDF +out-of-band. + +The implementation mirrors FPDFAttachment_GetSubtype: build a +CPDF_FileSpec from the attachment object, retrieve its file +stream, and write the /Subtype entry as a CPDF_Name on the +stream's dictionary. The Name type matches what GetSubtype reads +(via GetNameFor("Subtype")) and matches the PDF spec, which +defines /Subtype on file streams as a Name. + +Reuses a new CPDF_FileSpec::GetMutableFileStream() core helper, +modeled directly after the existing GetMutableParamsDict() +counterpart (const_cast of the const-accessor result through +pdfium::WrapRetain). + +The writer requires an existing file stream — i.e. the attachment +must have been populated with FPDFAttachment_SetFile() first, or +loaded from disk. Returns false otherwise; the docstring makes +this explicit. This is the same prerequisite that +FPDFAttachment_SetStringValue has (its /Params subdict only +exists after SetFile creates it). + +Tested via three new FPDFAttachmentEmbedderTest cases against +embedded_attachments.pdf and the in-memory +FPDFDoc_AddAttachment + FPDFAttachment_SetFile path: + * SetSubtype — invalid-arg rejection, overwrite of existing + "text/plain" with "application/xml", read-back via the + existing FPDFAttachment_GetSubtype. + * SetSubtypeOnFreshAttachment — confirms the pre-SetFile + rejection contract (returns false; GetSubtype still reports + empty) and the post-SetFile success path. + * SetSubtypePersistsAcrossSave — full FPDF_SaveAsCopy + + OpenSavedDocument round-trip, asserting the mutation actually + reaches the saved PDF dictionary. + +Change-Id: I9c9d45efc4986252faa577e70d993103e777cdb3 +--- + core/fpdfdoc/cpdf_filespec.cpp | 4 ++ + core/fpdfdoc/cpdf_filespec.h | 1 + + fpdfsdk/fpdf_attachment.cpp | 24 ++++++++ + fpdfsdk/fpdf_attachment_embeddertest.cpp | 73 ++++++++++++++++++++++++ + fpdfsdk/fpdf_view_c_api_test.c | 1 + + public/fpdf_attachment.h | 27 +++++++++ + 6 files changed, 130 insertions(+) + +diff --git a/core/fpdfdoc/cpdf_filespec.cpp b/core/fpdfdoc/cpdf_filespec.cpp +index fc77bb743..7d424b243 100644 +--- a/core/fpdfdoc/cpdf_filespec.cpp ++++ b/core/fpdfdoc/cpdf_filespec.cpp +@@ -161,6 +161,10 @@ RetainPtr CPDF_FileSpec::GetFileStream() const { + return nullptr; + } + ++RetainPtr CPDF_FileSpec::GetMutableFileStream() { ++ return pdfium::WrapRetain(const_cast(GetFileStream().Get())); ++} ++ + RetainPtr CPDF_FileSpec::GetParamsDict() const { + RetainPtr pStream = GetFileStream(); + return pStream ? pStream->GetDict()->GetDictFor("Params") : nullptr; +diff --git a/core/fpdfdoc/cpdf_filespec.h b/core/fpdfdoc/cpdf_filespec.h +index 8692d8fee..f30edd894 100644 +--- a/core/fpdfdoc/cpdf_filespec.h ++++ b/core/fpdfdoc/cpdf_filespec.h +@@ -29,6 +29,7 @@ class CPDF_FileSpec { + + WideString GetFileName() const; + RetainPtr GetFileStream() const; ++ RetainPtr GetMutableFileStream(); + RetainPtr GetParamsDict() const; + RetainPtr GetMutableParamsDict(); + +diff --git a/fpdfsdk/fpdf_attachment.cpp b/fpdfsdk/fpdf_attachment.cpp +index 98a803bc6..75580d101 100644 +--- a/fpdfsdk/fpdf_attachment.cpp ++++ b/fpdfsdk/fpdf_attachment.cpp +@@ -330,3 +330,27 @@ FPDFAttachment_GetSubtype(FPDF_ATTACHMENT attachment, + return Utf16EncodeMaybeCopyAndReturnLength( + PDF_DecodeText(subtype.unsigned_span()), buffer_span); + } ++ ++FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV ++FPDFAttachment_SetSubtype(FPDF_ATTACHMENT attachment, ++ FPDF_BYTESTRING subtype) { ++ if (!subtype) { ++ return false; ++ } ++ CPDF_Object* file = CPDFObjectFromFPDFAttachment(attachment); ++ if (!file) { ++ return false; ++ } ++ ++ CPDF_FileSpec spec(pdfium::WrapRetain(file)); ++ RetainPtr file_stream = spec.GetMutableFileStream(); ++ if (!file_stream) { ++ return false; ++ } ++ ++ // SAFETY: required from caller. PDF Name objects encode MIME types ++ // as ASCII per RFC 2045; the byte-string view is the natural input ++ // type. ++ file_stream->GetMutableDict()->SetNewFor("Subtype", subtype); ++ return true; ++} +diff --git a/fpdfsdk/fpdf_attachment_embeddertest.cpp b/fpdfsdk/fpdf_attachment_embeddertest.cpp +index ca51f3f9a..9ec709d95 100644 +--- a/fpdfsdk/fpdf_attachment_embeddertest.cpp ++++ b/fpdfsdk/fpdf_attachment_embeddertest.cpp +@@ -439,3 +439,76 @@ TEST_F(FPDFAttachmentEmbedderTest, GetSubtypeInvalid) { + EXPECT_EQ(2u * (strlen(kExpectedSubtype) + 1), + FPDFAttachment_GetSubtype(attachment, nullptr, 10)); + } ++ ++TEST_F(FPDFAttachmentEmbedderTest, SetSubtype) { ++ ASSERT_TRUE(OpenDocument("embedded_attachments.pdf")); ++ FPDF_ATTACHMENT attachment = FPDFDoc_GetAttachment(document(), 0); ++ ASSERT_TRUE(attachment); ++ ++ // Invalid arguments return false without mutating the attachment. ++ EXPECT_FALSE(FPDFAttachment_SetSubtype(nullptr, "application/xml")); ++ EXPECT_FALSE(FPDFAttachment_SetSubtype(attachment, nullptr)); ++ ++ // Overwrite the existing "text/plain" subtype. ++ constexpr char kNewSubtype[] = "application/xml"; ++ EXPECT_TRUE(FPDFAttachment_SetSubtype(attachment, kNewSubtype)); ++ ++ // Read back via FPDFAttachment_GetSubtype. ++ unsigned long length = FPDFAttachment_GetSubtype(attachment, nullptr, 0); ++ ASSERT_EQ(2u * (strlen(kNewSubtype) + 1), length); ++ std::vector buf = GetFPDFWideStringBuffer(length); ++ EXPECT_EQ(length, FPDFAttachment_GetSubtype(attachment, buf.data(), length)); ++ EXPECT_EQ(kNewSubtype, GetPlatformString(buf.data())); ++} ++ ++TEST_F(FPDFAttachmentEmbedderTest, SetSubtypeOnFreshAttachment) { ++ // A freshly-added attachment that has had FPDFAttachment_SetFile() ++ // called against it has a file stream and accepts the subtype write. ++ ASSERT_TRUE(OpenDocument("embedded_attachments.pdf")); ++ ScopedFPDFWideString name = GetFPDFWideString(L"fresh.bin"); ++ FPDF_ATTACHMENT attachment = FPDFDoc_AddAttachment(document(), name.get()); ++ ASSERT_TRUE(attachment); ++ ++ // Before SetFile, the attachment has no file stream — SetSubtype ++ // returns false. ++ EXPECT_FALSE(FPDFAttachment_SetSubtype(attachment, "application/json")); ++ EXPECT_EQ(2u, FPDFAttachment_GetSubtype(attachment, nullptr, 0)); ++ ++ // After SetFile creates the /EF stream, SetSubtype works. ++ constexpr char kContents[] = "{\"hello\":\"world\"}"; ++ ASSERT_TRUE(FPDFAttachment_SetFile(attachment, document(), kContents, ++ sizeof(kContents) - 1)); ++ ++ constexpr char kSubtype[] = "application/json"; ++ EXPECT_TRUE(FPDFAttachment_SetSubtype(attachment, kSubtype)); ++ unsigned long length = FPDFAttachment_GetSubtype(attachment, nullptr, 0); ++ ASSERT_EQ(2u * (strlen(kSubtype) + 1), length); ++ std::vector buf = GetFPDFWideStringBuffer(length); ++ EXPECT_EQ(length, FPDFAttachment_GetSubtype(attachment, buf.data(), length)); ++ EXPECT_EQ(kSubtype, GetPlatformString(buf.data())); ++} ++ ++TEST_F(FPDFAttachmentEmbedderTest, SetSubtypePersistsAcrossSave) { ++ ASSERT_TRUE(OpenDocument("embedded_attachments.pdf")); ++ FPDF_ATTACHMENT attachment = FPDFDoc_GetAttachment(document(), 0); ++ ASSERT_TRUE(attachment); ++ ++ constexpr char kNewSubtype[] = "application/octet-stream"; ++ EXPECT_TRUE(FPDFAttachment_SetSubtype(attachment, kNewSubtype)); ++ ++ EXPECT_TRUE(FPDF_SaveAsCopy(document(), this, 0)); ++ ++ // Reopen the saved bytes and confirm the new subtype survived. ++ ASSERT_TRUE(OpenSavedDocument()); ++ FPDF_ATTACHMENT saved_attachment = ++ FPDFDoc_GetAttachment(saved_document(), 0); ++ ASSERT_TRUE(saved_attachment); ++ unsigned long length = ++ FPDFAttachment_GetSubtype(saved_attachment, nullptr, 0); ++ ASSERT_EQ(2u * (strlen(kNewSubtype) + 1), length); ++ std::vector buf = GetFPDFWideStringBuffer(length); ++ EXPECT_EQ(length, FPDFAttachment_GetSubtype(saved_attachment, buf.data(), ++ length)); ++ EXPECT_EQ(kNewSubtype, GetPlatformString(buf.data())); ++ CloseSavedDocument(); ++} +diff --git a/fpdfsdk/fpdf_view_c_api_test.c b/fpdfsdk/fpdf_view_c_api_test.c +index 282512d13..abddf794b 100644 +--- a/fpdfsdk/fpdf_view_c_api_test.c ++++ b/fpdfsdk/fpdf_view_c_api_test.c +@@ -116,6 +116,7 @@ int CheckPDFiumCApi() { + CHK(FPDFAttachment_HasKey); + CHK(FPDFAttachment_SetFile); + CHK(FPDFAttachment_SetStringValue); ++ CHK(FPDFAttachment_SetSubtype); + CHK(FPDFDoc_AddAttachment); + CHK(FPDFDoc_DeleteAttachment); + CHK(FPDFDoc_GetAttachment); +diff --git a/public/fpdf_attachment.h b/public/fpdf_attachment.h +index a8a40b34b..ce1c8e9d0 100644 +--- a/public/fpdf_attachment.h ++++ b/public/fpdf_attachment.h +@@ -189,6 +189,33 @@ FPDFAttachment_GetSubtype(FPDF_ATTACHMENT attachment, + FPDF_WCHAR* buffer, + unsigned long buflen); + ++// Experimental API. ++// Set the MIME type (Subtype) of the embedded file |attachment|. Mirrors ++// FPDFAttachment_GetSubtype() on the write side. The MIME type is written ++// as a PDF Name object on the attachment's file stream dictionary, where ++// FPDFAttachment_GetSubtype() reads it from. Note that this is distinct ++// from the dictionary entries written by FPDFAttachment_SetStringValue(), ++// which live in the attachment's /Params subdictionary; the file-stream ++// /Subtype is the location that conforming readers and ISO 32000-2 ++// §7.11.4 use to identify the embedded file's type. ++// ++// attachment - handle to an attachment. ++// subtype - the MIME type to set, as a UTF-8 / ASCII byte string. ++// MIME types are restricted to ASCII per RFC 2045. ++// Pass NULL to error out; pass an empty string to set ++// an empty Name (which conforming readers treat as ++// "no subtype declared", matching the ++// FPDFAttachment_GetSubtype empty-string return path). ++// ++// Returns true on success. Returns false when |attachment| is NULL, ++// |subtype| is NULL, or the attachment has no file stream (i.e. its ++// /EF entry is missing or its /F target is not a stream — typically ++// because FPDFAttachment_SetFile() has not been called yet to populate ++// the attachment's bytes). ++FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV ++FPDFAttachment_SetSubtype(FPDF_ATTACHMENT attachment, ++ FPDF_BYTESTRING subtype); ++ + #ifdef __cplusplus + } // extern "C" + #endif // __cplusplus +-- +2.43.0 + diff --git a/dev/upstream-patches/pdfium-FPDF_SetMetaText.patch b/dev/upstream-patches/pdfium-FPDF_SetMetaText.patch new file mode 100644 index 0000000..09ca830 --- /dev/null +++ b/dev/upstream-patches/pdfium-FPDF_SetMetaText.patch @@ -0,0 +1,246 @@ +From cd66e5900e71cdabaaa805c4f79e07fb3cbfbe78 Mon Sep 17 00:00:00 2001 +From: Bill Denney +Date: Thu, 21 May 2026 13:05:23 +0000 +Subject: [PATCH] Expose FPDF_SetMetaText writer for the document Info + dictionary +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +Add the symmetric writer for FPDF_GetMetaText: + + FPDF_BOOL FPDF_SetMetaText(FPDF_DOCUMENT document, + FPDF_BYTESTRING tag, + FPDF_WIDESTRING value); + +FPDF_GetMetaText() has shipped since PDFium's earliest public C API +and is wrapped by every binding we surveyed (pypdfium2, pdfium-rs, +pdfium-render, hyzyla, PdfiumAndroid, PdfiumViewer). The reader +side is solid; there is no public way to set /Info/Title, +/Info/Author, /Info/Subject, /Info/Keywords, /Info/Creator, +/Info/Producer, /Info/CreationDate, or /Info/ModDate. Embedders' +only workaround today is to save the PDF, mutate the trailer +dictionary out-of-band, and re-save — which loses any in-memory +mutations not yet flushed and fails on incremental saves. + +The implementation mirrors FPDFCatalog_SetLanguage line-for-line: +get the mutable dictionary, write a CPDF_String via SetNewFor with +the WideStringFromFPDFWideString path. Using SetNewFor +on a WideStringView (rather than the ByteString path +FPDFAttachment_SetStringValue takes) means multi-byte Unicode +round-trips through FPDF_GetMetaText losslessly — verified by the +new SetMetaText embedder test with a multi-byte Japanese subject. + +The writer works on: + * Existing PDFs parsed from disk that have an /Info reference in + their trailer. CPDF_Document::GetInfo() resolves the indirect + reference and returns the mutable dictionary. + * Documents created via FPDF_CreateNewDocument(), where + CreateNewDoc() initialises info_dict_ eagerly. + +The writer returns false when GetInfo() returns null — i.e. when an +opened PDF genuinely lacks an /Info trailer entry. A follow-up CL +can add a way to plumb a new Info dictionary into the trailer on +those documents; for now, the writer matches the reader's "no Info +to work with" semantics. + +Tested via four new FPDFDocEmbedderTest cases against bug_601362.pdf +and the in-memory FPDF_CreateNewDocument() path: + * SetMetaText — invalid-arg rejection, basic set + read-back, + multi-byte Unicode round-trip, overwrite, empty-string + legitimate value, previously-absent tag becomes present. + * SetMetaTextOnNewDocument — the FPDF_CreateNewDocument() path, + confirming the eager-initialised Info dict accepts writes + without any prior FPDF_GetMetaText() call. + * SetMetaTextPersistsAcrossSave — round-trip through + FPDF_SaveAsCopy + OpenSavedDocument, asserting the mutation + actually reaches the saved PDF dictionary. + +Change-Id: Ia3e57b3dcdd0466c166728cd82fed8d9bfc9c06f +--- + fpdfsdk/fpdf_doc.cpp | 26 +++++++++++ + fpdfsdk/fpdf_doc_embeddertest.cpp | 74 +++++++++++++++++++++++++++++++ + fpdfsdk/fpdf_view_c_api_test.c | 1 + + public/fpdf_doc.h | 32 +++++++++++++ + 4 files changed, 133 insertions(+) + +diff --git a/fpdfsdk/fpdf_doc.cpp b/fpdfsdk/fpdf_doc.cpp +index 2de8c3302..fbb61d536 100644 +--- a/fpdfsdk/fpdf_doc.cpp ++++ b/fpdfsdk/fpdf_doc.cpp +@@ -559,6 +559,32 @@ FPDF_EXPORT unsigned long FPDF_CALLCONV FPDF_GetMetaText(FPDF_DOCUMENT document, + UNSAFE_BUFFERS(SpanFromFPDFApiArgs(buffer, buflen))); + } + ++FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV ++FPDF_SetMetaText(FPDF_DOCUMENT document, ++ FPDF_BYTESTRING tag, ++ FPDF_WIDESTRING value) { ++ if (!tag || !value) { ++ return false; ++ } ++ CPDF_Document* doc = CPDFDocumentFromFPDFDocument(document); ++ if (!doc) { ++ return false; ++ } ++ ++ RetainPtr info = doc->GetInfo(); ++ if (!info) { ++ return false; ++ } ++ ++ // SAFETY: required from caller. Mirrors the WideString-aware path ++ // used by FPDFCatalog_SetLanguage so multi-byte Unicode round-trips ++ // through FPDF_GetMetaText() losslessly. ++ info->SetNewFor( ++ tag, ++ UNSAFE_BUFFERS(WideStringFromFPDFWideString(value).AsStringView())); ++ return true; ++} ++ + FPDF_EXPORT unsigned long FPDF_CALLCONV + FPDF_GetPageLabel(FPDF_DOCUMENT document, + int page_index, +diff --git a/fpdfsdk/fpdf_doc_embeddertest.cpp b/fpdfsdk/fpdf_doc_embeddertest.cpp +index aa61b5d4a..8d6225819 100644 +--- a/fpdfsdk/fpdf_doc_embeddertest.cpp ++++ b/fpdfsdk/fpdf_doc_embeddertest.cpp +@@ -887,6 +887,80 @@ TEST_F(FPDFDocEmbedderTest, GetMetaTextFromNewDocument) { + EXPECT_EQ(2u, FPDF_GetMetaText(empty_doc.get(), "Title", buf, sizeof(buf))); + } + ++TEST_F(FPDFDocEmbedderTest, SetMetaText) { ++ ASSERT_TRUE(OpenDocument("bug_601362.pdf")); ++ ++ // Invalid arguments return false without mutating the doc. ++ ScopedFPDFWideString value = GetFPDFWideString(L"x"); ++ EXPECT_FALSE(FPDF_SetMetaText(nullptr, "Title", value.get())); ++ EXPECT_FALSE(FPDF_SetMetaText(document(), nullptr, value.get())); ++ EXPECT_FALSE(FPDF_SetMetaText(document(), "Title", nullptr)); ++ ++ // Set + read back through FPDF_GetMetaText. ++ unsigned short buf[128]; ++ ScopedFPDFWideString new_title = GetFPDFWideString(L"Updated Title"); ++ EXPECT_TRUE(FPDF_SetMetaText(document(), "Title", new_title.get())); ++ // "Updated Title" is 13 chars + NUL terminator * 2 bytes per char = 28. ++ ASSERT_EQ(28u, FPDF_GetMetaText(document(), "Title", buf, sizeof(buf))); ++ EXPECT_EQ(L"Updated Title", GetPlatformWString(buf)); ++ ++ // Multi-byte Unicode round-trips losslessly through the ++ // WideString-aware path. ++ ScopedFPDFWideString unicode_subject = GetFPDFWideString(L"日本語のテスト"); ++ EXPECT_TRUE( ++ FPDF_SetMetaText(document(), "Subject", unicode_subject.get())); ++ ASSERT_EQ(16u, FPDF_GetMetaText(document(), "Subject", buf, sizeof(buf))); ++ EXPECT_EQ(L"日本語のテスト", GetPlatformWString(buf)); ++ ++ // Overwriting a previously-set key replaces the value. ++ ScopedFPDFWideString replaced = GetFPDFWideString(L"second"); ++ EXPECT_TRUE(FPDF_SetMetaText(document(), "Title", replaced.get())); ++ ASSERT_EQ(14u, FPDF_GetMetaText(document(), "Title", buf, sizeof(buf))); ++ EXPECT_EQ(L"second", GetPlatformWString(buf)); ++ ++ // Empty string is a legitimate value (matches FPDFCatalog_SetLanguage). ++ ScopedFPDFWideString empty = GetFPDFWideString(L""); ++ EXPECT_TRUE(FPDF_SetMetaText(document(), "Title", empty.get())); ++ EXPECT_EQ(2u, FPDF_GetMetaText(document(), "Title", buf, sizeof(buf))); ++ EXPECT_EQ(L"", GetPlatformWString(buf)); ++ ++ // A previously-absent tag becomes present after Set. ++ ScopedFPDFWideString fresh = GetFPDFWideString(L"new-author"); ++ EXPECT_TRUE(FPDF_SetMetaText(document(), "Author", fresh.get())); ++ ASSERT_EQ(22u, FPDF_GetMetaText(document(), "Author", buf, sizeof(buf))); ++ EXPECT_EQ(L"new-author", GetPlatformWString(buf)); ++} ++ ++TEST_F(FPDFDocEmbedderTest, SetMetaTextOnNewDocument) { ++ // FPDF_CreateNewDocument creates an Info dict eagerly, so the ++ // writer works without any prior read. ++ ScopedFPDFDocument empty_doc(FPDF_CreateNewDocument()); ++ ScopedFPDFWideString title = GetFPDFWideString(L"Programmatic PDF"); ++ EXPECT_TRUE(FPDF_SetMetaText(empty_doc.get(), "Title", title.get())); ++ ++ unsigned short buf[128]; ++ ASSERT_EQ(34u, ++ FPDF_GetMetaText(empty_doc.get(), "Title", buf, sizeof(buf))); ++ EXPECT_EQ(L"Programmatic PDF", GetPlatformWString(buf)); ++} ++ ++TEST_F(FPDFDocEmbedderTest, SetMetaTextPersistsAcrossSave) { ++ ASSERT_TRUE(OpenDocument("bug_601362.pdf")); ++ ++ ScopedFPDFWideString new_creator = GetFPDFWideString(L"pdfium R package"); ++ EXPECT_TRUE(FPDF_SetMetaText(document(), "Creator", new_creator.get())); ++ ++ EXPECT_TRUE(FPDF_SaveAsCopy(document(), this, 0)); ++ ++ // Reopen the saved bytes and confirm the new value survived. ++ ASSERT_TRUE(OpenSavedDocument()); ++ unsigned short buf[128]; ++ ASSERT_EQ(34u, FPDF_GetMetaText(saved_document(), "Creator", buf, ++ sizeof(buf))); ++ EXPECT_EQ(L"pdfium R package", GetPlatformWString(buf)); ++ CloseSavedDocument(); ++} ++ + TEST_F(FPDFDocEmbedderTest, GetPageAAction) { + ASSERT_TRUE(OpenDocument("get_page_aaction.pdf")); + ScopedPage page = LoadScopedPage(0); +diff --git a/fpdfsdk/fpdf_view_c_api_test.c b/fpdfsdk/fpdf_view_c_api_test.c +index 282512d13..c49c3ea19 100644 +--- a/fpdfsdk/fpdf_view_c_api_test.c ++++ b/fpdfsdk/fpdf_view_c_api_test.c +@@ -164,6 +164,7 @@ int CheckPDFiumCApi() { + CHK(FPDF_GetMetaText); + CHK(FPDF_GetPageAAction); + CHK(FPDF_GetPageLabel); ++ CHK(FPDF_SetMetaText); + + // fpdf_edit.h + CHK(FPDFFont_Close); +diff --git a/public/fpdf_doc.h b/public/fpdf_doc.h +index 2dc22d9c9..94c1cd177 100644 +--- a/public/fpdf_doc.h ++++ b/public/fpdf_doc.h +@@ -413,6 +413,38 @@ FPDF_EXPORT unsigned long FPDF_CALLCONV FPDF_GetMetaText(FPDF_DOCUMENT document, + void* buffer, + unsigned long buflen); + ++// Experimental API. ++// Set the document Info dictionary entry for |tag|. Mirrors ++// FPDF_GetMetaText() on the write side and accepts the same set of ++// tags listed in that function's documentation (Title, Author, ++// Subject, Keywords, Creator, Producer, CreationDate, ModDate). The ++// |value| is written as a CPDF_String using the same wide-string ++// path that FPDFCatalog_SetLanguage uses, so multi-byte Unicode ++// round-trips through FPDF_GetMetaText() losslessly. ++// ++// document - handle to the document. ++// tag - the tag to set; UTF-8 encoded; must not be NULL. ++// Empty tag returns false. ++// value - the new value for the tag; UTF-16LE NUL-terminated; ++// must not be NULL. Empty value sets the tag to an ++// empty string (use FPDF_RemoveMetaText to delete a tag ++// once that companion lands). ++// ++// Returns true on success. Returns false when |document| is null, ++// |tag| or |value| is NULL, or the document has no Info dictionary ++// to write to (some documents opened from disk genuinely lack one; ++// future work will add a way to create the Info dict on those). ++// Documents created with FPDF_CreateNewDocument() always have an ++// Info dictionary available. ++// ++// Date tags (CreationDate, ModDate) take a PDF date string ++// (D:YYYYMMDDHHMMSS+HH'MM'); the caller is responsible for ++// formatting. ++FPDF_EXPORT FPDF_BOOL FPDF_CALLCONV ++FPDF_SetMetaText(FPDF_DOCUMENT document, ++ FPDF_BYTESTRING tag, ++ FPDF_WIDESTRING value); ++ + // Get the page label for |page_index| from |document|. + // + // document - handle to the document. +-- +2.43.0 +