Describe the bug
arrow-csv will generate \ characters from Utf8 columns as \ in output which lousier CSV parsers, like those written in C/C++ interpret as a string escape sequence and c corrupt the output stream.
To Reproduce
Expected behavior
Arguably those bad CSV parsers should be less bad, but IMHO it's a safe operation to convert \ to \\ in the output stream out of an abundance of caution.
Additional context
From 2a7615200965a68c4808efe021b0414e6e155135 Mon Sep 17 00:00:00 2001
From: "R. Tyler Croy" <rtyler@brokenco.de>
Date: Thu, 2 Apr 2026 18:24:19 +0000
Subject: [PATCH] chore: properly escape forward slashes in CSV output of
strings
Signed-off-by: R. Tyler Croy <rtyler@brokenco.de>
---
arrow-csv/src/writer.rs | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/arrow-csv/src/writer.rs b/arrow-csv/src/writer.rs
index c38d1cdec33..8c7f50b3ca8 100644
--- a/arrow-csv/src/writer.rs
+++ b/arrow-csv/src/writer.rs
@@ -293,6 +293,13 @@ impl<W: Write> Writer<W> {
))
})?;
+ let data_type = batch.schema().field(col_idx).data_type().clone();
+
+ if data_type == DataType::Utf8 || data_type == DataType::LargeUtf8 {
+ // This is fine
+ buffer = str::replace(&buffer, "\\", "\\\\");
+ }
+
let field_bytes =
self.get_trimmed_field_bytes(&buffer, batch.column(col_idx).data_type());
byte_record.push_field(field_bytes);
@@ -1358,4 +1365,28 @@ sed do eiusmod tempor,-556132.25,1,,2019-04-18T02:45:55.555,23:46:03,foo
write_quote_style_with_null(&batch, QuoteStyle::Always, "NULL")
);
}
+
+ #[test]
+ fn test_write_with_forward_slashes() {
+ let schema = Schema::new(vec![
+ Field::new("text", DataType::Utf8, true),
+ Field::new("number", DataType::Int32, true),
+ ]);
+
+ let text = StringArray::from(vec![Some(r"\"), None, Some("world")]);
+ let number = Int32Array::from(vec![Some(1), Some(2), None]);
+
+ let batch =
+ RecordBatch::try_new(Arc::new(schema), vec![Arc::new(text), Arc::new(number)]).unwrap();
+
+ // Test with QuoteStyle::Always
+ assert_eq!(
+ r#""text","number"
+"\\","1"
+"","2"
+"world",""
+"#,
+ write_quote_style(&batch, QuoteStyle::Always)
+ );
+ }
}
--
2.43.0
Describe the bug
arrow-csv will generate
\characters fromUtf8columns as\in output which lousier CSV parsers, like those written in C/C++ interpret as a string escape sequence and c corrupt the output stream.To Reproduce
Expected behavior
Arguably those bad CSV parsers should be less bad, but IMHO it's a safe operation to convert
\to\\in the output stream out of an abundance of caution.Additional context