You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Certain file systems allow for characters that either have a special meaning in Unicode such as U+d800 and/or non-Unicode characters
The extended bodyfile 3 format currently does not specify how to handle these characters. Proposal is to escape such characters as "\u####" and "\U########", preferring the short form over the long form where possible.
Control characters U+1-U+8, U+B-U+C, U+E-U+1F, U+7F-U+84, U+86-U+9F (already covered)
What about original path uses a specific codepage (encoding), which is converted to Unicode, however that can be encoded into multiple variations of the original encoding e.g. encoding U+2252 to cp932. What if there are 2 paths that decode to the same string? How should the original path be best preserved?
filename contains a path segment separator (e.g. \ or /), if not escaped this leads to ambiguity e.g. if / is a path segment separator is 'test/1234' a single file name or a path ?
Certain file systems allow for characters that either have a special meaning in Unicode such as U+d800 and/or non-Unicode characters
The extended bodyfile 3 format currently does not specify how to handle these characters. Proposal is to escape such characters as "\u####" and "\U########", preferring the short form over the long form where possible.
Open questions
A related discussion dfxml-working-group/dfxml_schema#34
Also consider if the format should be extended with a header to specify its encoding?