Related: TSV2 Proposal

Update: This is done, and it's now called QSN: Quoted String Notation. See the qsn/ directory.


Issue 582 is to implement CSTR

Intro

Rationale: ls --escaped and stat print filenames with 0xFF bytes differently! We want to document and formalize this small format.

Naming

CSTR doesn't stand for anything; it's basically short for "C String". It's spelled a bit like JSON.

Rough Sketch

It's basically a single quoted string with \ escapes that can express any byte string. We use single rather than double quotes to reduce confusion with JSON.

These are valid strings in the CSTR format:

Diff from JSON

It could be easier to describe CSTR as a "diff" from the JSON string format.

Parser

It can be implemented in any number of ways, but it's a regular language so Oil's common style with re2c should work very well.

http://www.oilshell.org/blog/2019/12/22.html#appendix-a-oils-lexer-uses-two-stages-of-code-generation

Printer

Relation to TSV2

CSTR is a subset of TSV2. TSV2 might not be implemented in Oil v1, but CSTR is necessary for basic shell functionality like displaying filenames and argv arrays.

Unquoted variant. "bob" is valid because it doesn't TABs.

name   age
bob    10
name   age
'bob'  10

Relation to Python's repr()

https://docs.python.org/2/library/codecs.html#python-specific-encodings

Related

Unix Tools lists tools like find which understand backslash escapes.