Now named QTT: Quoted, Typed Tables
QTT is an interchange format for tables that's a graceful upgrade of TSV. A TSV file is a valid QTT file.
It uses QSN (Quoted String Notation) as a building block.
TSV2 is a textual interchange format for tables, like JSON is for nested data. It's most easily described as a TSV file with some nice properties:
'foo\n' and 'nul byte: \x00 !' and '\u1234'. Most fields should not be quoted.In canonical form, a TSV2 file is a valid UTF-8 text file. There may be reasons to use non-canonical forms (e.g. for compatibility with old TSV readers).
NOTE: → are tab characters (written in HTML as →).
The default column type is string.
This example is inconsistent to illustrate the principle that single quotes are optional in some cases but required in others. They're recommended when a field contains spaces, but required when a field contains newlines or tabs.
rows, rather than faking it with -n 4 for a row of 4 items.
-I {} string.$1 $2 $3.ps and ls can be massaged into Tables.find command (TBD)A line is not a record in CSV, e.g.
"record
with
spaces",10
is a single record. This doesn't work well with line-oriented shell tools.
Also, CSV in practice tends to be ill-specified. (e.g. hardware outputs really broken CSV files.)
NUL bytes like '\x00'.Because JSON uses double quotes, and JSON represents unicode strings. TSV2 represents byte strings, which can be utf-8 encoded.
Strictly speaking, no. Hence the "major version number bump". However:
'spaces and single quotes'<TAB>'another field', rather thanno single quotes<TAB>another field.' characters. This only happens for "unprintable" characters and tabs, so in the common case there are NO corrections necessary.Based on JSON primitive types, but less tied to JavaScript:
string (default)bytes (can represent arbitrary bytes like invalid UTF-8)int and float, because most languages have that distinctionbooleanNothing as far as the TSV2 format is concerned. string is an indication to the language's library to try to decode utf-8 and escape sequences into a string data structure, if applicable. (e.g. Python, Java, and JavaScript provide random code point access; Go and Rust use UTF-8 internally.)
See the "regular expressions" at json.org. We use true, false, and null.
[a-zA-Z_][a-zA-Z0-9_]+. (Should we allow spaces? Language can convert to _ ?)' but doesn't end with one.NUL bytes are written '\x00' (only allow \xAB in bytes columns?)'fields with spaces have single quotes' -- this makes TSV readers less likely to work on TSV2 files, but that's OK. You have the option to emit non-canonical TSV2.'' !\xFF codes, it should work? Well there's also the "bell" character and such.align and trim are useful.null
int and boolean columns can't eitherfloat columns? null could be useful for NA. It can be represented by IEEE floats.