Up: Structured Data in Oil
Takeaway from http://www.oilshell.org/blog/2017/09/19.html
Minimal Solution, That Basically Exists
- Use
% format strings, as is a common convention.
- Implement a way to output the
NUL byte. git log uses %x00, and find -printf uses \0.
- Use UTF-8 encoding for strings.
\0 can't appear in UTF-8 strings, except as a terminator.
Advantages:
- It's already a common practice. See Unix Tools.
- Works with
xargs -0 (which was meant for find -print0)
%x00 is a trivial patch if it exists.
- You can save serialization cost by selecting the fields you want.
- In an escaping context, you can make this safe against against adversarial input.
Disadvantages:
%s is not that readable. But this can be mitigated by Oil Metaprogramming. That is, turning it into "hash: $hash commit: $commit".
Other Solutions
- JSON for structured (and proper escaping)
- CSV for tabular data (and proper escaping)
- Also need a foo.csv_schema for the types. JSON has types in the data encoding, but CSV doesn't.
- Provide
%#s for a length prefix, for truly binary data. What use cases exist?
- Alternative: base64 encode
- Alternative: pass the file system path of the file (could be in memory on
tmpfs).
- Netstrings -- for fixed formats
Languages with Pipe-Like Features
- Elm: Understanding Pipes in Elm -- In Elm we have the pipe operator (<| or |>). The pipes can be |> (pipe forward) and <|(pipe backward). It represents how the data is being passed.
- Elixir
- R -- with
%>% and -> and magritter
- Tulip shell-like / Haskell-like language
- Clojure ?
- Haskell ?
I don't think Julia has it.