Turn Oil's expression grammar into an AST #387
Related: Tips on Using pgen2
Demo:
bin/osh -n -c 'var x = 1 + 2 * 3;'
This already works. (Right now semicolon or newline are accepted, we should also add EOF.)
grammar.pgen2 is literally Python 3's grammar!!!expr_parse.py contains the public interface that the rest of the code uses. It turns a stream of tokens into
an AST, which is two steps under the hood. (tokens -> parse tree, then parse tree -> AST)
" starts a double quoted string, and " ends one. In OSH, the lexer modes
are dependent on the control flow of the recursive descent parser.expr_to_ast.py -- the "transformer" i.e. parse tree -> AST stepfrontend/syntax.asdl is the unified OSH and Oil code representation
OIL LANGUAGE, and then everything we care about is under the expr type.command.OilAssign is where Oil and OSH are integrated. That is, ls -l and var x = [1,2,3] are both
commands in OSH. The latter is an Oil expression.frontend/lex.py -- the huge unified OSH and Oil lexer. Lexer modes for Oil are toward the bottom.
lex_mode_e.Expr is the main one for Oil expressions. But we also have different ones for:
"string $interp"$/ d+ /@[myprog --foo --bar=1]osh/word_parse.py has the integration point between OSH and Oil
enode, last_token = self.parse_ctx.ParseOilAssign(self.lexer, grammar_nt.oil_var) -- that indicates that
we're using the oil_var production in grammar.pgen2find in https://github.com/oilshell/oil/pull/386, which also has a "transformer"expr nodes can appear on both LHS and RHS, and others can only appear on the RHS.~~a if cond else bin, not in, is, is not// is div** is ^ (following R and other mathematical languages)^ is xor@ operator instead?)3 < x <= 5f(x, y=3). Includes method calls with . operator, e.g. mydict.clear()
x,true and false, following C, Java, JS, etc.
True and False because types are generally capitalized Str, Dict, List{key1, key2} taking
their values from surrounding scope@[ mycommand --flag1 --flag2 ] -- uses the "command" lexer mode for "bare words"@[1 2 3]token -- I try not to preprocess these too much, to allow more options for downstream tools. Tokens have location information which makes it easy to generate precise error messages.Generally I test things very quickly with osh -n -c, or an interactive shell, but we should somehow record those tests. The simplest thing to do is to write some Python unit tests that take strings and print out the AST. Maybe they don't even need to make assertions?
Update: I added a test driver, which you can run like this:
test/unit.sh unit oil_lang/expr_parse_test.py
It takes lines of code and prints out an AST.
If you want to print out the parse tree, turn on print_parse_tree in frontend/parse_lib.py ParseOilAssign.
NOTE: The way I hacked everything together was with pgen2/pgen2-test.sh all. (You can run less with a particular function in that file, like parse-exprs or oil-productions.) This worked pretty nicely, but I won't be surprised if others don't like this style or get confused by it :-/
The whole front end is statically typed with MyPy now. The types/osh-parse.sh script checks it in Travis.
I usually the code working, and then add types. However filling in types first is conceivable. ASDL types map to MyPy types in a straightforward way.
See Contributing, but
build/dev.sh minimal
should be enough (on an Ubuntu/Debian machine).
Important: make sure to re-run this when changing frontend/syntax.asdl. The file _devbuild/gen/syntax_asdl.py needs to be regenerated.