diff --git a/text/camlp5-associativity.md b/text/camlp5-associativity.md new file mode 100644 index 00000000..8f3e1305 --- /dev/null +++ b/text/camlp5-associativity.md @@ -0,0 +1,146 @@ +- Title: Towards a more flexible and accurate model for grammar levels and associativity + +- Drivers: Hugo Herbelin + +---- + +# Summary + +The Camlp5/gramlib model for associativity of levels in extensible grammar has a couple of issues. We reflect how to solve them. + +# Detailed description of the current problems + +## Non associativity + +Camlp5 let us believe that non associativity is possible and we let Coq users believe in turn that it was the case too. However, non associativity is *not* a feature of Camlp5. As said in the section [Associativity](https://camlp5.readthedocs.io/en/latest/grammars.html?highlight=associativity#extensible-grammars) of the Camlp5 manual, the keyword `NONA` behaves as `LEFTA`. And indeed, there is no mechanism to support nor emulate non associativity of infix or postfix operators in Camlp5. + +Also, non associativity (equivalently right associativity) is inactive on postfix operators (e.g., a rule such as `[ RIGHTA [ t = SELF; "!" -> ... ] ]` accepts parsing `x!!`. This is for the same reason as non-associativity is not supported (one would need to call continuations only once and not arbitrarily recursively many often, see remark at end of camlp5/camlp5#100). + +For another reason, non associativity (equivalently left associativity) is not emulable on prefix operators (e.g., a rule such as `[ LEFTA [ "?"; t = SELF -> ... ] ]`, or `[ LEFTA [ "?"; t = NEXT -> ... ] ]`, accepts parsing `??x`); this times it is because of a fallback which replace `SELF` by `SELF TOP` if `SELF` has failed. + +Note that the non support for non associativity in the parser is currently a cause of asymmetry with the printer, since the printer respects non associativity. + +## Each Camlp5 level is in practice two levels + +Indeed, a Camlp5 level is made of two blocks: +- a block of rules of the form `SELF; suffix` +- the block of rules not of this form + +In the first block, `SELF` is implemented by a call to the 2nd block of rules. As a consequence, if an infix notation is put at Coq's lower level 0, it still succeeds to parse its argument by finding closed subterms in the second block of level 0, such as an ident, or `( term )`. + +This results in bug reports such as coq/coq#15336 where the parser parses the notation but the printer does not print it. + +## Mixing associativity in the same level is technically possible and would solve the issue of combining developments that set different associativities to the same level + +In practice, a level can mix different associativities. If a rule of the form `SELF; "+"; NEXT -> ...` is put in a right-associative level, it parses `+` in a left associative way. And if a rule of the form `SELF; "+"; SELF LEVEL "foo" -> ...` is put in the left-associative level `"foo"`, then, `+` is parsed in a right associative way. This is because the *only* effect of the `RIGHTA`/`LEFTA` keywords at some level, say `"foo"`, is to give a *default* interpretation to the occurrences of `SELF` on the right-hand side of rules (that is such `SELF` is then interpreted as `NEXT` for `LEFTA` and as `SELF LEVEL "foo"` for `RIGHTA`). In particular, we can decide the associativity per rule by just not using `SELF` on right-hand sides. + +Note that in this case, rules with right associativity (i.e. ending with `SELF LEVEL "foo"`) yield precedence (i.e. bind wider) than rules with left associativity. + +See e.g. coq/coq#17859 and camlp5/camlp5#100 for more details. + +Note that an example of conflicts on setting different associativities to the same level is told to be when combining Iris and MathComp. + +## Camlp5 associativity shown with `Print Grammar constr` leads to confusion + +The `RIGHTA`/`LEFTA` status of a level is thus indicative, and not constraining. And, actually, Coq do override it by setting itself the right-hand side of rules to `NEXT` or `SELF LEVEL "foo"`. + +When no official associativity is assigned to a level, Camlp5 decides it is left associative while the Coq table decides (confusingly) that it is right associative. Since the associativity is only a default value for rules which can be overriden per rule, one confusingly sees a level reported as left associative while Coq allows to add only right associative rules to the level (see coq/coq#12373). + +Also, the current Camlp5 implementation does not treat associativity in the presence of `LIST1` and `LIST0` on the border of a rule. E.g., in `...; LIST1 SELF -> ...`, `SELF` is interpreted as `SELF TOP` and thus does not follow associativity. This is visible e.g. in the `"choice"; LIST1 rewstrategy` rule for `rewstrategy`. + +## Camlp5/gramlib does not fully respect levels + +When no rule matches, there is a recovery mechanism (see coq/coq#9008) made of 4 fallbacks that bypasses the level specified by the user. + +For instance, `True \/ forall x, x = 0` is tolerated even though `forall` is at level 200 and `/\` at level 80. Or `x~0%Z` is tolerated even though `%Z` expects its arguments to be at level 1. + +# Proposal + +We propose to go towards a new model: +- that respects levels +- that respects no associativity +- is more flexible about mixing different associativities in the same level +- remains mostly compatible with the existing situation + +# Solutions + +## Removal of the recovery mechanism + +Removing the recovery mechanism breaks quite common situations such as `True \/ forall x, x = 0`, or `2^-n`, or `x~0%Z`. + +One could try to automatically adapt the scripts but I believe that the simplest is to move `forall` (and all `binder_constr` rules spanning until the end of the sentence) to a lower level, such as 10, just above the level of application. + +The drawback is that we need to modify the printer so that it knows how to print such cycle made of a rule at some level (here 10) which is lower than the level on the right-hand side of the rule (here 200). This is however already made possible by PR #17875. + +For `-`, there are several solutions, such as: +- to set it at level 10 with arguments at level 40 (the level of `*`) so that `2 ^ - x * y` is `2 ^ - (x * y)` but `2 ^ - x + y` is `(2 ^ - x) + y` (??) +- to change scripts and add parentheses + +For `x~0%Z`, there are hopefully few occurrences. + +## Granting non associativity + +In our experience, the changes are relatively small in the standard library: +- it impacts the family of notations `=`, `=?`, `<`, `