Because these languages are so closely related, their grammars and language descriptions are generated from a common source to ensure consistency, and the editors of these specifications work together closely.
This document specifies a grammar for XPath 3.1, using the same basic EBNF notation used in [XML 1.0]. Unless otherwise noted (see A.2 Lexical structure), whitespace is not significant in expressions. Grammar productions are introduced together with the features that they describe, and a complete grammar is also presented in the appendix [A XPath 3.1 Grammar]. The appendix is the normative version.
In the grammar productions in this document, named symbols are underlined and literal text is enclosed in double quotes. For example, the following productions describe the syntax of a static function call:
[Definition: The first three components of the dynamic context (context item, context position, and context size) are called the focus of the expression. ] The focus enables the processor to keep track of which items are being processed by the expression. If any component in the focus is defined, both the context item and context position are known.
If any component in the focus is defined, context size is usually defined as well. However, when streaming, the context size cannot be determined without lookahead, so it may be undefined. If so, expressions like last() will raise a dynamic error because the context size is undefined.
Certain language constructs, notably the path operator E1/E2, the simple map operator E1!E2 , and the predicate E1[E2], create a new focus for the evaluation of a sub-expression. In these constructs, E2 is evaluated once for each item in the sequence that results from evaluating E1. Each time E2 is evaluated, it is evaluated with a different focus. The focus for evaluating E2 is referred to below as the inner focus, while the focus for evaluating E1 is referred to as the outer focus. The inner focus is used only for the evaluation of E2. Evaluation of E1 continues with its original focus unchanged.
[Definition: The context item is the item currently being processed.] [Definition: When the context item is a node, it can also be referred to as the context node.] The context item is returned by an expression consisting of a single dot (.). When an expression E1/E2 or E1[E2] is evaluated, each item in the sequence obtained by evaluating E1 becomes the context item in the inner focus for an evaluation of E2.
[Definition: The context position is the position of the context item within the sequence of items currently being processed.] It changes whenever the context item changes. When the focus is defined, the value of the context position is an integer greater than zero. The context position is returned by the expression fn:position(). When an expression E1/E2 or E1[E2] is evaluated, the context position in the inner focus for an evaluation of E2 is the position of the context item in the sequence obtained by evaluating E1. The position of the first item in a sequence is always 1 (one). The context position is always less than or equal to the context size.
[Definition: The context size is the number of items in the sequence of items currently being processed.] Its value is always an integer greater than zero. The context size is returned by the expression fn:last(). When an expression E1/E2 or E1[E2] is evaluated, the context size in the inner focus for an evaluation of E2 is the number of items in the sequence obtained by evaluating E1.
This section discusses each of the basic kinds of expression. Each kind of expression has a name such as PathExpr, which is introduced on the left side of the grammar production that defines the expression. Since XPath 3.1 is a composable language, each kind of expression is defined in terms of other expressions whose operators have a higher precedence. In this way, the precedence of operators is represented explicitly in the grammar.
The order in which expressions are discussed in this document does not reflect the order of operator precedence. In general, this document introduces the simplest kinds of expressions first, followed by more complex expressions. For the complete grammar, see Appendix [A XPath 3.1 Grammar].
The XPath 3.1 operator that has lowest precedence is the comma operator, which is used to combine two operands to form a sequence. As shown in the grammar, a general expression (Expr) can consist of multiple ExprSingle operands, separated by commas. The name ExprSingle denotes an expression that does not contain a top-level comma operator (despite its name, an ExprSingle may evaluate to a sequence containing more than one item.)
The symbol ExprSingle is used in various places in the grammar where an expression is not allowed to contain a top-level comma. For example, each of the arguments of a function call must be an ExprSingle, because commas are used to separate the arguments of a function call.
For each item in the input sequence, the predicate expression is evaluated using an inner focus, defined as follows: The context item is the item currently being tested against the predicate. The context size is the number of items in the input sequence. The context position is the position of the context item within the input sequence.
Each operation E1/E2 is evaluated as follows: Expression E1 is evaluated, and if the result is not a (possibly empty) sequence S of nodes, a type error is raised [err:XPTY0019]. Each node in S then serves in turn to provide an inner focus (the node as the context item, its position in S as the context position, the length of S as the context size) for an evaluation of E2, as described in 2.1.2 Dynamic Context. The sequences resulting from all the evaluations of E2 are combined as follows:
The focus for evaluation of the return clause of a for expression is the same as the focus for evaluation of the for expression itself. The following example, which attempts to find the total value of a set of order-items, is therefore incorrect:
Each operation E1!E2 is evaluated as follows: Expression E1 is evaluated to a sequence S. Each item in S then serves in turn to provide an inner focus (the item as the context item, its position in S as the context position, the length of S as the context size) for an evaluation of E2 in the dynamic context. The sequences resulting from all the evaluations of E2 are combined as follows: Every evaluation of E2 returns a (possibly empty) sequence of items. These sequences are concatenated and returned. The returned sequence preserves the orderings within and among the subsequences generated by the evaluations of E2 .
[Definition: A terminal is a symbol or string or pattern that can appear in the right-hand side of a rule, but never appears on the left-hand side in the main grammar, although it may appear on the left-hand side of a rule in the grammar for terminals.] The following constructs are used to match strings of one or more characters in a terminal:
As written, the grammar in A XPath 3.1 Grammar is ambiguous for some forms using the '+' and '*' occurrence indicators. The ambiguity is resolved as follows: these operators are tightly bound to the SequenceType expression, and have higher precedence than other uses of these symbols. Any occurrence of '+' and '*', as well as '?', following a sequence type is assumed to be an occurrence indicator, which binds to the last ItemType in the SequenceType.
Comments are allowed everywhere that ignorable whitespace is allowed, and the Comment symbol does not explicitly appear on the right-hand side of the grammar (except in its own production). See A.2.4.1 Default Whitespace Handling.
[Definition: Whitespace and Comments function as symbol separators. For the most part, they are not mentioned in the grammar, and may occur between any two terminal symbols mentioned in the grammar, except where that is forbidden by the /* ws: explicit */ annotation in the EBNF, or by the /* xgc: xml-version */ annotation.]
The grammar in A.1 EBNF normatively defines built-in precedence among the operators of XPath. These operators are summarized here to make clear the order of their precedence from lowest to highest. The associativity column indicates the order in which operators of equal precedence in an expression are applied. 2b1af7f3a8