Compound Rules

The rules shown so far have defined terminal symbols, representing indivisible units of grammar. To parse more complex things, a parser combinator (or compound rule) is a rule which accepts as parameters one or more rules and combines them to form a higher order algorithm. In this section we introduce the compound rules provided by the library, and how they may be used to express more complex grammars.

Tuple Rule

Consider the following grammar:

version = "v" dec-octet "." dec-octet

We can express this using tuple_rule, which matches one or more specified rules in sequence. The folllowing defines a sequence using some character literals and two decimal octets, which is a fancy way of saying a number between 0 and 255:

constexpr auto version_rule = tuple_rule( delim_rule( 'v' ), dec_octet_rule, delim_rule( '.' ), dec_octet_rule );

This rule has a value type of std::tuple, whose types correspond to the value type of each rule specified upon construction. The decimal octets are represented by the dec_octet_rule which stores its result in an unsigned char:

system::result< std::tuple< core::string_view, unsigned char, core::string_view, unsigned char > > rv = parse( "v42.44800", version_rule );

To extract elements from std::tuple the function std::get must be used. In this case, we don’t care to know the value for the matching character literals. The tuple_rule discards match results whose value type is void. We can use the squelch compound rule to convert a matching value type to void, and reformulate our rule:

constexpr auto version_rule = tuple_rule( squelch( delim_rule( 'v' ) ), dec_octet_rule, squelch( delim_rule( '.' ) ), dec_octet_rule );

system::result< std::tuple< unsigned char, unsigned char > > rv = parse( "v42.44800", version_rule );

When all but one of the value types is void, the std::tuple is elided and the remaining value type is promoted to the result of the match:

// port     = ":" unsigned-short

constexpr auto port_rule = tuple_rule( squelch( delim_rule( ':' ) ), unsigned_rule< unsigned short >{} );

system::result< unsigned short > rv = parse( ":443", port_rule );

Optional Rule

BNF elements in brackets denote optional components. These are expressed using optional_rule, whose value type is an optional. For example, we can adapt the port rule from above to be an optional component:

// port     = [ ":" unsigned-short ]

constexpr auto port_rule = optional_rule( tuple_rule( squelch( delim_rule( ':' ) ), unsigned_rule< unsigned short >{} ) );

system::result< boost::optional< unsigned short > > rv = parse( ":8080", port_rule );

assert( rv->has_value() && rv->value() == 8080 );

In this example we build up a rule to represent an endpoint as an IPv4 address with an optional port:

// ipv4_address = dec-octet "." dec-octet "." dec-octet "." dec-octet
//
// port         = ":" unsigned-short
//
// endpoint     = ipv4_address [ port ]

constexpr auto endpoint_rule = tuple_rule(
    tuple_rule(
        dec_octet_rule, squelch( delim_rule( '.' ) ),
        dec_octet_rule, squelch( delim_rule( '.' ) ),
        dec_octet_rule, squelch( delim_rule( '.' ) ),
        dec_octet_rule ),
    optional_rule(
        tuple_rule(
            squelch( delim_rule( ':' ) ),
            unsigned_rule< unsigned short >{} ) ) );

This can be simplified; the library provides ipv4_address_rule whose result type is ipv4_address, offering more utility than representing the address simply as a collection of four numbers:

constexpr auto endpoint_rule = tuple_rule(
    ipv4_address_rule,
    optional_rule(
        tuple_rule(
            squelch( delim_rule( ':' ) ),
            unsigned_rule< unsigned short >{} ) ) );

system::result< std::tuple< ipv4_address, boost::optional< unsigned short > > > rv = parse( "192.168.0.1:443", endpoint_rule );

Variant Rule

BNF elements separated by unquoted slashes represent a set of alternatives from which one element may match. We represent them using variant_rule, whose value type is a variant. Consider the following HTTP production rule which comes from rfc7230:

request-target = origin-form
                / absolute-form
                / authority-form
                / asterisk-form

The request-target can be exactly one of these things. Here we define the rule, using origin_form_rule, absolute_uri_rule, and authority_rule which come with the library, and obtain a result from parsing a string:

constexpr auto request_target_rule = variant_rule(
    origin_form_rule,
    absolute_uri_rule,
    authority_rule,
    delim_rule('*') );

system::result< variant2::variant< url_view, url_view, authority_view, core::string_view > > rv = parse( "/results.htm?page=4", request_target_rule );

In the next section we discuss facilities to parse a repeating number of elements.