Parser - Evaluating Parser Options

Before we start digging into the rust code, we should first cover the grammar file.

It looks a lot like our BNF grammar, the biggest difference is that we have the opportunity to use some more flexible notation. For example the instead of having one rule for optional values and another for non-optional, we can use the ? to say that any existing rule is optional. When noting that values repeat, we can use + to indicate 1 or more and the * to indicate 0 or more. These might be familiar to you if you have used regular expressions.

Some other things to keep in mind when using the pest grammar syntax, the right hand side of a rule needs to be wrapped in curly braces and each segment should be separated with ~. There are some more advanced things you can do with this style but we don't need them here.

Starting from the bottom again, first we define our Decimal rule, this is really just an alias for the ASCII_DIGIT rule provided by pest.

Decimal = { ASCII_DIGIT }

Next we have Integer which is 1 or more decimals.

Integer = { Decimal+ }

Then Remainder, a period followed by an integer, notice that strings need to be wrapped in double quotes.

Remainder = { "." ~ Integer }

Now we can define our Number rule as either an Integer with an optional Remainder or an optional Integer followed by a Remainder.

Number = { (Integer ~ Remainder?) |
           (Integer? ~ Remainder)
}

Above that is all of our unit/value pairs.

Year = { Number ~ "Y" }
Week = { Number ~ "W" }
Day = { Number ~ "D" }
Hour = { Number ~ "H" }
MinuteOrMonth = { Number ~ "M" }
Second = { Number ~ "S" }

Followed by the time_section and date_section rules.

DateSection = {
    (Year? ~ MinuteOrMonth? ~ Week? ~ Day) |
    (Year? ~ MinuteOrMonth? ~ Week ~ Day?) |
    (Year? ~ MinuteOrMonth ~ Week? ~ Day?) |
    (Year ~ MinuteOrMonth? ~ Week? ~ Day?)
}
TimeSection = { "T" ~ (
    (Hour? ~ MinuteOrMonth? ~ Second) |
    (Hour? ~ MinuteOrMonth ~ Second?) |
    (Hour ~ MinuteOrMonth? ~ Second?)
    )
}

All the way at the top we have the Duration rule.

Duration = {
    "P" ~ ((DateSection ~ TimeSection?) | (DateSection? ~ TimeSection))
}

Now for the rust part, to start we are going to use a derive provided by pest for their trait Parser. The derive allows for an attribute grammar which should be assigned the relative plath to the grammar file. We apply these to a unit struct, I called mine DurationParser.


# #![allow(unused_variables)]
#fn main() {
#[derive(Parser)]
#[grammar = "duration.pest"]
pub struct DurationParser;
#}

This will create an enum called Rule that will have one variant for each of the rules in our grammar file. Here it would look something like this.


# #![allow(unused_variables)]
#fn main() {
enum Rule {
    Duration,
    DateSection,
    TimeSection,
    Year,
    Week,
    Day,
    MinuteOrMonth,
    Second,
    Number,
    Remainder,
    Integer,
    Decimal,
}
#}

Inside of the parse function, the first thing we do is call DurationParser::parse, providing the rule we are looking to parse, in this case Rule::Duration and the &str.


# #![allow(unused_variables)]
#fn main() {
pub fn parse(s: &str) -> Result<Duration, String> {
    let duration = DurationParser::parse(Rule::Duration, s)
                .map_err(|e| format!("{}", e))?
                .next()
                .unwrap();
    let ret = assemble_parts(duration)?;
    Ok(ret)
}
#}

This is going to return a Result with a Pairs in the success position. Pairs is an iterator over Pair. For our case, we just need to first Pair so we can call next to get that. Once we have that we can pass it off to assemble_parts, which will take the Pair and pull out the inner rules. You can think about that in the same way our grammar is layed out, the Duration rule had DateSection and TimeSection in its definition, so the inner pairs would be one of these two variants of the Rule enum.


# #![allow(unused_variables)]
#fn main() {
fn assemble_parts(pair: Pair<Rule>) -> Result<Duration, String> {
    let mut ret = Duration::new();
    for part in pair.into_inner() {
        match part.as_rule() {
            Rule::DateSection => {
                assemble_part(&mut ret, part, false)?;
            },
            Rule::TimeSection => {
                assemble_part(&mut ret, part, true)?;
            },
            _ => unreachable!()
        }
    }
    Ok(ret)
#}

Once we have the inner values we are going to loop over them and pass it off to assemble_part.


# #![allow(unused_variables)]
#fn main() {
fn assemble_part(d: &mut Duration, pair: pest::iterators::Pair<Rule>, time: bool) -> Result<(), String> {
    for ref part in pair.into_inner() {
        update_duration(d, part, time)?;
    }
    Ok(())
}
#}

This is again going to pull out the inner Pair which should be one of the unit value rules. Once it has pulled that out it passes that pair off to update_duration.


# #![allow(unused_variables)]
#fn main() {
fn update_duration(d: &mut Duration, pair: &Pair<Rule>, time: bool) -> Result<(), String> {
    let f = get_float(pair)?;
    match pair.as_rule() {
        Rule::Year => {
            d.set_years(f);
        },
        Rule::MinuteOrMonth => {
            if time {
                //minute
                d.set_minutes(f);
            } else {
                //month
                d.set_months(f);
            }
        },
        Rule::Week => {
            d.set_weeks(f);
        },
        Rule::Day => {
            d.set_days(f);
        }
        Rule::Hour => {
            d.set_hours(f);
        }
        Rule::Second => {
            d.set_seconds(f);
        },
        _ => unreachable!()
    }
    Ok(())
}

fn get_float(pair: &Pair<Rule>) -> Result<f32, String> {
    let s = pair.as_str();
    let s = &s[..s.len() - 1];
    s.parse().map_err(|e| format!("error parsing float: {:?} {}", s, e))
}
#}

Here we are going to first get the float value from the pair, we do this by calling as_str on the Pair which gives the full slice of the original, we know the last character is the unit so we call parse on the sub string not including that. Now that we have the value, we can just match on the Pair::as_rule which will be one of our unit variants. At each stage we have passed down a mutable reference to the Duration we are assembling, making it easier to update it as needed. That is it, we off loaded quite a bit of the logic to the parser generator.

Here are the full grammar and rust files.


Duration = {
    "P" ~ ((DateSection ~ TimeSection?) | (DateSection? ~ TimeSection))
}
DateSection = {
    (Year? ~ MinuteOrMonth? ~ Week? ~ Day) |
    (Year? ~ MinuteOrMonth? ~ Week ~ Day?) |
    (Year? ~ MinuteOrMonth ~ Week? ~ Day?) |
    (Year ~ MinuteOrMonth? ~ Week? ~ Day?)
}
TimeSection = { "T" ~ (
    (Hour? ~ MinuteOrMonth? ~ Second) |
    (Hour? ~ MinuteOrMonth ~ Second?) |
    (Hour ~ MinuteOrMonth? ~ Second?)
    )
}
Year = { Number ~ "Y" }
Week = { Number ~ "W" }
Day = { Number ~ "D" }
Hour = { Number ~ "H" }
MinuteOrMonth = { Number ~ "M" }
Second = { Number ~ "S" }
Number = { (Integer ~ Remainder?) |
           (Integer? ~ Remainder)
}
Remainder = { "." ~ Integer }
Integer = { Decimal+ }
Decimal = { ASCII_DIGIT }


# #![allow(unused_variables)]
#fn main() {
extern crate duration;
extern crate pest;
#[macro_use]
extern crate pest_derive;

use duration::Duration;
use pest::{
    Parser,
    iterators::Pair,
};

#[derive(Parser)]
#[grammar = "duration.pest"]
pub struct DurationParser;

pub fn parse(s: &str) -> Result<Duration, String> {
    let duration = DurationParser::parse(Rule::Duration, s)
                .map_err(|e| format!("{}", e))?
                .next()
                .unwrap();
    let ret = assemble_parts(duration)?;
    Ok(ret)
}

fn assemble_parts(pair: Pair<Rule>) -> Result<Duration, String> {
    let mut ret = Duration::new();
    for part in pair.into_inner() {
        match part.as_rule() {
            Rule::DateSection => {
                assemble_part(&mut ret, part, false)?;
            },
            Rule::TimeSection => {
                assemble_part(&mut ret, part, true)?;
            },
            _ => unreachable!()
        }
    }
    Ok(ret)
}

fn assemble_part(d: &mut Duration, pair: pest::iterators::Pair<Rule>, time: bool) -> Result<(), String> {
    for ref part in pair.into_inner() {
        update_duration(d, part, time)?;
    }
    Ok(())
}

fn update_duration(d: &mut Duration, pair: &Pair<Rule>, time: bool) -> Result<(), String> {
    let f = get_float(pair)?;
    match pair.as_rule() {
        Rule::Year => {
            d.set_years(f);
        },
        Rule::MinuteOrMonth => {
            if time {
                //minute
                d.set_minutes(f);
            } else {
                //month
                d.set_months(f);
            }
        },
        Rule::Week => {
            d.set_weeks(f);
        },
        Rule::Day => {
            d.set_days(f);
        }
        Rule::Hour => {
            d.set_hours(f);
        }
        Rule::Second => {
            d.set_seconds(f);
        },
        _ => unreachable!()
    }
    Ok(())
}

fn get_float(pair: &Pair<Rule>) -> Result<f32, String> {
    let s = pair.as_str();
    let s = &s[..s.len() - 1];
    s.parse().map_err(|e| format!("error parsing float: {:?} {}", s, e))
}

#[cfg(test)]
mod test {
    use super::*;
    #[test]
    fn one_of_each() {
        parse("P1Y1M1W1DT1H1M1.1S").unwrap();
    }
}
#}

Evaluating Parser Options

Pest

D-E-M-O

demo!

demo!