Pest
Before we start digging into the rust code, we should first cover the grammar file.
It looks a lot like our BNF grammar, the biggest difference is that we have the opportunity to use some more flexible notation. For example the instead of having one rule for optional values and another for non-optional, we can use the ?
to say that any existing rule is optional. When noting that values repeat, we can use +
to indicate 1 or more and the *
to indicate 0 or more. These might be familiar to you if you have used regular expressions.
Some other things to keep in mind when using the pest
grammar syntax, the right hand side of a rule needs to be wrapped in curly braces and each segment should be separated with ~
. There are some more advanced things you can do with this style but we don't need them here.
Starting from the bottom again, first we define our Decimal
rule, this is really just an alias for the ASCII_DIGIT
rule provided by pest
.
Decimal = { ASCII_DIGIT }
Next we have Integer
which is 1 or more decimal
s.
Integer = { Decimal+ }
Then Remainder
, a period followed by an integer, notice that strings need to be wrapped in double quotes.
Remainder = { "." ~ Integer }
Now we can define our Number
rule as either an Integer
with an optional Remainder
or an optional Integer
followed by a Remainder
.
Number = { (Integer ~ Remainder?) |
(Integer? ~ Remainder)
}
Above that is all of our unit/value pairs.
Year = { Number ~ "Y" }
Week = { Number ~ "W" }
Day = { Number ~ "D" }
Hour = { Number ~ "H" }
MinuteOrMonth = { Number ~ "M" }
Second = { Number ~ "S" }
Followed by the time_section
and date_section
rules.
DateSection = {
(Year? ~ MinuteOrMonth? ~ Week? ~ Day) |
(Year? ~ MinuteOrMonth? ~ Week ~ Day?) |
(Year? ~ MinuteOrMonth ~ Week? ~ Day?) |
(Year ~ MinuteOrMonth? ~ Week? ~ Day?)
}
TimeSection = { "T" ~ (
(Hour? ~ MinuteOrMonth? ~ Second) |
(Hour? ~ MinuteOrMonth ~ Second?) |
(Hour ~ MinuteOrMonth? ~ Second?)
)
}
All the way at the top we have the Duration
rule.
Duration = {
"P" ~ ((DateSection ~ TimeSection?) | (DateSection? ~ TimeSection))
}
Now for the rust part, to start we are going to use a derive provided by pest for their trait Parser
. The derive allows for an attribute grammar
which should be assigned the relative plath to the grammar file. We apply these to a unit struct, I called mine DurationParser
.
# #![allow(unused_variables)] #fn main() { #[derive(Parser)] #[grammar = "duration.pest"] pub struct DurationParser; #}
This will create an enum called Rule
that will have one variant for each of the rules in our grammar file. Here it would look something like this.
# #![allow(unused_variables)] #fn main() { enum Rule { Duration, DateSection, TimeSection, Year, Week, Day, MinuteOrMonth, Second, Number, Remainder, Integer, Decimal, } #}
Inside of the parse
function, the first thing we do is call DurationParser::parse
, providing the rule we are looking to parse, in this case Rule::Duration
and the &str
.
# #![allow(unused_variables)] #fn main() { pub fn parse(s: &str) -> Result<Duration, String> { let duration = DurationParser::parse(Rule::Duration, s) .map_err(|e| format!("{}", e))? .next() .unwrap(); let ret = assemble_parts(duration)?; Ok(ret) } #}
This is going to return a Result
with a Pairs
in the success position. Pairs
is an iterator
over Pair
. For our case, we just need to first Pair
so we can call next
to get that. Once we have that we can pass it off to assemble_parts
, which will take the Pair
and pull out the inner rules. You can think about that in the same way our grammar is layed out, the Duration
rule had DateSection
and TimeSection
in its definition, so the inner pairs would be one of these two variants of the Rule
enum.
# #![allow(unused_variables)] #fn main() { fn assemble_parts(pair: Pair<Rule>) -> Result<Duration, String> { let mut ret = Duration::new(); for part in pair.into_inner() { match part.as_rule() { Rule::DateSection => { assemble_part(&mut ret, part, false)?; }, Rule::TimeSection => { assemble_part(&mut ret, part, true)?; }, _ => unreachable!() } } Ok(ret) #}
Once we have the inner values we are going to loop over them and pass it off to assemble_part
.
# #![allow(unused_variables)] #fn main() { fn assemble_part(d: &mut Duration, pair: pest::iterators::Pair<Rule>, time: bool) -> Result<(), String> { for ref part in pair.into_inner() { update_duration(d, part, time)?; } Ok(()) } #}
This is again going to pull out the inner Pair
which should be one of the unit value rules. Once it has pulled that out it passes that pair off to update_duration
.
# #![allow(unused_variables)] #fn main() { fn update_duration(d: &mut Duration, pair: &Pair<Rule>, time: bool) -> Result<(), String> { let f = get_float(pair)?; match pair.as_rule() { Rule::Year => { d.set_years(f); }, Rule::MinuteOrMonth => { if time { //minute d.set_minutes(f); } else { //month d.set_months(f); } }, Rule::Week => { d.set_weeks(f); }, Rule::Day => { d.set_days(f); } Rule::Hour => { d.set_hours(f); } Rule::Second => { d.set_seconds(f); }, _ => unreachable!() } Ok(()) } fn get_float(pair: &Pair<Rule>) -> Result<f32, String> { let s = pair.as_str(); let s = &s[..s.len() - 1]; s.parse().map_err(|e| format!("error parsing float: {:?} {}", s, e)) } #}
Here we are going to first get the float value from the pair, we do this by calling as_str
on the Pair
which gives the full slice of the original, we know the last character is the unit so we call parse
on the sub string not including that. Now that we have the value, we can just match on the Pair::as_rule
which will be one of our unit variants. At each stage we have passed down a mutable reference to the Duration
we are assembling, making it easier to update it as needed. That is it, we off loaded quite a bit of the logic to the parser generator.
Here are the full grammar and rust files.
Duration = {
"P" ~ ((DateSection ~ TimeSection?) | (DateSection? ~ TimeSection))
}
DateSection = {
(Year? ~ MinuteOrMonth? ~ Week? ~ Day) |
(Year? ~ MinuteOrMonth? ~ Week ~ Day?) |
(Year? ~ MinuteOrMonth ~ Week? ~ Day?) |
(Year ~ MinuteOrMonth? ~ Week? ~ Day?)
}
TimeSection = { "T" ~ (
(Hour? ~ MinuteOrMonth? ~ Second) |
(Hour? ~ MinuteOrMonth ~ Second?) |
(Hour ~ MinuteOrMonth? ~ Second?)
)
}
Year = { Number ~ "Y" }
Week = { Number ~ "W" }
Day = { Number ~ "D" }
Hour = { Number ~ "H" }
MinuteOrMonth = { Number ~ "M" }
Second = { Number ~ "S" }
Number = { (Integer ~ Remainder?) |
(Integer? ~ Remainder)
}
Remainder = { "." ~ Integer }
Integer = { Decimal+ }
Decimal = { ASCII_DIGIT }
# #![allow(unused_variables)] #fn main() { extern crate duration; extern crate pest; #[macro_use] extern crate pest_derive; use duration::Duration; use pest::{ Parser, iterators::Pair, }; #[derive(Parser)] #[grammar = "duration.pest"] pub struct DurationParser; pub fn parse(s: &str) -> Result<Duration, String> { let duration = DurationParser::parse(Rule::Duration, s) .map_err(|e| format!("{}", e))? .next() .unwrap(); let ret = assemble_parts(duration)?; Ok(ret) } fn assemble_parts(pair: Pair<Rule>) -> Result<Duration, String> { let mut ret = Duration::new(); for part in pair.into_inner() { match part.as_rule() { Rule::DateSection => { assemble_part(&mut ret, part, false)?; }, Rule::TimeSection => { assemble_part(&mut ret, part, true)?; }, _ => unreachable!() } } Ok(ret) } fn assemble_part(d: &mut Duration, pair: pest::iterators::Pair<Rule>, time: bool) -> Result<(), String> { for ref part in pair.into_inner() { update_duration(d, part, time)?; } Ok(()) } fn update_duration(d: &mut Duration, pair: &Pair<Rule>, time: bool) -> Result<(), String> { let f = get_float(pair)?; match pair.as_rule() { Rule::Year => { d.set_years(f); }, Rule::MinuteOrMonth => { if time { //minute d.set_minutes(f); } else { //month d.set_months(f); } }, Rule::Week => { d.set_weeks(f); }, Rule::Day => { d.set_days(f); } Rule::Hour => { d.set_hours(f); } Rule::Second => { d.set_seconds(f); }, _ => unreachable!() } Ok(()) } fn get_float(pair: &Pair<Rule>) -> Result<f32, String> { let s = pair.as_str(); let s = &s[..s.len() - 1]; s.parse().map_err(|e| format!("error parsing float: {:?} {}", s, e)) } #[cfg(test)] mod test { use super::*; #[test] fn one_of_each() { parse("P1Y1M1W1DT1H1M1.1S").unwrap(); } } #}