Combine
To start, I want to point out how combine
is able to create combinators w/o using macros. It heavily leverages the impl Trait
feature released this year. looking the signature of the float
parser:
# #![allow(unused_variables)] #fn main() { fn float<I>() -> impl Parser<Input = I, Output = f32> where I: Stream<Item = char>, I::Error: ParseError<I::Item, I::Range, I::Position>, { ( many1(digit()), optional(( char('.'), many1::<String, _>(digit()) )) ).map(|(int, rem)| { let f = if let Some(rem) = rem { format!("{}.{}", int, rem.1) } else { int }; f.parse().unwrap() }) } #}
We are saying that this function will return a parser who's input is of type I
and output is an f32
. Additionally we constrain I
saying it needs to implement Stream
, where the Item
type is a char
, and the Error
type of I
needs to implement ParseError
. ParseError
is also generic so we pass along the properties of I
, Item
, Range
, and Position
. This is pretty verbose, but I found that building a simple parser I didn't need to worry a ton about what it meant, instead just using it as a blueprint for all of my parser functions. That is to say, I just copied and pasted this whenever I wanted to add another parser to a source file.
One thing to keep in mind is that these functions are not parsing the input but returning a parser that will be able to parse the input. We define these by combining them together and mapping over the result if successful. To indicate that parsers are chained together in a sequence we wrap them in either an array if they are all the exact same type otherwise we would use a tuple. Looking at the body of float
we are using a tuple, with two parsers inside it. The first parser is many1
this takes a parser as an argument and applies it a minimum of 1 times but will collect the results until in argument parser fails, we passed in the result of calling digit
which is a parser that will get us a single digit number. Next we have optional
this takes in a parser and wraps the result in an Option
, we pass it a tuple of 2 parsers the char
parser, this is a single character and another many1
with digit
as its argument. Notice in this second call to many1
we included a type annotation, many1
operates similar to the collect
method on an iterator
, the result could be a number of different collections we are just telling it it should be a String
.
We now call map
on our tuple, this essentially is saying if you find the pattern we defined, call this closure to generate the Output
. In this closure we are going to check and see if the remainder exists and coalesce that into a string with the integer portion if not just use the integer portion. We then call str::parse
on that coalesced value.
Moving up the file we next see value_pair
the signature is similar to float
though the Output
type is DurationPart
and it takes in an argument time
to determine if M
is a minute or a month.
# #![allow(unused_variables)] #fn main() { fn value_pair<I>(time: bool) -> impl Parser<Input = I, Output = DurationPart> where I: Stream<Item = char>, I::Error: ParseError<I::Item, I::Range, I::Position>, { ( float(), choice([ char('Y'), char('M'), char('W'), char('D'), char('H'), char('S'), ])).map(move |(v, c): (f32, char)| { match c { 'Y' => DurationPart::Years(v), 'M' => if time { DurationPart::Minutes(v) } else { DurationPart::Months(v) }, 'W' => DurationPart::Weeks(v), 'D' => DurationPart::Days(v), 'H' => DurationPart::Hours(v), 'S' => DurationPart::Seconds(v), _ => unreachable!() } }) } #}
Again here we have a tuple, the first parser is the float
parser we just defined, next is a choice
parser, this takes in an array or tuple of parsers and tries each one starting at the top, stopping at the first successful. We have passed choice
an array of char
parsers with our unit characters. We map our tuple, this time our closure has the move
keyword to take ownership of the time
argument. We have annotated the argument to our closure to help the compiler figure out what we are trying to do here. map
on a parser always takes 1 argument, this will be a tuple of the results, for us that is f32
and char
. We simply match on the char
and generate the correct DurationPart
variant as per the letter and the time flag.
Next we have the time_part
parser.
# #![allow(unused_variables)] #fn main() { fn time_part<I>() -> impl Parser<Input = I, Output = Vec<DurationPart>> where I: Stream<Item = char>, I::Error: ParseError<I::Item, I::Range, I::Position>, { ( char('T'), many1(value_pair(true)) ).map(|(_, p)| p) } #}
Here we have another tuple, the first parser is a call to char
with our T
indicator that this is time based values, the second is a many1
call to the value_pair
parser we just defined, passing true
for the time
flag. The map here is simply discarding the T
character, that make the Output
type annotation work for p
which means it also work for many1
.
After time_part
we have date_part
.
# #![allow(unused_variables)] #fn main() { fn date_part<I>() -> impl Parser<Input = I, Output = Vec<DurationPart>> where I: Stream<Item = char>, I::Error: ParseError<I::Item, I::Range, I::Position>, { ( many1(value_pair(false)) ) } #}
This one is just the many1
call to value_pair
passing false
for the time
flag. Notice that we don't need a map
here since no additional changes need to be made to match the Output
.
The last of the parsers we are going to define is duration
# #![allow(unused_variables)] #fn main() { fn duration<I>() -> impl Parser<Input = I, Output = Duration> where I: Stream<Item = char>, I::Error: ParseError<I::Item, I::Range, I::Position>, { ( char('P'), optional(date_part()), optional(time_part()), ).map(|(_, d, t)| { let mut ret = Duration::new(); for part in d.unwrap_or(Vec::new()).iter().chain(t.unwrap_or(Vec::new()).iter()) { match part { DurationPart::Years(v) => ret.set_years(*v), DurationPart::Months(v) => ret.set_months(*v), DurationPart::Weeks(v) => ret.set_weeks(*v), DurationPart::Days(v) => ret.set_days(*v), DurationPart::Hours(v) => ret.set_hours(*v), DurationPart::Minutes(v) => ret.set_minutes(*v), DurationPart::Seconds(v) => ret.set_seconds(*v), } } ret }) } #}
This is another tuple indicating the sequence of a char
of P
followed by an optional
date_part
followed by an optional
time_part
. The argument to the map
closure would be of type (char, Option<Vec<DurationPart>>, Option<Vec<DurationPart>>)
. In the body we loop over those two Vec
s if they exist, adding them to a Duration
returning the result.
Finally there is the parse
function.
# #![allow(unused_variables)] #fn main() { pub fn parse(s: &str) -> Result<Duration, String> { let d = duration().parse(s).map_err(|e| format!("{}", e))?; Ok(d.0) } #}
Which creates a Duration
parser by calling duration
and then passes a &str
to the parse
method, this is one of the methods defined on the Parser
trait. Parse
returns a Result
and in the Ok
position we have a tuple, the first item has the same type as Output
and the second is the remaining input to parse. For us the Output
is going to be a Duration
so we return Ok(d.0)
Here is the full source file for the combine
parser.
# #![allow(unused_variables)] #fn main() { extern crate duration; use duration::{Duration, DurationPart}; extern crate combine; use combine::{ choice, char::{char, digit}, many1, optional, Parser, ParseError, Stream, }; pub fn parse(s: &str) -> Result<Duration, String> { let d = duration().parse(s).map_err(|e| format!("{}", e))?; Ok(d.0) } fn duration<I>() -> impl Parser<Input = I, Output = Duration> where I: Stream<Item = char>, I::Error: ParseError<I::Item, I::Range, I::Position>, { ( char('P'), optional(date_part()), optional(time_part()), ).map(|(_, d, t)| { let mut ret = Duration::new(); for part in d.unwrap_or(Vec::new()).iter().chain(t.unwrap_or(Vec::new()).iter()) { match part { DurationPart::Years(v) => ret.set_years(*v), DurationPart::Months(v) => ret.set_months(*v), DurationPart::Weeks(v) => ret.set_weeks(*v), DurationPart::Days(v) => ret.set_days(*v), DurationPart::Hours(v) => ret.set_hours(*v), DurationPart::Minutes(v) => ret.set_minutes(*v), DurationPart::Seconds(v) => ret.set_seconds(*v), } } ret }) } fn date_part<I>() -> impl Parser<Input = I, Output = Vec<DurationPart>> where I: Stream<Item = char>, I::Error: ParseError<I::Item, I::Range, I::Position>, { ( many1(value_pair(false)) ) } fn time_part<I>() -> impl Parser<Input = I, Output = Vec<DurationPart>> where I: Stream<Item = char>, I::Error: ParseError<I::Item, I::Range, I::Position>, { ( char('T'), many1(value_pair(true)) ).map(|(_, p)| p) } fn value_pair<I>(time: bool) -> impl Parser<Input = I, Output = DurationPart> where I: Stream<Item = char>, I::Error: ParseError<I::Item, I::Range, I::Position>, { ( float(), choice([ char('Y'), char('M'), char('W'), char('D'), char('H'), char('S'), ])).map(move |(v, c): (f32, char)| { match c { 'Y' => DurationPart::Years(v), 'M' => if time { DurationPart::Minutes(v) } else { DurationPart::Months(v) }, 'W' => DurationPart::Weeks(v), 'D' => DurationPart::Days(v), 'H' => DurationPart::Hours(v), 'S' => DurationPart::Seconds(v), _ => unreachable!() } }) } fn float<I>() -> impl Parser<Input = I, Output = f32> where I: Stream<Item = char>, I::Error: ParseError<I::Item, I::Range, I::Position>, { ( many1(digit()), optional(( char('.'), many1::<String, _>(digit()) )) ).map(|(int, rem)| { let f = if let Some(rem) = rem { format!("{}.{}", int, rem.1) } else { int }; f.parse().unwrap() }) } #}