zetta-parser provides an easy to use Parser combinator library that allows you to parse strings easily by composing simple parsers together to create more powerful ones.
Basic parsers can be found in zetta.parser.seq
, this parsers will work
with mostly any type of items you find on a stream, some others such
as string
and number
expect to process a stream of characters.
[org.van-clj/zetta-parser "0.1.0"]
zetta-parser provides several namespaces, each with an specific functionality:
-
zetta.core
Holds the basic functions to start using a zetta-parser, such as the parser runners, a monadic implementation for parsers, etc. -
zetta.parser.seq
Holds the most basic parsers you may find, you can use this parsers out of the box to create more complex ones using thezetta.combinators
namespace. -
zetta.parse.string
Implements some of thezetta.parser.seq
namespace parsers so that they always return a string result rather than a seq -
zetta.combinators
Contains useful parsers transformers like many, sep-by, among others, this functions will allow you to enhance the behavior of simple parsers to allow them parse more complex inputs.
do-parse is a macro that will allow you to implement parsers using a monadic notation like the one provided by bwo monads library. This kinds of parsers are really handy when the behavior of the parser changes as you parse through the input.
(ns example
(:require
[zetta.core :refer :all]
[zetta.parser.seq :as pseq]
[zetta.parser.string :as pstr]
[zetta.combinators :as pc]))
;; sub parsers that are going to be used
(def parse-movie ...)
(def parse-patient ...)
(def parse-program ...)
(def parse-professional
(do-parser
pseq/skip-spaces
name <- (pstr/take-till #(Character/isSpace %))
;; ^ assigns name to a string up until a space
pseq/skip-spaces
profession <- (pstr/take-till #(Character/isSpace %))
;; ^ assigns profession to a string up until a space
;; check for profession to change parse strategies
;; dynamically
(cond
(= profession "actor")
(do-parser
movies <- (pc/sep-by1 parse-movie (pseq/char \,))
;; ^ parse many movies separated by commas, using
;; the parser of a single movie
(always (Actor. name movies)))
(= profession "doctor")
(do-parser
patients <- (pc/sep-by1 parse-patient (pseq/char \,))
;; ^ parse many patients separated by commas, using
;; the parser of a single patient
(always (Doctor. name patients)))
(= profession "programmer")
(do-parser
programs <- (pc/sep-by1 parse-program (pseq/char \,))
;; ^ parse many programs separated by commas, using
;; the parser of a single program
(always (Programmer. name programs)))
:else
(fail-parser (str "Invalid profession: " profession))
;; ^ fail parser if an invalid profession is given
)))
Most of the times however, the behavior of your parser won't change
depending on the input you are parsing, this is when the with-parser
macro
and the applicative functors macros come handy; zetta-parser provides
high order macros to go through the input and return the types you want.
The <$>
function will receive a normal function as it's first parameter,
the rest of the parameters are going to be a parsers, at the end the result
of each parser is going to be an input parameter for the function that was
specified in the first parameter of the function.
The *>
macro will receive multiple parsers, is going to execute each of
them, and is going to return the value of the last parser to the right, there
is also the <*
macro that will do the same thing, but will return the value
of first parser to the left.
Example:
(def parse-programmer
(<$> #(Programmer. %1 %2)
; ^ A function that is going to receive two parameters, two
; parsers should follow this parameter.
(*> spaces (many-till space)) ; this is %1
; ^ this parser will parse spaces, ignore them and return the result
; of the (many-till space) parser, this will be the %1 on the
; function given on the first parameter.
(*> spaces (many-till space))))
; ^ this will do the same as the parser given in the second parameter
; of <$>, the return value of this will be %2 on the the function
; given on the first parameter.
Most of the parser libraries out there will parse as long as you have all the input you want to parse at once, this is really limiting given that sometimes all the input to parse is not available (input streaming from a connection and such).
zetta-parser provides the parse
function which will parse the given input
using a given parser, if there is not enough input to either fail or return
a result, the parse function will return a continuation function that will
receive the remaining of the input when available, if this continuation
function receives a string, this function will either return a parsed result,
a failure or another function continuation. In case you have a continuation
and you pass an empty string to it, the parser will stop and will return either
a failure or a successful parsed result.
The parse-once
function will behave like the parse function of any of the
other parser libraries.
For more info, please clone the project and execute lein marg
to get a great
summary of the zetta.parser.seq
and zetta.combinators
namespaces.
(ns zetta.examples.csv
^{ :doc "Naive CSV parser" }
(:refer-clojure :exclude [char])
(:require [clojure.core :as core]
[clojure.string :as str]
[clojure.java.io :as io]
[zetta.core :refer :all]
[zetta.combinators
:refer [sep-by1 around many many1 choice]]
[zetta.parser.seq
:refer [char not-char spaces eol end-of-input]]))
(defrecord CSVFile [titles values])
(def csv-sep
(char \,))
(def csv-key
(<$> str/join
(many1
(around spaces (not-char #{\, \newline})))))
(def csv-entry
(<* (sep-by1 csv-key
csv-sep)
(<|> eol end-of-input)))
(def csv-file
(<$> #(CSVFile. %1 %2)
csv-entry
(many csv-entry)))
Copyright (C) 2012-2015 Roman Gonzalez and contributors.
Distributed under the Eclipse Public License, the same as Clojure.