Can anyone explain the regexps in #match and #consume #7

adamakhtar · 2012-07-30T04:03:03Z

@codereading/readers

def consume(pattern)
    return unless match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)
    binding.pry
    path, *vars = match.captures

    env["SCRIPT_NAME"] += "/#{path}"
    env["PATH_INFO"] = "#{vars.pop}#{match.post_match}"

    captures.push(*vars)
  end
  private :consume

  def match(matcher, segment = "([^\\/]+)")
    case matcher
    when String then 
      consume(matcher.gsub(/:\w+/, segment))
    when Regexp then consume(matcher)
    when Symbol then consume(segment)
    when Proc   then matcher.call
    else
      matcher
    end
  end

consume(matcher.gsub(/:\w+/, segment))

and

match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)

just what exactly are these two regexps doing?

The text was updated successfully, but these errors were encountered:

adamakhtar · 2012-07-30T08:43:09Z

Ok got it.

When we define a route with the dsl such as

on "posts/:post_id/comment/:id" do |post_id, id| 
...

Cuba needs a way to compare that with every request that comes in to see if the actual path matches our route.

The most obvious way would be to use regexps such as in this psedo code

if actual_request_path.match (our_dsl_defintion) then return result

But as our DSL definition stands it wont work as a regexp. So Cuba needs to interpret our rule and convert it into a regex.

This happens in #match

matcher.gsub(/:\w+/, segment)

where segment is a parameter with a default value of "([^\/]+)"

and matcher is our dsl rule "posts/:post_id/comment/:id"

the result of the gsub is
"posts/([^\\/]+)/comment/([^\\/]+)"

and that is the regexp that Cuba will use to check against the response path.

The ([^\/]+) regex simply means

match any non / or \ characters once or more times i.e. the 56 in posts/56 would match this.

adamakhtar · 2012-07-30T08:49:13Z

Consume

Now in #consume Cuba uses the previously constructed regex to check if it matches the given path.

return unless match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)

Now whilst I can understand what the above is doing I dont know how it is doing it.

Can any @codereading/readers help with this regex?

I understand pattern part - thats our regexp from before. The last part

((?:\/|\z))

is difficult to understand.

Why is it in double brackets?

And why what is this checking for

(/\A\/ at the beginning?

ericgj · 2012-07-31T13:07:43Z

\A and \z match the beginning and end of the string respectively. I didn't know about the ?: construct before, but according to the docs,

The (?:…) construct provides grouping without capturing.

Useful. But I'm not sure the advantage here since the double-parens mean that it does capture the closing forward-slash or end-of-line. So I don't see any difference between this and /\A\/(#{pattern})(\/|\z)/ Maybe it's for performance reasons.

theldoria · 2012-08-01T05:18:54Z

I don't think /\A/(#{pattern})((?:/|\z))/ is better performing than /\A/(#{pattern})(/|\z)/, and I could not find any other evidence while running a simple benchmark (see https://gist.github.com/3223768):

posts/0/comment/1 -- 
posts/0/comment/1 -- /
posts/0/comment/1 -- /x
Rehearsal --------------------------------------------
regexp_a   5.562000   0.110000   5.672000 (  5.687500)
----------------------------------- total: 5.672000sec

               user     system      total        real
regexp_a   5.594000   0.109000   5.703000 (  5.937500)
Rehearsal --------------------------------------------
regexp_b   5.594000   0.062000   5.656000 (  5.671875)
----------------------------------- total: 5.656000sec

               user     system      total        real
regexp_b   5.578000   0.032000   5.610000 (  5.609375)
Rehearsal --------------------------------------------
regexp_c   5.547000   0.093000   5.640000 (  6.046875)
----------------------------------- total: 5.640000sec

               user     system      total        real
regexp_c   5.641000   0.047000   5.688000 (  5.687500)

You may note that I tried a third regexp as well, because I guess the expression should not only match / but also \ at the end.

By the way, you may find the regexp idiom ((?:x|y)+) when a repeated group should be captured. For example ((?:ab|cd)+) matches abcd and captures abcd, while (ab|cd)+ also matches abcd, but captures only cd.

cyx · 2012-08-06T07:26:03Z

Very nice catch guys! I think @soveran has already pushed the revision (07d77d4).

@codereading == win! :-)

adamakhtar · 2012-08-13T15:28:09Z

@ericgj and @theldoria sorry for the late reply. thanks very much for explaining that and great that it contributed to a revision.

The issue was this covered in this thread: codereading#7 (comment) Turns out it was something we overlooked, based on a previous implementation.

foca pushed a commit to foca/cuba that referenced this issue Apr 11, 2014

Remove unnecessary lookahead capture.

07d77d4

The issue was this covered in this thread: codereading#7 (comment) Turns out it was something we overlooked, based on a previous implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can anyone explain the regexps in #match and #consume #7

Can anyone explain the regexps in #match and #consume #7

adamakhtar commented Jul 30, 2012

adamakhtar commented Jul 30, 2012

adamakhtar commented Jul 30, 2012

ericgj commented Jul 31, 2012

theldoria commented Aug 1, 2012

cyx commented Aug 6, 2012

adamakhtar commented Aug 13, 2012

Can anyone explain the regexps in #match and #consume #7

Can anyone explain the regexps in #match and #consume #7

Comments

adamakhtar commented Jul 30, 2012

adamakhtar commented Jul 30, 2012

adamakhtar commented Jul 30, 2012

Consume

ericgj commented Jul 31, 2012

theldoria commented Aug 1, 2012

cyx commented Aug 6, 2012

adamakhtar commented Aug 13, 2012