Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can anyone explain the regexps in #match and #consume #7

Open
adamakhtar opened this issue Jul 30, 2012 · 6 comments
Open

Can anyone explain the regexps in #match and #consume #7

adamakhtar opened this issue Jul 30, 2012 · 6 comments

Comments

@adamakhtar
Copy link
Member

@codereading/readers

def consume(pattern)
    return unless match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)
    binding.pry
    path, *vars = match.captures

    env["SCRIPT_NAME"] += "/#{path}"
    env["PATH_INFO"] = "#{vars.pop}#{match.post_match}"

    captures.push(*vars)
  end
  private :consume

  def match(matcher, segment = "([^\\/]+)")
    case matcher
    when String then 
      consume(matcher.gsub(/:\w+/, segment))
    when Regexp then consume(matcher)
    when Symbol then consume(segment)
    when Proc   then matcher.call
    else
      matcher
    end
  end

consume(matcher.gsub(/:\w+/, segment))

and

match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)

just what exactly are these two regexps doing?

@adamakhtar
Copy link
Member Author

Ok got it.

When we define a route with the dsl such as

on "posts/:post_id/comment/:id" do |post_id, id| 
...

Cuba needs a way to compare that with every request that comes in to see if the actual path matches our route.

The most obvious way would be to use regexps such as in this psedo code

if actual_request_path.match (our_dsl_defintion) then return result

But as our DSL definition stands it wont work as a regexp. So Cuba needs to interpret our rule and convert it into a regex.

This happens in #match

matcher.gsub(/:\w+/, segment)

where segment is a parameter with a default value of "([^\/]+)"

and matcher is our dsl rule "posts/:post_id/comment/:id"

the result of the gsub is
"posts/([^\\/]+)/comment/([^\\/]+)"

and that is the regexp that Cuba will use to check against the response path.

The ([^\/]+) regex simply means

match any non / or \ characters once or more times i.e. the 56 in posts/56 would match this.

@adamakhtar
Copy link
Member Author

Consume

Now in #consume Cuba uses the previously constructed regex to check if it matches the given path.

return unless match = env["PATH_INFO"].match(/\A\/(#{pattern})((?:\/|\z))/)

Now whilst I can understand what the above is doing I dont know how it is doing it.

Can any @codereading/readers help with this regex?

I understand pattern part - thats our regexp from before. The last part

((?:\/|\z))

is difficult to understand.

Why is it in double brackets?

And why what is this checking for

(/\A\/ at the beginning?

@ericgj
Copy link
Member

ericgj commented Jul 31, 2012

\A and \z match the beginning and end of the string respectively. I didn't know about the ?: construct before, but according to the docs,

The (?:…) construct provides grouping without capturing.

Useful. But I'm not sure the advantage here since the double-parens mean that it does capture the closing forward-slash or end-of-line. So I don't see any difference between this and /\A\/(#{pattern})(\/|\z)/ Maybe it's for performance reasons.

@theldoria
Copy link

I don't think /\A/(#{pattern})((?:/|\z))/ is better performing than /\A/(#{pattern})(/|\z)/, and I could not find any other evidence while running a simple benchmark (see https://gist.github.com/3223768):

posts/0/comment/1 -- 
posts/0/comment/1 -- /
posts/0/comment/1 -- /x
Rehearsal --------------------------------------------
regexp_a   5.562000   0.110000   5.672000 (  5.687500)
----------------------------------- total: 5.672000sec

               user     system      total        real
regexp_a   5.594000   0.109000   5.703000 (  5.937500)
Rehearsal --------------------------------------------
regexp_b   5.594000   0.062000   5.656000 (  5.671875)
----------------------------------- total: 5.656000sec

               user     system      total        real
regexp_b   5.578000   0.032000   5.610000 (  5.609375)
Rehearsal --------------------------------------------
regexp_c   5.547000   0.093000   5.640000 (  6.046875)
----------------------------------- total: 5.640000sec

               user     system      total        real
regexp_c   5.641000   0.047000   5.688000 (  5.687500)

You may note that I tried a third regexp as well, because I guess the expression should not only match / but also \ at the end.

By the way, you may find the regexp idiom ((?:x|y)+) when a repeated group should be captured. For example ((?:ab|cd)+) matches abcd and captures abcd, while (ab|cd)+ also matches abcd, but captures only cd.

@cyx
Copy link

cyx commented Aug 6, 2012

Very nice catch guys! I think @soveran has already pushed the revision (07d77d4).

@codereading == win! :-)

@adamakhtar
Copy link
Member Author

@ericgj and @theldoria sorry for the late reply. thanks very much for explaining that and great that it contributed to a revision.

foca pushed a commit to foca/cuba that referenced this issue Apr 11, 2014
The issue was this covered in this thread:

codereading#7 (comment)

Turns out it was something we overlooked, based on a previous
implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants