Skip to content
forked from marcomaggi/cre2

C language wrapper for RE2 the regular expressions library from Google

License

Unknown, BSD-3-Clause licenses found

Licenses found

Unknown
COPYING
BSD-3-Clause
LICENSE.re2
Notifications You must be signed in to change notification settings

tsingakbar/cre2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a fork from http://github.com/marcomaggi/cre2/.

with the following changes

  • use cmake to repace autoconf, will only generate a static lib.
  • add shortcut codes in DEFINE_MATCH_REX_FUN when there're no needs to return any matches to speed up my own usage.
  • includes go wrappers adapted from https://github.com/wordijp/golang-re2, which patched the cre2's header for better cgo integration.
  • modify the go wrapper to compile with newer Go compiler(see the notes below), and also some modification to accommodate the previous optimization in C binding's DEFINE_MATCH_REX_FUN.

to use the C binding

  1. install linux distro provided re2 package or build re2 by yourself, and make sure static lib libre2.a is finally available no matter in which way.
  2. mkdir build && cd build
  3. cmake .. will try to use re2 in system paths. but if it's not found, build re2 by yourself and hint cmake like cmake -Dre2_DIR=/<re2-src>/build/install/lib/cmake/re2 ..
    • re2_DIR should contains a usable cmake import script like re2Config.cmake
    • use -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-g" to build the most optimized(-O3) version with debugging symbols
    • use -DCMAKE_BUILD_TYPE=RelWithDebInfo to build the regular optimized(-O2) version with debugging symbols
  4. run make install to get the installed result in build/install/
bool cre2_demo(std::string regstr, std::string textstr) {
    bool ret = false;
    auto opt = cre2_opt_new();
    cre2_opt_set_log_errors(opt, 0);
    cre2_opt_set_max_mem(opt, 1024*1024*10);
    auto regexp = cre2_new(regstr.c_str(), regstr.length(), opt);
    const cre2_string_t text = {textstr.c_str(), int(textstr.length())};
    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < 100000; ++i) {
      ret = cre2_partial_match_re(regexp, &text, NULL, 0);
    }
    auto finish = std::chrono::high_resolution_clock::now();
    std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(finish-start).count() << "ns\n";
    cre2_delete(regexp);
    cre2_opt_delete(opt);
    return ret;
}

to use the go binding

  1. make sure the C binding is properly compiled.
  2. copy or soft link libre2.a and libstdc++.a besides c static binding libcre2.a.
  3. go get -v github.com/tsingakbar/cre2
  4. import this package and use it in your code. currently the go wrapper is 1.5-2.0x slower than the C bindings in my tests. but it is still much more faster than golang's regexp stdlib package which claims it is implemented with the same algorithm as RE2, but completely written in golang.
var (
	regexpFilter *re2.Regexp
	closer       *re2.Closer
	err error
)
if regexpFilter, closer, err = re2.Compile(regexpStr); err != nil {
	panic(err)
}
var result bool
var bEpoch = time.Now()
for i := 0; i < 100000; i++ {
	result = regexpFilter.Match(textbytes)
}
fmt.Printf("cre2 %v\n", time.Since(bEpoch))
fmt.Println(result)
closer.Close(regexpFilter)

NOTE: Since Go 1.6, you can no longer create a C struct like cre2_string_t in Go memory while setting one of its field to another pointer in Go memory like setting cre2_string_t::data to a slice's pointer. To solve this problem, you have to write wrappers to alloc these kind of C struct in C memory (stack space is prefered). By now I have only covered cre2_partial_match_re() to fullfill my own needs. You probably need to implement the others in re2.go, so any pull requests is welcomed.

About

C language wrapper for RE2 the regular expressions library from Google

Resources

License

Unknown, BSD-3-Clause licenses found

Licenses found

Unknown
COPYING
BSD-3-Clause
LICENSE.re2

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 63.1%
  • Go 18.0%
  • C++ 16.4%
  • CMake 2.5%