Running (uncompiled?) regular expressions, it seems.

mikeryan · on Nov 19, 2014

So I was a bit unclear on the parent's post but I don't think this time was on a route lookup if I'm reading the thread and post correctly the static file handler getting inserted multiple times. This handler will generally match on any route but then is doing something like "if file exists, return static file, if not look for the next handler" in this case the "if file exists" part was the "path check" thats taking 1 ms and was happening multiple times.

I could be wrong but it seems like the design of the route lookup mechanism (the global array) was actually a bit of a red herring, the real issue was the ability to attach multiple instances of the same handler to the same route.

Something was adding the same Express.js provided static route handler 10 times an hour. Further benchmarking revealed merely iterating through each of these handler instances cost about 1 ms of CPU time.

TheLoneWolfling · on Nov 20, 2014

A simple "if file exists" check shouldn't take 1ms on average.

OSes cache directory entries for a reason.

I mean, even Python manages 40,000 checks / second:

  >>> timeit.timeit("os.path.exists(data)", setup="import os; import random; import string; data = os.path.join(r'C:\Windows\System32', ''.join(random.choice(string.ascii_letters) for _ in range(10)))", number=40000)
  0.9998181355403517

baudehlo · on Nov 19, 2014

V8 compiles regular expressions to machine code.

simi_ · on Nov 19, 2014

You're the author of Haraka, a tool I'm prepared to try in production. This makes me _really_ worried that I'm making the right choice.

(To explain: obviously C++ churns out machine code, the parent was talking about compiled vs uncompiled regexes – if I'm not terribly wrong, the compilation step is turning the regex into a finite automaton.)

baudehlo · on Nov 19, 2014

V8 literally compiles regexps to X86 machine code the first time they are executed. They are not compiled into an FSA that gets walked in the traditional sense.

Hopefully that lowers your concern level.

TheLoneWolfling · on Nov 20, 2014

Doesn't that mean that you have exponential worst-case complexity?

jbnicolai · on Nov 20, 2014

Absolutely! Even worse, you may get different results than the expected method (FSA compilation) would yield. Observe:

Javascript, executed in the console of a recent Chrome:

    > 'ab'.match(/a|ab/)  
    ["a"]

BSD Grep:

    > egrep 'a|ab' <<< ab  
    ab

TheLoneWolfling · on Nov 20, 2014

Yow...

I was under the impression that all major regex engines used NFAs converted to DFAs lazily, with fallbacks to a slower engine for features that cannot (or cannot practically) be implemented using an NFA (unbounded backtracking, that sort of thing.)

What is the advantage to doing this?

simi_ · on Nov 19, 2014

Interesting, thanks for the info!