UnixLib regex implementation
jmb202 at ecs.soton.ac.uk
Sat Jul 23 16:24:14 PDT 2005
On Sun, 24 Jul 2005, John Tytgat wrote:
> In message <Pine.LNX.4.44.0507232314440.5119-100000 at tarrant.ecs.soton.ac.uk>
> John-Mark Bell <jmb202 at ecs.soton.ac.uk> wrote:
> > UnixLib's current regex implementation is horrendously slow (for
> > reference, it's currently the same as the one in NetBSD libc). The
> > original author of that implementation has produced another regex
> > implementation which is significantly faster.
> Any figures available ?
IRC log follows:
<@jmb> rite. having integrated the newer henry spencer regex
implementation into UnixLib and then profiled it with use of NetSurf, we
come to the following conclusions:
<@jmb> url_join mean execution time is 8,500 with new regex
implementation. it was 45,000 with the slow implementation and 3000 with
the previous fast implementation
The units of the above figures are: 1 unit == 64 clock cycles (the
profiling tool used makes use of the XScale's clock cycle counter, which
increments once every 64 cycles on an Iyonix by default).
url_join is a function in NetSurf that makes use of regexec (and a number
of other UnixLib functions, but the only factor that has changed between
the tests is the underlying regex implementation in UnixLib).
"new regex implementation" is the one from PostgreSQL
"slow regex implementation" is the current one in UnixLib
"fast regex implementation" is GNU regex that was in UnixLib prior to the
Take the accuracy of these figures with a pinch of salt, but they at least
show the magnitude of the possible improvement.
> > b) The implementation is theoretically wide-character aware, but my
> > changes have removed this support.
> Just curious : what was the reason to remove the wide-character support ?
My lack of knowledge about wchar related issues. I erred on the safe side
and simply used standard 8bit chars.
> If you check this in, make sure to approprately update unixlib/Docs/Copyright
> (which I guess needs more licenses attached, no ?).
Yes. The COPYRIGHT file contained in the regexp directory contains the
relevant licence details already.
More information about the gcc