UnixLib regex implementation

John-Mark Bell jmb202 at ecs.soton.ac.uk
Sat Jul 23 16:24:14 PDT 2005


On Sun, 24 Jul 2005, John Tytgat wrote:

> In message <Pine.LNX.4.44.0507232314440.5119-100000 at tarrant.ecs.soton.ac.uk>
>           John-Mark Bell <jmb202 at ecs.soton.ac.uk> wrote:
> 
> > UnixLib's current regex implementation is horrendously slow (for 
> > reference, it's currently the same as the one in NetBSD libc). The 
> > original author of that implementation has produced another regex 
> > implementation which is significantly faster.
> 
> Any figures available ?

IRC log follows:

---

<@jmb> rite. having integrated the newer henry spencer regex  
implementation into UnixLib and then profiled it with use of NetSurf, we 
come to the following conclusions:

<@jmb> url_join mean execution time is 8,500 with new regex  
implementation. it was 45,000 with the slow implementation and 3000 with 
the previous fast implementation

---

The units of the above figures are: 1 unit == 64 clock cycles (the 
profiling tool used makes use of the XScale's clock cycle counter, which 
increments once every 64 cycles on an Iyonix by default).

url_join is a function in NetSurf that makes use of regexec (and a number 
of other UnixLib functions, but the only factor that has changed between 
the tests is the underlying regex implementation in UnixLib).

For clarity:

"new regex implementation" is the one from PostgreSQL
"slow regex implementation" is the current one in UnixLib
"fast regex implementation" is GNU regex that was in UnixLib prior to the 
licence change.

Take the accuracy of these figures with a pinch of salt, but they at least 
show the magnitude of the possible improvement.

> > b) The implementation is theoretically wide-character aware, but my 
> >    changes have removed this support.
> 
> Just curious : what was the reason to remove the wide-character support ?

My lack of knowledge about wchar related issues. I erred on the safe side 
and simply used standard 8bit chars.
 
> If you check this in, make sure to approprately update unixlib/Docs/Copyright
> (which I guess needs more licenses attached, no ?).

Yes. The COPYRIGHT file contained in the regexp directory contains the 
relevant licence details already.


John.




More information about the gcc mailing list