Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

Coding Forum


You are currently viewing our Coding Forum as a guest. Please register to participate.
Login



Reply
Is RegEx single threaded?
Old 07-14-2008, 08:50 PM Is RegEx single threaded?
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
Is regular expression processing inherently single threaded? On a 24 way machine, I can process about the same number of extractions if I use 1 thread, 20 threads (leaving 4 for the OS) or all 24.

It seems this should be a CPU bound operation, so sending the work out to every available core should help. But it doesn't. And I've narrowed it down to the line of code that does the regular expression matching. Is there a kernel mode jump, some sort of lock, or something else that would grind everything to a halt like this?
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 07-17-2008, 08:27 PM Re: Is RegEx single threaded?
willcode4beer's Avatar
Super Moderator

Posts: 1,533
Name: Paul Davis
Location: San Francisco
Trades: 1
It depends on the implementation of the regex engine you are using.

Most that I know of (sed/awk/grep/perl/pike/java) are single threaded.
__________________

Please login or register to view this content. Registration is FREE

willcode4beer is offline
Reply With Quote
View Public Profile
 
Old 07-18-2008, 04:07 PM Re: Is RegEx single threaded?
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
I'm using the C# .NET implementation of a regular expression. It's single threaded, which is to be expected, but instantiating a RegularExpression object is, well, a critical path - it takes a lock.

Another developer had written a class that used Regular Expressions to parse out some information from a string. The class was written so that every parsing created a new reg ex object, instead of doing that once, and calling its match method. I broke this out onto each CPU on the host machine, and with 24 threads, I'd get 40 parses per second. With 1 thread, I'd also get 40 parses per second. Fixing this bug and making the class stateful, to reuse the object it had created, I got the performance up to almost 2,000 parses per second, using all 24 cores.

What that tells me, especially that using more threads on more CPUs got us nothing, is that the .NET RegularExpression object takes a lock in the cctor. That you can only create 1 object at a time, no matter how powerful the host machine is.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 07-21-2008, 09:12 PM Re: Is RegEx single threaded?
willcode4beer's Avatar
Super Moderator

Posts: 1,533
Name: Paul Davis
Location: San Francisco
Trades: 1
Quote:
Originally Posted by Learning Newbie View Post
I'm using the C# .NET implementation of a regular expression. It's single threaded, which is to be expected, but instantiating a RegularExpression object is, well, a critical path - it takes a lock.

Another developer had written a class that used Regular Expressions to parse out some information from a string. The class was written so that every parsing created a new reg ex object, instead of doing that once, and calling its match method. I broke this out onto each CPU on the host machine, and with 24 threads, I'd get 40 parses per second. With 1 thread, I'd also get 40 parses per second. Fixing this bug and making the class stateful, to reuse the object it had created, I got the performance up to almost 2,000 parses per second, using all 24 cores.

What that tells me, especially that using more threads on more CPUs got us nothing, is that the .NET RegularExpression object takes a lock in the cctor. That you can only create 1 object at a time, no matter how powerful the host machine is.
another reason not to use .net
Just kidding (not really).

Have you considered using a 3rd parth regx library instead of the one provided by microsoft?
__________________

Please login or register to view this content. Registration is FREE

willcode4beer is offline
Reply With Quote
View Public Profile
 
Old 07-22-2008, 04:32 PM Re: Is RegEx single threaded?
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
Actually, we got a satisfactory solution. What needs to happen is for the same regex match to be calculated against every row in a a god awful database. By moving the code that instantiated the RegularExpression out of the parse method into the class constructor, we got about a 50x performance increase. The problem was with the way a particular developer implemented the reg ex, and not so much with the .NET BCL itself.

I'm sure that could also be improved, but I doubt the gains from moving to Perl or whatever would be more than a few percent, and an implementation change liket that would set the project back weeks as the software is redesigned and rewritten, then has to go through the dreaded quality assurance process.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 07-23-2008, 12:26 PM Re: Is RegEx single threaded?
willcode4beer's Avatar
Super Moderator

Posts: 1,533
Name: Paul Davis
Location: San Francisco
Trades: 1
I didn't mean perl, just a different library for .net.
Anyway, you did the right thing by moving the instantiation. Compiling the regx once instead of every iteration will definitely make a difference :-)

For fine tuning the regx I highly recommend the Regular Expressions book by O'Reilly. It has a lot of information to help optimize regx's for performance. It is also the driest, most boring book ever written. Toook me 2 months to read it because it kept putting me to sleep. Still, chock full of very good stuff.

On a side note: you've got to educate your QA guys on the magic of automated tests :-)
__________________

Please login or register to view this content. Registration is FREE

willcode4beer is offline
Reply With Quote
View Public Profile
 
Reply     « Reply to Is RegEx single threaded?
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.39983 seconds with 12 queries