Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

What way is faster to check for a substring within a null-terminated string...

Posted on 2014-10-06
21
Medium Priority
?
204 Views
Last Modified: 2014-11-17
Hi guys... assuming I have an array of AnsiChar called "buffer" that contains some characters, and I want to find out if a certain substring "substr" exists within there, what would be the fastest/most efficient way to do it?

- do a "SearchBuf" command on the array of Char looking for "substr"
- or convert both "buffer" and "substr" to PChar, and do a  AnsiStrPos command?
- or possibly some other way...?

Thanks!
    Shawn
0
Comment
Question by:shawn857
  • 9
  • 4
  • 4
  • +2
21 Comments
 
LVL 19

Expert Comment

by:Thommy
ID: 40365311
Regarding the Delphi impementation of the different search functions SearchBuf is slower than Pos/PosEx.

But also take into account, that SearchBuf has options like start offset, forward/backward search, whole word search and case sensitive search, which could make it faster compared to Pos/PosEx with additional code.

Anyway, before making a decision you should try it out in your application with real data...
0
 
LVL 19

Accepted Solution

by:
MerijnB earned 2000 total points
ID: 40365464
Hi Shawn,

as long as you are sure you don't need unicode, QStrings is still a very fast option. The code is very old, so you need to find replace stuff like:

string -> ansistring
char -> ansichar
pchar -> pansichar

I use it in XE7 and it works fine. Comments are unreadable, but function names tell a lot.
0
 

Author Comment

by:shawn857
ID: 40367038
Thanks guys. Merijn - so if I change string to ansistring, etc within QStrings, then it will work for Unicode?

Thanks
   Shawn
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:shawn857
ID: 40367580
Hmmm, I think should give you guys the "bigger picture" as to what I'm trying to do overall. Basically I have a large file of records (this could be gigabytes big), each record terminated by a CR/LF. I also have a file of search strings or "search criteria" that I need to check if any exist in each record of my file. These search criteria will usually be short text strings - up to 50 or so chars max. I could have a few hundred or more of these search criteria strings. I need to find the fastest way to compare each record of my disk file to all of these individual search criteria... to see if they exist within the record or not. Googling around, I saw some algorithms such as Aho-Corasick and Rabin-Karp... and I'm wondering if a pre-made Delphi unit exists that would implement one of these superfast algorithms.

Should I be asking this as a separate question...?

Thanks!
    Shawn
0
 
LVL 19

Expert Comment

by:MerijnB
ID: 40367773
so if I change string to ansistring, etc within QStrings, then it will work for Unicode?

No, it will never work for unicode, but if you do this you can use it for ansi strings in a unicode aware Delphi
0
 
LVL 38

Expert Comment

by:Geert Gruwez
ID: 40372742
ugh ... database ?
insert the text in a database and then run an sql

i thought that's what databases were for ... :)
0
 

Author Comment

by:shawn857
ID: 40375192
Does anyone know of a Delphi implementation or unit of a very fast/efficient way to scan for multiple sub-strings within a string?

Thanks
   Shawn
0
 

Author Comment

by:shawn857
ID: 40393351
Anybody still with me?

Cheers
    Shawn
0
 
LVL 38

Expert Comment

by:Geert Gruwez
ID: 40393740
just an idea

for very large texts
you could use gabr's omnithreadlibrary to start searching in several places at the same time
each thread searches a portion of text (including an overlap of x = search text length)
and returns info found in a message queue

the otl can keep threads live in background so it doesn't have to create/destroy them each time
it also has a non-blocking message queue

should work the same for search text in file
you can open a file and move to a certain place within each thread providing it gets the starting search point
0
 
LVL 7

Expert Comment

by:kfoster11
ID: 40396118
"I have a large file of records (this could be gigabytes big), each record terminated by a CR/LF"

Are you trying to search this large file of records on disk, or is it loaded into memory?
0
 

Author Comment

by:shawn857
ID: 40396192
Thanks guys...

Geert - I am still on D7, appears OmniThreadLibrary needs Delphi 2007 at least.

Kfoster - I am reading from disk always... I never load into memory. And I must search *each* line (or record) for all the substrings.

Thanks!
    Shawn
0
 
LVL 7

Expert Comment

by:kfoster11
ID: 40396350
So basically, you need to search gigabytes of strings for a given string on disk.  Personally, I would first get this working, and then try and put a memory cache in the middle.

While it is no longer being maintained, AsyncCalls is an open source Delphi 7 compatible library that manages a thread pool.  While I have  not tried to use it, I believe the following article on delphi.about.com is doing basically what you are trying to do.

http://delphi.about.com/od/kbthread/a/delphi-thread-pool-example-using-asynccalls.htm

Also, just because 10 threads speeds up the process, does not mean that 10,000 will speed it up 1000 times more.  The following blog explains thread optimization
http://blogs.technet.com/b/markrussinovich/archive/2009/07/08/3261309.aspx

Good Luck.
0
 

Author Comment

by:shawn857
ID: 40400543
Thanks kFoster... I will have a look at that AsyncCalls. I'm pretty new to threading though.. it might be over my head.

You also mentioned about "putting a memory cache in the middle". Could you explain a bit what that is exactly?

Thanks!
   Shawn
0
 
LVL 7

Expert Comment

by:kfoster11
ID: 40408321
A memory cache is used to put pieces of the file into memory as you are reading it.  When you do a search, your routines would need to first check to see if the file has changed since the data was stored (Cached) into memory.  If it was, then the cache is dumped and not used, if it hasn't changed then you search through the memory cache instead of reading the disk.  Disk reads are time expensive, where memory is not.

If you are reading a file that is changing a lot, then the cache is probably not useful, but if it doesn't change much then it will greatly improve your performance.  

If it is new, and you haven't done threading then it may be more than you want to tackle.  Also, with today's operating systems and smart drives a lot of this is cached anyway.
0
 

Author Comment

by:shawn857
ID: 40420154
Thanks kfoster. My file's data does not change - it's just a static "flat file".. but it could be very large (many gigs) so it's not practical to hold it all in memory at once. I do already read in a sizable chunk of this file at a time into a buffer using TFileStream (65000+ bytes for each read), and then parse each block line-by-line (looking for CR/LF's to detect the end of each line). This would kinda be like the "memory cache" approach you mention, no?

Thanks
    Shawn
0
 
LVL 38

Expert Comment

by:Geert Gruwez
ID: 40420987
depending on what you want to give your users
you could go for a text solution like this:
http://www.oracle.com/technetwork/database/12coracletexttwp-1961244.pdf
0
 

Author Comment

by:shawn857
ID: 40429265
Thanks Geert, but that would be a whole other can of worms, it seems. Maybe a bit "overkill".

Thanks
   Shawn
0
 
LVL 7

Expert Comment

by:kfoster11
ID: 40429924
Why not just import the text file into a table in SQL Express and let the RDMS do the work?

Select fieldname from tablename where fieldname like '%strtofind%'

No need to reinvent the wheel since your table is static.  SQL Express is free for databases up to 10GIGS
0
 
LVL 19

Expert Comment

by:MerijnB
ID: 40446724
Is this correct? I haven't been participating in the discussion for about a month but do get all the points?
0
 

Author Comment

by:shawn857
ID: 40447827
Yours was the best solution after all was said and done Merijn!

Thanks
   Shawn
0
 
LVL 19

Expert Comment

by:MerijnB
ID: 40447972
ok, glad to hear it worked :)
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
Introduction Raise your hands if you were as upset with FireMonkey as I was when I discovered that there was no TListview.  I use TListView in almost all of my applications I've written, and I was not going to compromise by resorting to TStringGrid…
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …
Loops Section Overview
Suggested Courses

877 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question