What way is faster to check for a substring within a null-terminated string...

Hi guys... assuming I have an array of AnsiChar called "buffer" that contains some characters, and I want to find out if a certain substring "substr" exists within there, what would be the fastest/most efficient way to do it?

- do a "SearchBuf" command on the array of Char looking for "substr"
- or convert both "buffer" and "substr" to PChar, and do a  AnsiStrPos command?
- or possibly some other way...?

Thanks!
    Shawn
shawn857Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ThommyCommented:
Regarding the Delphi impementation of the different search functions SearchBuf is slower than Pos/PosEx.

But also take into account, that SearchBuf has options like start offset, forward/backward search, whole word search and case sensitive search, which could make it faster compared to Pos/PosEx with additional code.

Anyway, before making a decision you should try it out in your application with real data...
0
MerijnBSr. Software EngineerCommented:
Hi Shawn,

as long as you are sure you don't need unicode, QStrings is still a very fast option. The code is very old, so you need to find replace stuff like:

string -> ansistring
char -> ansichar
pchar -> pansichar

I use it in XE7 and it works fine. Comments are unreadable, but function names tell a lot.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
shawn857Author Commented:
Thanks guys. Merijn - so if I change string to ansistring, etc within QStrings, then it will work for Unicode?

Thanks
   Shawn
0
Become a Certified Penetration Testing Engineer

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

shawn857Author Commented:
Hmmm, I think should give you guys the "bigger picture" as to what I'm trying to do overall. Basically I have a large file of records (this could be gigabytes big), each record terminated by a CR/LF. I also have a file of search strings or "search criteria" that I need to check if any exist in each record of my file. These search criteria will usually be short text strings - up to 50 or so chars max. I could have a few hundred or more of these search criteria strings. I need to find the fastest way to compare each record of my disk file to all of these individual search criteria... to see if they exist within the record or not. Googling around, I saw some algorithms such as Aho-Corasick and Rabin-Karp... and I'm wondering if a pre-made Delphi unit exists that would implement one of these superfast algorithms.

Should I be asking this as a separate question...?

Thanks!
    Shawn
0
MerijnBSr. Software EngineerCommented:
so if I change string to ansistring, etc within QStrings, then it will work for Unicode?

No, it will never work for unicode, but if you do this you can use it for ansi strings in a unicode aware Delphi
0
Geert GOracle dbaCommented:
ugh ... database ?
insert the text in a database and then run an sql

i thought that's what databases were for ... :)
0
shawn857Author Commented:
Does anyone know of a Delphi implementation or unit of a very fast/efficient way to scan for multiple sub-strings within a string?

Thanks
   Shawn
0
shawn857Author Commented:
Anybody still with me?

Cheers
    Shawn
0
Geert GOracle dbaCommented:
just an idea

for very large texts
you could use gabr's omnithreadlibrary to start searching in several places at the same time
each thread searches a portion of text (including an overlap of x = search text length)
and returns info found in a message queue

the otl can keep threads live in background so it doesn't have to create/destroy them each time
it also has a non-blocking message queue

should work the same for search text in file
you can open a file and move to a certain place within each thread providing it gets the starting search point
0
Kyle FosterCEOCommented:
"I have a large file of records (this could be gigabytes big), each record terminated by a CR/LF"

Are you trying to search this large file of records on disk, or is it loaded into memory?
0
shawn857Author Commented:
Thanks guys...

Geert - I am still on D7, appears OmniThreadLibrary needs Delphi 2007 at least.

Kfoster - I am reading from disk always... I never load into memory. And I must search *each* line (or record) for all the substrings.

Thanks!
    Shawn
0
Kyle FosterCEOCommented:
So basically, you need to search gigabytes of strings for a given string on disk.  Personally, I would first get this working, and then try and put a memory cache in the middle.

While it is no longer being maintained, AsyncCalls is an open source Delphi 7 compatible library that manages a thread pool.  While I have  not tried to use it, I believe the following article on delphi.about.com is doing basically what you are trying to do.

http://delphi.about.com/od/kbthread/a/delphi-thread-pool-example-using-asynccalls.htm

Also, just because 10 threads speeds up the process, does not mean that 10,000 will speed it up 1000 times more.  The following blog explains thread optimization
http://blogs.technet.com/b/markrussinovich/archive/2009/07/08/3261309.aspx

Good Luck.
0
shawn857Author Commented:
Thanks kFoster... I will have a look at that AsyncCalls. I'm pretty new to threading though.. it might be over my head.

You also mentioned about "putting a memory cache in the middle". Could you explain a bit what that is exactly?

Thanks!
   Shawn
0
Kyle FosterCEOCommented:
A memory cache is used to put pieces of the file into memory as you are reading it.  When you do a search, your routines would need to first check to see if the file has changed since the data was stored (Cached) into memory.  If it was, then the cache is dumped and not used, if it hasn't changed then you search through the memory cache instead of reading the disk.  Disk reads are time expensive, where memory is not.

If you are reading a file that is changing a lot, then the cache is probably not useful, but if it doesn't change much then it will greatly improve your performance.  

If it is new, and you haven't done threading then it may be more than you want to tackle.  Also, with today's operating systems and smart drives a lot of this is cached anyway.
0
shawn857Author Commented:
Thanks kfoster. My file's data does not change - it's just a static "flat file".. but it could be very large (many gigs) so it's not practical to hold it all in memory at once. I do already read in a sizable chunk of this file at a time into a buffer using TFileStream (65000+ bytes for each read), and then parse each block line-by-line (looking for CR/LF's to detect the end of each line). This would kinda be like the "memory cache" approach you mention, no?

Thanks
    Shawn
0
Geert GOracle dbaCommented:
depending on what you want to give your users
you could go for a text solution like this:
http://www.oracle.com/technetwork/database/12coracletexttwp-1961244.pdf
0
shawn857Author Commented:
Thanks Geert, but that would be a whole other can of worms, it seems. Maybe a bit "overkill".

Thanks
   Shawn
0
Kyle FosterCEOCommented:
Why not just import the text file into a table in SQL Express and let the RDMS do the work?

Select fieldname from tablename where fieldname like '%strtofind%'

No need to reinvent the wheel since your table is static.  SQL Express is free for databases up to 10GIGS
0
MerijnBSr. Software EngineerCommented:
Is this correct? I haven't been participating in the discussion for about a month but do get all the points?
0
shawn857Author Commented:
Yours was the best solution after all was said and done Merijn!

Thanks
   Shawn
0
MerijnBSr. Software EngineerCommented:
ok, glad to hear it worked :)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Delphi

From novice to tech pro — start learning today.