Solved

What way is faster to check for a substring within a null-terminated string...

Posted on 2014-10-06
21
171 Views
Last Modified: 2014-11-17
Hi guys... assuming I have an array of AnsiChar called "buffer" that contains some characters, and I want to find out if a certain substring "substr" exists within there, what would be the fastest/most efficient way to do it?

- do a "SearchBuf" command on the array of Char looking for "substr"
- or convert both "buffer" and "substr" to PChar, and do a  AnsiStrPos command?
- or possibly some other way...?

Thanks!
    Shawn
0
Comment
Question by:shawn857
  • 9
  • 4
  • 4
  • +2
21 Comments
 
LVL 19

Expert Comment

by:Thommy
Comment Utility
Regarding the Delphi impementation of the different search functions SearchBuf is slower than Pos/PosEx.

But also take into account, that SearchBuf has options like start offset, forward/backward search, whole word search and case sensitive search, which could make it faster compared to Pos/PosEx with additional code.

Anyway, before making a decision you should try it out in your application with real data...
0
 
LVL 19

Accepted Solution

by:
MerijnB earned 500 total points
Comment Utility
Hi Shawn,

as long as you are sure you don't need unicode, QStrings is still a very fast option. The code is very old, so you need to find replace stuff like:

string -> ansistring
char -> ansichar
pchar -> pansichar

I use it in XE7 and it works fine. Comments are unreadable, but function names tell a lot.
0
 

Author Comment

by:shawn857
Comment Utility
Thanks guys. Merijn - so if I change string to ansistring, etc within QStrings, then it will work for Unicode?

Thanks
   Shawn
0
 

Author Comment

by:shawn857
Comment Utility
Hmmm, I think should give you guys the "bigger picture" as to what I'm trying to do overall. Basically I have a large file of records (this could be gigabytes big), each record terminated by a CR/LF. I also have a file of search strings or "search criteria" that I need to check if any exist in each record of my file. These search criteria will usually be short text strings - up to 50 or so chars max. I could have a few hundred or more of these search criteria strings. I need to find the fastest way to compare each record of my disk file to all of these individual search criteria... to see if they exist within the record or not. Googling around, I saw some algorithms such as Aho-Corasick and Rabin-Karp... and I'm wondering if a pre-made Delphi unit exists that would implement one of these superfast algorithms.

Should I be asking this as a separate question...?

Thanks!
    Shawn
0
 
LVL 19

Expert Comment

by:MerijnB
Comment Utility
so if I change string to ansistring, etc within QStrings, then it will work for Unicode?

No, it will never work for unicode, but if you do this you can use it for ansi strings in a unicode aware Delphi
0
 
LVL 36

Expert Comment

by:Geert Gruwez
Comment Utility
ugh ... database ?
insert the text in a database and then run an sql

i thought that's what databases were for ... :)
0
 

Author Comment

by:shawn857
Comment Utility
Does anyone know of a Delphi implementation or unit of a very fast/efficient way to scan for multiple sub-strings within a string?

Thanks
   Shawn
0
 

Author Comment

by:shawn857
Comment Utility
Anybody still with me?

Cheers
    Shawn
0
 
LVL 36

Expert Comment

by:Geert Gruwez
Comment Utility
just an idea

for very large texts
you could use gabr's omnithreadlibrary to start searching in several places at the same time
each thread searches a portion of text (including an overlap of x = search text length)
and returns info found in a message queue

the otl can keep threads live in background so it doesn't have to create/destroy them each time
it also has a non-blocking message queue

should work the same for search text in file
you can open a file and move to a certain place within each thread providing it gets the starting search point
0
 
LVL 7

Expert Comment

by:kfoster11
Comment Utility
"I have a large file of records (this could be gigabytes big), each record terminated by a CR/LF"

Are you trying to search this large file of records on disk, or is it loaded into memory?
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 

Author Comment

by:shawn857
Comment Utility
Thanks guys...

Geert - I am still on D7, appears OmniThreadLibrary needs Delphi 2007 at least.

Kfoster - I am reading from disk always... I never load into memory. And I must search *each* line (or record) for all the substrings.

Thanks!
    Shawn
0
 
LVL 7

Expert Comment

by:kfoster11
Comment Utility
So basically, you need to search gigabytes of strings for a given string on disk.  Personally, I would first get this working, and then try and put a memory cache in the middle.

While it is no longer being maintained, AsyncCalls is an open source Delphi 7 compatible library that manages a thread pool.  While I have  not tried to use it, I believe the following article on delphi.about.com is doing basically what you are trying to do.

http://delphi.about.com/od/kbthread/a/delphi-thread-pool-example-using-asynccalls.htm

Also, just because 10 threads speeds up the process, does not mean that 10,000 will speed it up 1000 times more.  The following blog explains thread optimization
http://blogs.technet.com/b/markrussinovich/archive/2009/07/08/3261309.aspx

Good Luck.
0
 

Author Comment

by:shawn857
Comment Utility
Thanks kFoster... I will have a look at that AsyncCalls. I'm pretty new to threading though.. it might be over my head.

You also mentioned about "putting a memory cache in the middle". Could you explain a bit what that is exactly?

Thanks!
   Shawn
0
 
LVL 7

Expert Comment

by:kfoster11
Comment Utility
A memory cache is used to put pieces of the file into memory as you are reading it.  When you do a search, your routines would need to first check to see if the file has changed since the data was stored (Cached) into memory.  If it was, then the cache is dumped and not used, if it hasn't changed then you search through the memory cache instead of reading the disk.  Disk reads are time expensive, where memory is not.

If you are reading a file that is changing a lot, then the cache is probably not useful, but if it doesn't change much then it will greatly improve your performance.  

If it is new, and you haven't done threading then it may be more than you want to tackle.  Also, with today's operating systems and smart drives a lot of this is cached anyway.
0
 

Author Comment

by:shawn857
Comment Utility
Thanks kfoster. My file's data does not change - it's just a static "flat file".. but it could be very large (many gigs) so it's not practical to hold it all in memory at once. I do already read in a sizable chunk of this file at a time into a buffer using TFileStream (65000+ bytes for each read), and then parse each block line-by-line (looking for CR/LF's to detect the end of each line). This would kinda be like the "memory cache" approach you mention, no?

Thanks
    Shawn
0
 
LVL 36

Expert Comment

by:Geert Gruwez
Comment Utility
depending on what you want to give your users
you could go for a text solution like this:
http://www.oracle.com/technetwork/database/12coracletexttwp-1961244.pdf
0
 

Author Comment

by:shawn857
Comment Utility
Thanks Geert, but that would be a whole other can of worms, it seems. Maybe a bit "overkill".

Thanks
   Shawn
0
 
LVL 7

Expert Comment

by:kfoster11
Comment Utility
Why not just import the text file into a table in SQL Express and let the RDMS do the work?

Select fieldname from tablename where fieldname like '%strtofind%'

No need to reinvent the wheel since your table is static.  SQL Express is free for databases up to 10GIGS
0
 
LVL 19

Expert Comment

by:MerijnB
Comment Utility
Is this correct? I haven't been participating in the discussion for about a month but do get all the points?
0
 

Author Comment

by:shawn857
Comment Utility
Yours was the best solution after all was said and done Merijn!

Thanks
   Shawn
0
 
LVL 19

Expert Comment

by:MerijnB
Comment Utility
ok, glad to hear it worked :)
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
Introduction I have seen many questions in this Delphi topic area where queries in threads are needed or suggested. I know bumped into a similar need. This article will address some of the concepts when dealing with a multithreaded delphi database…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now