vb.net search id from text file

I have a CSV file that contains line by line and I am using this function to search the file for an specific ID.. the file contains about 2 million lines and it takes a few seconds for the function to find the id.. is there a way to speed things up?

    Function Search_inFile(ByVal sFind As String, ByVal strFile As String) As Boolean
            If System.IO.File.Exists(strFile) Then
                Using reader As New System.IO.StreamReader(strFile, True)
                    While Not reader.EndOfStream
                        If reader.ReadLine().Contains(sFind) Then
                            Return True
                        End If
                    End While
                End Using
            End If
            Return False
    End Function

Open in new window

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.


you could "exit while" from the function as soon as sFind is found

XK8ERAuthor Commented:
Return True already does that..
Robert SchuttSoftware EngineerCommented:
It's a pretty general description of your situation (a small but representative sample from the file contents would be useful), here are some thoughts:

- you might be able to read bigger chunks of the file into memory to speed it up, but then you may need to check for newlines yourself, changing the bottleneck from IO to memory access/processing power

- your check using .Contains() seems tricky to me: it could generate false positives unless the id's are very specific text strings (but even then a line could contain a reference to another id in another field perhaps?)

- it could be possible to use a binary search if the CSV file is sorted by id

- a simple but often overlooked one: if the CSV file is exported from another source, sorting it descending can make your original routine sufficient if the data most often looked for are the newest id's

- if id's are numeric, generating an index file for the CSV (binary file containing id and original file position, sorted by id) could prove useful, like an index on an SQL table, also making binary search definitely possible, or maybe even storage in memory

- storage in memory may still be possible for alphanumerical id's using a hash table but only useful when this is a program that stays active and needs to look up lots of id's during its lifetime, like for example a (web) service

- even more general advice could be: move the file to a RAM disk or SSD, or make sure OS caching (read prefetch) is used for the HD containing the file

- one you probably don't want to hear: put your data in a database ;-)
Exploring SQL Server 2016: Fundamentals

Learn the fundamentals of Microsoft SQL Server, a relational database management system that stores and retrieves data when requested by other software applications.

XK8ERAuthor Commented:
I am willing to do anything as long as it can make things fast.. its taking about 5 seconds to find one record..

which approach of all of those you mentioned do you recommend for fastest performance ?
Robert SchuttSoftware EngineerCommented:
It depends on your application flow: is this function call a one-off or multiple times like in a service? if service, possibility to read all id's in memory (hash table) -> that would probably be the fastest in that case.

but it also depends on your data: are there only id's in the file? probably not as you used the term CSV; are the id's numeric? probably not as you currently use .Contains() which would return false positives for simple numbers; possibility to sort?

Very generally speaking, a database is probably the fastest by far if you're talking about a lot of data. I mean probably a typical query in a situation like this would take milliseconds not seconds.

Next best thing probably looking into the possibility to sort the file (how is it generated/maintained now?) and implement a binary search if it's possible. Response time should be a couple of 100's of milliseconds at most.

Last option for me, if sorting can't be done and you already have caching in place but it doesn't help, would be reading bigger chunks of the file. Should be easiest to implement but still means reading through the entire file in the worst case. Very rough estimate (wet finger in the air) would be that it should be possible to get response time under a second.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
XK8ERAuthor Commented:
>>Next best thing probably looking into the possibility to sort the file
>>and implement a binary search if it's possible

how exactly do I do this method?
XK8ERAuthor Commented:
perfect, I got it thanks so much
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic.NET

From novice to tech pro — start learning today.