vb.net search id from text file

Posted on 2013-09-26
Medium Priority
Last Modified: 2013-09-26
I have a CSV file that contains line by line and I am using this function to search the file for an specific ID.. the file contains about 2 million lines and it takes a few seconds for the function to find the id.. is there a way to speed things up?

    Function Search_inFile(ByVal sFind As String, ByVal strFile As String) As Boolean
            If System.IO.File.Exists(strFile) Then
                Using reader As New System.IO.StreamReader(strFile, True)
                    While Not reader.EndOfStream
                        If reader.ReadLine().Contains(sFind) Then
                            Return True
                        End If
                    End While
                End Using
            End If
            Return False
    End Function

Open in new window

Question by:XK8ER
  • 4
  • 2
LVL 54

Expert Comment

ID: 39524239

you could "exit while" from the function as soon as sFind is found


Author Comment

ID: 39524253
Return True already does that..
LVL 35

Expert Comment

by:Robert Schutt
ID: 39524417
It's a pretty general description of your situation (a small but representative sample from the file contents would be useful), here are some thoughts:

- you might be able to read bigger chunks of the file into memory to speed it up, but then you may need to check for newlines yourself, changing the bottleneck from IO to memory access/processing power

- your check using .Contains() seems tricky to me: it could generate false positives unless the id's are very specific text strings (but even then a line could contain a reference to another id in another field perhaps?)

- it could be possible to use a binary search if the CSV file is sorted by id

- a simple but often overlooked one: if the CSV file is exported from another source, sorting it descending can make your original routine sufficient if the data most often looked for are the newest id's

- if id's are numeric, generating an index file for the CSV (binary file containing id and original file position, sorted by id) could prove useful, like an index on an SQL table, also making binary search definitely possible, or maybe even storage in memory

- storage in memory may still be possible for alphanumerical id's using a hash table but only useful when this is a program that stays active and needs to look up lots of id's during its lifetime, like for example a (web) service

- even more general advice could be: move the file to a RAM disk or SSD, or make sure OS caching (read prefetch) is used for the HD containing the file

- one you probably don't want to hear: put your data in a database ;-)
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.


Author Comment

ID: 39524465
I am willing to do anything as long as it can make things fast.. its taking about 5 seconds to find one record..

which approach of all of those you mentioned do you recommend for fastest performance ?
LVL 35

Accepted Solution

Robert Schutt earned 2000 total points
ID: 39524573
It depends on your application flow: is this function call a one-off or multiple times like in a service? if service, possibility to read all id's in memory (hash table) -> that would probably be the fastest in that case.

but it also depends on your data: are there only id's in the file? probably not as you used the term CSV; are the id's numeric? probably not as you currently use .Contains() which would return false positives for simple numbers; possibility to sort?

Very generally speaking, a database is probably the fastest by far if you're talking about a lot of data. I mean probably a typical query in a situation like this would take milliseconds not seconds.

Next best thing probably looking into the possibility to sort the file (how is it generated/maintained now?) and implement a binary search if it's possible. Response time should be a couple of 100's of milliseconds at most.

Last option for me, if sorting can't be done and you already have caching in place but it doesn't help, would be reading bigger chunks of the file. Should be easiest to implement but still means reading through the entire file in the worst case. Very rough estimate (wet finger in the air) would be that it should be possible to get response time under a second.

Author Comment

ID: 39526479
>>Next best thing probably looking into the possibility to sort the file
>>and implement a binary search if it's possible

how exactly do I do this method?

Author Closing Comment

ID: 39526544
perfect, I got it thanks so much

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
Native ability to set a user account password via AD GPO was removed because the passwords can be easily decrypted by any authenticated user in the domain. Microsoft recommends LAPS as a replacement and I have written an article that does something …
Watch the video to know how one can repair corrupt Exchange OST file effortlessly and convert OST emails to MS Outlook PST file format by using Kernel for OST to PST converter tool. It can convert OST to MSG, MBOX, EML to access them. It can migrate…
When you have multiple client accounts to manage, it often feels like there aren’t enough hours in the day. With too many applications to juggle, you can’t focus on your clients, much less your growing to-do list. But that doesn’t have to be the cas…

624 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question