Fast file seach for a text string in Visual Basic

Posted on 2009-12-16
Last Modified: 2012-05-08
I am looking for a code that can quickly open a large text file full of email addresses (say, 200,000 lines) and see if that email address exists. I know I could write code to look through it line by line but I am wondering if there is a different way of doing it with a Windows API call or something.  I know too I could load the email addresses into a database and index them and search that way. So, I am looking for a unique solution if possible.
Question by:onemorecoke
    LVL 17

    Expert Comment

    Create some sized string like this:
    Dim buffer as String
    buffer = Space(4096)       ' <-- Means 4 kb per each read

    Then do this:

    Get#1, , buffer
    It will read each time 4096 byte from file to memory. Put in a loop until file ends.

    Then check each time for what you are looking in a 4kb memory part.

    Instr(1, buffer, "")

    You can keep larger blocks, like 100kb, it's up to you. It's the fastest possible way.
    LVL 29

    Accepted Solution

    I agree with CSecurity comment

    Make sure the text file with the email addresses contains lower case so the comparisons don't need additional overhead when searching for the string.

    I'm not sure about the performance a database can offer compared to a direct file read. You might consider trying to compare the speeds and decide which is best. I would assume direct file reading is faster but I can't be sure.

    I calculated that with about 200,000 lines at a maximum email lengths of 320 bytes your file shouldn't be any larger than estimated 64MB. However this can vary so the range of this file will most likely be from 1KB to 64MB which should be processed very fast. If the email address is near the beggining of the file it will be much faster but if it's towards the end it will take longer.
    Option Explicit
    Private Const BUFF_SIZE As Long = 65536 '64kb
    Private Const INVALID_HANDLE_VALUE As Long = (-1)
    Private Const GENERIC_READ As Long = &H80000000
    Private Const OPEN_EXISTING As Long = &H3&
    Private Declare Function CreateFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, ByVal lpSecurityAttributes As Long, ByVal dwCreationDisposition As Long, ByVal dwFlagsAndAttributes As Long, ByVal hTemplateFile As Long) As Long
    Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
    Private Declare Function ReadFile Lib "kernel32" (ByVal hFile As Long, ByVal lpBuffer As Long, ByVal nNumberOfBytesToRead As Long, lpNumberOfBytesRead As Long, ByVal lpOverlapped As Long) As Long
    Public Function EmailExists( _
      ByVal szFile As String, _
      ByVal szEmail As String) As Boolean
      Dim Buffer()  As Byte ' raw buffer
      Dim hFile     As Long ' file handle
      Dim dwBytes   As Long ' The read bytes.
      EmailExists = False
      szEmail = StrConv(szEmail, vbFromUnicode)
      hFile = CreateFileW(StrPtr("\\?\" & szFile), GENERIC_READ, 0, 0, OPEN_EXISTING, 0, 0)
      If hFile = INVALID_HANDLE_VALUE Then
        Debug.Print "Error= " & Err.LastDllError
        Exit Function ' _leave
      End If
      ReDim Buffer(BUFF_SIZE - 1) As Byte
        If ReadFile(hFile, VarPtr(Buffer(0)), BUFF_SIZE, dwBytes, 0) Then
          If InStrB(1, Buffer, szEmail) Then
            EmailExists = True
            Exit Do ' _leave
          End If
          Exit Do ' _leave
        End If
      Loop Until dwBytes = 0
      CloseHandle hFile
      Erase Buffer
    End Function
    Private Sub Command1_Click()
      Debug.Print EmailExists("d:\emails.txt", "")
    End Sub

    Open in new window


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    What Is Threat Intelligence?

    Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

    Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
    You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
    The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
    This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    11 Experts available now in Live!

    Get 1:1 Help Now