[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 363
  • Last Modified:

Fast file seach for a text string in Visual Basic

I am looking for a code that can quickly open a large text file full of email addresses (say, 200,000 lines) and see if that email address exists. I know I could write code to look through it line by line but I am wondering if there is a different way of doing it with a Windows API call or something.  I know too I could load the email addresses into a database and index them and search that way. So, I am looking for a unique solution if possible.
1 Solution
Create some sized string like this:
Dim buffer as String
buffer = Space(4096)       ' <-- Means 4 kb per each read

Then do this:

Get#1, , buffer
It will read each time 4096 byte from file to memory. Put in a loop until file ends.

Then check each time for what you are looking in a 4kb memory part.

Instr(1, buffer, "test@test.com")

You can keep larger blocks, like 100kb, it's up to you. It's the fastest possible way.
I agree with CSecurity comment

Make sure the text file with the email addresses contains lower case so the comparisons don't need additional overhead when searching for the string.

I'm not sure about the performance a database can offer compared to a direct file read. You might consider trying to compare the speeds and decide which is best. I would assume direct file reading is faster but I can't be sure.

I calculated that with about 200,000 lines at a maximum email lengths of 320 bytes your file shouldn't be any larger than estimated 64MB. However this can vary so the range of this file will most likely be from 1KB to 64MB which should be processed very fast. If the email address is near the beggining of the file it will be much faster but if it's towards the end it will take longer.
Option Explicit
Private Const BUFF_SIZE As Long = 65536 '64kb
Private Const INVALID_HANDLE_VALUE As Long = (-1)
Private Const GENERIC_READ As Long = &H80000000
Private Const OPEN_EXISTING As Long = &H3&
Private Declare Function CreateFileW Lib "kernel32" (ByVal lpFileName As Long, ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, ByVal lpSecurityAttributes As Long, ByVal dwCreationDisposition As Long, ByVal dwFlagsAndAttributes As Long, ByVal hTemplateFile As Long) As Long
Private Declare Function CloseHandle Lib "kernel32" (ByVal hObject As Long) As Long
Private Declare Function ReadFile Lib "kernel32" (ByVal hFile As Long, ByVal lpBuffer As Long, ByVal nNumberOfBytesToRead As Long, lpNumberOfBytesRead As Long, ByVal lpOverlapped As Long) As Long
Public Function EmailExists( _
  ByVal szFile As String, _
  ByVal szEmail As String) As Boolean
  Dim Buffer()  As Byte ' raw buffer
  Dim hFile     As Long ' file handle
  Dim dwBytes   As Long ' The read bytes.
  EmailExists = False
  szEmail = StrConv(szEmail, vbFromUnicode)
  hFile = CreateFileW(StrPtr("\\?\" & szFile), GENERIC_READ, 0, 0, OPEN_EXISTING, 0, 0)
    Debug.Print "Error= " & Err.LastDllError
    Exit Function ' _leave
  End If
  ReDim Buffer(BUFF_SIZE - 1) As Byte
    If ReadFile(hFile, VarPtr(Buffer(0)), BUFF_SIZE, dwBytes, 0) Then
      If InStrB(1, Buffer, szEmail) Then
        EmailExists = True
        Exit Do ' _leave
      End If
      Exit Do ' _leave
    End If
  Loop Until dwBytes = 0
  CloseHandle hFile
  Erase Buffer
End Function
Private Sub Command1_Click()
  Debug.Print EmailExists("d:\emails.txt", "3660208E02@ee.com")
End Sub

Open in new window


Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now