Advertisement

05.15.2004 at 11:36AM PDT, ID: 20990596
[x]
Attachment Details

Searching a big file

Asked by happispider in Visual Basic Programming

Tags: search, large, file

I'm making a program which requires me to extract URLs, or just search for things in files of various sizes.
http://www.experts-exchange.com/Programming/Programming_Languages/Visual_Basic/Q_20978383.html
That there helped me out a bit and I'm checking on the Dim buff() as Byte (using byte array) part of the question.

Summary...  How to search a large file?  Since it's divided into different blocks, and I have InStr searching only 1 block at a time, InStr will miss the search string because the block ends before the search string does.  search "href".  "hr" is at block 1 and "ef" is at block 2.  How am I to search for "href"?  There are more blocks these 2 so I think I'd have to either search an entire file as 1 single string if I want to use InStr.  Is there an alternative because I want to search/extract from larger files.

'buffer$ = String(1000000000, " ") Returns "out of string space" error!
'buffer$ = String(100000000, " ") Jams VB for 1.5 mins!
buffer$ = String(32767, " ") except for future problems
Get #1, , buffer$
If EOF(fileNumber) = False Then Get #1, , sBuffer$

Allocating for 1GB returns and error and allocating for 100MB makes VB slow down for about 1.5 minutes.  What I originally tried were several things I mention at the bottom of Message.  Copying everything from a file to 1 string would seem to me to not work at all for large files or it would go too slowly.

History of attempts to fix this...  This is really tedious so you probably shouldn't read too much of it.
Attempts
1.  Simple as could be:  I'd search for "<a href=", then check to see if there was a quote or apostrophe after "<a href=", then copy the URL until it ran into either quote mark, apostrophe, or ">".  This didn't work out because some URLs didn't end before the magic 32767 block-byte-border.

2.  Buffers were 32767 bytes.  32767 bytes was read into a (primary) pBuffer$ and 32767 bytes into sBuffer$ (secondary)...  It looks like it goes nowhere...  I though that maybe I could do this:  32767 - len(searchstring) or SOMETHING to allow InStr to work on the buffer.  Or I could do instr(pBuffer & sBuffer,1,"href") but then the same problem would occur
pBuffer$ = String(32767, " ")
Get #1, , buffer$
If EOF(fileNumber) = False Then Get #1, , sBuffer$...  That one was a great big failure and it looked funny.

3.  Inside the buffer/block whatever of 32767 bytes I'd search for "=", then copy 32 bytes backwards...  If either "href" or "src" was found I'd copy 8192 bytes forward (where 8192 is max. size of a URL) like this:
____________BORING CODE____________
i=1
do
    i=instr(buffer,"=",i) 'Middle
    if i<>0 then
        back=instrrev(mid(buffer,i-32,32),"href") 'Search 32 bytes backwards for "href"
        forward=instr(left(buffer,8192),q) 'q=chr$(034) which is quotation mark.  Also was another statement for apostrophe and >, which are sometimes the characters right at the end of URLs.
    end if
curChunk=mid(buffer,back,forward) 'Or something
loop until something or other
____________BORING CODE____________

That didn't work out because it was going nowhere and was a big huge pain in the neck.Start Free Trial
[+][-]05.15.2004 at 08:44PM PDT, ID: 11079750

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]05.16.2004 at 01:25AM PDT, ID: 11080284

View this solution now by starting your 7-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

 

About this solution

Zone: Visual Basic Programming
Tags: search, large, file
Sign Up Now!
Solution Provided By: ameba
Participating Experts: 1
Solution Grade: A
 
 
[+][-]06.28.2004 at 05:25PM PDT, ID: 11421600

Experts Exchange has a courteous staff of administrators who help members get the most out of the website by means of administrative comments like this one.

Start your 7-day free trial to view this Administrative Comment or ask the Experts your question.

 
[+][-]07.03.2004 at 08:09PM PDT, ID: 11466221

Experts Exchange has a courteous staff of administrators who help members get the most out of the website by means of administrative comments like this one.

Start your 7-day free trial to view this Administrative Comment or ask the Experts your question.

 
[+][-]07.07.2004 at 04:06AM PDT, ID: 11490085

Experts Exchange has a courteous staff of administrators who help members get the most out of the website by means of administrative comments like this one.

Start your 7-day free trial to view this Administrative Comment or ask the Experts your question.

 
 
Loading Advertisement...
20080716-EE-VQP-32