I'm making a program which requires me to extract URLs, or just search for things in files of various sizes.
http://www.experts-exchange.com/Programming/Programming_Languages/Visual_Basic/Q_20978383.htmlThat there helped me out a bit and I'm checking on the Dim buff() as Byte (using byte array) part of the question.
Summary... How to search a large file? Since it's divided into different blocks, and I have InStr searching only 1 block at a time, InStr will miss the search string because the block ends before the search string does. search "href". "hr" is at block 1 and "ef" is at block 2. How am I to search for "href"? There are more blocks these 2 so I think I'd have to either search an entire file as 1 single string if I want to use InStr. Is there an alternative because I want to search/extract from larger files.
'buffer$ = String(1000000000, " ") Returns "out of string space" error!
'buffer$ = String(100000000, " ") Jams VB for 1.5 mins!
buffer$ = String(32767, " ") except for future problems
Get #1, , buffer$
If EOF(fileNumber) = False Then Get #1, , sBuffer$
Allocating for 1GB returns and error and allocating for 100MB makes VB slow down for about 1.5 minutes. What I originally tried were several things I mention at the bottom of Message. Copying everything from a file to 1 string would seem to me to not work at all for large files or it would go too slowly.
History of attempts to fix this... This is really tedious so you probably shouldn't read too much of it.
Attempts
1. Simple as could be: I'd search for "<a href=", then check to see if there was a quote or apostrophe after "<a href=", then copy the URL until it ran into either quote mark, apostrophe, or ">". This didn't work out because some URLs didn't end before the magic 32767 block-byte-border.
2. Buffers were 32767 bytes. 32767 bytes was read into a (primary) pBuffer$ and 32767 bytes into sBuffer$ (secondary)... It looks like it goes nowhere... I though that maybe I could do this: 32767 - len(searchstring) or SOMETHING to allow InStr to work on the buffer. Or I could do instr(pBuffer & sBuffer,1,"href") but then the same problem would occur
pBuffer$ = String(32767, " ")
Get #1, , buffer$
If EOF(fileNumber) = False Then Get #1, , sBuffer$... That one was a great big failure and it looked funny.
3. Inside the buffer/block whatever of 32767 bytes I'd search for "=", then copy 32 bytes backwards... If either "href" or "src" was found I'd copy 8192 bytes forward (where 8192 is max. size of a URL) like this:
____________BORING CODE____________
i=1
do
i=instr(buffer,"=",i) 'Middle
if i<>0 then
back=instrrev(mid(buffer,i
-32,32),"h
ref") 'Search 32 bytes backwards for "href"
forward=instr(left(buffer,
8192),q) 'q=chr$(034) which is quotation mark. Also was another statement for apostrophe and >, which are sometimes the characters right at the end of URLs.
end if
curChunk=mid(buffer,back,f
orward) 'Or something
loop until something or other
____________BORING CODE____________
That didn't work out because it was going nowhere and was a big huge pain in the neck.
Start Free Trial