Start Free Trial

asked on

Need 4 Speed

Hello all.

I'm writing a vb prog (VB5CCE) that reads in a file in a home made format.

I'm not going to give all the details about the progr or the fileformat, as it is confidential, but you'll see that that isn't necessary.

Inside the file we have a 2char constant, followed by a designation number (also 2 chars). After the designation number, the info for that particular record is stored.

I need a *F*A*S*T* routine for finding any givven designation number in the file.

Let's say that the 2char constant is //
That would mean that somewhere in the file, you would have:

...first part of file//01recordinfo//02recordinfo...rest of file

Although it's not that important, the designation number is always the ascii hex representation so that we can store a 4byte number in two bytes

Example:

65535 = FFFF = #255#255

At first, I read the file in a loop and checked the buffer each time, but then I figured that if I read the file only once (entirly), it HAS to be faster than reading from it x times.

Sample code:

Dim buffer As String, Handle As Byte

Handle=FreeFile
Open file For Binary Access Read Lock Read Write As Handle
buffer=Space(Lof(handle))
Get #handle,1,buffer
Close handle

Should you need more info, please ask

Kind roOOars

Ramses

position=instr(buffer,"//" & designation)

ASKER

Well it might be that simple, but that string might be repeated in the file, there are also some rules about what chars follow the designation to validate it. For example:

//#0#1#x#y#z

where #x#y#z are variable numbers that provide some sort of checksum (not really a checksum, but to know this isn't a false alarm)

Ramses

So you need a fast function to search the buffer for record separators?
Am I right?

ASKER

kinda, I tried with the instr function first too, but then, since it IS a binary file, it is likely that the constant+designation number is repeated inside the file without actually being a record seperator (can you still follow me?), that's why I've included a 3char field after each record seperator that does some checking if this is REALY a record seperator.

Don't break your head over that, because when providing sample code just pass the entire extracted string (2char constant+designation+checksum) to the function IsSeperator, which returns a bool value.

Also, what I'd like is a faster way to load the file (API?) and A faster implementation of Instr because it is actually slow.

Ramses

ASKER

BTW: I want the whole process to take up about 1/10th sec/mb

Ramses

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

well the complete syntax is:
InStr([start, ]string1, string2[, compare])
so if the first found does not qualify you just keep on looping until you get the correct one (just put the return value of instr + 4 in the start variable)
but if you have some kind of checksum then do include it in the search
position=instr(buffer,"//" & designation & checksum)
reading in the file faster with the api
i doubt very much you'l get faster speed with the api
>>I want the whole process to take up about 1/10th sec/mb<<
the speed depends for 99 percent on the hardware used and not on the algoritm used (unless ofcource you write an extremely slow algoritm)

Instr is VERY SLOW without a modified algorithm.

Here it is.

This code will search for ALL "\\" occurences in a file.
I doesn't check for mistakes, but you can add that better than I, since I you have the full specs.

On my PIII 450 laptop it finds 28000 occurences in a 3.5 MB file in 0.31 s. (not compiled)
You should be able to add the extra check within the .10 MB/s margin.

To experiment, change the lngBlockSize to 100 or 10000.
You will see a dramatic drop in speed.

Martijn

'*****************************************************
Option Explicit

Private Sub Command1_Click()
Dim sinS As Single
Dim sinE As Single
Dim lngFound As Long

sinS = Timer
lngFound = fcnSearch("d:\temp\data.dat")
sinE = Timer
MsgBox "found " & lngFound & " time: " & sinE - sinS
End Sub

Public Function fcnSearch(strFileName As String) As Long
Dim lngBlockSize As Long
Dim lngFile As Long
Dim lngLen As Long
Dim lngLOF As Long
Dim lngFilePos As Long
Dim lngNoBlocks As Long
Dim lngExtra As Long
Dim lngMain As Long
Dim arrBlock() As Byte
Dim lngChar As Long
Dim lngBlockLen As Long
Dim lngCurrentState As Long
Dim lngCount As Long
Dim arrDelimit(1) As Byte
Dim lngFound As Long

arrDelimit(0) = AscB("\")
arrDelimit(1) = AscB("\")

lngBlockSize = 2000&
'*** instr base compatibility: the ' -1& ' would slow down the parser
ReDim arrBlock(1& To lngBlockSize)

lngFile = FreeFile
Open strFileName For Binary As lngFile
lngLOF = LOF(lngFile)
lngNoBlocks = lngLOF \ lngBlockSize
lngExtra = lngLOF Mod lngBlockSize
lngMain = lngNoBlocks * lngBlockSize

lngCurrentState = 0&

For lngFilePos = 1& To lngMain Step lngBlockSize
Get #lngFile, lngFilePos, arrBlock()
'*** sorry but gosub is faster than a function call
GoSub tagSearch
Next

If lngExtra <> 0 Then
ReDim arrBlock(1& To lngExtra)
Get #lngFile, lngFilePos, arrBlock()
'*** sorry but gosub is faster than a function call
GoSub tagSearch
End If
Close lngFile

fcnSearch = lngFound
Exit Function

tagSearch:

lngChar = 1&
If lngCurrentState = 1& Then
If arrBlock(1&) = arrDelimit(1&) Then
'*** found a delimiter - remove the slow print function
'Debug.Print "found at: " & lngFilePos + lngChar - 1&
lngFound = lngFound + 1&
lngCurrentState = 0&
lngChar = 2&
End If
End If

lngBlockLen = UBound(arrBlock) + 1&
Do
'*** searching
lngChar = InStrB(lngChar, arrBlock, arrDelimit, vbBinaryCompare)
If lngChar = 0& Then
lngChar = lngBlockLen
Else

lngChar = lngChar + 1&
'*** found a delimiter - remove the slow print function
'Debug.Print "found at: " & lngFilePos + lngChar - 1&
lngFound = lngFound + 1&
End If
Loop Until lngChar = lngBlockLen

If arrBlock(lngBlockLen - 1&) = arrDelimit(0) Then
'*** delimit was cut
lngCurrentState = 1&
End If
Return

End Function

ASKER

MartijnB, that looks EXACTLY wat I had in mind. I'll see if I can implement it, and if so, I'll award the points and close this question. If not, let's keep on it.

Please allow +/- max 2hrs after this comment to implement (might be faster, but, you never now)

C ya soon

Ramses

ASKER

What's with the "sorry, but gosub is faster than a function call"?

Did your teacher also refered to goto's and gosubs as habbits of lazy programmers?

"and gosubs as habbits of lazy programmers?"

Some ppl may think that :-)
I never use Gosub myself, but it made some difference inside the loop.

ASKER

On my system (without the extra check), it comes to a speed of .3 sec/mb [checked with a 13mb file (uncached)]

I am almost satisfied.

Martijn,

I once read somewhere an article about a certain api call to quickly load large files (and small too, I hope). Maybe that can provide the extra .2 seconds speed that I need. I can't remember where I read it, though.

About my system: it's a Compaq Deskpro 2000 (P166) with 96Mb Ram (non edo). I know it's not much, but the minimum system requirements for the program will have to be 80486@100Mhz

I suspect you all know the H*U*G*E* difference between a P166 and a 486

.2secs on a P166 are maybe 3 secs on a 486

Sorry, those standards are not mine, but I have to obbey to them.

Ramses

Aha,

I checked, you're right the file has been cached. 1.92 secs for my 3.5 MB file.

Benchmarked my system: 6MB/s sequential read. (uncached)
It's scaled 5 times faster than 2GB EIDE (486)
pierrecampe may be right right on the hardware point...

But there is a margin:
3.5 MB in 1.92 s
means 1.92 - .3 VB time = 1.6 s disk read time
The sequential read was 3.5MB.
minimum disk read time could be 0.7 s.

I'm interestedin finding a faster read routine myself.
I'll have a look.

Martijn

ASKER

tnx

Alright, here is the changed version.

I changed the file read to read the 3.5 MB file in a single array.
This seams to be faster. Instr still uses the blocks.
CopyMemory copies the data from one array to the other.

I've found no evidence of API calls that could be faster than the binary read.

There are just a few changes:

'*************************************************
Option Explicit

Public Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" ( _
ByRef Destination As Any, _
ByRef Source As Any, _
ByVal numbytes As Long)

Public Function fcnSearch2(strFileName As String) As Long
Dim lngBlockSize As Long
Dim lngFile As Long
Dim lngLen As Long
Dim lngLOF As Long
Dim lngFilePos As Long
Dim lngNoBlocks As Long
Dim lngExtra As Long
Dim lngMain As Long
Dim arrBlock() As Byte
Dim lngChar As Long
Dim lngBlockLen As Long
Dim lngCurrentState As Long
Dim lngCount As Long
Dim arrDelimit(1) As Byte
Dim lngFound As Long
Dim arrFile() As Byte

arrDelimit(0) = AscB("\")
arrDelimit(1) = AscB("\")

lngBlockSize = 3200&
'*** instr base compatibility: the ' -1& ' would slow down the parser
ReDim arrBlock(1& To lngBlockSize)

'*** load the entire file here
'*** this could cause memory issues
lngFile = FreeFile
Open strFileName For Binary As lngFile
lngLOF = LOF(lngFile)
ReDim arrFile(lngLOF)
Get #lngFile, 1&, arrFile()
Close lngFile
'*** end of file read

lngNoBlocks = lngLOF \ lngBlockSize
lngExtra = lngLOF Mod lngBlockSize
lngMain = lngNoBlocks * lngBlockSize

lngCurrentState = 0&

For lngFilePos = 1& To lngMain Step lngBlockSize
''''''''''Get #lngFile, lngFilePos, arrBlock()
CopyMemory arrBlock(1), arrFile(lngFilePos), lngFilePos
'*** sorry but gosub is faster than a function call
GoSub tagSearch
Next

If lngExtra <> 0 Then
ReDim arrBlock(1& To lngExtra)
''''''''''Get #lngFile, lngFilePos, arrBlock()
CopyMemory arrBlock(1), arrFile(lngFilePos), lngExtra
'*** sorry but gosub is faster than a function call
GoSub tagSearch
End If

fcnSearch2 = lngFound
Exit Function

tagSearch:

lngChar = 1&
If lngCurrentState = 1& Then
If arrBlock(1&) = arrDelimit(1&) Then
'*** found a delimiter - remove the slow print function
'Debug.Print "found at: " & lngFilePos + lngChar - 1&
lngFound = lngFound + 1&
lngCurrentState = 0&
lngChar = 2&
End If
End If

lngBlockLen = UBound(arrBlock) + 1&
Do
'*** searching
lngChar = InStrB(lngChar, arrBlock, arrDelimit, vbBinaryCompare)
If lngChar = 0& Then
lngChar = lngBlockLen
Else

lngChar = lngChar + 1&
'*** found a delimiter - remove the slow print function
'Debug.Print "found at: " & lngFilePos + lngChar - 1&
lngFound = lngFound + 1&
End If
Loop Until lngChar = lngBlockLen

If arrBlock(lngBlockLen - 1&) = arrDelimit(0) Then
'*** delimit was cut
lngCurrentState = 1&
End If
Return

End Function

ASKER

Test results:

Uncached: .4sec/mb
Cached : .2sec/mb

This approach takes .1sec/mb longer than the previous one with uncached files and .1sec/mb shorter than the previous one.

Also, to use this approach we have to add extra code for memory managment and that will also slow down.

About the test results, I use TimePassed or something, but I can't give you the details because it's in a typelib with some benchmarking funcitons that we have to use, to test our processes and subprocesses for speed

I go to bed now, and I'll be back on @ about 10am GMT+1

See you then, and remember:

Sleep tight, don't let the bedbugs bite :-)

Ramses

I think ramses expects too much...
Small improvements are possible:
Use string buffer and Instr (which is very fast, BTW), and read file in 2 KB (2048) chunks.
To measure time use GetTickCount, and for filehandle use Integer.

Quote: "Use string buffer and Instr (which is very fast, BTW), and read file in 2 KB (2048) chunks."

That's exactly what the code does.

Instr can be beaten by plain VB code:
a for - next with an inline cascading if - then.
If a string is over 30kb instr crowls.

BTW:
there is a very dangerous bug in the second codechunk:

For lngFilePos = 1& To lngMain Step lngBlockSize
''''''''''Get #lngFile, lngFilePos, arrBlock()
'''BAD:: CopyMemory arrBlock(1), arrFile(lngFilePos), lngFilePos
CopyMemory arrBlock(1), arrFile(lngFilePos), lngBlockSize
'*** sorry but gosub is faster than a function call
GoSub tagSearch
Next

>That's exactly what the code does
I don't see Instr and string, I see InstrB and byte array.

I cannot show the code, I didn't save it - I renamed my function and tested second sample :-)

> Get #lngFile, lngFilePos, arrBlock()
this is a little bit faster:
Get #lngFile, , arrBlock()

Ok - instrB is not exactly Instr.
Instr and string is not an option here due to unicode conversion. (the data is binary)

Do you claim that Instr is faster than InstrB?

Get #lngFile, , arrBlock()
Agreed :-)

>Do you claim that Instr is faster than InstrB?

Yes. But they are for different data types.
Byte array will load faster than string, but searching is slower.

Of course, I'm talking about case sensitive Instr, the vbTextCompare verasion is ~10 times slower than default vbBinaryCompare.

Well this beats me:
Instr is 764 times faster than InstrB.

Really ridiculous. Do you have an explanation for this?

ASKER

Guys

It's true that -I- might be excpecting too much, but I got certain guidelines from the principal. He stated that the subprocess that reads a certain record mustn't take longer then 0.5sec/mb on a P166.

That means:

-skipping to record in file
-reading record
-parsing the data and filling a listview with it.

As you can see, we can tweak the algorithms for reading the record and parsing the data, but there is always the matter of the listview which is quite slow to add data, even when used with the LockWindowUpdate api (without that it's SOOOOOOO slow).

If it would be "impossable" to speed up the record search routine, maybe you could suggest a faster way to load the listview. Maybe there is some API call to load a large set of data in one step.

Ramses

ASKER

From my own research, I found something to load -what MS called back then "Huge" files (when refering to bitmaps)- on MSKB, but it was for Win3.1 and since they are mostly 16bit api's I don't think they might provide increase in speed.

When you got a sec, take a look at this page:

http://support.microsoft.com/support/kb/articles/Q100/5/13.asp?LN=NL&SD=gn&FR=0

Ramses

MartijnB
>Instr is 764 times faster than InstrB.

I don't understand, are you making fun from my statements, and you don't believe Instr is faster?

ramses
>maybe you could suggest a faster way to load the listview

I use listview as my main list control, and I can fill it very fast. If you give your example, I can suggest changes.

>take a look at this page:

There are 32-bit versions of that APIs, but I don't expect loading the whole file can be faster than what MartijnB suggested - loading small chunks.

Ameba,

I tested Instr and InstrB on a 400k string/bytearray.
Finding all 2000 substrings.
Instr was 764 times faster.

On smaller strings the difference is not this extreme.
But it puzzles me.

Martijn

ASKER

Does the seek function take much time?

I mean, it might be interesting in large files, to split them in four when searching for a certain record seperator

Pseudo

Read file,start+pos
Read file,center-pos
Read file,center+pos
Read file,end-pos

ASKER

>I use listview as my main list control, and I can fill it very fast. If you give your example, I can
suggest changes

I don't mean the algorithm to add the items, after all, we all know how to add items to a listview

Dim lItem as ListItem

Set lItem=ListView.ListItems.Add(,"item description",I_large,I_small)
lItem.SubItems(1)="subitem text 1"
lItem.SubItems(2)="Subitem text 2"
...

But I mean, suppos I'd make a UDT array like this:

UDT llItemType
Text As String
Icon_L As String
Icon_S As String
SubItems() As String
END UDT

Then, pass the whole UDT to some api which adds them all at once.

I know this is a bit too optimistic to expect, but are you guys aware of anything like it? I mean, some programs I've seen start in 2secs and have a listview with 200.000 items, while, when I add them, it takes about half a minute.

Ramses

Not too much.

The first code I send splits the file in blocks
that are optimized for InstrB. This also has the advantage
that at any point of time, just 2000 bytes of the file are
in memory.

The second one reads the whole file. This is faster - if you remove the bug - on systems with enough memory.

I see the following optimizations:
Read in blocks of 64kb and search in blocks of 2kb.
Somehow convert the data to a string to search.

Martijn

MartijnB
>I tested Instr and InstrB on a 400k string/bytearray.

Ah, so... OK. Instr is faster because MS team optimized it.
Theoretically, InstrB can be optimized to be 2 times faster, but no-one took the time to do that.

ramses,
>after all, we all know how to add items to a listview
>
>Set lItem=ListView.ListItems.Add(,"item description",I_large,I_small)

Well, that line can work once, since "item description" cannot be a Key for second item.
ListView.ListItems is also very slow - you are asking for control property "ListItems" when adding each item.
And do you really need Listitem object "lItem"?

>some api which adds them all at once
No, the nearest possible is to set initial size of the listview:
SendMessage LV1.hwnd, LVM_SETITEMCOUNT, 10000, 0&

>listview with 200.000 items
That is virtual listview, but it has slow scrolling - each item is redrawn "on request"

LVM_SETITEMCOUNT makes filling ListView only 3-4% faster.

on the slowness or fastness of finding a record
the apparent slowness is largely due to the fact that you read in large chunks of data while you apparently need to read in just 1 record
also the speed of what you are doing largely depends on the hardware and even on the same hardware the speed will largely differ due to the disk reads, the speed of wich depends on the lenght of a disk cluster,the fragmentation,the interleave factor,the place of the file on disk etc...
i think you will have to rethink your problem
maybe sacrifice disk room for speed,random access will increase the record finding speed many times,if not possible then an index may increase speed many times also
maybe by just rethinking the problem you can limit the number of disk accesses/number of bytes read
i suppose a record has some unique identifier
now if there is a maximum lenght a record can have then just multiply that maximum lenght by 3 or 4 just to be sure and use random access even if that increases the filesize from say 5 MB to 50 MB,diskspace is cheap
then you will just need 1 read to get a record
if you want to keep the file as it is make a random access index on it that will limit the diskaccess to 2 reads
maybe exactly this is not possible may there probably is a way to limit the disk reads/bytes read

also:
>>I'm not going to give all the details about the progr or<<
>>the fileformat, as it is confidential, but<<
>>you'll see that that isn't necessary.<<
well i think the details about the prog are not needed
but if you dont give the details about the fileformat then nobody will be able to help you

I agree with you Pierre,

It may also be wise to create an index file.

The reason that I am interested in this, is that I am writing a program that reads/parses large XML files.

ASKER

Thank you for your comments and suggestions so far.

Ameba
>Well, that line can work once, since "item description" cannot be a Key for second item

>>so I forget to type an extra comma before "item description". I'm not that kind of novice programmer, I just happened to forget to type it in here, you know (let's please not be picky)

OK guys, I guess you're right about the fact that it is actually guessing if you don't know what you're dealing with, so I'll disclose some info on the file format.

The thing I call a Record has a variable length however

indexfile specification
-----------------------

Byte number

1 2 3 4 5 6 7 8
1234567890123456789012345678901324567890123456789012345678901234567901234567890

CBNNNFF\folder10\folder20\folder300~?dnfile10?SSSDDDDDddddd0

Description

CB -> identifier string, always CB
NNN -> 3 chars field, number of files in this file
(Max number of files: 16.777.215)
FF -> 2 chars field, number of folders in this file
(Max number of folders: 65.535)
\folder1 -> name and level of folder1, followed by #0

#0 end of folders

When the file is read, each folder gets a 4 byte designation number, first folder is
0000, 2nd folder 0001, 3rd folder 0002, etc...
Maximum number of folders: FFFF=65.535

~? -> start of folder
dn -> 4byte designation number of folder
(ie files in here belong to folder dndn)
file1 -> name and extension of file 1, followed by #0
a -> attributes of file (1/2 byte)
No attribs=0, Archive=1, Hidden=2, Read-Only=4, System=8
Combinations are possible to OR values
e -> unit of file size (1/2 byte)
Can be any of the following:
0=bytes, 1=Kbytes, 2=Mbytes, 3=Gigabytes, 4=Terrabytes
SSS -> 3 char field, size of file (when expanded resize to 6 chars by
adding 0's at the start, then take first three as start, last three
as decimal, then multiply with power e to get filesize
dddd -> 5 byte field, creation date & time
DDDD -> 5 byte field, last modified date & time
0 -> terminator, end of file list in current folder

ASKER

replace the ? in the master string with ae (as one char)

ASKER

also note that the alignment of this specification has been completaly ruined because of EE's lack of html support. For optimal view, please copy the entire specification to an app that is capable of rendering it in a fixed width font (like Courier New)

Ramses

ramses
>so I forget to type an extra comma before "item description". I'm not that kind of novice programmer,
>I just happened to forget to type it in here, you know (let's please not be picky)

Sorry if it looked 'picky', but "one comma missing" was not obvious to me - using or not using Key argument makes big difference in performance.
To my knowledge, many programmers use Key to identify row (they put record ID into listitem Key)

ASKER

In this case, Key is not needed nor used, as all the data about a particular item is static and already provided by it's subitems when the list is loaded. Should the user want some details about an item when in Icon view, I just extracted it from the subitems collection of the selected item.

Ramses

No need to appology, no harm done

OK. ... Trying to understand that secret formatting of the secret file-system you are creating :-)

Maybe shorter way to save folders
- if you have 2 folders, each with 2 files

c:\temp\bin\
c:\temp\help\

You can save it like this:
Folders part: ID,parentID,folderName
Files part: folderID, filenames
-------------------------------
1,0,temp
2,1,bin
3,1,help
2
file
another file
3
file
another file
-------------------------------

and - avoid 'full path' ??

\temp10\bin20\
\temp10\help20\

ASKER

I'll explain it to you as it is maybe not that obvious.

The fist two bytes of the file are always CB it's a identification string (like MZ for exes), following that, you have a three char field that specifies how many files are in this file (just for statistics), then comes a 2 char field that holds the number of folders in this file, followed by the actual folder names, each separated by a #0 char

When the file is processed each folder gets a 2 char designation number (in mem). ie the first foldername in the file will be designation #0#1, the second #0#2 and so on.

After the list of folder names, another #0 follows, then comes the IsFolder Constant (~?) followed by it's designation number. This means that all entrys following this belong to folder with designation xx.

Now come the files

after the filename there is a #0 character, followed by 14 char fileinfo field. If after that a #0 character is found, it means end of file list for this folder, otherwise the next file comes.

If this is the file:
c:\temp\help\readme.txt
what is the folder? Is it one folder or two?

What is the UI? Treeview on the left to select folder, and Listview with filenames on the right?

yes are those 'folders' hierarchical ?
can a folder have sub-folders ?
if so can it go to any deepnes
if so can every folder have files
is it a kind of file system ?

ASKER

Yes pierrecampe, it's an entire tree. At the start of the file, all folders al listed in order they are collected with the FindFirst, FindNext api calls. Their position in the file also make up their designation number. Subfolders are treaded exactly as folders. And every folder can have files, but doesn't have to have. If there are no files in a certain folder, then after the folder designation number the text EMPTY is stored in the file.

Yes ameba, kinda like that

Ramses

You didn't answer all the questions, but OK.

You have two parts:
1. Folders info
2. Files in each folder (for each folder you have one "block")

Add indexing (as already suggested by pierrecampe):
- calculate size of each 'file block'

block 1, contains file info for all files in folder 1
size 30500
block 2, contains file info for all files in folder 2
size 20100

During the file creation, add that info near each folder.

In your program - reader of that file:

When you load your form, open file (and keep it open until end of program).
At start, read only "Folders Info", and fill treeview.

When user clicks on folder 3, calculate position:
position = size of "Folders info" + 30500 (size of first block) + 20100 (size of second block)

and use Seek to goto that position (Seek doesn't read anything from file, it only moves 'file pointer').

No need to search/reread the whole file whenever user selects some folder.

Indexing info can be also in separate file, or you can create it whenever you open the file.

"Keep the file open" - this is not very nice, but that is what any database does.

ASKER

wooow, wait a sec, there is no touching on the file format. It's not mine to change, I have to obey the specs you know.

Second, as already mentioned, the directory listing records are variable length.

Third, keeping the file open all the time is quite impossable because there are (or will be) hundereds of them used in one session.

ramses,,
>Second, as already mentioned, the directory listing records are variable length.

Of course, if they were fixed, there won't be a need to calculate or save block lenght. It would be trivial: recNo * size.

>keeping the file open all the time is quite impossable

That is good to know - Then you'll have to reopen file and use Seek whenever some folder is selected.

ASKER

First, I'd like to explain what I think about EE and asking questions.

I'm not looking in EE for a Question-Answer-Period approach, I want it to be a discussion with colleques about an issue I'm facing. If, during the debate I should come up with the right solution, we'll work something out to make sure the right persons (not me) get the credits for their efforts. Splitting points, when necessary is also an option.

Havving this said, I'd like to throw something in the debate.

Let's say I like the idea about keeping the file open, not for the entire session, but for as long as the user is working with it (ie the form which shows/alters/processes the data is shown).

Secondly, how about indexing (in memory) filepositions?

I'll explain.

Before the form's shown/updated, I'll enumerate all directorys, that's easy because their always at the beginning of the file. First I read two chars at offset 6 which tells me how much folders to expect, then, starting at offset 8 the folders are listed.

When I reach the end of the folders, I know that following that, the first folders listing is in the file, so I could set the treeview's node tag for that folder to the current position.

At startup, the Root item (folder 1) gets programatically selected to update the listview. Since we already have that starting position, we don't need to search for it. At the end of that folder listing, we know it's the start of the second folder, so we can already set the node with index(2)'s tag to that seek position.

If the user selects a treeview node that is a child of a node that has a tag value, and hasn't got a tag value of it's own (yet) we know it's "near" to it's position too, or at the least, we already have an offset to start.

Please tell me, if you know, what is faster

offset=TreeView1.Nodes(index).Tag

or

offset=OffsetArray(index)

Right, that would mean an additional array that also will have to be exposed to the Module level procedures, and since -following the specs- that would mean passing them as a parameter to all procedures, and not declaring them globaly

What I would like to know is, the overhead that this approach will cause, will it really make a difference (more lines of code=slower execution)

Please feel free to suggest anything you think might help, as stated above in this comment, I don't look for a Question-Answer-Period type of thing.

Ramses

Before opening the file and reusing the previously done indexing, you should check if file has changed - use FileDateTime(), and if it has changed, do the indexing again (into memory or into some external .ndx file)

ASKER

It's not likely to change. Only thing that might happen to the file is that it's upgraded to a new version, and in that case (WHEN that would happen) I'll make sure to feed my comments to the people who come up with the format.

Even if the files are re-created, the won't be so much as a single byte of difference. (unless when upgraded)

Ramses

We posted in the same minute..

>Please tell me, if you know, what is faster

.Tag is very fast, but I would use array and save that useful Tag thingie.

ASKER

Guys (dunno if they're any girls here, if so SPEAK UP please), I'm off to bed now. Be back @ 10am GMT+1

Have a good night.

Ramses

>>there is no touching on the file format<<
ok so lets see if i know what the file writing program does
it finds the first dir in the root writes it to the file and thats folder1
then it finds the the first dir in that dir writes it to the file and thats folder2...etc
after it finds no more dirs it starts over with folder1 and writes each of its files to the writtenfile then folder2...etc
if thats whats happens you can make an index file like this:
*dir1,12345,12378,12401,etc
*dir2,24576,24591,etc
etc...
the numbers are the positions of each file in the writtenfile
this indexfile can be read very fast,it can even be kept in memory in an array of variants
and for each dir you know where its files start in the writtenfile and as ameba said you can seek immediately to any file,and if the index is kept in memory you'l need only 1 disk read to access any file
and if you want you can even read in all the files in a dir in 1 disk read

hey guys seems everybody is writing at the same time

Sleep ??? - use Sleep API (Apis are fast) and you'll need 2-3 hours less. ;-)

ASKER

Before I go to bed, you're almost right pierrecampe.

You're right about the first part, (folders), but then when it comes to folder listing, there is no value inside the file that tells me how much files there are in a certain folder (only how much file in total).

Each filelisting for a particular folder begins with a 2char constant followed by a 2char folder designation number, followed by file1, followed by a nullchar, followed by a 14char field which holds information like attributes of file, FileSize Unit (ie Kb, Mb,...), Filesize, Creation Date, Modified Date, followed by file 2,... at the end of the filelisting for each folder follows a nullchar. Filenames are not in 8.3 format but long filenames, and there is no option to padd each filename to 255 chars as this would make the file unnecessary large, and besides, I can't do that because I must obey the specs.

Ramses

ASKER

Good one ameba, please give me the declares so I can implement it in my system (body). Be sure to be kind enough to include the necessary hardware to program the routines in my brains. :)))

Ramses

C ya all tomorrow (actually, here it's already 1.13am 11/19/01)

:-)

ramses i like your idea
>>and since -following the specs- that would mean passing them as a parameter to all procedures, and not
declaring them globaly
come on if the specs ask for speed then tell the specs-makers that means globally
if the specs say -programmer has to kill himself after coding because this thing is so secret- what would you do ? :-)
so the specs say no globals but gosub is ok ?

I see so you have no say about the program that writes that file right, what i meant was that index has to be written by the program that writes that file
and you may not add that index yourself right ?
you have my sympaty

ASKER

Actually, the only Goto we're allowed to use is On Error Goto 0

Even Error Handling must be done at the place where the exception can be raised

ie not

Dim a As Long, b As Long

On Error Goto ErrHandler

a=100
b=0

Msgbox Cstr(a/b)

Exit Sub

ErrHandler:
Msgbox "An error occured"

But

Dim a as Long, b As Long

a=100
b=0

On Error Resume Next
Msgbox Cstr(a/b)
If Err<>0 Then
Err.Source=CurrentProcess
IF Not ErrorHandler(Err) Then Exit Sub
End If
On Error Goto 0

Active error trapping is allowed only at a place where an inevatable exception may occur. So in the example I just gave, I'm not allowed to use an errorhandler since all exceptions that can be raised by performing mathematical calculations, can be avoided by evaluating the input variables (ie, check if divider is not zerro, etc)

Please update the expert here who have so willingly stepped in to help you, since much time has passed since your last comments, and Email notifications may not have been generated to the participating experts here due to some problems at that time. If you've been helped, accept the respective question by that expert to grade and close it.

Somewhat off-topic, but important.

****************************** ALERT********************************
WindowsUpdate - Critical Update alert March 28, 2002 from Microsoft
http://www.microsoft.com/technet/treeview/default.asp?url=/technet/security/bulletin/ms02-015.asp
Synopsis:
Microsoft Security Bulletin MS02-015
28 March 2002 Cumulative Patch for Internet Explorer
Originally posted: March 28, 2002
Summary
Who should read this bulletin: Customers using Microsoft® Internet Explorer
Impact of vulnerability: Two vulnerabilities, the most serious of which would allow script to run in the Local Computer Zone.
Maximum Severity Rating: Critical
Recommendation: Consumers using the affected version of IE should install the patch immediately.
Affected Software:
Microsoft Internet Explorer 5.01
Microsoft Internet Explorer 5.5
Microsoft Internet Explorer 6.0

Thought you'd appreciate knowing this.
":0)
Asta

ASKER

ok thanks for the help guys. I was unable to login for few months because of lost password. Anyway... since this is a question from so long time ago, I don't know who will be the right person to reward the points.

MartijnB
Ameba
Piercecamp

or if you guys want to accept a 3-way split, just tell me

Ramses

Thanks, Ramses, sorry you had login problems, and happy to see that is now fine.

I have processed the point splits for you, as requested.

Points for ameba -> https://www.experts-exchange.com/jsp/qShow.jsp?qid=20314982
Points for pierrecampe -> https://www.experts-exchange.com/jsp/qShow.jsp?qid=20314983

Moondancer - EE Moderator