Link to home
Start Free TrialLog in
Avatar of ramses
ramses

asked on

Need 4 Speed

Hello all.

I'm writing a vb prog (VB5CCE) that reads in a file in a home made format.

I'm not going to give all the details about the progr or the fileformat, as it is confidential, but you'll see that that isn't necessary.

Inside the file we have a 2char constant, followed by a designation number (also 2 chars).  After the designation number, the info for that particular record is stored.

I need a *F*A*S*T* routine for finding any givven designation number in the file.

Let's say that the 2char constant is //
That would mean that somewhere in the file, you would have:

...first part of file//01recordinfo//02recordinfo...rest of file

Although it's not that important, the designation number is always the ascii hex representation so that we can store a 4byte number in two bytes

Example:

65535 = FFFF = #255#255

At first, I read the file in a loop and checked the buffer each time, but then I figured that if I read the file only once (entirly), it HAS to be faster than reading from it x times.

Sample code:

Dim buffer As String, Handle As Byte

Handle=FreeFile
Open file For Binary Access Read Lock Read Write As Handle
buffer=Space(Lof(handle))
Get #handle,1,buffer
Close handle

Should you need more info, please ask


Kind roOOars


Ramses
Avatar of pierrecampe
pierrecampe

position=instr(buffer,"//" & designation)

Avatar of ramses

ASKER

Well it might be that simple, but that string might be repeated in the file, there are also some rules about what chars follow the designation to validate it.  For example:

//#0#1#x#y#z

where #x#y#z are variable numbers that provide some sort of checksum (not really a checksum, but to know this isn't a false alarm)

Ramses
So you need a fast function to search the buffer for record separators?
Am I right?
Avatar of ramses

ASKER

kinda, I tried with the instr function first too, but then, since it IS a binary file, it is likely that the constant+designation number is repeated inside the file without actually being a record seperator (can you still follow me?), that's why I've included a 3char field after each record seperator that does some checking if this is REALY a record seperator.

Don't break your head over that, because when providing sample code just pass the entire extracted string (2char constant+designation+checksum) to the function IsSeperator, which returns a bool value.

Also, what I'd like is a faster way to load the file (API?) and A faster implementation of Instr because it is actually slow.


Ramses
Avatar of ramses

ASKER

BTW: I want the whole process to take up about 1/10th sec/mb


Ramses
ASKER CERTIFIED SOLUTION
Avatar of MartijnB
MartijnB

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
well the complete syntax is:
InStr([start, ]string1, string2[, compare])
so if the first found does not qualify you just keep on looping until you get the correct one (just put the return value of instr + 4 in the start variable)
but if you have some kind of checksum then do include it in the search
position=instr(buffer,"//" & designation & checksum)
reading in the file faster with the api
i doubt very much you'l get faster speed with the api
>>I want the whole process to take up about 1/10th sec/mb<<
the speed depends for 99 percent on the hardware used and not on the algoritm used (unless ofcource you write an extremely slow algoritm)
Instr is VERY SLOW without a modified algorithm.
Here it is.

This code will search for ALL "\\" occurences in a file.
I doesn't check for mistakes, but you can add that better than I, since I you have the full specs.

On my PIII 450 laptop it finds 28000 occurences in a 3.5 MB file in 0.31 s. (not compiled)
You should be able to add the extra check within the .10 MB/s margin.

To experiment, change the lngBlockSize to 100 or 10000.
You will see a dramatic drop in speed.

Martijn

'*****************************************************
Option Explicit

Private Sub Command1_Click()
  Dim sinS             As Single
  Dim sinE             As Single
  Dim lngFound         As Long
 
  sinS = Timer
  lngFound = fcnSearch("d:\temp\data.dat")
  sinE = Timer
  MsgBox "found " & lngFound & " time: " & sinE - sinS
End Sub

Public Function fcnSearch(strFileName As String) As Long
  Dim lngBlockSize     As Long
  Dim lngFile          As Long
  Dim lngLen           As Long
  Dim lngLOF           As Long
  Dim lngFilePos       As Long
  Dim lngNoBlocks      As Long
  Dim lngExtra         As Long
  Dim lngMain          As Long
  Dim arrBlock()       As Byte
  Dim lngChar          As Long
  Dim lngBlockLen      As Long
  Dim lngCurrentState  As Long
  Dim lngCount         As Long
  Dim arrDelimit(1)    As Byte
  Dim lngFound         As Long
 
  arrDelimit(0) = AscB("\")
  arrDelimit(1) = AscB("\")

  lngBlockSize = 2000&
  '***  instr base compatibility: the ' -1& ' would slow down the parser
  ReDim arrBlock(1& To lngBlockSize)

  lngFile = FreeFile
  Open strFileName For Binary As lngFile
  lngLOF = LOF(lngFile)
  lngNoBlocks = lngLOF \ lngBlockSize
  lngExtra = lngLOF Mod lngBlockSize
  lngMain = lngNoBlocks * lngBlockSize

  lngCurrentState = 0&

  For lngFilePos = 1& To lngMain Step lngBlockSize
    Get #lngFile, lngFilePos, arrBlock()
    '***  sorry but gosub is faster than a function call
    GoSub tagSearch
  Next

  If lngExtra <> 0 Then
    ReDim arrBlock(1& To lngExtra)
    Get #lngFile, lngFilePos, arrBlock()
    '***  sorry but gosub is faster than a function call
    GoSub tagSearch
  End If
  Close lngFile
 
  fcnSearch = lngFound
  Exit Function

tagSearch:

  lngChar = 1&
  If lngCurrentState = 1& Then
    If arrBlock(1&) = arrDelimit(1&) Then
      '***  found a delimiter - remove the slow print function
      'Debug.Print "found at: " & lngFilePos + lngChar - 1&
      lngFound = lngFound + 1&
      lngCurrentState = 0&
      lngChar = 2&
    End If
  End If

  lngBlockLen = UBound(arrBlock) + 1&
  Do
    '***  searching
    lngChar = InStrB(lngChar, arrBlock, arrDelimit, vbBinaryCompare)
    If lngChar = 0& Then
      lngChar = lngBlockLen
    Else

      lngChar = lngChar + 1&
        '***  found a delimiter - remove the slow print function
        'Debug.Print "found at: " & lngFilePos + lngChar - 1&
        lngFound = lngFound + 1&
    End If
  Loop Until lngChar = lngBlockLen
 
  If arrBlock(lngBlockLen - 1&) = arrDelimit(0) Then
    '***  delimit was cut
    lngCurrentState = 1&
  End If
  Return

End Function
Avatar of ramses

ASKER

MartijnB, that looks EXACTLY wat I had in mind.  I'll see if I can implement it, and if so, I'll award the points and close this question.  If not, let's keep on it.  

Please allow +/- max 2hrs after this comment to implement (might be faster, but, you never now)


C ya soon


Ramses
Avatar of ramses

ASKER

What's with the "sorry, but gosub is faster than a function call"?

Did your teacher also refered to goto's and gosubs as habbits of lazy programmers?
"and gosubs as habbits of lazy programmers?"

Some ppl may think that :-)
I never use Gosub myself, but it made some difference inside the loop.

Avatar of ramses

ASKER

On my system (without the extra check), it comes to a speed of .3 sec/mb [checked with a 13mb file (uncached)]

I am almost satisfied.

Martijn,

I once read somewhere an article about a certain api call to quickly load large files (and small too, I hope). Maybe that can provide the extra .2 seconds speed that I need.  I can't remember where I read it, though.

About my system: it's a Compaq Deskpro 2000 (P166) with 96Mb Ram (non edo).  I know it's not much, but the minimum system requirements for the program will have to be 80486@100Mhz

I suspect you all know the H*U*G*E* difference between a P166 and a 486

.2secs on a P166 are maybe 3 secs on a 486


Sorry, those standards are not mine, but I have to obbey to them.



Ramses
Aha,

I checked, you're right the file has been cached. 1.92 secs for my 3.5 MB file.

Benchmarked my system: 6MB/s sequential read. (uncached)
It's scaled 5 times faster than 2GB EIDE (486)
pierrecampe may be right right on the hardware point...

But there is a margin:
3.5 MB in 1.92 s
means 1.92 - .3 VB time = 1.6 s disk read time
The sequential read was 3.5MB.
minimum disk read time could be 0.7 s.

I'm interestedin finding a faster read routine myself.
I'll have a look.

Martijn
Avatar of ramses

ASKER

tnx
Alright, here is the changed version.

I changed the file read to read the 3.5 MB file in a single array.
This seams to be faster. Instr still uses the blocks.
CopyMemory copies the data from one array to the other.

I've found no evidence of API calls that could be faster than the binary read.

There are just a few changes:

'*************************************************
Option Explicit

Public Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" ( _
  ByRef Destination As Any, _
  ByRef Source As Any, _
  ByVal numbytes As Long)

Public Function fcnSearch2(strFileName As String) As Long
  Dim lngBlockSize     As Long
  Dim lngFile          As Long
  Dim lngLen           As Long
  Dim lngLOF           As Long
  Dim lngFilePos       As Long
  Dim lngNoBlocks      As Long
  Dim lngExtra         As Long
  Dim lngMain          As Long
  Dim arrBlock()       As Byte
  Dim lngChar          As Long
  Dim lngBlockLen      As Long
  Dim lngCurrentState  As Long
  Dim lngCount         As Long
  Dim arrDelimit(1)    As Byte
  Dim lngFound         As Long
  Dim arrFile()        As Byte
 
  arrDelimit(0) = AscB("\")
  arrDelimit(1) = AscB("\")

  lngBlockSize = 3200&
  '***  instr base compatibility: the ' -1& ' would slow down the parser
  ReDim arrBlock(1& To lngBlockSize)

  '***  load the entire file here
  '***  this could cause memory issues
  lngFile = FreeFile
  Open strFileName For Binary As lngFile
  lngLOF = LOF(lngFile)
  ReDim arrFile(lngLOF)
  Get #lngFile, 1&, arrFile()
  Close lngFile
  '*** end of file read

  lngNoBlocks = lngLOF \ lngBlockSize
  lngExtra = lngLOF Mod lngBlockSize
  lngMain = lngNoBlocks * lngBlockSize

  lngCurrentState = 0&

  For lngFilePos = 1& To lngMain Step lngBlockSize
    ''''''''''Get #lngFile, lngFilePos, arrBlock()
    CopyMemory arrBlock(1), arrFile(lngFilePos), lngFilePos
    '***  sorry but gosub is faster than a function call
    GoSub tagSearch
  Next

  If lngExtra <> 0 Then
    ReDim arrBlock(1& To lngExtra)
    ''''''''''Get #lngFile, lngFilePos, arrBlock()
    CopyMemory arrBlock(1), arrFile(lngFilePos), lngExtra
    '***  sorry but gosub is faster than a function call
    GoSub tagSearch
  End If

  fcnSearch2 = lngFound
  Exit Function

tagSearch:

  lngChar = 1&
  If lngCurrentState = 1& Then
    If arrBlock(1&) = arrDelimit(1&) Then
      '***  found a delimiter - remove the slow print function
      'Debug.Print "found at: " & lngFilePos + lngChar - 1&
      lngFound = lngFound + 1&
      lngCurrentState = 0&
      lngChar = 2&
    End If
  End If

  lngBlockLen = UBound(arrBlock) + 1&
  Do
    '***  searching
    lngChar = InStrB(lngChar, arrBlock, arrDelimit, vbBinaryCompare)
    If lngChar = 0& Then
      lngChar = lngBlockLen
    Else

      lngChar = lngChar + 1&
        '***  found a delimiter - remove the slow print function
        'Debug.Print "found at: " & lngFilePos + lngChar - 1&
        lngFound = lngFound + 1&
    End If
  Loop Until lngChar = lngBlockLen
 
  If arrBlock(lngBlockLen - 1&) = arrDelimit(0) Then
    '***  delimit was cut
    lngCurrentState = 1&
  End If
  Return

End Function
Avatar of ramses

ASKER

Test results:

Uncached: .4sec/mb
Cached  : .2sec/mb

This approach takes .1sec/mb longer than the previous one with uncached files and .1sec/mb shorter than the previous one.

Also, to use this approach we have to add extra code for memory managment and that will also slow down.

About the test results, I use TimePassed or something, but I can't give you the details because it's in a typelib with some benchmarking funcitons that we have to use, to test our processes and subprocesses for speed


I go to bed now, and I'll be back on @ about 10am GMT+1


See you then, and remember:

Sleep tight, don't let the bedbugs bite :-)


Ramses
I think ramses expects too much...
Small improvements are possible:
Use string buffer and Instr (which is very fast, BTW), and read file in 2 KB (2048) chunks.
To measure time use GetTickCount, and for filehandle use Integer.
Quote: "Use string buffer and Instr (which is very fast, BTW), and read file in 2 KB (2048) chunks."

That's exactly what the code does.

Instr can be beaten by plain VB code:
a for - next with an inline cascading if - then.
If a string is over 30kb instr crowls.

BTW:
there is a very dangerous bug in the second codechunk:

For lngFilePos = 1& To lngMain Step lngBlockSize
   ''''''''''Get #lngFile, lngFilePos, arrBlock()
'''BAD:: CopyMemory arrBlock(1), arrFile(lngFilePos), lngFilePos
   CopyMemory arrBlock(1), arrFile(lngFilePos), lngBlockSize
   '***  sorry but gosub is faster than a function call
   GoSub tagSearch
Next
>That's exactly what the code does
I don't see Instr and string, I see InstrB and byte array.

I cannot show the code, I didn't save it - I renamed my function and tested second sample :-)
>   Get #lngFile, lngFilePos, arrBlock()
this is a little bit faster:
     Get #lngFile, , arrBlock()
Ok - instrB is not exactly Instr.
Instr and string is not an option here due to unicode conversion. (the data is binary)

Do you claim that Instr is faster than InstrB?
Get #lngFile, , arrBlock()
Agreed :-)
>Do you claim that Instr is faster than InstrB?

Yes. But they are for different data types.
Byte array will load faster than string, but searching is slower.
Of course, I'm talking about case sensitive Instr, the vbTextCompare verasion is ~10 times slower than default vbBinaryCompare.
Well this beats me:
Instr is 764 times faster than InstrB.

Really ridiculous. Do you have an explanation for this?




Avatar of ramses

ASKER

Guys

It's true that -I- might be excpecting too much, but I got certain guidelines from the principal.  He stated that the subprocess that reads a certain record mustn't take longer then 0.5sec/mb on a P166.

That means:

-skipping to record in file
-reading record
-parsing the data and filling a listview with it.

As you can see, we can tweak the algorithms for reading the record and parsing the data, but there is always the matter of the listview which is quite slow to add data, even when used with the LockWindowUpdate api (without that it's SOOOOOOO slow).

If it would be "impossable" to speed up the record search routine, maybe you could suggest a faster way to load the listview.  Maybe there is some API call to load a large set of data in one step.


Ramses
Avatar of ramses

ASKER

From my own research, I found something to load -what MS called back then "Huge" files (when refering to bitmaps)- on MSKB, but it was for Win3.1 and since they are mostly 16bit api's I don't think they might provide increase in speed.

When you got a sec, take a look at this page:

http://support.microsoft.com/support/kb/articles/Q100/5/13.asp?LN=NL&SD=gn&FR=0

Ramses
MartijnB
>Instr is 764 times faster than InstrB.

I don't understand, are you making fun from my statements, and you don't believe Instr is faster?


ramses
>maybe you could suggest a faster way to load the listview

I use listview as my main list control, and I can fill it very fast.  If you give your example, I can suggest changes.

>take a look at this page:

There are 32-bit versions of that APIs, but I don't expect loading the whole file can be faster than what MartijnB suggested - loading small chunks.
Ameba,

I tested Instr and InstrB on a 400k string/bytearray.
Finding all 2000 substrings.
Instr was 764 times faster.

On smaller strings the difference is not this extreme.
But it puzzles me.

Martijn
Avatar of ramses

ASKER

Does the seek function take much time?

I mean, it might be interesting in large files, to split them in four when searching for a certain record seperator

Pseudo

Read file,start+pos
Read file,center-pos
Read file,center+pos
Read file,end-pos

Avatar of ramses

ASKER

>I use listview as my main list control, and I can fill it very fast.  If you give your example, I can
suggest changes

I don't mean the algorithm to add the items, after all, we all know how to add items to a listview

Dim lItem as ListItem

Set lItem=ListView.ListItems.Add(,"item description",I_large,I_small)
lItem.SubItems(1)="subitem text 1"
lItem.SubItems(2)="Subitem text 2"
...


But I mean, suppos I'd make a UDT array like this:

UDT llItemType
  Text As String
  Icon_L As String
  Icon_S As String
  SubItems() As String
END UDT

Then, pass the whole UDT to some api which adds them all at once.

I know this is a bit too optimistic to expect, but are you guys aware of anything like it?  I mean, some programs I've seen start in 2secs and have a listview with 200.000 items, while, when I add them, it takes about half a minute.


Ramses
Not too much.

The first code I send splits the file in blocks
that are optimized for InstrB. This also has the advantage
that at any point of time, just 2000 bytes of the file are
in memory.

The second one reads the whole file. This is faster - if you remove the bug - on systems with enough memory.

I see the following optimizations:
Read in blocks of 64kb and search in blocks of 2kb.
Somehow convert the data to a string to search.

Martijn


MartijnB
>I tested Instr and InstrB on a 400k string/bytearray.

Ah, so... OK.  Instr is faster because MS team optimized it.
Theoretically, InstrB can be optimized to be 2 times faster, but no-one took the time to do that.


ramses,
>after all, we all know how to add items to a listview
>
>Set lItem=ListView.ListItems.Add(,"item description",I_large,I_small)

Well, that line can work once, since "item description" cannot be a Key for second item.
ListView.ListItems is also very slow - you are asking for control property "ListItems" when adding each item.
And do you really need Listitem object "lItem"?

>some api which adds them all at once
No, the nearest possible is to set initial size of the listview:
     SendMessage LV1.hwnd, LVM_SETITEMCOUNT, 10000, 0&

>listview with 200.000 items
That is virtual listview, but it has slow scrolling - each item is redrawn "on request"
LVM_SETITEMCOUNT makes filling ListView only 3-4% faster.
on the slowness or fastness of finding a record
the apparent slowness is largely due to the fact that you read in large chunks of data while you apparently need to read in just 1 record
also the speed of what you are doing largely depends on the hardware and even on the same hardware the speed will largely differ due to the disk reads, the speed of wich depends on the lenght of a disk cluster,the fragmentation,the interleave factor,the place of the file on disk etc...
i think you will have to rethink your problem
maybe sacrifice disk room for speed,random access will increase the record finding speed many times,if not possible then an index may increase speed many times also
maybe by just rethinking the problem you can limit the number of disk accesses/number of bytes read
i suppose a record has some unique identifier
now if there is a maximum lenght a record can have then just multiply that maximum lenght by 3 or 4 just to be sure and use random access even if that increases the filesize from say 5 MB to 50 MB,diskspace is cheap
then you will just need 1 read to get a record
if you want to keep the file as it is make a random access index on it that will limit the diskaccess to 2 reads
maybe exactly this is not possible may there probably is a way to limit the disk reads/bytes read

also:
>>I'm not going to give all the details about the progr or<<
>>the fileformat, as it is confidential, but<<
>>you'll see that that isn't necessary.<<
well i think the details about the prog are not needed
but if you dont give the details about the fileformat then nobody will be able to help you

I agree with you Pierre,

It may also be wise to create an index file.

The reason that I am interested in this, is that I am writing a program that reads/parses large XML files.
Avatar of ramses

ASKER

Thank you for your comments and suggestions so far.

Ameba
>Well, that line can work once, since "item description" cannot be a Key for second item

>>so I forget to type an extra comma before "item description".  I'm not that kind of novice programmer, I just happened to forget to type it in here, you know (let's please not be picky)

OK guys, I guess you're right about the fact that it is actually guessing if you don't know what you're dealing with, so I'll disclose some info on the file format.

The thing I call a Record has a variable length however

indexfile specification
-----------------------

Byte number

         1         2         3         4         5         6        7         8
1234567890123456789012345678901324567890123456789012345678901234567901234567890
                                                         
CBNNNFF\folder10\folder20\folder300~?dnfile10?SSSDDDDDddddd0


Description

CB           -> identifier string, always CB
NNN            -> 3 chars field, number of files in this file
             (Max number of files: 16.777.215)
FF            -> 2 chars field, number of folders in this file
             (Max number of folders: 65.535)
\folder1     -> name and level of folder1, followed by #0

#0 end of folders

When the file is read, each folder gets a 4 byte designation number, first folder is
0000, 2nd folder 0001, 3rd folder 0002, etc...
Maximum number of folders: FFFF=65.535

~?          -> start of folder
dn            -> 4byte designation number of folder
             (ie files in here belong to folder dndn)
file1          -> name and extension of file 1, followed by #0
a          -> attributes of file (1/2 byte)
             No attribs=0, Archive=1, Hidden=2, Read-Only=4, System=8
             Combinations are possible to OR values
e          -> unit of file size (1/2 byte)
             Can be any of the following:
             0=bytes, 1=Kbytes, 2=Mbytes, 3=Gigabytes, 4=Terrabytes
SSS             -> 3 char field, size of file (when expanded resize to 6 chars by
             adding 0's at the start, then take first three as start, last three
             as decimal, then multiply with power e to get filesize
dddd          -> 5 byte field, creation date & time
DDDD          -> 5 byte field, last modified date & time
0          -> terminator, end of file list in current folder

Avatar of ramses

ASKER

replace the ? in the master string with ae (as one char)
Avatar of ramses

ASKER

also note that the alignment of this specification has been completaly ruined because of EE's lack of html support.  For optimal view, please copy the entire specification to an app that is capable of rendering it in a fixed width font (like Courier New)


Ramses
ramses
>so I forget to type an extra comma before "item description".  I'm not that kind of novice programmer,
>I just happened to forget to type it in here, you know (let's please not be picky)

Sorry if it looked 'picky', but "one comma missing" was not obvious to me - using or not using Key argument makes big difference in performance.
To my knowledge, many programmers use Key to identify row (they put record ID into listitem Key)
Avatar of ramses

ASKER

In this case, Key is not needed nor used, as all the data about a particular item is static and already provided by it's subitems when the list is loaded.  Should the user want some details about an item when in Icon view, I just extracted it from the subitems collection of the selected item.


Ramses

No need to appology, no harm done
OK. ... Trying to understand that secret formatting of the secret file-system you are creating  :-)

Maybe shorter way to save folders
- if you have 2 folders, each with 2 files

c:\temp\bin\
c:\temp\help\

You can save it like this:
Folders part: ID,parentID,folderName
Files part: folderID, filenames
-------------------------------
1,0,temp
2,1,bin
3,1,help
2
file
another file
3
file
another file
-------------------------------

and - avoid 'full path' ??

\temp10\bin20\
\temp10\help20\
Avatar of ramses

ASKER

I'll explain it to you as it is maybe not that obvious.

The fist two bytes of the file are always CB it's a identification string (like MZ for exes), following that, you have a three char field that specifies how many files are in this file (just for statistics), then comes a 2 char field that holds the number of folders in this file, followed by the actual folder names, each separated by a #0 char

When the file is processed each folder gets a 2 char designation number (in mem). ie the first foldername in the file will be designation #0#1, the second #0#2 and so on.

After the list of folder names, another #0 follows, then comes the IsFolder Constant (~?) followed by it's designation number.  This means that all entrys following this belong to folder with designation xx.

Now come the files

after the filename there is a #0 character, followed by 14 char fileinfo field.  If after that a #0 character is found, it means end of file list for this folder, otherwise the next file comes.


If this is the file:
c:\temp\help\readme.txt
what is the folder? Is it one folder or two?
What is the UI?  Treeview on the left to select folder, and Listview with filenames on the right?
yes are those 'folders' hierarchical ?
can a folder have sub-folders ?
if so can it go to any deepnes
if so can every folder have files
is it a kind of file system ?

 
Avatar of ramses

ASKER

Yes pierrecampe, it's an entire tree.  At the start of the file, all folders al listed in order they are collected with the FindFirst, FindNext api calls.  Their position in the file also make up their designation number.  Subfolders are treaded exactly as folders.  And every folder can have files, but doesn't have to have.  If there are no files in a certain folder, then after the folder designation number the text EMPTY is stored in the file.


Yes ameba, kinda like that


Ramses

You didn't answer all the questions, but OK.

You have two parts:
1. Folders info
2. Files in each folder (for each folder you have one "block")

Add indexing (as already suggested by pierrecampe):
- calculate size of each 'file block'

block 1, contains file info for all files in folder 1
size 30500
block 2, contains file info for all files in folder 2
size 20100

During the file creation, add that info near each folder.


In your program - reader of that file:

When you load your form, open file (and keep it open until end of program).
At start, read only "Folders Info", and fill treeview.

When user clicks on folder 3, calculate position:
   position = size of "Folders info" + 30500 (size of first block) + 20100 (size of second block)

and use Seek to goto that position (Seek doesn't read anything from file, it only moves 'file pointer').

No need to search/reread the whole file whenever user selects some folder.
Indexing info can be also in separate file, or you can create it whenever you open the file.

"Keep the file open" - this is not very nice, but that is what any database does.
Avatar of ramses

ASKER

wooow, wait a sec, there is no touching on the file format. It's not mine to change, I have to obey the specs you know.

Second, as already mentioned, the directory listing records are variable length.


Third, keeping the file open all the time is quite impossable because there are (or will be) hundereds of them used in one session.
ramses,,
>Second, as already mentioned, the directory listing records are variable length.

Of course, if they were fixed, there won't be a need to calculate or save block lenght. It would be trivial:  recNo * size.

>keeping the file open all the time is quite impossable

That is good to know - Then you'll have to reopen file and use Seek whenever some folder is selected.
Avatar of ramses

ASKER

First, I'd like to explain what I think about EE and asking questions.

I'm not looking in EE for a Question-Answer-Period approach, I want it to be a discussion with colleques about an issue I'm facing.  If, during the debate I should come up with the right solution, we'll work something out to make sure the right persons (not me) get the credits for their efforts.  Splitting points, when necessary is also an option.

Havving this said, I'd like to throw something in the debate.

Let's say I like the idea about keeping the file open, not for the entire session, but for as long as the user is working with it (ie the form which shows/alters/processes the data is shown).

Secondly, how about indexing (in memory) filepositions?

I'll explain.

Before the form's shown/updated, I'll enumerate all directorys, that's easy because their always at the beginning of the file.  First I read two chars at offset 6 which tells me how much folders to expect, then, starting at offset 8 the folders are listed.

When I reach the end of the folders, I know that following that, the first folders listing is in the file, so I could set the treeview's node tag for that folder to the current position.

At startup, the Root item (folder 1) gets programatically selected to update the listview.  Since we already have that starting position, we don't need to search for it.  At the end of that folder listing, we know it's the start of the second folder, so we can already set the node with index(2)'s tag to that seek position.

If the user selects a treeview node that is a child of a node that has a tag value, and hasn't got a tag value of it's own (yet) we know it's "near" to it's position too, or at the least, we already have an offset to start.

Please tell me, if you know, what is faster

offset=TreeView1.Nodes(index).Tag

or

offset=OffsetArray(index)

Right, that would mean an additional array that also will have to be exposed to the Module level procedures, and since -following the specs- that would mean passing them as a parameter to all procedures, and not declaring them globaly


What I would like to know is, the overhead that this approach will cause, will it really make a difference (more lines of code=slower execution)


Please feel free to suggest anything you think might help, as stated above in this comment, I don't look for a Question-Answer-Period type of thing.


Ramses
Before opening the file and reusing the previously done indexing, you should check if file has changed - use FileDateTime(), and if it has changed, do the indexing again (into memory or into some external .ndx file)
Avatar of ramses

ASKER

It's not likely to change.  Only thing that might happen to the file is that it's upgraded to a new version, and in that case (WHEN that would happen) I'll make sure to feed my comments to the people who come up with the format.

Even if the files are re-created, the won't be so much as a single byte of difference. (unless when upgraded)


Ramses
We posted in the same minute..

>Please tell me, if you know, what is faster

.Tag is very fast, but I would use array and save that useful Tag thingie.
Avatar of ramses

ASKER

Guys (dunno if they're any girls here, if so SPEAK UP please), I'm off to bed now.  Be back @ 10am GMT+1


Have a good night.


Ramses
>>there is no touching on the file format<<
ok so lets see if i know what the file writing program does
it finds the first dir in the root writes it to the file and thats folder1
then it finds the the first dir in that dir writes it to the file and thats folder2...etc
after it finds no more dirs it starts over with folder1 and writes each of its files to the writtenfile then folder2...etc
if thats whats happens you can make an index file like this:
*dir1,12345,12378,12401,etc
*dir2,24576,24591,etc
etc...
the numbers are the positions of each file in the writtenfile
this indexfile can be read very fast,it can even be kept in memory in an array of variants
and for each dir you know where its files start in the writtenfile and as ameba said you can seek immediately to any file,and if the index is kept in memory you'l need only 1 disk read to access any file
and if you want you can even read in all the files in a dir in 1 disk read


hey guys seems everybody is writing at the same time
Sleep ??? - use Sleep API (Apis are fast) and you'll need 2-3 hours less. ;-)
Avatar of ramses

ASKER

Before I go to bed, you're almost right pierrecampe.

You're right about the first part, (folders), but then when it comes to folder listing, there is no value inside the file that tells me how much files there are in a certain folder (only how much file in total).  

Each filelisting for a particular folder begins with a 2char constant followed by a 2char folder designation number, followed by file1, followed by a nullchar, followed by a 14char field which holds information like attributes of file, FileSize Unit (ie Kb, Mb,...), Filesize, Creation Date, Modified Date, followed by file 2,... at the end of the filelisting for each folder follows a nullchar.  Filenames are not in 8.3 format but long filenames, and there is no option to padd each filename to 255 chars as this would make the file unnecessary large, and besides, I can't do that because I must obey the specs.


Ramses
Avatar of ramses

ASKER

Good one ameba, please give me the declares so I can implement it in my system (body).  Be sure to be kind enough to include the necessary hardware to program the routines in my brains. :)))


Ramses


C ya all tomorrow (actually, here it's already 1.13am 11/19/01)
:-)
ramses i like your idea
>>and since -following the specs- that would mean passing them as a parameter to all procedures, and not
declaring them globaly
come on if the specs ask for speed then tell the specs-makers that means globally
if the specs say -programmer has to kill himself after coding because this thing is so secret- what would you do ? :-)
so the specs say no globals but gosub is ok ?
I see so you have no say about the program that writes that file right, what i meant was that index has to be written by the program that writes that file
and you may not add that index yourself right ?
you have my sympaty
Avatar of ramses

ASKER

Actually, the only Goto we're allowed to use is On Error Goto 0

Even Error Handling must be done at the place where the exception can be raised

ie not

Dim a As Long, b As Long

On Error Goto ErrHandler

a=100
b=0

Msgbox Cstr(a/b)

Exit Sub

ErrHandler:
Msgbox "An error occured"


But

Dim a as Long, b As Long

a=100
b=0

On Error Resume Next
Msgbox Cstr(a/b)
If Err<>0 Then
  Err.Source=CurrentProcess
  IF Not ErrorHandler(Err) Then Exit Sub
End If
On Error Goto 0

Active error trapping is allowed only at a place where an inevatable exception may occur.  So in the example I just gave, I'm not allowed to use an errorhandler since all exceptions that can be raised by performing mathematical calculations, can be avoided by evaluating the input variables (ie, check if divider is not zerro, etc)
Avatar of Asta Cu
Please update the expert here who have so willingly stepped in to help you, since much time has passed since your last comments, and Email notifications may not have been generated to the participating experts here due to some problems at that time.  If you've been helped, accept the respective question by that expert to grade and close it.

Somewhat off-topic, but important.

****************************** ALERT********************************
WindowsUpdate - Critical Update alert March 28, 2002 from Microsoft
http://www.microsoft.com/technet/treeview/default.asp?url=/technet/security/bulletin/ms02-015.asp
Synopsis:
Microsoft Security Bulletin MS02-015  
28 March 2002 Cumulative Patch for Internet Explorer
Originally posted: March 28, 2002
Summary
Who should read this bulletin: Customers using Microsoft® Internet Explorer
Impact of vulnerability: Two vulnerabilities, the most serious of which would allow script to run in the Local Computer Zone.
Maximum Severity Rating: Critical
Recommendation: Consumers using the affected version of IE should install the patch immediately.
Affected Software:
Microsoft Internet Explorer 5.01
Microsoft Internet Explorer 5.5
Microsoft Internet Explorer 6.0

Thought you'd appreciate knowing this.
":0)
Asta
Avatar of ramses

ASKER

ok thanks for the help guys.  I was unable to login for few months because of lost password.  Anyway... since this is a question from so long time ago, I don't know who will be the right person to reward the points.

MartijnB
Ameba
Piercecamp

or if you guys want to accept a 3-way split, just tell me


Ramses
Thanks, Ramses, sorry you had login problems, and happy to see that is now fine.

I have processed the point splits for you, as requested.

Points for ameba -> https://www.experts-exchange.com/jsp/qShow.jsp?qid=20314982
Points for pierrecampe -> https://www.experts-exchange.com/jsp/qShow.jsp?qid=20314983

Moondancer - EE Moderator