Solved

Code Optimization

Posted on 2008-10-13
26
397 Views
Last Modified: 2010-04-05
I was wondering if any of the other experts can use there eagle eye's to spot a speed flaw or a way to increase the speed. The usual way of increasing the block size(which is limited to 2048 bytes) can't be changed.

I was thinking of:

A) Threading. Compress two blocks at the same time.
B) Cache data. Write to drive every 4 blocks.

The threading option is probably a must. But how exactly would I pull it off? Would I fire up two or more threads dealing with seperate blocks, wait for their return, combine the blocks and write it. Or would I let the threads write to the file, how would I maintain order?

I tried method B but it didn't give me a speed increase, what are the effecient ways to Move memory without having a large overhead?

Cheers- ThievingSix
function CompressCISO(FileIn, FileOut: TFileStream; Level: Integer = 1): Boolean;

var

  CISO : TCISOHeader;

  FileSize : Int64;

  WritePos : Int64;

  IndexSize : Integer;

  Block : Integer;

  Buffer : Array[0..63] Of Char;

  CmpSize : Integer;

  Status : Integer;

  CISOTotalBlocks : Integer;

  IndexBuffer : Array Of Cardinal;

  BlockBuffer1 : PChar;

  BlockBuffer2 : PChar;

  ZStream : TZStreamRec;

  Align,

  AlignB,

  AlignM : Integer;

begin

  Result := False;

  CISO := InitializeHeader(FileIn,CISOTotalBlocks);

  FileSize := CISO.TotalBytes;

  If FileSize < 0 Then

    begin

    Raise Exception.CreateFmt('%d',[INVALID_FILE]);

    Exit;

  end;

  IndexSize := (CISOTotalBlocks + 1) * 4;

  SetLength(IndexBuffer,IndexSize);

  BlockBuffer1 := AllocMem(CISO.BlockSize);

  BlockBuffer2 := AllocMem(CISO.BlockSize * 2);

  If (IndexBuffer = nil) Or (BlockBuffer1 = nil) Or (BlockBuffer2 = nil) Then

    begin

    Raise Exception.CreateFmt('%d',[NOT_ENOUGH_MEMORY]);

    Exit;

  end;

  FillChar(Buffer,64,0);

  ZStream.zalloc := nil;

  ZStream.zfree := nil;

  ZStream.opaque := nil;

  FileOut.Write(CISO,SizeOf(CISO));

  FileOut.Write(IndexBuffer[0],IndexSize);

  WritePos := SizeOf(CISO) + IndexSize;

  AlignB := 1 SHL CISO.Align;

  AlignM := AlignB - 1;

  For Block := 0 To CISOTotalBlocks - 1 Do

    begin

    If DeflateInit2(ZStream,Level,Z_DEFLATED,-15,8,Z_DEFAULT_STRATEGY) <> Z_OK Then

      begin

      Raise Exception.CreateFmt('%d:%s',[DEFLATEINIT_ERROR,ZStream.msg]);

      Exit;

    end;

    Align := WritePos And AlignM;

    If Align <> 0 Then

      begin

      Align := AlignB - Align;

      If FileOut.Write(Buffer[0],Align) <> Align Then

        begin

        Raise Exception.CreateFmt('%d',[INVALID_FILE]);

        Exit;

      end;

      Inc(WritePos,Align);

    end;

    IndexBuffer[Block] := WritePos SHR CISO.Align;

    ZStream.next_out := BlockBuffer2;

    ZStream.avail_out := CISO.BlockSize * 2;

    ZStream.next_in := BlockBuffer1;

    ZStream.avail_in := FileIn.Read(BlockBuffer1[0],CISO.BlockSize);

    If ZStream.avail_in <> CISO.BlockSize Then

      begin

      Raise Exception.CreateFmt('%d:%d',[INVALID_FILE,Block]);

      Exit;

    end;

    Status := Deflate(ZStream,Z_FINISH);

    If Status <> Z_STREAM_END Then

      begin

      Raise Exception.CreateFmt('%d:%d:%s',[DEFLATE_ERROR,Block,ZStream.msg]);

      Exit;

    end;

    CmpSize := (CISO.BlockSize * 2) - ZStream.avail_out;

    If CmpSize >= CISO.BlockSize Then

      begin

      CmpSize := CISO.BlockSize;

      CopyMemory(BlockBuffer2,BlockBuffer1,CmpSize);

      IndexBuffer[Block] := IndexBuffer[Block] Or $80000000;

    end;

    If FileOut.Write(BlockBuffer2[0],CmpSize) <> CmpSize Then

      begin

      Raise Exception.CreateFmt('%d:%d',[INVALID_FILE,Block]);

      Exit;

    end;

    Inc(WritePos,CmpSize);

    If DeflateEnd(ZStream) <> Z_OK Then

      begin

      Raise Exception.CreateFmt('%d:%s',[DEFLATE_END_ERROR,ZStream.msg]);

      Exit;

    end;

  end;

  IndexBuffer[Block] := WritePos SHR CISO.Align;

  FileOut.Position := SizeOf(CISO);

  FileOut.Write(IndexBuffer[0],IndexSize);

  Result := True;

end;

Open in new window

0
Comment
Question by:ThievingSix
  • 11
  • 10
  • 2
  • +3
26 Comments
 
LVL 17

Expert Comment

by:Shanmuga Sundaram
ID: 22701258
I am not a master in Delphi. This is just a thought. Better try threading. This will sure increase the performance. You can create as many threads as you need. These sites will help you on threading

http://delphi.about.com/od/kbthread/Threading_in_Delphi.htm
http://dn.codegear.com/article/22411
http://www.eonclash.com/Tutorials/Multithreading/MartinHarvey1.1/Ch2.html
http://www.delphifaq.com/faq/delphi/vcl/f269.shtml
0
 
LVL 28

Accepted Solution

by:
ciuly earned 400 total points
ID: 22701427
the definite speed increase is using a file buffer. create a new class, internally using a file, which buffers reads and writes with at least 64kb blocks. that will speed up file access by a factor of 30 (which is not the total speed, just the file access speed)

the threading model only provides speed gain in case of parallel solutions. doing the same stuff in the same stuff, or different stuff in the same time. the gain is because while some threads are bloking some resources, other threads continue to work on tasks that do not require the blocked resources.

your solution is liniar, so threads are out of the question from optimizaiton point of view UNLESS there are more than 1 operations that are speed consumers and are independed (pretty important detail)

for example. your solutiuon would be;

for each block do
  independent operation 1
  independent operation 2
  independent operation 3
  independent operation 4
end for each
all 4 operations are independent and take a lot of time to process and use different resources. you fire up 4 threads doing the operations and "waitfor" all of them.

from a first glance at your code, you don't have anything there that can vbe done in parallel except file reading and compressing, with the note that compressing is done on the previous file block read, otherwise the 2 operations are dependent.

so, using this transformation model, you can create as many threads as semi-independent operations you have, with the restriciton that some operations will depend on the previous result of another operation.

so you clould do threading, the only question is, is it feasable? in order to answer this, you will have to measure the time of each thread-able operation.

with this "delayed" model in mind, I think you might be able to pull off some speed increase, provided that there are at least 10 blocks of data to process (10 is relative, could be 5, could be 15, but I think around 10 is a optimal minimum)
one other thing you will need to keep in mind with this scenario is that creating and destroying threads consumes times, so you will have to create and destroy them once. and if you call the method many many times, then you will have to consider using a thread pool, otherwise whatever you gain with threads, will loose with bad thread management.
and also keep in mind that thread sycnhronizationalso costs time and that is why I kep saying that the operations must be independent, so that there will be as few locks as possible.
0
 
LVL 21

Expert Comment

by:developmentguru
ID: 22704454
To optimize the disk I/O I would read the file into memory, compress it in memory, then write the results out to a file.  Of course, this will only work on files that could fit in memory.  For larger files it could be broken into one megabyte chunks and done the same way (one Meg at a time).  The more of it you can do in memory, the more can be read or written at one time, the faster it will be.
0
 
LVL 28

Expert Comment

by:ciuly
ID: 22705132
that is called a file buffer :P
0
 
LVL 13

Author Comment

by:ThievingSix
ID: 22707419
Something like this?


constructor TBufferStream.Create(InputStream: TFileStream);

begin

  fFileStream := InputStream;

  fMemoryStream := TMemoryStream.Create;

  fMemoryStream.SetSize(64 * 1024);

  fBufferSize := 0;

end;
 

destructor TBufferStream.Destroy;

begin

  inherited Destroy;

  FreeAndNil(fMemoryStream);

end;
 

procedure TBufferStream.DumpBuffer;

begin

  fMemoryStream.Position := 0;

  fFileStream.CopyFrom(fMemoryStream,fBufferSize);

end;
 

function TBufferStream.WriteBuffer(const Buffer; Count: Int64): Int64;

begin

  If Count + fMemoryStream.Position > fMemoryStream.Size Then

    begin

    DumpBuffer;

    fBufferSize := 0;

    fMemoryStream.Position := 0;

  end;

  Result := 0;

  Result := fMemoryStream.Write(Buffer,Count);

  Inc(fBufferSize,Result);

end;

Open in new window

0
 
LVL 28

Expert Comment

by:ciuly
ID: 22709504
something like that, but move
    fBufferSize := 0;
    fMemoryStream.Position := 0;
in dumpbuffer and call dumpbuffer in destroy and in dumpbuffer check if buffersize>0 before operating. and don't forget to buffer the read from file as well if that is not done in big chunks either.
0
 
LVL 13

Author Comment

by:ThievingSix
ID: 22709582
Completely forgot about the read! Will do.
0
 
LVL 28

Expert Comment

by:ciuly
ID: 22709630
one minor detail: you need to support writing (reading) big chunks (it won't happen in your particular case, but you might want to use this code for other projects later on and it might happen then ;)
sure, the code as it is will work just fine with the side effect that your memory buffer will increase and thus the behaviour will change. it's what I call a logical error.

  If Count + fMemoryStream.Position > fMemoryStream.Size Then
   begin
    DumpBuffer;
    fBufferSize := 0;
    fMemoryStream.Position := 0;
    If Count + fMemoryStream.Position > fMemoryStream.Size Then
    begin
      Result := fFileStream.Write(Buffer,Count);
      exit;
    end;
  end;
0
 
LVL 13

Author Comment

by:ThievingSix
ID: 22709680
Took me about 5 re reads of that code before I realized why you did what you did. =P But I see why now.
0
 
LVL 28

Expert Comment

by:ciuly
ID: 22709741
:)

of course
    fBufferSize := 0;
    fMemoryStream.Position := 0;
shouldn't be there. copy-paste left-overs :P
0
 
LVL 13

Author Comment

by:ThievingSix
ID: 22715424
Hmm, how would I take advantage of dual cores and hyper threading processors(such as mine)? Wouldn't that mean I would have to thread anyway? I never hit more than 50% on this app meaning I never hit my second virtual core.
0
 
LVL 28

Expert Comment

by:ciuly
ID: 22715992
I never touched that subject so I don't know. but I can google :P
if you don't want to waste time, you can make a separate small test app with 2 threads and let them rip doing nothing. keep in mind that your system might freeze :)
next will be to set the cpu affinity. a few articles I found:
http://www.delphi3000.com/articles/article_2605.asp?SK=thread
http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_23039948.html
http://dn.codegear.com/article/27267

but this will only make sense if you have CPU intesive operatiosn. keep in mind that even if you write 100 threads, there is onl;y one data BUS leading to only one harddrive. so threads is not an anwser for boosting single IO operation speed, but it is an answer, as already explained, when you can break that up into more IO operations and other operations which do not lock eachother out.
0
 
LVL 45

Assisted Solution

by:aikimark
aikimark earned 50 total points
ID: 22717935
@ThievingSix

1. Have you done any performance profiling of your code?
2. Is your code I/O bound or CPU bound?

benchmark reference:
http://www.softpedia.com/get/System/Hard-Disk-Utils/AssignFile-TFileStream-Speed-Test.shtml

3. What do DeflateInit2(), Deflate(), and DeflateEnd() functions do?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 13

Author Comment

by:ThievingSix
ID: 22717964
This is a file compression routine. DeflateInit2(), Deflate(), and DeflateEnd() are part of the zlib compression/decompression library. No, I haven't done any performance profiling. Can you recommend one for delphi?
0
 
LVL 13

Author Comment

by:ThievingSix
ID: 22718206
Here is what I have for the reading side of it. Looks like I screwed up somewhere because I either get a stream read error, or a longer completion time(which isn't supposed to happen =P). Been fiddling with it for a while and I will continue to do so unless someone here spots it first.
constructor TReadBufferStream.Create(InputStream: TFileStream);

begin

  fFileStream := InputStream;

  fMemoryStream := TMemoryStream.Create;

  fMemoryStream.SetSize(64 * 1024);

  fBufferSize := 0;

end;
 

destructor TReadBufferStream.Destroy;

begin

  FreeAndNil(fMemoryStream);

  inherited Destroy;

end;
 

procedure TReadBufferStream.FillBuffer;

begin

  If fBufferSize >= fMemoryStream.Size Then Exit;

  fMemoryStream.Position := fBufferSize;

  fBufferSize := fBufferSize + fMemoryStream.CopyFrom(fFileStream,fMemoryStream.Size - fBufferSize);

  fMemoryStream.Position := fMemoryStream.Position - fBufferSize;

end;
 

function TReadBufferStream.ReadBuffer(var Buffer; Count: Int64): Int64;

var

  BytesRead : Int64;

  TotalBytesRead : Int64;

  OldPosition : Int64;

  BigBlock : PChar;

begin

  If Count > fMemoryStream.Size Then

    begin

    OldPosition := Count;

    TotalBytesRead := 0;

    BytesRead := fMemoryStream.Read(Buffer,fBufferSize);

    Inc(TotalBytesRead,BytesRead);

    Dec(fBufferSize,BytesRead);

    Dec(OldPosition,TotalBytesRead);

    BigBlock := AllocMem(OldPosition);

    Result := fFileStream.Read(BigBlock[0],OldPosition);

    PChar(Buffer) := TotalBytesRead + PChar(Buffer);

    MoveMemory(PChar(Buffer) + TotalBytesRead,BigBlock,OldPosition);

    Result := Result + TotalBytesRead;

    fBufferSize := 0;

    FillBuffer;

    Exit;

  end;

  If fBufferSize - Count < 0 Then

    begin

    FillBuffer;

  end;

  Result := fMemoryStream.Read(Buffer,Count);

  Dec(fBufferSize,Result);

end;

Open in new window

0
 
LVL 28

Expert Comment

by:ciuly
ID: 22719313
I don't understand why you are allocating memory. your algorithm is way off. no memory allocation is needed (othjer then what the buffer uses, but that is done by the memorystream).
keep in mind that reads and writes can happen alternativly in the buffer.

I wanted to write the read function myself but the whole thing needs to be changed. writing the way it is done will not work correctly. you need to keep a virtual position and size which corelates with the actual file position and size if you were not using the bufer.
why?
because the buffer is fixed size 64 kb. when writing, this either overwrites a part of the file or extends it. when reading from the end of the file but still the middle of the buffer you must return 0 becuse even though the buffer has space, the file with the actual data from buffer does not.

I'm sure you can write everything from scratch yourself, but if you need help let me know. If you need me to write it for you I think I can squeeze it in my shedule sometime tonight (that's about 5-6 hours from now).
0
 
LVL 45

Expert Comment

by:aikimark
ID: 22720441
What version of zlib are you using?  There have been performance boosts in some of the zlib wrappers.
Examples:
http://www.programmersheaven.com/download/24171/download.aspx
http://www.dellapasqua.com/delphizlib/

Once you allocate the correct output size, you should use stream-to-stream operations, bypassing intermediate variables.
Example:
http://coding.derkeiler.com/Archive/Delphi/borland.public.delphi.thirdpartytools.general/2004-01/0552.html
0
 
LVL 13

Author Comment

by:ThievingSix
ID: 22720701
I'm using an updated version of the second zlib you posted, so I'm not going to get much better there. I'll try the stream to stream option after I fix my messy TReadBufferStream =P
0
 
LVL 13

Author Comment

by:ThievingSix
ID: 22737610
My second attempt. Doesn't slow it down this time, doesn't speed it up either. Looks like there is only so much speed to gain.
constructor TReadBufferStream.Create(InputStream: TFileStream);

begin

  fFileStream := InputStream;

  fMemoryStream := TMemoryStream.Create;

  fMemoryStream.SetSize(64 * 1024);

  fBufferSize := 0;

  FillBuffer;

end;
 

destructor TReadBufferStream.Destroy;

begin

  FreeAndNil(fMemoryStream);

  inherited Destroy;

end;
 

procedure TReadBufferStream.FillBuffer;

var

  BytesToRead : Int64;

  BytesRead : Int64;

  Buffer : PChar;

begin

  If fBufferSize < fMemoryStream.Size Then

    begin

    BytesToRead := fMemoryStream.Size - fBufferSize;

    fMemoryStream.Position := fBufferSize;

    GetMem(Buffer,BytesToRead);

    Try

      BytesRead := fFileStream.Read(Buffer^,BytesToRead);

      fMemoryStream.Write(Buffer^,BytesRead);

      Inc(fBufferSize,BytesRead);

      fMemoryStream.Position := 0;

    Finally

      FreeMem(Buffer,BytesToRead);

    end;

  end;

end;
 

function TReadBufferStream.ReadBuffer(var Buffer; Count: Int64): Int64;

var

  BytesToRead : Int64;

begin

  BytesToRead := Count;

  If BytesToRead > fBufferSize Then

    begin

    FillBuffer;

    If BytesToRead > fBufferSize Then

      begin

      BytesToRead := fBufferSize;

    end;

  end;

  Result := fMemoryStream.Read(Buffer,BytesToRead);

  Dec(fBufferSize,Result);

end;

Open in new window

0
 
LVL 13

Author Comment

by:ThievingSix
ID: 22737686
Changed the buffer sizes to 128kb's which offered a bit of a speed increase. Anything more didn't help much.
0
 
LVL 28

Expert Comment

by:ciuly
ID: 22738744
that is because you are still doing memory allocation while filling the buffer. I told you not to do that.
do it once in the constructor.

second step will be to to some profiling as aikimark suggested. measure the time/clock cycles whatever of every major operation and see where you have big numbers. that's where you might be able to imporive. no sense in wasting time improving something that takes under 1 ms when you might have operations that take 10 ms or more.
0
 
LVL 13

Author Comment

by:ThievingSix
ID: 22738836
     BytesRead := fFileStream.Read(Buffer^,BytesToRead);
      fMemoryStream.Write(Buffer^,BytesRead);
      Inc(fBufferSize,BytesRead);

Thats the same thing TStream.CopyFrom does, is there a more efficient way to move the data from the file stream to the memory stream?

Also, I used ProDelphi to profile it. Seems that the 100k iterations are just going to take that much time. The TReadBufferStream.FillBuffer() and TWriteBufferStream.DumpBuffer() seem to take the most time as individuals, but the main part is just in the data loop.
0
 

Assisted Solution

by:birbilis
birbilis earned 50 total points
ID: 22738864
try moving all loop invariants outside of the for loop. For example why calculate CISO.BlockSize * 2 several times, if the CISO structure is initialized once outside of the for loop? Just define a local variable with CISO_BlockSize2 = CISO.BlockSize * 2 before the for loop and do a replace on the for loop body (select it and replace on the selection only) all "CISO.BlockSize * 2" with "CISO_BlockSize2". Similarly, avoid even field access like CISO.BlockSize inside the for loop and assign such fields to local variables, then use them inside the for loop (assuming the loop executes many times, not just a few)
0
 
LVL 13

Author Comment

by:ThievingSix
ID: 22738890
@cuily: I just realized what you were probably saying. Something like fFileStream.Read(PChar(fMemoryStream.Memory + fMemoryStream.Position)^,Count);?

@birbilis: Ah, makes sense. Will do.
0
 
LVL 28

Expert Comment

by:ciuly
ID: 22739027
you got it all wrong. TStream.CopyFrom does it in a non-speed fashion.

I was saying more like this (copy paste follows with needed modifications)

in read buffer you will need to make a loop, in case Count is bigger then MaxBuffSize, so that you will do multiple reads if needed.
then, further optimizaiton can be done for this case, in the sense that if you need to read more than MaxBuffSize, you will read once, directly from the file, but not before reading whatever is left in the memory buffer.
you will need to do pointer operations for this. in case you don't know how, shortly, it's liek this:
read(pointer((@Buffer+delta))^, whatever count is left);
const MaxBuffSize = 64 * 1024; // ---- constant
 

constructor TReadBufferStream.Create(InputStream: TFileStream);

begin

  fFileStream := InputStream;

  fMemoryStream := TMemoryStream.Create;

  fMemoryStream.SetSize(MaxBuffSize);

  fBufferSize := 0;

  GetMem(Buffer,MaxBuffSize); // -------- this

  FillBuffer;

end;

 

destructor TReadBufferStream.Destroy;

begin

  FreeAndNil(fMemoryStream);

  FreeMem(Buffer,MaxBuffSize); //  ----------- this

  inherited Destroy;

end;

 

procedure TReadBufferStream.FillBuffer;

var

  BytesToRead : Int64;

  BytesRead : Int64;

  Buffer : PChar;

begin

  If fBufferSize < fMemoryStream.Size Then

    begin

    BytesToRead := fMemoryStream.Size - fBufferSize;

    fMemoryStream.Position := fBufferSize;

    BytesRead := fFileStream.Read(Buffer^,BytesToRead);

    fMemoryStream.Write(Buffer^,BytesRead);

    Inc(fBufferSize,BytesRead);

    fMemoryStream.Position := 0;

  end;

end;

 

function TReadBufferStream.ReadBuffer(var Buffer; Count: Int64): Int64;

var

  BytesToRead : Int64;

begin

  BytesToRead := Count;

  If BytesToRead > fBufferSize Then

    begin

    FillBuffer;

    If BytesToRead > fBufferSize Then

      begin

      BytesToRead := fBufferSize;

    end;

  end;

  Result := fMemoryStream.Read(Buffer,BytesToRead);

  Dec(fBufferSize,Result);

end;

Open in new window

0
 
LVL 28

Expert Comment

by:ciuly
ID: 22739036
>> FileStream.Read(PChar(fMemoryStream.Memory + fMemoryStream.Position)^,Count)

that's is actually a pretty good idea. this way you eliminate one memory copy. brilliant :) dunno how I didn't think of this :|
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now