stuayre
asked on
need a fast search and replace in a very large file (4mb)
Hi,
I have a very large 4mb text file with stock place holders like this %%%stock_DJB400%%%
and then i have a large list of stock codes and qtys (7000+) in a csv file that i need to replace in the large txt file.
I wrote an application to load both the files into a string and then search and replace in the other but it is really sssllloww.
how can i make things faster?
can anyone help?
cheers
Stu
I have a very large 4mb text file with stock place holders like this %%%stock_DJB400%%%
and then i have a large list of stock codes and qtys (7000+) in a csv file that i need to replace in the large txt file.
I wrote an application to load both the files into a string and then search and replace in the other but it is really sssllloww.
how can i make things faster?
can anyone help?
cheers
Stu
Does you need application or just combine these 2 files?
Try to open txt file in MSExcel2007 which limitation is 1mln rows and use VLOOKUP to combine these 2 files
Try to open txt file in MSExcel2007 which limitation is 1mln rows and use VLOOKUP to combine these 2 files
ASKER
ciuly: thanks, I will try your suggestions, I had the large txt file loaded in a memo and was reading each line out and doing the replace but it was still slow.
veaceslavz: I could do that but I need to write something programmaticly, it needs to sit in the background on my pc and do the job repeatedly.
veaceslavz: I could do that but I need to write something programmaticly, it needs to sit in the background on my pc and do the job repeatedly.
memo is visual it eats up unnecessary resources. use a TStringList instead which is non-visual. if you want to modify directly in the stringlist, you will aoso have to do soemthing like this:
list.beginupdate;
try
operate many operations on the list
finally
list.endupdate;
end;
list.beginupdate;
try
operate many operations on the list
finally
list.endupdate;
end;
I would recommend using a TFileStream to open and read your file. While reading from the stream make your changes saving to another TFileStream. I would use parsing to locate your replacement then instead of writing it out as is write out the replacement value.
A file stream takes care of buffering a set amount of the file read in memory so it is fast. This also allows you to do a parsing technique without worrying about the speed impact.
If you are in control of the replacement strings I would make a change there as well. Instead of using what you showed %%%stock_DJB400%%% use something that better indicates the start and stop of the replacement area such as XML <TagA></TagA> or just <<<stock_DJB400>>> The indicators then show begin and end. Just a suggestion.
If you want to give me an example replacement file and an example file that contains the replacements to be done, I will do an example for you.
A file stream takes care of buffering a set amount of the file read in memory so it is fast. This also allows you to do a parsing technique without worrying about the speed impact.
If you are in control of the replacement strings I would make a change there as well. Instead of using what you showed %%%stock_DJB400%%% use something that better indicates the start and stop of the replacement area such as XML <TagA></TagA> or just <<<stock_DJB400>>> The indicators then show begin and end. Just a suggestion.
If you want to give me an example replacement file and an example file that contains the replacements to be done, I will do an example for you.
>> A file stream takes care of buffering a set amount of the file read in memory so it is fast.
do you have anything to back this up? because I looked at the sources (up until delphi 7) and there is no buffering in any of the streams, including file stream.
do you have anything to back this up? because I looked at the sources (up until delphi 7) and there is no buffering in any of the streams, including file stream.
ASKER
thanks developmentguru & ciuly, im going to try the TFileStream
can anyone help me with the tFileStream? I have this code but how do i read the stream into a var so i can do my search replace?
var
FileS: TFilestream;
I: Integer;
Buff: string;
Teststring: String;
begin
FileS := TFilestream.Create('c:\myt extfile.tx t',fmOpenR ead);
try
for I := 0 to FileS.Size do
begin
//some code here?
end;
finally
FileS.Free;
end;
end;
can anyone help me with the tFileStream? I have this code but how do i read the stream into a var so i can do my search replace?
var
FileS: TFilestream;
I: Integer;
Buff: string;
Teststring: String;
begin
FileS := TFilestream.Create('c:\myt
try
for I := 0 to FileS.Size do
begin
//some code here?
end;
finally
FileS.Free;
end;
end;
@ciuly I was quoting from memory and seem to have been mistaken. I can't find any automatic buffering either. Of course it is not a terrible problem to handle the buffering.
@stuayre I am working on an example now.
@stuayre I am working on an example now.
ASKER
ok thanks
I generated a file that I could test the replace on. The file is roughly 4.5 MB I had another file that contained my list of search tags versus replacement values (SearchTag=ReplacementValu e on each line). I used this file to call the TFileReplace.AddCriteria. I then called replace and got the expected results in about 26.82 seconds.
The class I made here is FAR from optimized.
1) I started with the idea of using an out buffer, never used it.
2) When creating the strings to compare for start and end tag, tag name, etc. This could also be optimized to not recreate the string several times.
Just a couple of things off the top of my head.
It does show you the use of a file stream with a parsing replacement method.
Let me know if you need more.
The class I made here is FAR from optimized.
1) I started with the idea of using an out buffer, never used it.
2) When creating the strings to compare for start and end tag, tag name, etc. This could also be optimized to not recreate the string several times.
Just a couple of things off the top of my head.
It does show you the use of a file stream with a parsing replacement method.
Let me know if you need more.
unit uFileReplace;
interface
uses
SysUtils, Classes;
type
//replace array of PChar with array of PWideChar for unicode
TFileReplace = class
private
fInBuffer : PChar;
fOutBuffer : PChar;
fInFile : TFileStream;
fOutFile : TFileStream;
fInFileSector : integer;
fInFileIndex : integer;
fOutFileSector : integer;
fOutFileIndex : integer;
fSearchList : TStringList;
fReplaceList : TStringList;
fStartDelimiter : string;
fEndDelimiter : string;
fBufferSize : integer;
function GetChar(Index: integer): char;
protected
procedure SortSearchCriteria;
property Chars[Index : integer] : char read GetChar;
public
constructor Create(InFileName, OutFileName, StartDelim, EndDelim : string;
BufferSize : integer);
destructor Destroy; override;
procedure AddCriteria(SearchFor, ReplaceWith : string);
procedure Replace;
end;
implementation
var
SearchList : TStringList;
{ TFileReplace }
function CompareSearchList(List: TStringList; Index1,
Index2: Integer): Integer;
var
A, B : string;
begin
Result := 0;
A := '';
B := '';
if Index1 < SearchList.Count then
A := SearchList[Index1];
if Index2 < SearchList.Count then
B := SearchList[Index2];
if (Index1 < SearchList.Count) and (Index2 < SearchList.Count) then
if A > B then
Result := 1
else
if A < B then
Result := -1;
end;
procedure TFileReplace.AddCriteria(SearchFor, ReplaceWith: string);
begin
fSearchList.Add(SearchFor);
fReplaceList.Add(ReplaceWith);
end;
constructor TFileReplace.Create(InFileName, OutFileName, StartDelim,
EndDelim : string; BufferSize: integer);
begin
fBufferSize := BufferSize;
GetMem(fInBuffer, BufferSize);
GetMem(fOutBuffer, BufferSize);
fInFile := TFileStream.Create(InFileName, fmOpenRead);
fOutFile := TFileStream.Create(OutFileName, fmCreate);
fSearchList := TStringList.Create;
fReplaceList := TStringList.Create;
fStartDelimiter := StartDelim;
fEndDelimiter := EndDelim;
fInFileSector := -1; //points nowhere
end;
destructor TFileReplace.Destroy;
begin
fReplaceList.Free;
fSearchList.Free;
fOutFile.Free;
fInFile.Free;
FreeMem(fOutBuffer);
FreeMem(fInBuffer);
inherited;
end;
function TFileReplace.GetChar(Index: integer): char;
var
TargetSector : integer;
Ptr : integer;
begin
TargetSector := Index div fBufferSize;
if TargetSector <> fInFileSector then
begin
fInFile.Seek(TargetSector * fBufferSize, soFromBeginning);
fInFile.Read(fInBuffer^, fBufferSize);
fInFileSector := TargetSector;
end;
Ptr := Index mod fBufferSize;
Result := Char(PByteArray(fInBuffer)^[Ptr]);
end;
procedure TFileReplace.Replace;
var
Last, SIndex : integer;
Comp : string;
I : integer;
FoundEnd : boolean;
TagName : string;
TBuf : PChar;
Value : string;
TagIndex : integer;
ChunkSize : integer;
begin
Assert(fStartDelimiter <> '', 'Start delimiter not set');
Assert(fEndDelimiter <> '', 'End delimiter not set');
SortSearchCriteria;
Last := 0;
repeat
//find a possible start
while (fInFileIndex < fInFile.Size) and
(Chars[fInFileIndex] <> fStartDelimiter[1]) do
inc(fInFileIndex);
if Chars[fInFileIndex] = fStartDelimiter[1] then
begin
//is this the full delimiter?
Comp := '';
for I := 0 to Length(fStartDelimiter) - 1 do
Comp := Comp + Chars[fInFileIndex + I];
if Comp = fStartDelimiter then
begin
inc(fInFileIndex, Length(fStartDelimiter));
//search for end delimiter
sIndex := fInFileIndex;
repeat
//find a possible end
while (fInFileIndex < fInFile.Size) and
(Chars[fInFileIndex] <> fStartDelimiter[1]) do
inc(fInFileIndex);
Comp := '';
for I := 0 to Length(fStartDelimiter) - 1 do
Comp := Comp + Chars[fInFileIndex + I];
FoundEnd := Comp = fEndDelimiter;
until (fInFileIndex >= fInFile.Size) or FoundEnd;
if FoundEnd then
begin
TagName := '';
for I := sIndex to fInFileIndex - 1 do
TagName := TagName + Chars[I];
//time to do the replace
//first, copy the file from the original to the new from Last to
//current
ChunkSize := SIndex - Last - Length(fStartDelimiter);
GetMem(TBuf, ChunkSize);
try
fInFile.Seek(Last, soFromBeginning);
fInFile.Read(TBuf^, ChunkSize);
fOutFile.WriteBuffer(TBuf^, ChunkSize);
finally
FreeMem(TBuf);
end;
//copy in the replacement
TagIndex := fSearchList.IndexOf(TagName);
Value := '';
if TagIndex > -1 then
Value := fReplaceList[TagIndex];
if Value <> '' then
fOutFile.Write(Value[1], Length(Value));
//advance past the end of the tag
inc(fInFileIndex, Length(fEndDelimiter));
Last := fInFileIndex;
end;
end;
end;
until fInFileIndex >= fInFile.Size;
end;
procedure TFileReplace.SortSearchCriteria;
var
SL, Values : TstringList;
I : Integer;
begin
SL := TStringList.Create;
try
SL.Assign(fSearchList);
SL.Sort;
Values := TStringList.Create;
try
for I := 0 to SL.Count - 1 do
Values.Add(fReplaceList[fSearcHlist.IndexOf(SL[I])]);
fSearchList.Assign(SL);
fReplaceList.Assign(Values);
finally
Values.Free;
end;
finally
SL.Free;
end;
end;
end.
The first comment should not have the word array in it...
Just to put things in perspective, my test was the following.
10,000 randomly generated text lines of varying length from 6 characters to 1025 characters built of random words including the replacement tags. The example file size was 4,760,311 bytes. There were over 157,500 replacements to be done. I did not bother with an exact count, I let the IDE search find all instances of ::: and it found more than 315,000 (two per tag). Processed into a new file with the replacements in 26.82 seconds. Is that still too slow?
10,000 randomly generated text lines of varying length from 6 characters to 1025 characters built of random words including the replacement tags. The example file size was 4,760,311 bytes. There were over 157,500 replacements to be done. I did not bother with an exact count, I let the IDE search find all instances of ::: and it found more than 315,000 (two per tag). Processed into a new file with the replacements in 26.82 seconds. Is that still too slow?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hey guys sorry for the delay!
first thanks for the massive effort you both put in.
i've had a look at both soultions and WOW rllibby's wins it does it in about 1 second!
amazing, i think Borland should use your functions!
developmentguru yours was good too and i want to thank you for the effort and time you put in to post a solution.
cheers
Stu
first thanks for the massive effort you both put in.
i've had a look at both soultions and WOW rllibby's wins it does it in about 1 second!
amazing, i think Borland should use your functions!
developmentguru yours was good too and i want to thank you for the effort and time you put in to post a solution.
cheers
Stu
I think one of the key differences between the two solutions is that my solution does all replacements in one pass. Using rlibby's solution will need to do a separate pass for every replacement.
I am going to play with rlibby's solution and check it out. My previous statement was based on assumptions and not on watching the code run.
Actually, the replace is done in memory using a 2 pass approach. First pass performs the matching logic, and maintains the following:
- source index of match
- length of match
- substitution string
This allows the replacement string to be allocated once, and the second pass does the work of moving the source string over to the destination. It is extremely efficient, at the cost of memory. But regardless, it does NOT perform a pass for every replacement match.
Russell
- source index of match
- length of match
- substitution string
This allows the replacement string to be allocated once, and the second pass does the work of moving the source string over to the destination. It is extremely efficient, at the cost of memory. But regardless, it does NOT perform a pass for every replacement match.
Russell
I copied the code out as is and ran it. Very fast, but where exactly is the output?
Same file...
// Resource protection
try
// Load the test file
regTest.LoadFromFile(FILE_ TEST); <--- load test file
// Perform replacement callback
regTest.ReplaceCallback(Re placeTag); <--- manage replacements
// Save back to original file
regTest.SaveToFile(FILE_TE ST); <---- save back to file (see comment above)
finally
// Free regex
regTest.Free;
end;
// Resource protection
try
// Load the test file
regTest.LoadFromFile(FILE_
// Perform replacement callback
regTest.ReplaceCallback(Re
// Save back to original file
regTest.SaveToFile(FILE_TE
finally
// Free regex
regTest.Free;
end;
The only other thing... I thought that the person who wrote the question wanted a streaming solution so memory would not be an issue. I will want to try mine against your sample data and see how fast it is in memory. Sorry if this is annoying to anyone, I am having fun.
Its a tradeoff; memory / speed....
Imho, if you are dealing with a known size (in this case 4MB), then it is perfectly acceptable to load the data into memory to process. Also for this size, there should be no issues if the dest result is handled in memory before writing to disk.
You could process the source in memory, and generate the results directly to the disk file. But this tends to be slightly slower due to all the small IO writes to disk. Below is code that handles the replacements manually, and depending on the $define statement, will dump the result directly to disk, or a memory stream (fast version) which will then write the final result to disk.
Using the memory stream I was seeing run times around 500ms. File based stream runs in 850ms
Russell
-----
unit Unit1;
interface
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
//
// Download RegExprEx from:
//
// http://users.adelphia.net/~rllibby/downloads/regexprex.zip
//
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
RegExprEx, StdCtrls;
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
// Node definitions for hashing elements
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
type
PHashNode = ^THashNode;
THashNode = packed record
Item: PChar;
Data: Pointer;
Next: PHashNode;
end;
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
// Hashing constants
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
const
HASH_SIZE = 12583;
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
// Callback for optional freeing of the data items
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
type
THashDeallocator = procedure(Sender: TObject; P: Pointer) of object;
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
// Hash class definition
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
type
THash = class(TObject)
private
// Private declarations
FHash: Array [0..Pred(HASH_SIZE)] of PHashNode;
FDeallocator: THashDeallocator;
FCareCase: Boolean;
FCount: Integer;
protected
// Protected declarations
function NewNode(Item: PChar; Data: Pointer): PHashNode;
public
// Public declarations
constructor Create(Deallocator: THashDeallocator = nil);
destructor Destroy; override;
procedure Clear;
function Delete(Item: PChar): Boolean;
function Extract(Item: PChar; out Data: Pointer): Boolean;
function Add(Item: PChar; Data: Pointer): Boolean;
function Find(Item: PChar; out Data: Pointer): Boolean;
property Count: Integer read FCount;
property CaseSensitive: Boolean read FCareCase write FCareCase;
end;
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
// TFastMemStream
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
type
TFastMemStream = class(TMemoryStream)
protected
// Protected declarations
function Realloc(var NewCapacity: Longint): Pointer; override;
end;
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
// Hashing functions
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
function HashFunc(Key: PChar; CareCase: Boolean): Integer;
function HashCompare(Item1, Item2: PChar; CareCase: Boolean): Boolean;
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
// TForm1
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
type
TForm1 = class(TForm)
Button1: TButton;
btnRun: TButton;
procedure FormCreate(Sender: TObject);
procedure FormDestroy(Sender: TObject);
procedure Button1Click(Sender: TObject);
procedure btnRunClick(Sender: TObject);
private
// Private declarations
FTestCodes: TStringList;
FHashCodeQty: THash;
protected
// Protected declarations
procedure GenerateSampleCodes;
procedure GenerateSampleCodeQtyFile;
procedure GenerateSampleTestFile;
procedure GenerateSampleFiles;
public
// Public declarations
end;
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
// File constants
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
const
// File names for testing
FILE_TEST = 'testing.txt';
FILE_CODEQTY = 'codeqty.txt';
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
// RegEx constants
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
const
// Eg StockItem1 = 103
REG_TAGQTY = '\s*(:<Tag>[^=\s]+)\s*=\s* (:<Value>\ d+)';
// Eg %%%StockItem1%%%
REG_TAG = '%%%(:<Tag>!(%%%)+)%%%';
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
// Global variables
////////////////////////// ////////// ////////// ////////// ////////// ////////// ////
var
Form1: TForm1;
implementation
{$R *.DFM}
//// Example of what you would need to do to perform the replacement logic /////
procedure TForm1.btnRunClick(Sender: TObject);
var strmBase: TStream;
regCodeQty: TRegExpr;
regTest: TRegExpr;
ptrValue: Pointer;
dwBackPos: Integer;
dwSize: Integer;
dwMark: LongWord;
szReplace: String;
lpszSource: PChar;
begin
// Mark time
dwMark:=GetTickCount;
// Resource protection
try
// Clear hash table
FHashCodeQty.Clear;
// Create regex to load hash table
regCodeQty:=TRegExpr.Creat ePattern(R EG_TAGQTY) ;
// Resource protection
try
// Load the data from file
regCodeQty.LoadFromFile(FI LE_CODEQTY );
// While match
if regCodeQty.MatchFirst then
begin
// Repeat
repeat
// Add hash item
FHashCodeQty.Add(PChar(reg CodeQty.Na medBackRef erence['Ta g']), Pointer(StrToInt(regCodeQt y.NamedBac kReference ['Value']) ));
// Stop when we fail to match next
until not(regCodeQty.MatchNext);
end;
finally
// Free regex
regCodeQty.Free;
end;
//
// Undef to use file based destination stream
//
{$DEFINE USE_MEM}
// Create regex to load the test file
regTest:=TRegExpr.CreatePa ttern(REG_ TAG);
// Resource protection
try
// Load the test file
regTest.LoadFromFile(FILE_ TEST);
// Create stream to dump the results
{$IFDEF USE_MEM}
strmBase:=TFastMemStream.C reate;
{$ELSE}
strmBase:=TFileStream.Crea te(FILE_TE ST, fmCreate);
{$ENDIF}
// Resource protection
try
// Get source pointer from regex parser
lpszSource:=regTest.Source ;
// Set back marker
dwBackPos:=0;
// Start match
if regTest.MatchFirst then
begin
// Repeat while match
repeat
// Get size of data up to match
dwSize:=regTest.MatchPos - dwBackPos;
// write preceding text over
if (dwSize > 0) then strmBase.Write(lpszSource^ , dwSize);
// Push source pointer past the match
Inc(lpszSource, dwSize + regTest.MatchLen);
// Lookup the tag in the hash to get the quantitiy
if FHashCodeQty.Find(PChar(re gTest.Name dBackRefer ence['Tag' ]), ptrValue) then
begin
// Pointer is really numeric value, convert to string
Str(Integer(ptrValue), szReplace);
// Write replacement value to file
strmBase.Write(Pointer(szR eplace)^, Length(szReplace));
end;
// Update the back marker
dwBackPos:=regTest.MatchPo s + regTest.MatchLen;
// Match next
until not(regTest.MatchNext);
end;
// Write remaining block of text
strmBase.Write(lpszSource^ , StrLen(lpszSource));
finally
// Save to file (if not a file stream)
if not(strmBase is THandleStream) then
begin
TCustomMemoryStream(strmBa se).SaveTo File(FILE_ TEST);
end;
// Free the stream
strmBase.Free;
end;
finally
// Free regex
regTest.Free;
end;
finally
// Set caption
Caption:=IntToStr(GetTickC ount - dwMark);
end;
end;
procedure TForm1.GenerateSampleFiles ;
begin
// Generate the sample codes
GenerateSampleCodes;
// Generate sample code = qty file
GenerateSampleCodeQtyFile;
// Generate the sample test file
GenerateSampleTestFile;
end;
procedure TForm1.GenerateSampleTestF ile;
var listSample: TStringList;
dwIndex: Integer;
begin
// Generate a sample file with the following attributes:
//
// - 50000 total lines
// - 3 replacement tags per line
// - 150000 total replacement tags in the file
// Create list
listSample:=TStringList.Cr eate;
// Resource protection
try
// Walk sample codes
for dwIndex:=1 to 50000 do
begin
// Create sample line
listSample.Add(
StringOfChar('a', Random(100)) + '(%%%' +
FTestCodes[Random(FTestCod es.Count)] + '%%%)' +
StringOfChar('b', Random(100)) + '(%%%' +
FTestCodes[Random(FTestCod es.Count)] + '%%%)' +
StringOfChar('c', Random(100)) + '(%%%' +
FTestCodes[Random(FTestCod es.Count)] + '%%%)' +
StringOfChar('d', Random(100))
);
end;
listSample.Add('zzz(%%%' + FTestCodes[Random(FTestCod es.Count)] + '%%%)Q');
// Save to file
listSample.SaveToFile(FILE _TEST);
finally
// Free the list
listSample.Free;
end;
end;
procedure TForm1.GenerateSampleCodeQ tyFile;
var listCodeQty: TStringList;
dwIndex: Integer;
begin
// Create list
listCodeQty:=TStringList.C reate;
// Resource protection
try
// Walk sample codes
for dwIndex:=0 to Pred(FTestCodes.Count) do
begin
// Generate random quantity values
listCodeQty.Add(Format('%s = %d', [FTestCodes[dwIndex], Random(100000)]));
end;
// Save to file
listCodeQty.SaveToFile(FIL E_CODEQTY) ;
finally
// Free the list
listCodeQty.Free;
end;
end;
procedure TForm1.GenerateSampleCodes ;
var dwIndex: Integer;
begin
// Lock list
FTestCodes.BeginUpdate;
// Resource protection
try
// Clear list
FTestCodes.Clear;
// Create set of stock codes
for dwIndex:=1 to 10000 do
begin
// Add sample stock code
FTestCodes.Add(Format('sto ck_DJB%d', [dwIndex]));
end;
finally
// Unlock list
FTestCodes.EndUpdate;
end;
end;
procedure TForm1.FormCreate(Sender: TObject);
begin
// Create string list for codes
FTestCodes:=TStringList.Cr eate;
// Create hash for loading codes and quantities
FHashCodeQty:=THash.Create ;
end;
procedure TForm1.FormDestroy(Sender: TObject);
begin
// Free objects
FTestCodes.Free;
FHashCodeQty.Free;
end;
//// Generate sample data ////////////////////////// ////////// ////////// ////////
procedure TForm1.Button1Click(Sender : TObject);
var dwMark: LongWord;
begin
// Mark time
dwMark:=GetTickCount;
// Resource protection
try
// Generate the sample data files
GenerateSampleFiles;
finally
// Set caption
Caption:=IntToStr(GetTickC ount - dwMark);
end;
end;
//// TFastMemStream ////////////////////////// ////////// ////////// ////////// ////
function TFastMemStream.Realloc(var NewCapacity: Longint): Pointer;
var dwDelta: Integer;
lpMemory: Pointer;
begin
// Get current memory pointer
lpMemory:=Memory;
// Resource protection
try
// Calculate the delta to be applied to the capacity
if (NewCapacity > 0) then
begin
// Check new capacity
if (NewCapacity > MaxWord) then
// Delta is 1/4 of desired capacity
dwDelta:=NewCapacity div 4
else
// Minimum allocation of 64 KB
dwDelta:=MaxWord;
// Update by delta
Inc(NewCapacity, dwDelta);
end;
// Determine if capacity has changed
if not(NewCapacity = Capacity) then
begin
// Check for nil alloc
if (NewCapacity = 0) then
begin
// Release the memory
FreeMem(lpMemory);
// Clear result
lpMemory:=nil;
end
else
begin
// Check current capacity
if (Capacity = 0) then
// Allocate memory
lpMemory:=AllocMem(NewCapa city)
else
// Reallocate memory
ReallocMem(lpMemory, NewCapacity);
end;
end;
finally
// Return modified pointer
result:=lpMemory;
end;
end;
//// Hashing functions ////////////////////////// ////////// ////////// ////////// /
function HashCompare(Item1, Item2: PChar; CareCase: Boolean): Boolean;
begin
// Null assignment checks
if (Item1 = nil) then
// Check for Item2 being nil
result:=(Item2 = nil)
else if (Item2 = nil) then
// Item1 is not null, so no possible match
result:=False
// Check case
else if CareCase then
// Case sensitive compare
result:=(StrComp(Item1, Item2) = 0)
else
// In-case sensitive compare
result:=(StrIComp(Item1, Item2) = 0);
end;
function HashFunc(Key: PChar; CareCase: Boolean): Integer;
var bChar: Byte;
begin
// Set starting result
result:=0;
// Generate hash index for key
while (Key^ > #0) do
begin
if CareCase then
bChar:=Byte(Key^)
else if (Key^ in ['A'..'Z']) then
bChar:=Byte(Key^) + 32
else
bChar:=Byte(Key^);
Inc(result, (result shl 5) + (bChar shl 3));
Inc(Key);
end;
// Keep result in bounds of array
result:=LongWord(result) mod HASH_SIZE;
end;
//// THash ////////////////////////// ////////// ////////// ////////// ////////// ///
constructor THash.Create(Deallocator: THashDeallocator = nil);
begin
// Perform inherited
inherited Create;
// Set default values
ZeroMemory(@FHash, SizeOf(FHash));
FDeallocator:=Deallocator;
FCareCase:=False;
FCount:=0;
end;
destructor THash.Destroy;
begin
// Resource protection
try
// Clear the hash
Clear;
finally
// Perform inherited
inherited Destroy;
end;
end;
procedure THash.Clear;
var phNode1: PHashNode;
phNode2: PHashNode;
dwIndex: Integer;
begin
// Resource protection
try
// Iterate the array and clear the hash nodes
for dwIndex:=0 to Pred(HASH_SIZE) do
begin
// Get bucket node
phNode1:=FHash[dwIndex];
// Walk the nodes
while Assigned(phNode1) do
begin
// Get pointer to next item
phNode2:=phNode1^.Next;
// Free memory for node name
if Assigned(phNode1^.Item) then FreeMem(phNode1^.Item);
// Callback
if (Assigned(phNode1^.Data) and Assigned(FDeallocator)) then FDeallocator(Self, phNode1^.Data);
// Free node item
FreeMem(phNode1);
// Set iterator to next item
phNode1:=phNode2;
end;
end;
finally
// Clear all top level buckets
for dwIndex:=0 to Pred(HASH_SIZE) do FHash[dwIndex]:=nil;
// Reset the count
FCount:=0;
end;
end;
function THash.Extract(Item: PChar; out Data: Pointer): Boolean;
var phNode1: PHashNode;
phNode2: PHashNode;
dwIndex: Integer;
begin
// Get the hash index
dwIndex:=HashFunc(Item, FCareCase);
// Get the hash bucket item
phNode1:=FHash[dwIndex];
// Did top level item exist?
if Assigned(phNode1) then
begin
// Prepare for loop
phNode2:=phNode1;
// Walk the nodes
while Assigned(phNode2) do
begin
// Match key
if HashCompare(phNode2^.Item, Item, FCareCase) then break;
// Save current node
phNode1:=phNode2;
// Move to the next node in the chain
phNode2:=phNode2^.Next;
end;
// Check to see if the node is still set
if Assigned(phNode2) then
begin
// Set out param data value
Data:=phNode2^.Data;
// Check to see if this is the top level item
if (phNode2 = phNode1) then
// Link next node into the bucket
FHash[dwIndex]:=phNode2^.N ext
else
// Link over this node
phNode1^.Next:=phNode2^.Ne xt;
// Free memory for node name
if Assigned(phNode2^.Item) then FreeMem(phNode2^.Item);
// Free the node
FreeMem(phNode2);
// Decrement the node count
Dec(FCount);
// Success
result:=True;
end
else
// Did not find the node
result:=False;
end
else
// No nodes in bucket
result:=False;
end;
function THash.Delete(Item: PChar): Boolean;
var lpData: Pointer;
begin
// Extract the item
result:=Extract(Item, lpData);
// Check result, perform callback if needed
if (result and Assigned(lpData) and Assigned(FDeallocator)) then FDeallocator(Self, lpData);
end;
function THash.Add(Item: PChar; Data: Pointer): Boolean;
var phNode1: PHashNode;
phNode2: PHashNode;
dwIndex: Integer;
begin
// Get the hash bucket item index
dwIndex:=HashFunc(Item, FCareCase);
// Resource protection
try
// Get the hash bucket item
phNode1:=FHash[dwIndex];
// Is the bucket empty
if (phNode1 = nil) then
// Add new cell item
FHash[dwIndex]:=NewNode(It em, Data)
else
begin
// Save current node
phNode2:=phNode1;
// Walk nodes
while Assigned(phNode2) do
begin
// Match the key
if HashCompare(phNode2^.Item, Item, FCareCase) then
begin
// Check for data change
if (phNode2^.Data <> Data) then
begin
// Callback
if (Assigned(phNode2^.Data) and Assigned(FDeallocator)) then FDeallocator(Self, phNode2^.Data);
// Set new data item
phNode2^.Data:=Data;
end;
// Break loop
break;
end;
// Save current node
phNode1:=phNode2;
// Walk next node
phNode2:=phNode2^.Next;
end;
// Do we need to add a new item to the end of the chain?
if (phNode2 = nil) then
begin
// Create hash node
phNode2:=NewNode(Item, Data);
// Link the node in
phNode1^.Next:=phNode2;
end;
end;
finally
// Always success
result:=True;
end;
end;
function THash.Find(Item: PChar; out Data: Pointer): Boolean;
var phNode: PHashNode;
begin
// Get the hash bucket item
phNode:=FHash[HashFunc(Ite m, FCareCase)];
// Resource protection
try
// Walk the items
while Assigned(phNode) do
begin
// Compare the key
if HashCompare(phNode^.Item, Item, FCareCase) then
begin
// Key exists, set out return data
Data:=phNode^.Data;
break;
end;
// Walk the next item
phNode:=phNode^.Next;
end;
finally
// Success if node is assigned
result:=Assigned(phNode);
end;
end;
function THash.NewNode(Item: PChar; Data: Pointer): PHashNode;
begin
// Allocate memory for new node
result:=AllocMem(SizeOf(TH ashNode));
// Set the structure fields
result^.Item:=StrCopy(Allo cMem(Succ( StrLen(Ite m))), Item);
result^.Data:=Data;
// Increment the count
Inc(FCount);
end;
initialization
// Seed random number generator
Randomize;
end.
-- dfm ---
object Form1: TForm1
Left = 351
Top = 319
Width = 453
Height = 177
Caption = 'Form1'
Color = clBtnFace
Font.Charset = ANSI_CHARSET
Font.Color = clWindowText
Font.Height = -11
Font.Name = 'Tahoma'
Font.Style = []
OldCreateOrder = False
OnCreate = FormCreate
OnDestroy = FormDestroy
PixelsPerInch = 96
TextHeight = 13
object Button1: TButton
Left = 12
Top = 12
Width = 165
Height = 25
Caption = 'Generate Sample Data'
TabOrder = 0
OnClick = Button1Click
end
object btnRun: TButton
Left = 12
Top = 44
Width = 165
Height = 25
Caption = 'Run Test'
TabOrder = 1
OnClick = btnRunClick
end
end
Imho, if you are dealing with a known size (in this case 4MB), then it is perfectly acceptable to load the data into memory to process. Also for this size, there should be no issues if the dest result is handled in memory before writing to disk.
You could process the source in memory, and generate the results directly to the disk file. But this tends to be slightly slower due to all the small IO writes to disk. Below is code that handles the replacements manually, and depending on the $define statement, will dump the result directly to disk, or a memory stream (fast version) which will then write the final result to disk.
Using the memory stream I was seeing run times around 500ms. File based stream runs in 850ms
Russell
-----
unit Unit1;
interface
//////////////////////////
//
// Download RegExprEx from:
//
// http://users.adelphia.net/~rllibby/downloads/regexprex.zip
//
//////////////////////////
uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
RegExprEx, StdCtrls;
//////////////////////////
// Node definitions for hashing elements
//////////////////////////
type
PHashNode = ^THashNode;
THashNode = packed record
Item: PChar;
Data: Pointer;
Next: PHashNode;
end;
//////////////////////////
// Hashing constants
//////////////////////////
const
HASH_SIZE = 12583;
//////////////////////////
// Callback for optional freeing of the data items
//////////////////////////
type
THashDeallocator = procedure(Sender: TObject; P: Pointer) of object;
//////////////////////////
// Hash class definition
//////////////////////////
type
THash = class(TObject)
private
// Private declarations
FHash: Array [0..Pred(HASH_SIZE)] of PHashNode;
FDeallocator: THashDeallocator;
FCareCase: Boolean;
FCount: Integer;
protected
// Protected declarations
function NewNode(Item: PChar; Data: Pointer): PHashNode;
public
// Public declarations
constructor Create(Deallocator: THashDeallocator = nil);
destructor Destroy; override;
procedure Clear;
function Delete(Item: PChar): Boolean;
function Extract(Item: PChar; out Data: Pointer): Boolean;
function Add(Item: PChar; Data: Pointer): Boolean;
function Find(Item: PChar; out Data: Pointer): Boolean;
property Count: Integer read FCount;
property CaseSensitive: Boolean read FCareCase write FCareCase;
end;
//////////////////////////
// TFastMemStream
//////////////////////////
type
TFastMemStream = class(TMemoryStream)
protected
// Protected declarations
function Realloc(var NewCapacity: Longint): Pointer; override;
end;
//////////////////////////
// Hashing functions
//////////////////////////
function HashFunc(Key: PChar; CareCase: Boolean): Integer;
function HashCompare(Item1, Item2: PChar; CareCase: Boolean): Boolean;
//////////////////////////
// TForm1
//////////////////////////
type
TForm1 = class(TForm)
Button1: TButton;
btnRun: TButton;
procedure FormCreate(Sender: TObject);
procedure FormDestroy(Sender: TObject);
procedure Button1Click(Sender: TObject);
procedure btnRunClick(Sender: TObject);
private
// Private declarations
FTestCodes: TStringList;
FHashCodeQty: THash;
protected
// Protected declarations
procedure GenerateSampleCodes;
procedure GenerateSampleCodeQtyFile;
procedure GenerateSampleTestFile;
procedure GenerateSampleFiles;
public
// Public declarations
end;
//////////////////////////
// File constants
//////////////////////////
const
// File names for testing
FILE_TEST = 'testing.txt';
FILE_CODEQTY = 'codeqty.txt';
//////////////////////////
// RegEx constants
//////////////////////////
const
// Eg StockItem1 = 103
REG_TAGQTY = '\s*(:<Tag>[^=\s]+)\s*=\s*
// Eg %%%StockItem1%%%
REG_TAG = '%%%(:<Tag>!(%%%)+)%%%';
//////////////////////////
// Global variables
//////////////////////////
var
Form1: TForm1;
implementation
{$R *.DFM}
//// Example of what you would need to do to perform the replacement logic /////
procedure TForm1.btnRunClick(Sender:
var strmBase: TStream;
regCodeQty: TRegExpr;
regTest: TRegExpr;
ptrValue: Pointer;
dwBackPos: Integer;
dwSize: Integer;
dwMark: LongWord;
szReplace: String;
lpszSource: PChar;
begin
// Mark time
dwMark:=GetTickCount;
// Resource protection
try
// Clear hash table
FHashCodeQty.Clear;
// Create regex to load hash table
regCodeQty:=TRegExpr.Creat
// Resource protection
try
// Load the data from file
regCodeQty.LoadFromFile(FI
// While match
if regCodeQty.MatchFirst then
begin
// Repeat
repeat
// Add hash item
FHashCodeQty.Add(PChar(reg
// Stop when we fail to match next
until not(regCodeQty.MatchNext);
end;
finally
// Free regex
regCodeQty.Free;
end;
//
// Undef to use file based destination stream
//
{$DEFINE USE_MEM}
// Create regex to load the test file
regTest:=TRegExpr.CreatePa
// Resource protection
try
// Load the test file
regTest.LoadFromFile(FILE_
// Create stream to dump the results
{$IFDEF USE_MEM}
strmBase:=TFastMemStream.C
{$ELSE}
strmBase:=TFileStream.Crea
{$ENDIF}
// Resource protection
try
// Get source pointer from regex parser
lpszSource:=regTest.Source
// Set back marker
dwBackPos:=0;
// Start match
if regTest.MatchFirst then
begin
// Repeat while match
repeat
// Get size of data up to match
dwSize:=regTest.MatchPos - dwBackPos;
// write preceding text over
if (dwSize > 0) then strmBase.Write(lpszSource^
// Push source pointer past the match
Inc(lpszSource, dwSize + regTest.MatchLen);
// Lookup the tag in the hash to get the quantitiy
if FHashCodeQty.Find(PChar(re
begin
// Pointer is really numeric value, convert to string
Str(Integer(ptrValue), szReplace);
// Write replacement value to file
strmBase.Write(Pointer(szR
end;
// Update the back marker
dwBackPos:=regTest.MatchPo
// Match next
until not(regTest.MatchNext);
end;
// Write remaining block of text
strmBase.Write(lpszSource^
finally
// Save to file (if not a file stream)
if not(strmBase is THandleStream) then
begin
TCustomMemoryStream(strmBa
end;
// Free the stream
strmBase.Free;
end;
finally
// Free regex
regTest.Free;
end;
finally
// Set caption
Caption:=IntToStr(GetTickC
end;
end;
procedure TForm1.GenerateSampleFiles
begin
// Generate the sample codes
GenerateSampleCodes;
// Generate sample code = qty file
GenerateSampleCodeQtyFile;
// Generate the sample test file
GenerateSampleTestFile;
end;
procedure TForm1.GenerateSampleTestF
var listSample: TStringList;
dwIndex: Integer;
begin
// Generate a sample file with the following attributes:
//
// - 50000 total lines
// - 3 replacement tags per line
// - 150000 total replacement tags in the file
// Create list
listSample:=TStringList.Cr
// Resource protection
try
// Walk sample codes
for dwIndex:=1 to 50000 do
begin
// Create sample line
listSample.Add(
StringOfChar('a', Random(100)) + '(%%%' +
FTestCodes[Random(FTestCod
StringOfChar('b', Random(100)) + '(%%%' +
FTestCodes[Random(FTestCod
StringOfChar('c', Random(100)) + '(%%%' +
FTestCodes[Random(FTestCod
StringOfChar('d', Random(100))
);
end;
listSample.Add('zzz(%%%' + FTestCodes[Random(FTestCod
// Save to file
listSample.SaveToFile(FILE
finally
// Free the list
listSample.Free;
end;
end;
procedure TForm1.GenerateSampleCodeQ
var listCodeQty: TStringList;
dwIndex: Integer;
begin
// Create list
listCodeQty:=TStringList.C
// Resource protection
try
// Walk sample codes
for dwIndex:=0 to Pred(FTestCodes.Count) do
begin
// Generate random quantity values
listCodeQty.Add(Format('%s
end;
// Save to file
listCodeQty.SaveToFile(FIL
finally
// Free the list
listCodeQty.Free;
end;
end;
procedure TForm1.GenerateSampleCodes
var dwIndex: Integer;
begin
// Lock list
FTestCodes.BeginUpdate;
// Resource protection
try
// Clear list
FTestCodes.Clear;
// Create set of stock codes
for dwIndex:=1 to 10000 do
begin
// Add sample stock code
FTestCodes.Add(Format('sto
end;
finally
// Unlock list
FTestCodes.EndUpdate;
end;
end;
procedure TForm1.FormCreate(Sender: TObject);
begin
// Create string list for codes
FTestCodes:=TStringList.Cr
// Create hash for loading codes and quantities
FHashCodeQty:=THash.Create
end;
procedure TForm1.FormDestroy(Sender:
begin
// Free objects
FTestCodes.Free;
FHashCodeQty.Free;
end;
//// Generate sample data //////////////////////////
procedure TForm1.Button1Click(Sender
var dwMark: LongWord;
begin
// Mark time
dwMark:=GetTickCount;
// Resource protection
try
// Generate the sample data files
GenerateSampleFiles;
finally
// Set caption
Caption:=IntToStr(GetTickC
end;
end;
//// TFastMemStream //////////////////////////
function TFastMemStream.Realloc(var
var dwDelta: Integer;
lpMemory: Pointer;
begin
// Get current memory pointer
lpMemory:=Memory;
// Resource protection
try
// Calculate the delta to be applied to the capacity
if (NewCapacity > 0) then
begin
// Check new capacity
if (NewCapacity > MaxWord) then
// Delta is 1/4 of desired capacity
dwDelta:=NewCapacity div 4
else
// Minimum allocation of 64 KB
dwDelta:=MaxWord;
// Update by delta
Inc(NewCapacity, dwDelta);
end;
// Determine if capacity has changed
if not(NewCapacity = Capacity) then
begin
// Check for nil alloc
if (NewCapacity = 0) then
begin
// Release the memory
FreeMem(lpMemory);
// Clear result
lpMemory:=nil;
end
else
begin
// Check current capacity
if (Capacity = 0) then
// Allocate memory
lpMemory:=AllocMem(NewCapa
else
// Reallocate memory
ReallocMem(lpMemory, NewCapacity);
end;
end;
finally
// Return modified pointer
result:=lpMemory;
end;
end;
//// Hashing functions //////////////////////////
function HashCompare(Item1, Item2: PChar; CareCase: Boolean): Boolean;
begin
// Null assignment checks
if (Item1 = nil) then
// Check for Item2 being nil
result:=(Item2 = nil)
else if (Item2 = nil) then
// Item1 is not null, so no possible match
result:=False
// Check case
else if CareCase then
// Case sensitive compare
result:=(StrComp(Item1, Item2) = 0)
else
// In-case sensitive compare
result:=(StrIComp(Item1, Item2) = 0);
end;
function HashFunc(Key: PChar; CareCase: Boolean): Integer;
var bChar: Byte;
begin
// Set starting result
result:=0;
// Generate hash index for key
while (Key^ > #0) do
begin
if CareCase then
bChar:=Byte(Key^)
else if (Key^ in ['A'..'Z']) then
bChar:=Byte(Key^) + 32
else
bChar:=Byte(Key^);
Inc(result, (result shl 5) + (bChar shl 3));
Inc(Key);
end;
// Keep result in bounds of array
result:=LongWord(result) mod HASH_SIZE;
end;
//// THash //////////////////////////
constructor THash.Create(Deallocator: THashDeallocator = nil);
begin
// Perform inherited
inherited Create;
// Set default values
ZeroMemory(@FHash, SizeOf(FHash));
FDeallocator:=Deallocator;
FCareCase:=False;
FCount:=0;
end;
destructor THash.Destroy;
begin
// Resource protection
try
// Clear the hash
Clear;
finally
// Perform inherited
inherited Destroy;
end;
end;
procedure THash.Clear;
var phNode1: PHashNode;
phNode2: PHashNode;
dwIndex: Integer;
begin
// Resource protection
try
// Iterate the array and clear the hash nodes
for dwIndex:=0 to Pred(HASH_SIZE) do
begin
// Get bucket node
phNode1:=FHash[dwIndex];
// Walk the nodes
while Assigned(phNode1) do
begin
// Get pointer to next item
phNode2:=phNode1^.Next;
// Free memory for node name
if Assigned(phNode1^.Item) then FreeMem(phNode1^.Item);
// Callback
if (Assigned(phNode1^.Data) and Assigned(FDeallocator)) then FDeallocator(Self, phNode1^.Data);
// Free node item
FreeMem(phNode1);
// Set iterator to next item
phNode1:=phNode2;
end;
end;
finally
// Clear all top level buckets
for dwIndex:=0 to Pred(HASH_SIZE) do FHash[dwIndex]:=nil;
// Reset the count
FCount:=0;
end;
end;
function THash.Extract(Item: PChar; out Data: Pointer): Boolean;
var phNode1: PHashNode;
phNode2: PHashNode;
dwIndex: Integer;
begin
// Get the hash index
dwIndex:=HashFunc(Item, FCareCase);
// Get the hash bucket item
phNode1:=FHash[dwIndex];
// Did top level item exist?
if Assigned(phNode1) then
begin
// Prepare for loop
phNode2:=phNode1;
// Walk the nodes
while Assigned(phNode2) do
begin
// Match key
if HashCompare(phNode2^.Item,
// Save current node
phNode1:=phNode2;
// Move to the next node in the chain
phNode2:=phNode2^.Next;
end;
// Check to see if the node is still set
if Assigned(phNode2) then
begin
// Set out param data value
Data:=phNode2^.Data;
// Check to see if this is the top level item
if (phNode2 = phNode1) then
// Link next node into the bucket
FHash[dwIndex]:=phNode2^.N
else
// Link over this node
phNode1^.Next:=phNode2^.Ne
// Free memory for node name
if Assigned(phNode2^.Item) then FreeMem(phNode2^.Item);
// Free the node
FreeMem(phNode2);
// Decrement the node count
Dec(FCount);
// Success
result:=True;
end
else
// Did not find the node
result:=False;
end
else
// No nodes in bucket
result:=False;
end;
function THash.Delete(Item: PChar): Boolean;
var lpData: Pointer;
begin
// Extract the item
result:=Extract(Item, lpData);
// Check result, perform callback if needed
if (result and Assigned(lpData) and Assigned(FDeallocator)) then FDeallocator(Self, lpData);
end;
function THash.Add(Item: PChar; Data: Pointer): Boolean;
var phNode1: PHashNode;
phNode2: PHashNode;
dwIndex: Integer;
begin
// Get the hash bucket item index
dwIndex:=HashFunc(Item, FCareCase);
// Resource protection
try
// Get the hash bucket item
phNode1:=FHash[dwIndex];
// Is the bucket empty
if (phNode1 = nil) then
// Add new cell item
FHash[dwIndex]:=NewNode(It
else
begin
// Save current node
phNode2:=phNode1;
// Walk nodes
while Assigned(phNode2) do
begin
// Match the key
if HashCompare(phNode2^.Item,
begin
// Check for data change
if (phNode2^.Data <> Data) then
begin
// Callback
if (Assigned(phNode2^.Data) and Assigned(FDeallocator)) then FDeallocator(Self, phNode2^.Data);
// Set new data item
phNode2^.Data:=Data;
end;
// Break loop
break;
end;
// Save current node
phNode1:=phNode2;
// Walk next node
phNode2:=phNode2^.Next;
end;
// Do we need to add a new item to the end of the chain?
if (phNode2 = nil) then
begin
// Create hash node
phNode2:=NewNode(Item, Data);
// Link the node in
phNode1^.Next:=phNode2;
end;
end;
finally
// Always success
result:=True;
end;
end;
function THash.Find(Item: PChar; out Data: Pointer): Boolean;
var phNode: PHashNode;
begin
// Get the hash bucket item
phNode:=FHash[HashFunc(Ite
// Resource protection
try
// Walk the items
while Assigned(phNode) do
begin
// Compare the key
if HashCompare(phNode^.Item, Item, FCareCase) then
begin
// Key exists, set out return data
Data:=phNode^.Data;
break;
end;
// Walk the next item
phNode:=phNode^.Next;
end;
finally
// Success if node is assigned
result:=Assigned(phNode);
end;
end;
function THash.NewNode(Item: PChar; Data: Pointer): PHashNode;
begin
// Allocate memory for new node
result:=AllocMem(SizeOf(TH
// Set the structure fields
result^.Item:=StrCopy(Allo
result^.Data:=Data;
// Increment the count
Inc(FCount);
end;
initialization
// Seed random number generator
Randomize;
end.
-- dfm ---
object Form1: TForm1
Left = 351
Top = 319
Width = 453
Height = 177
Caption = 'Form1'
Color = clBtnFace
Font.Charset = ANSI_CHARSET
Font.Color = clWindowText
Font.Height = -11
Font.Name = 'Tahoma'
Font.Style = []
OldCreateOrder = False
OnCreate = FormCreate
OnDestroy = FormDestroy
PixelsPerInch = 96
TextHeight = 13
object Button1: TButton
Left = 12
Top = 12
Width = 165
Height = 25
Caption = 'Generate Sample Data'
TabOrder = 0
OnClick = Button1Click
end
object btnRun: TButton
Left = 12
Top = 44
Width = 165
Height = 25
Caption = 'Run Test'
TabOrder = 1
OnClick = btnRunClick
end
end
ASKER
Russell is there a way to mod the code so if the qty doesn't exist it will replace it with 0?
I can't work out how your code works :)
cheers
Stu
I can't work out how your code works :)
cheers
Stu
Stu,
Yes, its very easy. All you need to do is check the hash find; if it fails, set ptrValue to nil (which will convert to zero when cast as int).
Eg (first example I gave)
-------------------------- --------
procedure TForm1.ReplaceTag(Sender: TObject; var Replace: String);
var ptrValue: Pointer;
begin
// Lookup the tag in the hash to get the quantitiy
if not(FHashCodeQty.Find(PCha r(TRegExpr (Sender).N amedBackRe ference['T ag']), ptrValue)) then ptrValue:=nil;
// Pointer is really numeric value, set replacement text
Str(Integer(ptrValue), Replace);
end;
Eg (second example I gave, mod the code in TForm1.btnRunClick)
-------------------------- --------
// Lookup the tag in the hash to get the quantitiy
if not(FHashCodeQty.Find(PCha r(regTest. NamedBackR eference[' Tag']), ptrValue)) then ptrValue:=nil;
// Pointer is really numeric value, convert to string
Str(Integer(ptrValue), szReplace);
// Write replacement value to file
strmBase.Write(Pointer(szR eplace)^, Length(szReplace));
Yes, its very easy. All you need to do is check the hash find; if it fails, set ptrValue to nil (which will convert to zero when cast as int).
Eg (first example I gave)
--------------------------
procedure TForm1.ReplaceTag(Sender: TObject; var Replace: String);
var ptrValue: Pointer;
begin
// Lookup the tag in the hash to get the quantitiy
if not(FHashCodeQty.Find(PCha
// Pointer is really numeric value, set replacement text
Str(Integer(ptrValue), Replace);
end;
Eg (second example I gave, mod the code in TForm1.btnRunClick)
--------------------------
// Lookup the tag in the hash to get the quantitiy
if not(FHashCodeQty.Find(PCha
// Pointer is really numeric value, convert to string
Str(Integer(ptrValue), szReplace);
// Write replacement value to file
strmBase.Write(Pointer(szR
In case you were interested I was able to significantly increase the speed of what I had previously done by doing a couple of optimizations. The process should be far easier to understand although it is still not quite as fast (the hashing algorithm is roughly2-3 times as fast). I timed the StrPos routine that comes with Delphi and it was taking 15ms or more per search on my machine which is ridiculous. I made my own and it routinely tests as 0ms. Before anyone comments, I know such routines exist. It took me less time to write than to look through the available libraries and pick one (let alone test them all). I used the same data generator as rlibby created with one important difference. I changed the GenerateSampleCodes to generate them this way:
procedure TForm1.GenerateSampleCodes ;
var dwIndex: Integer;
begin
// Lock list
FTestCodes.BeginUpdate;
// Resource protection
try
// Clear list
FTestCodes.Clear;
// Create set of stock codes
for dwIndex:=1 to 10000 do
begin
// Add sample stock code
FTestCodes.Add(Format('sto ck_DJB%.5d ', [dwIndex]));
end;
finally
// Unlock list
FTestCodes.EndUpdate;
end;
end;
This allowed me to presort them and do a simple binary search for the name within a regular string list.
I plan to work on it more as time permits. Let me know if you have questions.
procedure TForm1.GenerateSampleCodes
var dwIndex: Integer;
begin
// Lock list
FTestCodes.BeginUpdate;
// Resource protection
try
// Clear list
FTestCodes.Clear;
// Create set of stock codes
for dwIndex:=1 to 10000 do
begin
// Add sample stock code
FTestCodes.Add(Format('sto
end;
finally
// Unlock list
FTestCodes.EndUpdate;
end;
end;
This allowed me to presort them and do a simple binary search for the name within a regular string list.
I plan to work on it more as time permits. Let me know if you have questions.
unit uFileReplace;
interface
uses
Windows, SysUtils, Classes;
type
//replace array of PChar with array of PWideChar for unicode
TFileReplace = class
private
fInFile : TMemoryStream;
fOutFile : TMemoryStream;
fStartDelimiter : string;
fEndDelimiter : string;
fSearchList : TStringList;
fOutFileName : string;
protected
public
constructor Create(InFileName, OutFileName, ReplaceFileName, StartDelim,
EndDelim : string);
destructor Destroy; override;
procedure Replace;
function PCFind(Source, Value : pchar) : PChar;
function IndexOfName(Name : string) : integer;
end;
implementation
{ TFileReplace }
constructor TFileReplace.Create(InFileName, OutFileName, ReplaceFileName,
StartDelim, EndDelim : string);
var
FS : TFileStream;
begin
FS := TFileStream.Create(InFileName, fmOpenRead);
try
fInFile := TMemoryStream.Create;
fInFile.LoadFromStream(FS);
finally
FS.Free;
end;
fSearchList := TStringList.Create;
fSearchList.LoadFromFile(ReplaceFileName);
fOutFile := TMemoryStream.Create;
fOutFile.SetSize(fInFile.Size * 2);
fStartDelimiter := StartDelim;
fEndDelimiter := EndDelim;
fOutFileName := OutFileName;
end;
destructor TFileReplace.Destroy;
begin
fOutFile.Free;
fSearchList.Free;
fInFile.Free;
inherited;
end;
function TFileReplace.IndexOfName(Name: string): integer;
var
I, L, H, G : integer;
GName : string;
begin
Result := -1;
L := 0;
H := fSearchList.Count - 1;
repeat
G := ((H - L) shr 1) + L;
GName := fSearchList.Names[G];
if Name > GName then
L := G
else
if Name < GName then
H := G;
until (H - L < 3) or (Name = GName);
if Name = GName then
begin
Result := G;
exit;
end;
I := L;
while (Name <> fSearchList.Names[I]) and (I < H) do
inc(I);
if Name = fSearchList.Names[I] then
Result := I;
end;
function TFileReplace.PCFind(Source, Value: pchar): PChar;
begin
Result := nil;
if not Assigned(Source) then
exit;
if not Assigned(Value) then
exit;
try
if Source[0] = #0 then
exit;
except
end;
if Value[0] = #0 then
exit;
asm
//ecx Value
//edx Source
Mov ESI, Source
@ResetValue:
Mov EDI, Value; //set the pointer to the value
@Loop:
Mov BL, [ESI] //read in a byte of the source
Cmp BL, [EDI] //compare to what is being searched for
JE @TestFullValue //if found then continue
@NextChar:
inc ESI //otherwise increment the source pointer
CMP [ESI], 0
JE @NotFound
JMP @Loop //and search further
@TestFullValue:
//this is where we need to see if we found the entire value
Mov EDX, ESI
@LoopTestFullValue:
inc ESI
inc EDI
Mov BL, [ESI]
Cmp BL, [EDI]
JE @LoopTestFullValue
Mov BL, [EDI]
Cmp BL, 0
JE @Found
JMP @ResetValue
@Found:
mov Result, EDX //set the result
@NotFound:
end;
end;
procedure TFileReplace.Replace;
var
Bookmark, SIndex, EIndex : PChar;
LInt : integer absolute Bookmark;
TagName : string;
Value : string;
TagIndex : integer;
ChunkSize : integer;
P : integer;
begin
Assert(fStartDelimiter <> '', 'Start delimiter not set');
Assert(fEndDelimiter <> '', 'End delimiter not set');
Bookmark := fInFile.Memory;
repeat
// SIndex := StrPos(Bookmark, PChar(fStartDelimiter));
SIndex := PCFind(Bookmark, PChar(fStartDelimiter));
if Assigned(SIndex) then
begin
//figure out how much can be copied RAW
ChunkSize := integer(SIndex) - integer(Bookmark);
//copy it
if ChunkSize > 0 then
fOutFile.WriteBuffer(Bookmark^, ChunkSize);
//skip the delimiter
inc(SIndex, Length(fStartDelimiter));
//find the ending delimiter
// EIndex := StrPos(SIndex, PChar(fEndDelimiter));
EIndex := PCFind(SIndex, PChar(fEndDelimiter));
//up to 100% to this point
//figure out how big the tag is
ChunkSize := integer(EIndex) - integer(SIndex);
//set the tagname
SetLength(TagName, ChunkSize);
StrLCopy(@TagName[1], SIndex, ChunkSize);
//move past the end delimiter
inc(EIndex, Length(fEndDelimiter));
//update Bookmark
Bookmark := EIndex;
//up to 100% to this point
//replace the tag
TagIndex := IndexOfName(TagName);
Value := '';
if TagIndex > -1 then
begin
Value := fSearchList[TagIndex];
P := Pos('=', Value);
Value := Copy(Value, P + 1, Length(Value));
end;
if Value <> '' then
fOutFile.Write(Value[1], Length(Value));
end;
until not Assigned(SIndex);
fOutFile.Seek(0, soFromBeginning);
fOutFile.SaveToFile(fOutFileName);
end;
end.
Any chance that this question will get closed?
Russell
Russell
ASKER
Sorry!!!!!!!!!!!!!!!!!!!!! !!!!!!!! Russle you're a genius! thanks for all your help :)
kind regards
Stu
kind regards
Stu
ASKER
p.s thanks to developmentguru too :o)
Stu,
No problem. Figured I would throw a reminder out there (its easy to lose track sometimes)
Russell
No problem. Figured I would throw a reminder out there (its easy to lose track sometimes)
Russell
I suggest rewriting your code to do something like this:
- read from file a small buffer (of type string, for being easy. since it's small, the speed is not a big problem. lets say the minimum size of a buffer is 1024 characters)
- if the buffer does not contains enough data for one operation (like in your case replace a stockholder, or whatever), read some more into the buffer
- if you reached end of file, break (no sense in reading nothing forever :P)
- do your operation (reaplce, whatever)
- write the modified data (not entire buffer) to file (not same file, anotehr one whichyou create empty)
the idea is this: you basically copy the file, but while doing so, you also operate the modifications you want ;)
if you need to keep going through that 7000+ whatevr file, then you should definetly:
- either load it in memory (if it's not too big)
- import it into some database-like system (for example access, or sql lite, or whatever) so that you can do sql selects, which will be faster then you liniarly searching for stuff.
now, in case you just need to take the first element from the 7000+ file and reaplce every occurance in the 4 MB file, then you don't do the above loading/importing, but instead read it one record at a time and :
- load the entire 4 mb file in memory
- while loading it, save into a list the index in the file of the data that needs to be replaced, and what it is
- after this is done, in another list you will save the actual modified data.
example.
at index 10 you need to change A
at index 15 you need to change B
at index 23 you need to change A
in the list you have ((10, A), (15, B), (23, A)) (use a TList and a record for example. or use 2 lists, one for integers and one for strings. which ever is easier for you.
then you load the 7000+ file and see that
A needs to be changed into Y. so you iterate through the list (or the string list) and change A to Y). same for B.
then you save the file like this:
repeat
copy from original file to new file every byte until the next index from the list
save into new file the modified value
until end of file
I don't have time right now to make a complete demo, but if you are having trouble with some specific point, just ask.