I have 2linked list,
i need to see if any of the records on list2 are on list1. And i need to do that in a verry fast way.

type  plist =^list
list= record
name:string
size:integer;
next: plist
end
LVL 1
Who is Participating?

Commented:
I think I understood you perfectly, but the problem is you cannot run the code as it is. There are too many things missing. It is to show you the way to do what you want. My contribution here is to give you the ideas and show you the direction. The work still has to be done by yourself.

Ok, I gave some working code too (Quicksort, the compare function etc.) but you should consider this as bonus. From your request to give you declarations like PVirtualNode I conclude that you have not yet tried to understand what I gave you nor have you even read the code, otherwise you would have seen that I told you twice how you must modify the array entry (replacing PVirtualNode by a string).

Well, let me try it a last time. You probably don't need such a fast solution as I posted here. You can do it easier with some speed penalty:

1) Fill List1 and List2 with values
2) sort both lists using the QuickSort algorithm from above
3) Use the following modified compare function to determine whether files in both lists:

type
TCacheEntry = record
Name: String;
Size: Cardinal;
end;

TCache = array of TCacheEntry;

function ContainsAnyFile(List1, List2: TCache): Boolean;

// Determines whether files from List2 are in List1.
// Returns True if at least one file of List2 is in List1, otherwise False.
// Note: Both lists must be sorted alphabetically with shorter names placed before longer names!

var
I, J,
Len1, Len2: Cardinal;

begin
// make sure we don't try to access empty lists
Result := Assigned(List1) and Assigned(List2);
if Result then
begin
I := 0;
J := 0;
Len1 := Length(List1);
Len2 := Length(List2);
repeat
// loop until we found one entry in both lists
Result := List1[I].Name = List2[J].Name;
if Result then Break;
// skip entries in List1 which are smaller than the first entry in List2
while (I < Len1) and (List1[I].Name < List2[J].Name) do Inc(I);
// Result is False at this point
if I = Len1 then Break;
// skip entries in List2 which are smaller than the first entry in List1
while (J < Len2) and (List2[J].Name < List1[I].Name) do Inc(J);
until J = Len2;
end;
end;

Ciao, Mike
0

Commented:
I didn't test it but I think this should be it:

function LinkedListCompare(list1, list2: plist): Boolean;
var found: Boolean;
i, j: plist;
begin
i := list2;
repeat
found := false;
j := list1;
repeat
if i.name = j.name then
found := true;
j := j.next;
until (found) or (j = nil);
i := i.next;
until (not found) or (i = nil);
result := found;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
if LinkedListCompare(list1, list2) then
ShowMessage('All items found')
else
ShowMessage('One or more items missing');
end;
0

Commented:
Hi Eps, looks alright, but you will get an exception, if one of the lists is "nil".   :-)

0

Commented:
Just assuming that PD1 points on the first element of List1 and PD2 points on the first element of List2

Function Search : Boolean;
var Found : Boolean;
L1,L2 : PList;
Begin
L1 := PD1;
Found := False;
While (L1 <> nil) and (not Found) do
Begin
L2 := PD2;
While (L2 <> Nil) and (not Found) do
Begin
Found := L2^.Name = L1^.Name;
L2 := L2^.Next;
End; // While
L1 := L1^.Next;
End; // While

Result := Found;
End; // Function

0

Commented:
Thanks Madshi for pointing that out. I think this should be safe:

function LinkedListCompare(list1, list2: plist): Boolean;
var found: Boolean;
i, j: plist;
begin
found := false;
if (list1 = nil) or (list2 = nil) then
begin
// We have 3 situations here
if (list1 = nil) and (list2 = nil) then
found := true
else if list2 = nil then
found := true
else if list1 = nil then
found := false;
end
else begin
i := list2;
repeat
found := false;
j := list1;
repeat
if i.name = j.name then
found := true;
j := j.next;
until (found) or (j = nil);
i := i.next;
until (not found) or (i = nil);
end;
result := found;
end;
0

Commented:
Hi experts :-)

You both left out the size member in the record.

lfrodrigues, is speed a significant matter in this question or is it merely the algorithm you are after?

If speed is an issue and we can make some assumption then comparison can greatly be improved.

Ciao, Mike
0

Author Commented:
i know how to do that the way you showed me.
I need a very fast way of comparing
0

Commented:
Well then, let's talk about assumptions we can make to ease the comparison. How are the lists organized? What I mean is:

1) Are they sorted in any way?
2) What does the size member specify?

3) Have the node texts specific properties like a fixed length, only upper case or lower case or similar?
4) Do you need case sensitve comparison?
5) What actually is to be considered as being equal lists? I assume:
a) strings must be equal
b) nodes must be in the same order
c) size member must be equal
d) count of nodes must be equal
Is this correct? What more, what less?
6) About what dimensions are we talking here (1 million nodes or runs, less, more)?
7) Can it be that a node appears in both lists or are the memory locations always different? If a node would be in both lists then this would mean that all the nodes following it would be in both lists too.

Ciao, Mike
0

Commented:
>i know how to do that the way you showed me.

Your question does not tell me that so how could I know that. Now it seems that I have been wasting my time   :o(
0

Author Commented:
Lischke
1)
the nods aren't ordered,
2)
name: a filename
size: the file size
4) case sensitive would be usefull but not mandatory
5)
to be equal the names must be equal.
probably less than one million but could be more

what i want is a way to find if any of the filenames on list2 are on list1.

hope that helps you

-------------------------
sorry epsylon i thought that way was to basic and that anyone would know a better way.

0

Commented:
It depends on how often you have to look through that list. If you do that only once, then I think, there's no better way than Eps'. If you do that more than once, it might be faster to sort one list.
But you can save time when we talk about string comparison. If the names are file names, then the compare logic should be case insensitive, that means "c:\autoexec.bat" should be the same as "C:\Autoexec.Bat".
Which function do you use now? CompareText? Or "UpperCase(str1) = UpperCase(str2)"? Doesn't matter, the following function is faster anyway:

var lowCharTable : array [#0..#\$FF] of char =
(#\$00,#\$01,#\$02,#\$03,#\$04,#\$05,#\$06,#\$07,#\$08,#\$09,#\$0A,#\$0B,#\$0C,#\$0D,#\$0E,#\$0F,
#\$10,#\$11,#\$12,#\$13,#\$14,#\$15,#\$16,#\$17,#\$18,#\$19,#\$1A,#\$1B,#\$1C,#\$1D,#\$1E,#\$1F,
#\$20,#\$21,#\$22,#\$23,#\$24,#\$25,#\$26,#\$27,#\$28,#\$29,#\$2A,#\$2B,#\$2C,#\$2D,#\$2E,#\$2F,
#\$30,#\$31,#\$32,#\$33,#\$34,#\$35,#\$36,#\$37,#\$38,#\$39,#\$3A,#\$3B,#\$3C,#\$3D,#\$3E,#\$3F,
#\$40,#\$61,#\$62,#\$63,#\$64,#\$65,#\$66,#\$67,#\$68,#\$69,#\$6A,#\$6B,#\$6C,#\$6D,#\$6E,#\$6F,
#\$70,#\$71,#\$72,#\$73,#\$74,#\$75,#\$76,#\$77,#\$78,#\$79,#\$7A,#\$5B,#\$5C,#\$5D,#\$5E,#\$5F,
#\$60,#\$61,#\$62,#\$63,#\$64,#\$65,#\$66,#\$67,#\$68,#\$69,#\$6A,#\$6B,#\$6C,#\$6D,#\$6E,#\$6F,
#\$70,#\$71,#\$72,#\$73,#\$74,#\$75,#\$76,#\$77,#\$78,#\$79,#\$7A,#\$7B,#\$7C,#\$7D,#\$7E,#\$7F,
#\$80,#\$81,#\$82,#\$83,#\$84,#\$85,#\$86,#\$87,#\$88,#\$89,#\$9A,#\$8B,#\$9C,#\$8D,#\$9E,#\$8F,
#\$90,#\$91,#\$92,#\$93,#\$94,#\$95,#\$96,#\$97,#\$98,#\$99,#\$9A,#\$9B,#\$9C,#\$9D,#\$9E,#\$FF,
#\$B0,#\$B1,#\$B2,#\$B3,#\$B4,#\$B5,#\$B6,#\$B7,#\$B8,#\$B9,#\$BA,#\$BB,#\$BC,#\$BD,#\$BE,#\$BF,
#\$E0,#\$E1,#\$E2,#\$E3,#\$E4,#\$E5,#\$E6,#\$E7,#\$E8,#\$E9,#\$EA,#\$EB,#\$EC,#\$ED,#\$EE,#\$EF,
#\$F0,#\$F1,#\$F2,#\$F3,#\$F4,#\$F5,#\$F6,#\$D7,#\$F8,#\$F9,#\$FA,#\$FB,#\$FC,#\$FD,#\$FE,#\$DF,
#\$E0,#\$E1,#\$E2,#\$E3,#\$E4,#\$E5,#\$E6,#\$E7,#\$E8,#\$E9,#\$EA,#\$EB,#\$EC,#\$ED,#\$EE,#\$EF,
#\$F0,#\$F1,#\$F2,#\$F3,#\$F4,#\$F5,#\$F6,#\$F7,#\$F8,#\$F9,#\$FA,#\$FB,#\$FC,#\$FD,#\$FE,#\$FF);

function IsTextEqual(const s1, s2: string) : boolean;
var c1, c2 : cardinal;
begin
c1 := Length(s1);
result := cardinal(Length(s2)) = c1;
if result then
for c2 := 1 to c1 do
if lowCharTable[s1[c2]] <> lowCharTable[s2[c2]] then begin
result := false;
break;
end;
end;

0

Commented:
Ah yes, this simplifies the stuff a lot and I do agree with Madshi that Epsylon's solution is the best one so far (plus the case insensitive match).

But consider that this approach needs (in worst case) M * N comparisons (and inner loops) where M is the number of names in list one and N the number of names in list two. The time for the search will so increase quadratic with increasing file numers. Hence I strongly recommend to use a kind of sorted lists. Since you have linked lists using pointers there will be only a small penalty to insert new nodes in a sorted manner (no need to move large memory blocks around, like in linear lists). Doing a binary search (which would be the fastest way) is still a bit expensive because you cannot directly determine mid points in a linked list. So a binary tree (patricia tree, whatever) would be the best option.

Since there's significant more work to do it depends on the importance of a fast comparison.

Ciao, Mike
0

Author Commented:
Lischke  you are correct i need to use the worst case. How do i sort the list1 and how do i search the two lists in a REALLY fast way?
0

Commented:
Hi Rodrigues,

here's my solution for you. First the code then the discussion:

type
PFileList = ^TFileList;
TFileList = record
Name: String;
Size: Cardinal; // perhaps Int64 is better here for files > 4GB
Next: PFileList;
end;

procedure InsertNodeSorted(var List: PFileList; Name: String; Size: Cardinal);

// Inserts a new node with the given values into the list List where the position is
// determined by the given Name (alpha sorted list).
// Note: To make this work you must have a dummy node as root which is not allocated here
//       and should not be passed to the compare function (Use Root1.Next and Root2.Next instead).

var
NewEntry: PFileList;

begin
// find last node which is less than Name
while Assigned(List.Next) and (List.Next.Name < Name) do List := List.Next;
// create new node...
New(NewEntry);
NewEntry.Name := Name;
NewEntry.Size := Size;
// ...and insert it
NewEntry.Next := List.Next;
List.Next := NewEntry;
end;

function ContainsAnyFile(List1, List2: PFileList): Boolean;

// Determines whether files from List2 are in List1.
// Returns True if at least one file of List2 is in List1, otherwise False.
// Note: Both lists must be sorted alphabetically with shorter names placed before longer names!

begin
// make sure we don't try to access empty lists
Result := Assigned(List1) and Assigned(List2);
if Result then
begin
repeat
// loop until we found one entry in both lists
Result := List1.Name = List2.Name;
if Result then Break;
// skip entries in List1 which are smaller than the first entry in List2
while Assigned(List1) and (List1.Name < List2.Name) do List1 := List1.Next;
// Result is False at this point
if List1 = nil then Break;
// skip entries in List2 which are smaller than the first entry in List1
while Assigned(List2) and (List2.Name < List1.Name) do List2 := List2.Next;
until List2 = nil;
end;
end;

Under consideration that we are talking about file lists I assumed they don't become very large, so insertion should not be optimized too much for the cost of much more complex code.

On a linked list with N + 1 entries it takes on average N /2 comparisons to find a node sequentially. This is only important for the sorted insertion here!

The searched speeds up to M + N + 1 comparisons in worst case. E.g. 10,000 'a's and 1 'z' in list 1, 10,000 'b's in list 2 lead to 20,001 comparisons here in opposition to 1,000,000 comparisons in the old case!

Until now the source code is still very trivial. If you still need a faster search then you need to implement binary trees or an array instead of a linked list. The array approach is much better because of its property to be binary searchable, which results in an average comparison count of at most ~ lg N. This is for 1,000,000 entries 20! But in this case insertion cost a lot of time because possibly large memory areas have to be moved around. With N items in list2 and M items in list 1 this results in about N * lg M comparisons on average. Quite impressive, isn't it? This is the reason why I prefer arrays, but you would need to stop using linked lists.

Binary trees are better when inserting in a sorted manner, but need quite complex code and searching costs on average 2 * lg N comparisons (worst case N comparisons). But this is for special needs only and has other drawbacks.

Ciao, Mike
0

Author Commented:
Its not fast enouhgt, you are probably right it would be better to use an array, but how do i use binary search with strings?
0

Commented:
This is easy. Look at this code which is directly copied from my virtual treeview implementation which handles one million nodes very smoothly:

function TBaseVirtualTree.FindInNodeCache(Node: PVirtualNode; var Index: Integer): Boolean;

// Looks through the node cache and returns the index of the found entry in Index. When successful then the result
// is True else False (in which case Index contains a value as nearest as possible to the given node).

var
L, H, I, C: Integer;

begin
Result := False;
L := 0;
H := High(FNodeCache);
while L <= H do
begin
I := (L + H) shr 1;
C := Integer(FNodeCache[I].Node) - Integer(Node);
if C < 0 then L := I + 1 else
begin
H := I - 1;
if C = 0 then
begin
Result := True;
L := I;
end;
end;
end;
Index := L;
end;

FNodeCache is of type TCache which is defined as:

TCacheEntry = record
Node: PVirtualNode;
AbsoluteTop: Cardinal;
end;

TCache = array of TCacheEntry;

You need of course to modify the cache entry to:

TCacheEntry = record
Name: String;
Size: Cardinal;
end;

Now the binary search whether a string is in the cache is:

function TBaseVirtualTree.FindInNodeCache(Name: String; var Index: Integer): Boolean;

// Looks through the node cache and returns the index of the found entry in Index. When successful then the result
// is True else False (in which case Index contains a value as nearest as possible to the given node).

var
L, H, I, C: Integer;

begin
Result := False;
L := 0;
H := High(FNodeCache);
while L <= H do
begin
I := (L + H) shr 1;
// compare not case sensitive
C := CompareText(FNodeCache[I].Name, Name);
if C < 0 then L := I + 1 else
begin
H := I - 1;
if C = 0 then
begin
Result := True;
L := I;
end;
end;
end;
Index := L;
end;

This code is extremely efficient, as I already pointed out. The worst case costs ld N (at most 20 comparations for 1,000,000 entries). But the average cost is of course much lesser.

But now comes the hard stuff to insert new entries as fast as possible into the cache, but this one will you excellent response times:

procedure TBaseVirtualTree.InternalAddToSelection(const NewItems: TNodeArray; NewLength: Integer;
ForceInsert: Boolean);

// Internal version of method AddToSelection which does not trigger OnChange events

var
I, J: Integer;
CurrentEnd: Integer;

begin
// The idea behind this code is to use a kind of reverse merge sort. QuickSort is quite fast
// and would do the job here too but has a serious problem with already sorted lists like FSelection.

// 1) Remove already selected items, mark all other as being selected
// here I have snipped some code not related to your problem
// PackArray removes all nil entries from NewItems (if there are any) and returns the number of remaining entries
I := PackArray(NewItems, NewLength);
if I > -1 then NewLength := I;
if NewLength > 0 then
begin
// 2) Sort the new item list so we can easily traverse it.
if NewLength > 1 then QuickSort(NewItems, 0, NewLength - 1);
// 3) Make room in FSelection for the new items.
if FSelectionCount + NewLength >= Length(FSelection) then
begin
SetLength(FSelection, FSelectionCount + NewLength);
end;

// 4) Merge in new items
J := NewLength - 1;
CurrentEnd := FSelectionCount - 1;

while J >= 0 do
begin
// First insert all new entries which are greater than the greatest entry in the old list.
// If the current end marker is < 0 then there's nothing more to move in the selection
// array and only the remaining new items must be inserted.
if CurrentEnd >= 0 then
begin
while (J >= 0) and (Cardinal(NewItems[J]) > Cardinal(FSelection[CurrentEnd])) do
begin
FSelection[CurrentEnd + J + 1] := NewItems[J];
Dec(J);
end;
// early out if nothing more needs to be copied
if J < 0 then Break;
end
else
begin
// insert remaining new entries at position 0
Move(NewItems[0], FSelection[0], (J + 1) * SizeOf(Pointer));
// nothing more to do so exit main loop
Break;
end;

// find the last entry in the remaining selection list which is smaller then the largest
// entry in the remaining new items list
FindNodeInSelection(NewItems[J], I, 0, CurrentEnd);
Dec(I);
// move all entries which are greater than the greatest entry in the new items list up
// so the remaining gap travels down to where new items must be inserted
Move(FSelection[I + 1], FSelection[I + J + 2], (CurrentEnd - I) * SizeOf(Pointer));
CurrentEnd := I;
end;

Inc(FSelectionCount, NewLength);
end;
end;

The Quicksort algotithm (as well as the binary search) are used in a different manner in the VCL too (see TStringList). For you I post my code here:

procedure QuickSort(const TheArray: TCache; L, R: Integer);

var
I, J, M: Integer;
T: TCacheEntry;

begin
repeat
I := L;
J := R;
M := (L + R) shr 1;
repeat
while CompareText(TheArray[I].Name, TheArray[M].Name) < 0 do Inc(I);
while Cardinal(TheArray[J].Name, TheArray[P].Name) > 0 do Dec(J);
if I <= J then
begin
T := TheArray[I];
TheArray[I] := TheArray[J];
TheArray[J] := T;
Inc(I);
Dec(J);
end;
until I > J;
if L < J then QuickSort(TheArray, L, J);
L := I;
until I >= R;
end;

Is this now fast enough?

Ciao, Mike
0

Commented:
Oops, the second line should of course be:

while CompareText(TheArray[J].Name, TheArray[P].Name) > 0 do Dec(J);

Ciao, Mike
0

Author Commented:
hi mike

What is FNodeCache? in FindInNodeCache
What is FindNodeInSelection? in InternalAddToSelection
What is PackArray? in InternalAddToSelection
The code is just to confusing since there are many missing declarations,
could you please give some example on how to use your source.

bye LR
0

Commented:
Have you read all the comments in the text? This stuff is indeed not easy (to understand). I told it to you initially. On the other hand it is the fastest code I could come up with (after quite long research, I even have asked some "algorithm peoples" here at E-E which gave me the final idea).

FNodeCache is of type TCache which is defined as (copy from above):

TCacheEntry = record
Node: PVirtualNode;
AbsoluteTop: Cardinal;
end;

TCache = array of TCacheEntry;

You need of course to modify the cache entry to:

TCacheEntry = record
Name: String;
Size: Cardinal;
end;

Both Find* functions are just binary searches in their caches.

PackArray is also described in the text. It is used to quickly remove items too. This is for your case not relevant. Just ignore the related lines.

The code itself (InternalAddToSelection) is quite complex, I know, but nonetheless fast. Keep in mind it is ONLY to quickly add a range of items into a local array, nothing else. Once you have managed this you can very quick compare two lists.

Ciao, Mike
0

Author Commented:
Mike

I probably did made myself clear enough.
Have you tried to run the code i gave me?
Some types and functions are not declared eg: PVirtualNode ,FindNodeInSelection
I would like to have those types and functions.

bye
LR
0

Author Commented:
Yes you are correct i really didn't spent more than 5 min loking at the source. I just thought that for 200 points you would bother to give me a full working solution. I really need this i don't have the time to understand the source and make the changes. Well i will acept you anwser anyway.

Thanks
LR
0

Commented:
Well, at last I gave you working code. Perhaps next time it would be wise to tell already in the question that you want fully working source code. I'm not a friend to make it standard for an acceptable answer to provide full source code with each and every bit worked out. But if you make the code a condition explicitly I can in advance decide how much time I willing to spend on your question. Usually writing the code is only a small excercise once you actually understood the problem and the provided solution. If you do not understand it then you will never be able to fix bugs or enhance it...

Ciao, Mike
0

Author Commented:
Don't worry i've been exeminig the source for the last 30 min and i find it easy enought to understant. Thanks for everything

Bye
LR
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.