trevor1940
asked on
C#: How do you read a text file into a array removing dupilcate lines
I have a text file with over 2,000 lines
I need to process each line but how do check its unique first?
File.ReadAllLines shows how to read the file into an array but isn't testing for duplicates first
Sudo code just to illustrate
I need to process each line but how do check its unique first?
File.ReadAllLines shows how to read the file into an array but isn't testing for duplicates first
Sudo code just to illustrate
// Open the file to read from.
string[] List= File.ReadAllLines(path);
foreach (string Line in List){
unless(Seen the Line) {// Check not processed the Line already
// Do stuff with Line
// Add To Seen
}
}
ASKER
Each line in the file is a url
First 2 are unique 3 is a duplicate
Ignoring any white space at the start or more likely end of each line
https://example.com/threads/3669713-trevor-19060
https://example.com/threads/3669714-trevor-19059
https://example.com/threads/3669713-trevor-19060
First 2 are unique 3 is a duplicate
Ignoring any white space at the start or more likely end of each line
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks I didn't know .NET had means of processing Hashes!
Are these slimier to a perl hash, in that order isn't maintained?
If so it suggests the need to record to a external log file completed / processed lines and read that in first (So the program can be stopped & restarted)
As suggested this example is simple so a list is probably good enough
With both options how would check the current line isn't a duplicate and not in the log file?
Are these slimier to a perl hash, in that order isn't maintained?
If so it suggests the need to record to a external log file completed / processed lines and read that in first (So the program can be stopped & restarted)
As suggested this example is simple so a list is probably good enough
With both options how would check the current line isn't a duplicate and not in the log file?
ASKER
Thank you
Not a problem, glad to help.
Checking for duplicate line from a line of text may be more difficult then Line1 == Line2. This is because the two lines may differ by one space character. For example the following are not duplicates.
Open in new window
If you look at the end of the line you will see an extra space between the t and question mark at the end. In order to help we need to know what you mean by unique? Is there a account number in each line that makes then unique? Or something else?