Link to home
Start Free TrialLog in
Avatar of trevor1940
trevor1940

asked on

C#: How do you read a text file into a array removing dupilcate lines

I have a text file with over 2,000 lines

I need to process each line but how do check its unique first?
File.ReadAllLines shows how to read the file into an array but isn't testing for duplicates first

Sudo code just to illustrate
        // Open the file to read from.
        string[] List= File.ReadAllLines(path);
        foreach (string Line in List){
           unless(Seen the Line) {// Check not processed the Line already
            // Do stuff with Line
            // Add To Seen
            }

        }

Open in new window

Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

Hi trevor1940;

Checking for duplicate line from a line of text may be more difficult then Line1 == Line2. This is because the two lines may differ by one space character. For example the following are not duplicates.
I need to process each line but how do check its unique first?
I need to process each line but how do check its unique first ?

Open in new window

If you look at the end of the line you will see an extra space between the t and question mark at the end. In order to help we need to know what you mean by unique? Is there a account number in each line that makes then unique? Or something else?
Avatar of trevor1940
trevor1940

ASKER

Each line in the file is a url

https://example.com/threads/3669713-trevor-19060
https://example.com/threads/3669714-trevor-19059
https://example.com/threads/3669713-trevor-19060

Open in new window


First 2 are unique 3 is a duplicate

Ignoring any white space at the start or  more  likely end of each line
SOLUTION
Avatar of ste5an
ste5an
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks I didn't know .NET had means of processing Hashes!
Are these slimier to a perl hash, in that order isn't maintained?

 If  so it suggests the need to record to a external log file completed / processed lines and read that in first (So the program can be stopped & restarted)
As suggested this example is simple so a list is probably good enough

With both options how would check the current line isn't a duplicate and  not in the log file?
Thank you
Not a problem, glad to help.