De-Duping a Text file

I have a text file with 100 character lines (100,000 lines).  I need to dedupe this file and the de-duped output file must be in same order as input.  What is a good way to process the file.
I have worked with streamreader/writer.  It is the random order de-duping that I can't see.
Could you include example with solution.
garyinmiami2003Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

strickddCommented:
using (StreamReader sr = new StreamReader("TestFile.txt")) 
{
	string line;
	List<string> lineList = new List<string>();
	List<int> deleteIndexes = new List<int>();
	
	while ((line = sr.ReadLine()) != null) 
	{
	    lineList.Add(line);	    
	}
	
	//Find duplicate indexes
	for(int i = 0; i<lineList.Count - 1; i++)
	{
		for(int j = i+1; j<lineList.Count; j++)
		{
			if(lineList[i] == lineList[j])
			{
				deleteIndexes.Add(j); //remove 2nd duplication of line
			}
		}
	}
	
	deleteIndexs.Revers(); //start deletion from the last index to prevent shifting index issues
	
	foreach(int index in deleteIndexes)
	{
		lineList.RemoveAt(index);
	}
	
	//write lineList to streamwriter
	foreach(string newLine in lineList)
	{
		sw.WriteLine(newLine);
	}
}

Open in new window

0
garyinmiami2003Author Commented:
Same code in vb.net?
0
strickddCommented:
This should be close, just put it through a converted:

Using sr As New StreamReader("TestFile.txt")
	Dim line As String
	Dim lineList As New List(Of String)()
	Dim deleteIndexes As New List(Of Integer)()

	While (InlineAssignHelper(line, sr.ReadLine())) IsNot Nothing
		lineList.Add(line)
	End While

	'Find duplicate indexes
	For i As Integer = 0 To lineList.Count - 2
		For j As Integer = i + 1 To lineList.Count - 1
			If lineList(i) = lineList(j) Then
					'remove 2nd duplication of line
				deleteIndexes.Add(j)
			End If
		Next
	Next

	deleteIndexs.Revers()
	'start deletion from the last index to prevent shifting index issues
	For Each index As Integer In deleteIndexes
		lineList.RemoveAt(index)
	Next

	'write lineList to streamwriter
	For Each newLine As String In lineList
		sw.WriteLine(newLine)
	Next
End Using

Open in new window

0
Cloud Class® Course: SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

garyinmiami2003Author Commented:
strickdd:

a couple of errors:
sw.WriteLine(newLine)
Not a member of StreamReader

InlineAssignHelper not declared


 
0
strickddCommented:
When you said "I have worked with streamreader/writer." I figured you had a stream writer variable declared and opened already. Add the declaration and put the variable name in place of the "sw" and it should work.
0
AmbusyCommented:
The previous code misses a IF NOT CONTAINS test: is a line occurs 3 time things go wrong, as the same indes is deleted more than once.
 
        Dim lineList As New List(Of String)
        Dim deleteIndexes As New List(Of Integer)
        Using sr As StreamReader = New StreamReader("a.txt")
            Dim line As String = sr.ReadLine()
            While Not (line Is Nothing)
                lineList.Add(line)
                line = sr.ReadLine()
            End While
            sr.Close()
        End Using
        'Find duplicate indexes
        For i As Integer = 0 To lineList.Count - 1
            For j As Integer = i + 1 To lineList.Count - 1
                If lineList(i) = lineList(j) Then
                    If Not deleteIndexes.Contains(j) Then
                        deleteIndexes.Add(j) 'remove 2nd duplication of line
                    End If
                End If
            Next
        Next
        deleteIndexes.Reverse() '; //start deletion from the last index to prevent shifting index issues
        For Each index As Integer In deleteIndexes
            lineList.RemoveAt(index)
        Next
        Dim sw As New StreamWriter("a.txt")
        ' //write lineList to streamwriter
        For Each newLine As String In lineList
            sw.WriteLine(newLine)
        Next
        sw.Close()

Open in new window

0
Mike TomlinsonMiddle School Assistant TeacherCommented:
Here's another one:
Dim FileName As String = "C:\Users\Mike\Documents\SomeFile.txt"

        Dim lines As New List(Of String)
        lines.AddRange(System.IO.File.ReadAllLines(FileName))

        Dim index As Integer = 0
        Dim keys As New Dictionary(Of String, String)
        While index < lines.Count
            If Not keys.ContainsKey(lines(index)) Then
                keys.Add(lines(index), Nothing)
                index = index + 1
            Else
                lines.RemoveAt(index)
            End If
        End While

        System.IO.File.WriteAllLines(FileName, lines.ToArray)

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
garyinmiami2003Author Commented:
strickdd - never got his to work but sure it is close

Ambusy - Think that would work - I had a double extension on test file (hidden)  my probl;em

Wound up using idle mind and got it to work.
My thanks to you all and hope you feel points went out fairly.
0
AmbusyCommented:
you sure did well.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic.NET

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.