Remove Duplicate Lines When Parsing CSV

Posted on 2007-12-04
Last Modified: 2008-02-01
I am using Java to read lines from a text file.  In this case, it is a CSV.  I parse the file and process each line based on my application.  I noticed that sometimes the file contains duplicate lines.  Instead of processing the duplicate line multiple times, I would like to somehow remove the duplicate lines from the CSV and then process each line in the file.  Can someone instruct me on the one of the more efficient ways of doing this?  Thanks
Question by:pcarrollnf
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 86

Accepted Solution

CEHJ earned 125 total points
ID: 20405022
If it's not too big you can save the file to a Set<String>. That will ensure uniqueness

Author Comment

ID: 20405043
That's would be an issue.  These CSV files can become large and may contain thousands of lines.
LVL 86

Expert Comment

ID: 20405108
What OS are you using?
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 20405193
Windows 2000, 2003
LVL 86

Expert Comment

ID: 20405968
Get a Windows port of textutils from You can then do

cat orig.csv | sort | uniq >uniq.csv

Doubt if you'll get much more efficient than that

Expert Comment

ID: 20406102
There is a software called Clippy that does what you want.
LVL 92

Expert Comment

ID: 20406402
you don't need to store the entire csv in memory, just the unique keys.
That will allow you to check each line as you process it
LVL 86

Expert Comment

ID: 20781397

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
egit plugin on eclipse 8 98
hibernate example using maven 12 77
Facing this issue for maven proxy setting 2 27
DTD and JAVA versions 1 30
This was posted to the Netbeans forum a Feb, 2010 and I also sent it to Verisign. Who didn't help much in my struggles to get my application signed. ------------------------- Start The idea here is to target your cell phones with the correct…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question