Remove Duplicate Lines When Parsing CSV

Posted on 2007-12-04
Last Modified: 2008-02-01
I am using Java to read lines from a text file.  In this case, it is a CSV.  I parse the file and process each line based on my application.  I noticed that sometimes the file contains duplicate lines.  Instead of processing the duplicate line multiple times, I would like to somehow remove the duplicate lines from the CSV and then process each line in the file.  Can someone instruct me on the one of the more efficient ways of doing this?  Thanks
Question by:pcarrollnf
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 86

Accepted Solution

CEHJ earned 125 total points
ID: 20405022
If it's not too big you can save the file to a Set<String>. That will ensure uniqueness

Author Comment

ID: 20405043
That's would be an issue.  These CSV files can become large and may contain thousands of lines.
LVL 86

Expert Comment

ID: 20405108
What OS are you using?
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users


Author Comment

ID: 20405193
Windows 2000, 2003
LVL 86

Expert Comment

ID: 20405968
Get a Windows port of textutils from You can then do

cat orig.csv | sort | uniq >uniq.csv

Doubt if you'll get much more efficient than that

Expert Comment

ID: 20406102
There is a software called Clippy that does what you want.
LVL 92

Expert Comment

ID: 20406402
you don't need to store the entire csv in memory, just the unique keys.
That will allow you to check each line as you process it
LVL 86

Expert Comment

ID: 20781397

Featured Post

The Ultimate Checklist to Optimize Your Website

Websites are getting bigger and complicated by the day. Video, images, custom fonts are all great for showcasing your product/service. But the price to pay in terms of reduced page load times and ultimately, decreased sales, can lead to some difficult decisions about what to cut.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question