?
Solved

Remove Duplicate Lines When Parsing CSV

Posted on 2007-12-04
8
Medium Priority
?
1,000 Views
Last Modified: 2008-02-01
I am using Java to read lines from a text file.  In this case, it is a CSV.  I parse the file and process each line based on my application.  I noticed that sometimes the file contains duplicate lines.  Instead of processing the duplicate line multiple times, I would like to somehow remove the duplicate lines from the CSV and then process each line in the file.  Can someone instruct me on the one of the more efficient ways of doing this?  Thanks
0
Comment
Question by:pcarrollnf
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 20405022
If it's not too big you can save the file to a Set<String>. That will ensure uniqueness
0
 

Author Comment

by:pcarrollnf
ID: 20405043
That's would be an issue.  These CSV files can become large and may contain thousands of lines.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20405108
What OS are you using?
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 

Author Comment

by:pcarrollnf
ID: 20405193
Windows 2000, 2003
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20405968
Get a Windows port of textutils from http://gnuwin32.sourceforge.net. You can then do

cat orig.csv | sort | uniq >uniq.csv

Doubt if you'll get much more efficient than that
0
 
LVL 9

Expert Comment

by:brunoguimaraes
ID: 20406102
There is a software called Clippy that does what you want.

http://www.snapfiles.com/get/clippy.html
0
 
LVL 92

Expert Comment

by:objects
ID: 20406402
you don't need to store the entire csv in memory, just the unique keys.
That will allow you to check each line as you process it
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20781397
:-)
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
In this post we will learn different types of Android Layout and some basics of an Android App.
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:
This video teaches viewers about errors in exception handling.
Suggested Courses
Course of the Month10 days, 12 hours left to enroll

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question