Solved

Remove Duplicate Lines When Parsing CSV

Posted on 2007-12-04
8
992 Views
Last Modified: 2008-02-01
I am using Java to read lines from a text file.  In this case, it is a CSV.  I parse the file and process each line based on my application.  I noticed that sometimes the file contains duplicate lines.  Instead of processing the duplicate line multiple times, I would like to somehow remove the duplicate lines from the CSV and then process each line in the file.  Can someone instruct me on the one of the more efficient ways of doing this?  Thanks
0
Comment
Question by:pcarrollnf
8 Comments
 
LVL 86

Accepted Solution

by:
CEHJ earned 125 total points
ID: 20405022
If it's not too big you can save the file to a Set<String>. That will ensure uniqueness
0
 

Author Comment

by:pcarrollnf
ID: 20405043
That's would be an issue.  These CSV files can become large and may contain thousands of lines.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20405108
What OS are you using?
0
 

Author Comment

by:pcarrollnf
ID: 20405193
Windows 2000, 2003
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 86

Expert Comment

by:CEHJ
ID: 20405968
Get a Windows port of textutils from http://gnuwin32.sourceforge.net. You can then do

cat orig.csv | sort | uniq >uniq.csv

Doubt if you'll get much more efficient than that
0
 
LVL 9

Expert Comment

by:brunoguimaraes
ID: 20406102
There is a software called Clippy that does what you want.

http://www.snapfiles.com/get/clippy.html
0
 
LVL 92

Expert Comment

by:objects
ID: 20406402
you don't need to store the entire csv in memory, just the unique keys.
That will allow you to check each line as you process it
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 20781397
:-)
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now