Solved

how to read the file and display the values with out duplicates

Posted on 2015-01-31
32
101 Views
Last Modified: 2015-02-07
Hi Experts,

I have a file contains values separated by comma
I have to read the file display the lines with out duplicates
If u see the line no 1 and 4 are the same. so i have to remove the duplicate and display all the line
how to do can some suggest me

My text file contains these values:
172.38,185.62,tcp,u16
145.23,125.89,pcp,o16
143.75,178.98,tcp,p16
172.38,185.62,tcp,u16

Thanks,
filereader.txt
0
Comment
Question by:srikotesh
  • 11
  • 10
  • 7
  • +1
32 Comments
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Read the file into a LinkedHashSet<String>
0
 
LVL 16

Expert Comment

by:krakatoa
Comment Utility
Smthg like this

import java.util.*;

class DupeFree {

static HashSet<String> hs = new HashSet<String>();
static Iterator it;

public static void main(String[] args){

 hs.add("172.38,185.62,tcp,u16");
 hs.add("145.23,125.89,pcp,o16");
 hs.add("143.75,178.98,tcp,p16");
 hs.add("172.38,185.62,tcp,u16");

 it = hs.iterator();
 
 while(it.hasNext()){System.out.println(it.next());}
 

}


}

Open in new window

0
 
LVL 1

Author Comment

by:srikotesh
Comment Utility
HI,
I got the output with the below program.
i can able remove duplicates using hashset.
i have one more problem like
172.38,185.62,tcp,u16
145.23,125.89,pcp,o16
143.75,178.98,tcp,p16
172.38,185.62,tcp,z16

if u see the above lines line no 1 and 4 are equal upto tcp.
now i dont want these kind of duplicates.
what to do in this scenario?


package org.com;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
import java.util.Set;

public class FileReaderExample {

	/**
	 * @param args
	 * @throws IOException 
	 */
	//G:\JAVA\file1
	public void removeDuplicatesFromFile(String filename) throws IOException {
	    BufferedReader reader = new BufferedReader(new FileReader(filename));
	    Set<String> lines = new HashSet<String>(10000); // maybe should be bigger
	    String line;
	    while ((line = reader.readLine()) != null) {
	        lines.add(line);
	        
	    }
	    for(String lineValue :lines){
	    	System.out.println("reading each line"+lineValue);
	    }
	    
	    reader.close();
	}
	public static void main(String[] args) throws IOException {
		// TODO Auto-generated method stub
		FileReaderExample fre= new FileReaderExample();	
		String txtFile = "G:\\JAVA\\file1.txt";
		System.out.println("filename is"+txtFile);
		fre.removeDuplicatesFromFile(txtFile);
	}

}

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
You'd be far better off writing a Java object that encapsulates the data. You can then implement equals() such that it ignores the last field. Your Set will then work as you expect it to if you also override hashCode as well as equals
0
 
LVL 37

Expert Comment

by:zzynx
Comment Utility
>> what to do in this scenario?
Then you could use a Map instead of a Set and use the part that should be unique as the key and the whole line as the value (in case you need it):

    public void removeDuplicatesFromFile(String filename) throws IOException {
        BufferedReader reader = new BufferedReader(new FileReader(filename));
        Map<String, String> lines = new HashMap<>();
        String line;
        while ((line = reader.readLine()) != null) {
            String key = line.substring(0, line.lastIndexOf(","));
            lines.put(key, line);
        }
        for(String key : lines.keySet()){
            System.out.println("reading each line: " + key);
        }
        reader.close();
    }

Open in new window


If you don't need the full line, you can keep using the Set, but only store the (to be unique) part of the line:

    public void removeDuplicatesFromFile(String filename) throws IOException {
        BufferedReader reader = new BufferedReader(new FileReader(filename));
        Set<String> lines = new HashSet<>();
        String line;
        while ((line = reader.readLine()) != null) {
            String part = line.substring(0, line.lastIndexOf(","));
            lines.add(part);

        }
        for(String lineValue : lines){
            System.out.println("reading each line: " + lineValue);
        }

        reader.close();
    }

Open in new window

0
 
LVL 16

Expert Comment

by:krakatoa
Comment Utility
are equal upto tcp

You might want to tell us whether "up to tcp" is an absolute condition, or whether an exclusion would apply if one of the other comma-delimited fields were a duplicate instead, and the 'tcp' one wasn't. Like this :

 172.38,185.62,xcp,u16
 172.38,185.62,ycp,z16

as you've already changed your mind on what constitutes uniqueness from the original question parameters.
0
 
LVL 1

Author Comment

by:srikotesh
Comment Utility
hI CEHJ,

Could you plz provide me code how to implement with equals method .

hi zzynx,

i executed the above code with both map and set but it is printing the output upto tcp only
i need complete line.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Could you plz provide me code how to implement with equals method .
I can do that if you tell me what each field of your csv file line means
0
 
LVL 1

Author Comment

by:srikotesh
Comment Utility
172.38,185.62,tcp,u16
it is like network range from starting(172.38) , ending(185.62), type of protocol(tcp) and last field i am not sure what it is.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Try

public class Data {
    private String startIP;
    private String endIP;
    private String protocol;
    private String X;

    public Data() {
    }

    public Data(String startIP, String endIP, String protocol, String X) {
        this.startIP = startIP;
        this.endIP = endIP;
        this.protocol = protocol;
        this.X = X;
    }

    public String getStartIP() {
        return this.startIP;
    }

    public String getEndIP() {
        return this.endIP;
    }

    public String getProtocol() {
        return this.protocol;
    }

    public String getX() {
        return this.X;
    }

    public void setStartIP(String startIP) {
        this.startIP = startIP;
    }

    public void setEndIP(String endIP) {
        this.endIP = endIP;
    }

    public void setProtocol(String protocol) {
        this.protocol = protocol;
    }

    public void setX(String X) {
        this.X = X;
    }

    @Override
    public boolean equals(Object other) {
        Data that = (Data) other;

        return this.startIP.equals(that.startIP) &&
        this.endIP.equals(that.endIP) && this.protocol.equals(that.protocol);
    }

    @Override
    public int hashCode() {
        return new StringBuilder(startIP).append(endIP).append(protocol)
                                         .toString().hashCode();
    }

    @Override
    public String toString() {
        return String.format("%s=%s,%s=%s,%s=%s,%s=%s", "startIP", startIP,
            "endIP", endIP, "protocol", protocol, "X", X);
    }
}

Open in new window

0
 
LVL 1

Author Comment

by:srikotesh
Comment Utility
how to utilize these data class methods in the above program of file reading.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Create a LinkedHashSet<Data> instead
0
 
LVL 1

Author Comment

by:srikotesh
Comment Utility
hi cehj,
could you please see the below java comments how to change those two steps
           BufferedReader reader = new BufferedReader(new FileReader(filename));
	    Set<Data> lines = new LinkedHashSet<Data>(); // maybe should be bigger
	    String line;
	    //here how can i change string value?
	    while ((line = reader.readLine()) != null) {
	        //how to add object values instead of String value?
	    	lines.add(line);
	        
	    }

Open in new window

0
 
LVL 86

Assisted Solution

by:CEHJ
CEHJ earned 250 total points
Comment Utility
//lines.add(line);
String[] fields = line.split("\\s*,\\s*");
lines.add(new Data(fields[0], fields[1], fields[2]], fields[3]));

Open in new window

0
 
LVL 37

Assisted Solution

by:zzynx
zzynx earned 250 total points
Comment Utility
>> hi zzynx,
>> i executed the above code with both map and set but it is printing the output upto tcp only
>> i need complete line.
I wrote:
Then you could use a Map instead of a Set and use the part that should be unique as the key and the whole line as the value (in case you need it):

Apparently, you need it.
Well, instead of printing out the key, you print out the value:

In the first code part I posted you replace
System.out.println("reading each line: " + key);

Open in new window

by
System.out.println("reading each line: " + lines.get(key));

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
For convenience, you could give the class Data another ctor with String[] as the parameter
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 1

Author Comment

by:srikotesh
Comment Utility
hi cehj/zzyynx,

i got the expected result in both the scenarios.

now lines object having unique values.So i have to write these unique values into another file

I have tried with the below code but this not working for me

                File file = new File("G:\\JAVA\\file2.txt"); 
	        FileOutputStream f = new FileOutputStream(file);  
	        ObjectOutputStream s = new ObjectOutputStream(f);          
	        s.writeObject(lines);

Open in new window


i have attached output file .
Could some one suggest me how to write unique values into a new file
file2.txt
0
 
LVL 37

Expert Comment

by:zzynx
Comment Utility
>> i got the expected result in both the scenarios.
Good!

>> Could some one suggest me how to write unique values into a new file
have a look at these examples
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
s.writeObject(lines);

Open in new window

That's fine (as long as you close 's') but it will produce a binary file.

You could give Data a toCsv method or implement toString such that CSV is output if you want to write to a text file
0
 
LVL 1

Author Comment

by:srikotesh
Comment Utility
hi Cehj,

i have written separate method to write data into a txt file.

public void writeDataIntoTxt(Map<String, String> lines) throws IOException{
		File file = new File("G:\\JAVA\\file2.txt"); 
        FileOutputStream f = new FileOutputStream(file);  
        ObjectOutputStream s = new ObjectOutputStream(f);          
        s.writeObject(lines);
        s.close();
	}

Open in new window


but i got the same output whatever i attached earlier file.
Can you please provide me sample code to implement toString() method here.

Hi zzyynx,

I have tried with the above link but i did not get the expected one.
I guess BufferWriter wont work here.
bw.write(lines)//it is asking change the parameter into int type.
i tried in all the ways from that link.
0
 
LVL 37

Assisted Solution

by:zzynx
zzynx earned 250 total points
Comment Utility
>> i tried in all the ways from that link
I'm afraid you didn't try hard enough.

With this method

    private void writeToFile(Map<String, String> content) {
        Writer writer = null;
        try {
            // Using OutputStreamWriter you don't have to convert the String to byte[]
            writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("C:\\temp\\file2.txt"), "utf-8"));
            for (String key : content.keySet()) {
                String line = content.get(key);
                line += System.getProperty("line.separator");
                writer.write(line);
            }
        } catch (IOException e) {
        } finally {
            if (writer != null) {
                try {
                    writer.close();
                } catch (Exception e) {

                }
            }
        }
    }

Open in new window


you can just add the following line as last in your removeDuplicatesFromFile() method and you're done.
writeToFile(lines);

Open in new window


At my place, the result is a file2.txt in C:\temp with as content:

143.75,178.98,tcp,p16
172.38,185.62,tcp,z16
145.23,125.89,pcp,o16

>> I guess BufferWriter wont work here.
No. It's BufferedWriter

PS. Now that you got two answers for one question, I guess it's time to close this one.
0
 
LVL 1

Author Comment

by:srikotesh
Comment Utility
thanks,i got the output.

This is my mistake only
i failed to write iteration logic for map
these two lines i am not written in an effective way
for (String key : content.keySet()) {
                String line = content.get(key);

here i strucked

  for (String line : lines) {
}
when i tried like this i got the below error,so i omitted this one.
Can only iterate over an array or an instance of java.lang.Iterable
0
 
LVL 37

Expert Comment

by:zzynx
Comment Utility
>> thanks,i got the output.
Good!
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 250 total points
Comment Utility
i have written separate method to write data into a txt file.
I've already said, that will not produce a text file. It will produce a binary file

I've also told you how to get  a text file.

    private void writeToFile(Set<Data> content, String path) {
        PrintWriter writer = null;

        try {
            writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(path), "utf-8"));
            for (Data d : content) {
                writer.println(d);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (writer != null) {
                try {
                    writer.close();
                } catch (Exception e) {
                }
            }
        }
    }

Open in new window

0
 
LVL 37

Expert Comment

by:zzynx
Comment Utility
>> I've also told you how to get  a text file.
Are you sure about that, CEHJ? Because I didn't and I don't see it. ;-)
0
 
LVL 1

Author Comment

by:srikotesh
Comment Utility
Thanks CEHJ
I got two solutions for one question
Happy learning.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Are you sure about that, CEHJ? Because I didn't and I don't see it. ;-)
You could give Data a toCsv method or implement toString such that CSV is output if you want to write to a text file

(I don't mean the actual file writing code - that's easy to find ;))
0
 
LVL 1

Author Closing Comment

by:srikotesh
Comment Utility
Excellent
0
 
LVL 37

Expert Comment

by:zzynx
Comment Utility
Thanx 4 axxepting
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
:)
0
 
LVL 16

Expert Comment

by:krakatoa
Comment Utility
Although I missed the party, this turned out to be trickier than I thought. But here is code that accepts an arg as the field on which uniqueness depends (or a no-arg if the entire record should be unique). Tests with a million entries into a Vector, acting as a pseudo file.

import java.util.*;

class DupeFree {

static boolean bsalreadyin =false;
static String blacksheep;
static HashSet<String> hs = new HashSet<String>();
static Iterator it;
static String[] sA = {"172.38,185.62,tcp,u16","145.23,125.89,pcp,o16","143.75,178.98,tcp,p16","172.38,185.62,tcp,u16","143.75,178.98,xcp,p16"};
static Vector<String> pseudoFile = new Vector<String>(Arrays.asList(sA));


public static void main(String[] args){

	for(int g=0;g<1000000-sA.length;g++){pseudoFile.add(pseudoFile.get(g));}

	
	blacksheep = args.length >0?args[0].trim():"";
 
	for(int c=0;c<pseudoFile.size();c++){
		
		if(bsalreadyin==true&&((String)pseudoFile.elementAt(c)).indexOf(blacksheep)>-1){continue;}
		hs.add(pseudoFile.elementAt(c));
		if(((String)pseudoFile.elementAt(c)).indexOf(blacksheep)>-1){if(blacksheep.equals("")){bsalreadyin=false;}else{bsalreadyin=true;}}
		
	}
	
	it = hs.iterator();
 
	while(it.hasNext()){System.out.println(it.next());}
 
}

}

Open in new window



Here is the output from the different args provided:

C:\EE_Q_CODE>java DupeFree tcp
143.75,178.98,xcp,p16
145.23,125.89,pcp,o16
172.38,185.62,tcp,u16

C:\EE_Q_CODE>java DupeFree 143.75
143.75,178.98,tcp,p16
145.23,125.89,pcp,o16
172.38,185.62,tcp,u16

C:\EE_Q_CODE>java DupeFree 145.23
143.75,178.98,xcp,p16
143.75,178.98,tcp,p16
145.23,125.89,pcp,o16
172.38,185.62,tcp,u16

C:\EE_Q_CODE>java DupeFree
143.75,178.98,xcp,p16
143.75,178.98,tcp,p16
145.23,125.89,pcp,o16
172.38,185.62,tcp,u16
0
 
LVL 16

Expert Comment

by:krakatoa
Comment Utility
Manufactures 10 million records, puts them in a file, reads in the file to a hashset based on field uniqueness selectability, and writes the resulting records to a new file. Approximate time taken to process 10M records, 20 seconds.

import java.util.*;
import java.io.*;

class DupeFree {

static boolean bsalreadyin =false;
static String blacksheep;
static HashSet<String> hs = new HashSet<String>();
static Iterator it;
static String[] sA = {"172.38,185.62,tcp,u16","145.23,125.89,pcp,o16","143.75,178.98,tcp,p16","172.38,185.62,tcp,u16","143.75,178.98,xcp,p16"};
static Vector<String> pseudoFile = new Vector<String>(Arrays.asList(sA));
static File inFile = new File("C:/EE_Q_CODE/primer.txt");
static File outFile = new File("C:/EE_Q_CODE/outFile.txt");
static BufferedReader br;
static BufferedWriter bw0;
static BufferedWriter bw; 
static String inStr;
static long startTime, endTime;



public static void main(String[] args){

	blacksheep = args.length >0?args[0].trim():"";
	
try{
		bw0 = new BufferedWriter(new FileWriter(inFile));
		br = new BufferedReader(new FileReader(inFile));
		bw = new BufferedWriter(new FileWriter(outFile));
		
	}catch(Exception ex){ex.printStackTrace();}
	
	for(int g=0;g<10000000-sA.length;g++){pseudoFile.add(pseudoFile.get(g));} //make a 10 million record array
	
	try{ //write the array to a file
	for(int g=0;g<pseudoFile.size();g++){bw0.write(pseudoFile.get(g).toCharArray(),0,((String)pseudoFile.get(g)).length());bw0.newLine();}
	bw0.close();
	
	startTime = System.nanoTime();
	
	while(!((inStr=br.readLine())==null)){ //read the file and add the records to a hashset based on <=1 uniqueness criteria
			
		if(bsalreadyin==true&&inStr.indexOf(blacksheep)>-1){continue;}
					
		hs.add(inStr);
					
		if(inStr.indexOf(blacksheep)>-1){if(blacksheep.equals("")){bsalreadyin=false;}else{bsalreadyin=true;}}	
	}
	
	it = hs.iterator();
	String fStr;
	//write the hashset to a new file, displaying the values on screen for each write
	while(it.hasNext()){fStr = (String)it.next();System.out.println(fStr);bw.write(fStr.toCharArray(),0,fStr.length());bw.newLine();}
	bw.close();
}catch(Exception exx){}
	endTime = System.nanoTime();
	System.out.println(((endTime-startTime)/1000000000)+ " seconds to process a file of "+pseudoFile.size()+" lines.");
}

}

Open in new window

0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now