java question

private String newF = "a.gz";
	private String oldF = "b.gz";
	private InputStream fileStream_old;
	InputStream fileStream_new;
	BufferedReader buffered_old;
	BufferedReader buffered_new ;
	 int c = 0;
	 String line_old;
	 String line_new;
	 String encoding = "UTF-8";
	private GZIPInputStream gzipStream_old_tmp;
	private InputStreamReader decoder_old_tmp;
	private GZIPInputStream gzipStream_new_tmp;
	private InputStreamReader decoder_new_tmp;
	private BufferedReader buffered_new_tmp;
	private BufferedReader buffered_old_tmp;
	


	void RunDiff()  {

		try {
			    fileStream_old = new FileInputStream(oldFreebase);
			    InputStream gzipStream_old = null;
				try {
					gzipStream_old = new GZIPInputStream(fileStream_old);
					gzipStream_old_tmp = new GZIPInputStream(fileStream_old);
					
				} catch (Exception e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
			     String encoding = "UTF-8";
			    Reader decoder_old = null;
			    Reader decoder_old_tmp = null;
				try {
					decoder_old = new InputStreamReader(gzipStream_old, encoding);
					decoder_old_tmp = new InputStreamReader(gzipStream_old_tmp, encoding);
				} catch (UnsupportedEncodingException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
			  buffered_old = new BufferedReader(decoder_old);	
			  buffered_old_tmp = new BufferedReader(decoder_old_tmp); 
			    fileStream_new = new FileInputStream(newFreebase);
			    InputStream gzipStream_new = null;
				try {
					gzipStream_new = new GZIPInputStream(fileStream_new);
					gzipStream_new_tmp = new GZIPInputStream(fileStream_new);
					
				} catch (IOException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
		
			    
			    Reader decoder_new = null;
				try {
					decoder_new = new InputStreamReader(gzipStream_new, encoding);
					decoder_new_tmp = new InputStreamReader(gzipStream_new_tmp, encoding);
				} catch (UnsupportedEncodingException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
			     buffered_new = new BufferedReader(decoder_new);
			     buffered_new_tmp =  new BufferedReader(decoder_new_tmp);
			     
			} catch (FileNotFoundException fe){
			    fe.printStackTrace();
			}
		  c = 0;
		
		  /*Loops begin here*/
		 try {
			while ((( line_new = buffered_new.readLine()) != null) && c < 50 && (( line_old = buffered_old.readLine()) != null)) {
				System.out.println("new");
				System.out.println(line_new);
				System.out.println("old");
				System.out.println(line_old);
				String[] split_old = line_old.split("\\s+");
				String[] split_new = line_new.split("\\s+");
				int result_subj_id = split_old[0].compareToIgnoreCase(split_new[0]);
				if (result_subj_id >0) {
					//read all subsequent lines from the new file with same subject ID
					//read the next line
					String line_next_new = buffered_new.readLine();
					
					while (0 == line_next_new.compareToIgnoreCase(line_new)) {
						
					}

				}
				
				c++;
				}
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		 try {
			buffered_new.close();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		
		}
		 
		}
	

	public static void main(String[] args) {
		Diffr fbr = new FreeBaseDiffr("a.gz",b.gz", "UTF-8");
		fbr.RunDiff();
	}
}

Open in new window

Hi

There are two files in gzip format  with lines(3 tab separated strings on each line) in them...


while there are lines left in both files:
read a line from each file
if the id(the first column)  from the old file is greater than the id from new file:
read all subsequent lines from the new file with same id
add this id to the list of "added topics"
add all lines with this subject to the list of "added facts"

I am trying to convert this into java code

but am stuck on
how to convert "read all subsequent lines from the new file with same id" into java code
i know scanners are typically used, but since i have gz files, i am not sure if that world work.
i have attached my code.

Can you help me out
VlearnsAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

VlearnsAuthor Commented:
this is the algorithm i am trying to implement
while there are lines left in both files:
read a line from each file
if the id from the old file is greater than the id from new file:
read all subsequent lines from the new file with ID
add this id to the list of "a topics"
add all lines with this id to the list of "a all"
else if the ID from the old file is less than the  ID from new file:
read all subsequent lines from the old file with same  ID
add this subject to the list of "r  topics"
add all lines with this subject to the list of "r  all"
else if IDs in both files match:
read all subsequent lines from the both file with same ID
diff the lines to figure out which ones have been added/removed
add all added lines to the list of "added a"
add all removed lines to the list of "r all"
0
CEHJCommented:
Always a good idea in these cases to post (attach) example files
0
VlearnsAuthor Commented:
hi

did you mean input files

those gz files contains lines line


aaaaa    bbbbbb   cccccccc
ddddd   eeeeeee  fffffffffff

where the spaces are tabs

i have already put my current code

can you help?
0
Starting with Angular 5

Learn the essential features and functions of the popular JavaScript framework for building mobile, desktop and web applications.

CEHJCommented:
did you mean input files
Yes
0
CEHJCommented:
This doesn't look like a Java question, but looks like an MQL question. I would ask for it to be moved somewhere more apt
0
VlearnsAuthor Commented:
Hi

My question is much more basic

when i am reading gz file using the code i pasted above,
i want to collect all subsequent rows whose first token match the one  just have by scanning forward rows. All rows are grouped by first token and so are next to each other.

how do i perform this iteration in java?

usually a scanner is used to  do this.
not sure how to do the same thing over gz files.
0
CEHJCommented:
The fact that file is zipped is not what's important. What's important is that it's not a text file. You should read and treat the file in the format for which it's designed. You would not try to read a pdf file with a BufferedReader
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
VlearnsAuthor Commented:
hi cehj

not sure i understand.
it is a text file, each line has 3 strings separated by a tab.

if you run the code above it will read and print the files.
what i am trying to do is understand, how we can look at a particular line and then find all subsequent lines that are equal to the first line.

is this not a generic problem?
thanks
0
CEHJCommented:
OK, i'll have another look at this. Can you post a short example file?
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.