Link to home
Start Free TrialLog in
Avatar of hi4ppl
hi4ppl

asked on

how can I do this in java I have done some code but stuck!

Hi,

this is how the filenames are in a folder...

g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
g_2015_00000000087644_xxxx_108_yyy.DWH

all I want is to read it from a folder inwdows and compare where I bold with preious file if any digit is missing it should print out the file before that ...

see bellow between these two files two digis is missing

g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH

so I would like to print

g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087642_xxxx_105_yyy.DWH

this what I did but stuck...


	private String path = "D:\\20141217";

	public static void main(String[] args) {

		GapCheck gp = new GapCheck();

	}

	public GapCheck() {

		File dir = new File(path);
		File[] directoryListing = dir.listFiles();
		List<Integer> filenames = new ArrayList<>();
		//ArrayList<Integer> names = new ArrayList<Integer>(new);
		
		for (File child : directoryListing) {
			String filename = child.getName();
			Integer sequenceNo = Integer.parseInt(filename.split("_")[4]);
		
			// System.out.println(sequenceNo);
			filenames.add(sequenceNo);

			
		}
		Collections.sort(filenames);

		int previousInSequence = filenames.get(0);
		for (Integer currentSequence : filenames) {
			if (currentSequence - previousInSequence > 1) {
				System.out.println(currentSequence- previousInSequence);
				
				
				

			}
		}
	}

Open in new window

Avatar of krakatoa
krakatoa
Flag of United Kingdom of Great Britain and Northern Ireland image

So what output do you get? (You get like all the values are >1) ?
You might want to jiggle things around to this tune :

import java.util.*;
import java.io.*;


class HoleFinder {


	public static void main(String[] args) {
	
	String[] sA = {"g_2015_00000000087539_xxxx_101_yyy.DWH","g_2015_00000000087540_xxxx_103_yyy.DWH","g_2015_00000000087643_xxxx_105_yyy.DWH","g_2015_00000000087644_xxxx_108_yyy.DWH"};

		List<Integer> filenames = new ArrayList<Integer>();
		
		
		for (String s : sA) {
			String wanted = s.substring(27,30);
			Integer sequenceNo = Integer.parseInt(wanted);
		
			//System.out.println(sequenceNo);
			filenames.add(sequenceNo);
			
		}
		Collections.sort(filenames);

		int previousInSequence = filenames.get(0);
		for (Integer currentSequence : filenames) {
			if (currentSequence > previousInSequence +1) {
			
				for(int p=1;p<currentSequence-previousInSequence;p++){
				System.out.println("Missing sequence number here . . . "+(previousInSequence+p));
				}
				previousInSequence = currentSequence;
			}
		}
	}


}

Open in new window

Oops - that's not pointing at the right part of your data.
This is the nearest I can do for now :

import java.util.*;
import java.io.*;


class HoleFinder {


	public static void main(String[] args) {
	
	
	int noughts;
	int arrayPointer = -1;
	int p;
	int lowMissing = 0;
	int highMissing =0;
	
	String[] sA = {"g_2015_00000000087539_xxxx_101_yyy.DWH","g_2015_00000000087540_xxxx_103_yyy.DWH","g_2015_00000000087643_xxxx_105_yyy.DWH","g_2015_00000000087644_xxxx_108_yyy.DWH"};

		List<Integer> filenames = new ArrayList<Integer>();
		
		
		for (String s : sA) {
			String wanted = s.substring(7,21);
			Integer sequenceNo = Integer.parseInt(wanted);
			
			//System.out.println(sequenceNo);
			filenames.add(sequenceNo);
			
		}
		Collections.sort(filenames);
		

		int previousInSequence = filenames.get(0);
		for (Integer currentSequence : filenames) {
		
			 
			if (currentSequence > previousInSequence +1) {
				
				char[] cA = {'0','0','0','0','0','0','0','0','0','0','0','0','0','0'};
				noughts = cA.length-(String.valueOf(currentSequence).length());
				String nothings = new String(cA,0,noughts);
				lowMissing = previousInSequence+1;
				p = currentSequence - previousInSequence;
				highMissing = (lowMissing+p);
				previousInSequence = highMissing;
				//System.out.println("Missing sequence numbers here are : "+(lowMissing+1)+" through "+(highMissing-2));
				System.out.println("Missing sequence(s) are : "+(sA[arrayPointer].replace(sA[arrayPointer].substring(7,21),nothings+String.valueOf((lowMissing+1))))+" through "+(sA[arrayPointer+1].replace(sA[arrayPointer+1].substring(7,21),nothings+String.valueOf((highMissing-2)))));
				
			}
			arrayPointer++;
		}
	}


}

Open in new window

SOLUTION
Avatar of mccarl
mccarl
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Yeah - I made a meal out of it with my approach, no denying.
Avatar of hi4ppl
hi4ppl

ASKER

Hi,

thank you guys for the time... but I want this to be bale to scan the directory and show me those files that are missing in sequence...

@mccarl it only display static record from the files like loop from 11 till 100... I want that part to scan the directory and give me missing files sequence...
First, to be sure.
You said:

between these two files two digis is missing

g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH

so I would like to print

g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087642_xxxx_105_yyy.DWH

I think you meant:

I would like to print
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_???_yyy.DWH
g_2015_00000000087543_xxxx_???_yyy.DWH
...
g_2015_00000000087599_xxxx_???_yyy.DWH
g_2015_00000000087600_xxxx_???_yyy.DWH
g_2015_00000000087601_xxxx_???_yyy.DWH
...
g_2015_00000000087641_xxxx_???_yyy.DWH

Right? Since your two files were
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH

I want this to be able to scan the directory and show me those files that are missing in sequence
That's what mccarl's code does.
This
File[] directoryListing = dir.listFiles();

Open in new window

does scan the directory.
After that, the code starts with the sequence number of the first found file and it stops with checking that of the last found file.
Ok, yeah, this is all a bit confusing. Can we make this a bit simpler? I think what you are after is essentially some code the finds gaps in sequences and reports on those gaps. Now the point to confirm is... when you find a gap, do you want every number in that gap to be reported or just the start and end sequence of the gap. ie, a simple example, say you had these numbers

1   2   7   8   9

There is obviously a gap here, when reporting this gap do you want to see...

3   4   5   6

OR, do you just want to see the start and end of that gap, ie...

3 - 6
Avatar of hi4ppl

ASKER

Hi,

thanks mccarl that is what i'm looking for... you simplified very well so the result I look for is:

3   4   5   6

thanks
that is what i'm looking for
Good to know.
So, your initial example was indeed wrong.
Since, when you would have this files:

g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH

Then you would like to be printed:
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_???_yyy.DWH
g_2015_00000000087543_xxxx_???_yyy.DWH
...
g_2015_00000000087599_xxxx_???_yyy.DWH
g_2015_00000000087600_xxxx_???_yyy.DWH
g_2015_00000000087601_xxxx_???_yyy.DWH
...
g_2015_00000000087641_xxxx_???_yyy.DWH

Well, as I said before, that's exactly what mccarl's code does.
Avatar of hi4ppl

ASKER

Hi,

okay thanks sorry then my bad... but I didn't undrestand this part:
 System.out.printf("g_2015_%014d_xxxx_103_yyy.DWH\r\n", prev);     
		        prev++;

Open in new window


as when I run the script even though I put the path of the folder in their it always print bellow code to me not scanning the folder, I might doing it in wrong not sure...

g_2015_00000000000102_xxxx_103_yyy.DWH
g_2015_00000000000104_xxxx_103_yyy.DWH
g_2015_00000000000105_xxxx_103_yyy.DWH
g_2015_00000000000106_xxxx_103_yyy.DWH
g_2015_00000000000107_xxxx_103_yyy.DWH
g_2015_00000000000108_xxxx_103_yyy.DWH
g_2015_00000000000109_xxxx_103_yyy.DWH
g_2015_00000000000110_xxxx_103_yyy.DWH
g_2015_00000000000111_xxxx_103_yyy.DWH
g_2015_00000000000112_xxxx_103_yyy.DWH

Open in new window

>> I didn't undrestand this part
You can't just use it as it is.
As mccarl said in the comment:
// Note that it is unclear exactly how you determine the other parts of the filename, so I'll leave that part up to you!

You didn't tell us how you will determine the other parts of the file name for files that are NOT present. (I'm talking about the xxxx, ???? and yyyy in the file names in my comments)
Ok, so the sequence numbers are doing what you want (I think that's what you are saying, anyway). So now we just have to help you with the rest of the filename.

You need to be able to explain to us what the filenames of the missing files should be, based on the file before and after it. The example that you gave in the initial question is ambiguous in this regard. Of the 2 files that are "in the gap", the first appears to be based on the file BEFORE the gap and the second based on the file AFTER the gap!?

If you fully define what you are after and also tell us if the general format of the filename will remain constant, then we can help further.
Avatar of hi4ppl

ASKER

Hi...

thanks for the help... well I want the sequence to be before like if the sequence would be 101 and the second is 103 it should display the missing sequence... and the file name and format will always be the same no changes only the sequence will change after each file....

sorry for confusion in ambiguous part..
Maybe give an example.
Given these four files in a directory (your initial set-up):

g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
g_2015_00000000087644_xxxx_108_yyy.DWH
what do you want the outcome to be?
will always be the same no changes only the sequence will change after each file
Ok, so to be 100% clear... in the initial example that you gave, the sequence number is changing but the field between the _xxxx_ and the _yyy_ (the 3 digit number) WON'T change, ie. the example is wrong to have 101, 103, 105, 108, and in reality these numbers are all the same for every file in the folder?
Avatar of hi4ppl

ASKER

Hi,

as given example

g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
g_2015_00000000087644_xxxx_108_yyy.DWH

I will breakdown this

1- g_ = will always be G
2- 2015_ = will be date so according to date it will change
3- 00000000087643_ = this will be the sequence
4- xxxx_ = this will be variable and sometimes static I don't care about it
5- _108 = is number of records in the file
6- _yyy = this will be variable it will change and I don't care about it as

but over all file structure will be the same the _ placement will always be there.... and
my main goal is Number (3) which is the counter....

so the out put I want to be able to see

g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_103_yyy.DWH

as after

g_2015_00000000087540_xxxx_103_yyy.DWH

I expect to see

g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_103_yyy.DWH

and then
g_2015_00000000087543_xxxx_103_yyy.DWH

regards
So, for the parts 1, 2, 4, 5 and 6 of the output that the program should give, it's OK to take the parts as if eg. found in the first file?

Well, then get the parts out of the first file found and apply them instead of using:

System.out.printf("g_2015_%014d_xxxx_103_yyy.DWH\r\n", prev);   

Open in new window

Your original question contained this statement that is confusing -
>>see bellow between these two files two digis is missing
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH<<

In reality, there are 102 digits missing. Please confirm whether or not you want 2 or 102 filenames printed. If 2, which 2 and why?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of hi4ppl

ASKER

Hi,

where I bold the text is the actual sequence not the end part... thanks
>> where I bold the text is the actual sequence not the end part.
That's something I already understood. (and I guess the other experts too)

Can you please answer the question I raised in comment ID: 40616621.

Given only these four files in a directory (your initial set-up):

g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
g_2015_00000000087644_xxxx_108_yyy.DWH

what do you want the outcome to be?

And I kindly ask you to give us the complete outcome, line by line.
The above four files are the input. What is the exact output line by line that you expect?
Avatar of hi4ppl

ASKER

Hi,

taking this in as exmaple

g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
g_2015_00000000087644_xxxx_108_yyy.DWH

Open in new window



I want to be able to see the
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087642_xxxx_105_yyy.DWH

Open in new window



regards
So, although 100 sequences are missing (87642 - 87541 - 1), you only want TWO of them to be printed out: the first one of the gap (87541) and the last one of the gap (87642)?

In an overview:
00000000087540  - available
00000000087541  -  not available and you want this printed out (1st of the sequence gap)
00000000087542  -  also not available but you don't want this printed out
...
00000000087641  - also not available but you don't want this printed out
00000000087642  -  not available and you want this printed out (last of the sequence gap)
00000000087643  -  available

Can you confirm that we understand that correctly?

And so, to construct the complete name of the first missing file name to be printed out, you want the program to take the name of the previous available file ("g_2015_00000000087540_xxxx_103_yyy.DWH") and replace its sequence part with 00000000087541 resulting in
g_2015_00000000087541_xxxx_103_yyy.DWH

Open in new window


To construct the complete name of the last missing file name to be printed out, you want the program to take the name of the next available file ("g_2015_00000000087643_xxxx_105_yyy.DWH") and replace its sequence with 00000000087642 resulting in
g_2015_00000000087642_xxxx_105_yyy.DWH

Open in new window

Can you also confirm that we understand that correctly?
Avatar of hi4ppl

ASKER

Hi,

i'm really sorry everyone I did mistake here the file suppose to be like :

g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087543_xxxx_105_yyy.DWH
g_2015_00000000087544_xxxx_108_yyy.DWH

Open in new window


I mistakenly put wrong digits all suppose to be start like 875

apologies for confusing
>> I mistakenly put wrong digits
Well, that's a pity! I did already pointing this out in my first comment of a week ago...

Assuming that three files were missing, the available files being:
g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087544_xxxx_105_yyy.DWH
g_2015_00000000087545_xxxx_108_yyy.DWH

Then which output do you want:

g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087543_xxxx_105_yyy.DWH

or

g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_???_yyy.DWH
g_2015_00000000087543_xxxx_105_yyy.DWH

In other words, do you want
1) the first missing (first of the gap) and the last missing (last of the gap)
or
2) all missing files (in this case three)
printed out?

Can you also please confirm the second part of my previous comment? Do we understand that correctly?
(It's really very helpful - and it avoids lots of misused time - if you answer the questions experts ask you one by one. Really.)
Using zzynx's assumption that the files in the folder were

g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087544_xxxx_105_yyy.DWH
g_2015_00000000087545_xxxx_108_yyy.DWH

My code (ID: 40621739) would produce the following:

g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_103_yyy.DWH
g_2015_00000000087543_xxxx_103_yyy.DWH

Is that what you want?
I believe mccarl's code will likely produce the same.