hi4ppl
asked on
how can I do this in java I have done some code but stuck!
Hi,
this is how the filenames are in a folder...
g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
g_2015_00000000087644_xxxx_108_yyy.DWH
all I want is to read it from a folder inwdows and compare where I bold with preious file if any digit is missing it should print out the file before that ...
see bellow between these two files two digis is missing
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
so I would like to print
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087642_xxxx_105_yyy.DWH
this what I did but stuck...
this is how the filenames are in a folder...
g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
g_2015_00000000087644_xxxx_108_yyy.DWH
all I want is to read it from a folder inwdows and compare where I bold with preious file if any digit is missing it should print out the file before that ...
see bellow between these two files two digis is missing
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
so I would like to print
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087642_xxxx_105_yyy.DWH
this what I did but stuck...
private String path = "D:\\20141217";
public static void main(String[] args) {
GapCheck gp = new GapCheck();
}
public GapCheck() {
File dir = new File(path);
File[] directoryListing = dir.listFiles();
List<Integer> filenames = new ArrayList<>();
//ArrayList<Integer> names = new ArrayList<Integer>(new);
for (File child : directoryListing) {
String filename = child.getName();
Integer sequenceNo = Integer.parseInt(filename.split("_")[4]);
// System.out.println(sequenceNo);
filenames.add(sequenceNo);
}
Collections.sort(filenames);
int previousInSequence = filenames.get(0);
for (Integer currentSequence : filenames) {
if (currentSequence - previousInSequence > 1) {
System.out.println(currentSequence- previousInSequence);
}
}
}
So what output do you get? (You get like all the values are >1) ?
You might want to jiggle things around to this tune :
import java.util.*;
import java.io.*;
class HoleFinder {
public static void main(String[] args) {
String[] sA = {"g_2015_00000000087539_xxxx_101_yyy.DWH","g_2015_00000000087540_xxxx_103_yyy.DWH","g_2015_00000000087643_xxxx_105_yyy.DWH","g_2015_00000000087644_xxxx_108_yyy.DWH"};
List<Integer> filenames = new ArrayList<Integer>();
for (String s : sA) {
String wanted = s.substring(27,30);
Integer sequenceNo = Integer.parseInt(wanted);
//System.out.println(sequenceNo);
filenames.add(sequenceNo);
}
Collections.sort(filenames);
int previousInSequence = filenames.get(0);
for (Integer currentSequence : filenames) {
if (currentSequence > previousInSequence +1) {
for(int p=1;p<currentSequence-previousInSequence;p++){
System.out.println("Missing sequence number here . . . "+(previousInSequence+p));
}
previousInSequence = currentSequence;
}
}
}
}
Oops - that's not pointing at the right part of your data.
This is the nearest I can do for now :
import java.util.*;
import java.io.*;
class HoleFinder {
public static void main(String[] args) {
int noughts;
int arrayPointer = -1;
int p;
int lowMissing = 0;
int highMissing =0;
String[] sA = {"g_2015_00000000087539_xxxx_101_yyy.DWH","g_2015_00000000087540_xxxx_103_yyy.DWH","g_2015_00000000087643_xxxx_105_yyy.DWH","g_2015_00000000087644_xxxx_108_yyy.DWH"};
List<Integer> filenames = new ArrayList<Integer>();
for (String s : sA) {
String wanted = s.substring(7,21);
Integer sequenceNo = Integer.parseInt(wanted);
//System.out.println(sequenceNo);
filenames.add(sequenceNo);
}
Collections.sort(filenames);
int previousInSequence = filenames.get(0);
for (Integer currentSequence : filenames) {
if (currentSequence > previousInSequence +1) {
char[] cA = {'0','0','0','0','0','0','0','0','0','0','0','0','0','0'};
noughts = cA.length-(String.valueOf(currentSequence).length());
String nothings = new String(cA,0,noughts);
lowMissing = previousInSequence+1;
p = currentSequence - previousInSequence;
highMissing = (lowMissing+p);
previousInSequence = highMissing;
//System.out.println("Missing sequence numbers here are : "+(lowMissing+1)+" through "+(highMissing-2));
System.out.println("Missing sequence(s) are : "+(sA[arrayPointer].replace(sA[arrayPointer].substring(7,21),nothings+String.valueOf((lowMissing+1))))+" through "+(sA[arrayPointer+1].replace(sA[arrayPointer+1].substring(7,21),nothings+String.valueOf((highMissing-2)))));
}
arrayPointer++;
}
}
}
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Yeah - I made a meal out of it with my approach, no denying.
ASKER
Hi,
thank you guys for the time... but I want this to be bale to scan the directory and show me those files that are missing in sequence...
@mccarl it only display static record from the files like loop from 11 till 100... I want that part to scan the directory and give me missing files sequence...
thank you guys for the time... but I want this to be bale to scan the directory and show me those files that are missing in sequence...
@mccarl it only display static record from the files like loop from 11 till 100... I want that part to scan the directory and give me missing files sequence...
First, to be sure.
You said:
I think you meant:
I would like to print
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_???_yyy.DWH
g_2015_00000000087543_xxxx_???_yyy.DWH
...
g_2015_00000000087599_xxxx_???_yyy.DWH
g_2015_00000000087600_xxxx_???_yyy.DWH
g_2015_00000000087601_xxxx_???_yyy.DWH
...
g_2015_00000000087641_xxxx_???_yyy.DWH
Right? Since your two files were
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
This
After that, the code starts with the sequence number of the first found file and it stops with checking that of the last found file.
You said:
between these two files two digis is missing
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
so I would like to print
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087642_xxxx_105_yyy.DWH
I think you meant:
I would like to print
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_???_yyy.DWH
g_2015_00000000087543_xxxx_???_yyy.DWH
...
g_2015_00000000087599_xxxx_???_yyy.DWH
g_2015_00000000087600_xxxx_???_yyy.DWH
g_2015_00000000087601_xxxx_???_yyy.DWH
...
g_2015_00000000087641_xxxx_???_yyy.DWH
Right? Since your two files were
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
I want this to be able to scan the directory and show me those files that are missing in sequenceThat's what mccarl's code does.
This
File[] directoryListing = dir.listFiles();
does scan the directory.After that, the code starts with the sequence number of the first found file and it stops with checking that of the last found file.
Ok, yeah, this is all a bit confusing. Can we make this a bit simpler? I think what you are after is essentially some code the finds gaps in sequences and reports on those gaps. Now the point to confirm is... when you find a gap, do you want every number in that gap to be reported or just the start and end sequence of the gap. ie, a simple example, say you had these numbers
1 2 7 8 9
There is obviously a gap here, when reporting this gap do you want to see...
3 4 5 6
OR, do you just want to see the start and end of that gap, ie...
3 - 6
1 2 7 8 9
There is obviously a gap here, when reporting this gap do you want to see...
3 4 5 6
OR, do you just want to see the start and end of that gap, ie...
3 - 6
ASKER
Hi,
thanks mccarl that is what i'm looking for... you simplified very well so the result I look for is:
3 4 5 6
thanks
thanks mccarl that is what i'm looking for... you simplified very well so the result I look for is:
3 4 5 6
thanks
that is what i'm looking forGood to know.
So, your initial example was indeed wrong.
Since, when you would have this files:
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
Then you would like to be printed:
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_???_yyy.DWH
g_2015_00000000087543_xxxx_???_yyy.DWH
...
g_2015_00000000087599_xxxx_???_yyy.DWH
g_2015_00000000087600_xxxx_???_yyy.DWH
g_2015_00000000087601_xxxx_???_yyy.DWH
...
g_2015_00000000087641_xxxx_???_yyy.DWH
Well, as I said before, that's exactly what mccarl's code does.
ASKER
Hi,
okay thanks sorry then my bad... but I didn't undrestand this part:
as when I run the script even though I put the path of the folder in their it always print bellow code to me not scanning the folder, I might doing it in wrong not sure...
okay thanks sorry then my bad... but I didn't undrestand this part:
System.out.printf("g_2015_%014d_xxxx_103_yyy.DWH\r\n", prev);
prev++;
as when I run the script even though I put the path of the folder in their it always print bellow code to me not scanning the folder, I might doing it in wrong not sure...
g_2015_00000000000102_xxxx_103_yyy.DWH
g_2015_00000000000104_xxxx_103_yyy.DWH
g_2015_00000000000105_xxxx_103_yyy.DWH
g_2015_00000000000106_xxxx_103_yyy.DWH
g_2015_00000000000107_xxxx_103_yyy.DWH
g_2015_00000000000108_xxxx_103_yyy.DWH
g_2015_00000000000109_xxxx_103_yyy.DWH
g_2015_00000000000110_xxxx_103_yyy.DWH
g_2015_00000000000111_xxxx_103_yyy.DWH
g_2015_00000000000112_xxxx_103_yyy.DWH
>> I didn't undrestand this part
You can't just use it as it is.
As mccarl said in the comment:
You didn't tell us how you will determine the other parts of the file name for files that are NOT present. (I'm talking about the xxxx, ???? and yyyy in the file names in my comments)
You can't just use it as it is.
As mccarl said in the comment:
// Note that it is unclear exactly how you determine the other parts of the filename, so I'll leave that part up to you!
You didn't tell us how you will determine the other parts of the file name for files that are NOT present. (I'm talking about the xxxx, ???? and yyyy in the file names in my comments)
Ok, so the sequence numbers are doing what you want (I think that's what you are saying, anyway). So now we just have to help you with the rest of the filename.
You need to be able to explain to us what the filenames of the missing files should be, based on the file before and after it. The example that you gave in the initial question is ambiguous in this regard. Of the 2 files that are "in the gap", the first appears to be based on the file BEFORE the gap and the second based on the file AFTER the gap!?
If you fully define what you are after and also tell us if the general format of the filename will remain constant, then we can help further.
You need to be able to explain to us what the filenames of the missing files should be, based on the file before and after it. The example that you gave in the initial question is ambiguous in this regard. Of the 2 files that are "in the gap", the first appears to be based on the file BEFORE the gap and the second based on the file AFTER the gap!?
If you fully define what you are after and also tell us if the general format of the filename will remain constant, then we can help further.
ASKER
Hi...
thanks for the help... well I want the sequence to be before like if the sequence would be 101 and the second is 103 it should display the missing sequence... and the file name and format will always be the same no changes only the sequence will change after each file....
sorry for confusion in ambiguous part..
thanks for the help... well I want the sequence to be before like if the sequence would be 101 and the second is 103 it should display the missing sequence... and the file name and format will always be the same no changes only the sequence will change after each file....
sorry for confusion in ambiguous part..
Maybe give an example.
Given these four files in a directory (your initial set-up):
Given these four files in a directory (your initial set-up):
g_2015_00000000087539_xxxxwhat do you want the outcome to be?_101_yyy.D WH
g_2015_00000000087540_xxxx_103_yyy.D WH
g_2015_00000000087643_xxxx_105_yyy.D WH
g_2015_00000000087644_xxxx_108_yyy.D WH
will always be the same no changes only the sequence will change after each fileOk, so to be 100% clear... in the initial example that you gave, the sequence number is changing but the field between the _xxxx_ and the _yyy_ (the 3 digit number) WON'T change, ie. the example is wrong to have 101, 103, 105, 108, and in reality these numbers are all the same for every file in the folder?
ASKER
Hi,
as given example
g_2015_00000000087539_xxxx _101_yyy.D WH
g_2015_00000000087540_xxxx _103_yyy.D WH
g_2015_00000000087643_xxxx _105_yyy.D WH
g_2015_00000000087644_xxxx _108_yyy.D WH
I will breakdown this
1- g_ = will always be G
2- 2015_ = will be date so according to date it will change
3- 00000000087643_ = this will be the sequence
4- xxxx_ = this will be variable and sometimes static I don't care about it
5- _108 = is number of records in the file
6- _yyy = this will be variable it will change and I don't care about it as
but over all file structure will be the same the _ placement will always be there.... and
my main goal is Number (3) which is the counter....
so the out put I want to be able to see
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_103_yyy.DWH
as after
g_2015_00000000087540_xxxx_103_yyy.DWH
I expect to see
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_103_yyy.DWH
and then
g_2015_00000000087543_xxxx_103_yyy.DWH
regards
as given example
g_2015_00000000087539_xxxx
g_2015_00000000087540_xxxx
g_2015_00000000087643_xxxx
g_2015_00000000087644_xxxx
I will breakdown this
1- g_ = will always be G
2- 2015_ = will be date so according to date it will change
3- 00000000087643_ = this will be the sequence
4- xxxx_ = this will be variable and sometimes static I don't care about it
5- _108 = is number of records in the file
6- _yyy = this will be variable it will change and I don't care about it as
but over all file structure will be the same the _ placement will always be there.... and
my main goal is Number (3) which is the counter....
so the out put I want to be able to see
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_103_yyy.DWH
as after
g_2015_00000000087540_xxxx_103_yyy.DWH
I expect to see
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087542_xxxx_103_yyy.DWH
and then
g_2015_00000000087543_xxxx_103_yyy.DWH
regards
So, for the parts 1, 2, 4, 5 and 6 of the output that the program should give, it's OK to take the parts as if eg. found in the first file?
Well, then get the parts out of the first file found and apply them instead of using:
Well, then get the parts out of the first file found and apply them instead of using:
System.out.printf("g_2015_%014d_xxxx_103_yyy.DWH\r\n", prev);
Your original question contained this statement that is confusing -
>>see bellow between these two files two digis is missing
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH<<
In reality, there are 102 digits missing. Please confirm whether or not you want 2 or 102 filenames printed. If 2, which 2 and why?
>>see bellow between these two files two digis is missing
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH<<
In reality, there are 102 digits missing. Please confirm whether or not you want 2 or 102 filenames printed. If 2, which 2 and why?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi,
where I bold the text is the actual sequence not the end part... thanks
where I bold the text is the actual sequence not the end part... thanks
>> where I bold the text is the actual sequence not the end part.
That's something I already understood. (and I guess the other experts too)
Can you please answer the question I raised in comment ID: 40616621.
And I kindly ask you to give us the complete outcome, line by line.
The above four files are the input. What is the exact output line by line that you expect?
That's something I already understood. (and I guess the other experts too)
Can you please answer the question I raised in comment ID: 40616621.
Given only these four files in a directory (your initial set-up):
g_2015_00000000087539_xxxx_101_yyy.D WH
g_2015_00000000087540_xxxx_103_yyy.D WH
g_2015_00000000087643_xxxx_105_yyy.D WH
g_2015_00000000087644_xxxx_108_yyy.D WH
what do you want the outcome to be?
And I kindly ask you to give us the complete outcome, line by line.
The above four files are the input. What is the exact output line by line that you expect?
ASKER
Hi,
taking this in as exmaple
I want to be able to see the
regards
taking this in as exmaple
g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087643_xxxx_105_yyy.DWH
g_2015_00000000087644_xxxx_108_yyy.DWH
I want to be able to see the
g_2015_00000000087541_xxxx_103_yyy.DWH
g_2015_00000000087642_xxxx_105_yyy.DWH
regards
So, although 100 sequences are missing (87642 - 87541 - 1), you only want TWO of them to be printed out: the first one of the gap (87541) and the last one of the gap (87642)?
In an overview:
00000000087540 - available
00000000087541 - not available and you want this printed out (1st of the sequence gap)
00000000087542 - also not available but you don't want this printed out
...
00000000087641 - also not available but you don't want this printed out
00000000087642 - not available and you want this printed out (last of the sequence gap)
00000000087643 - available
Can you confirm that we understand that correctly?
And so, to construct the complete name of the first missing file name to be printed out, you want the program to take the name of the previous available file ("g_2015_00000000087540_xx xx_103_yyy .DWH") and replace its sequence part with 00000000087541 resulting in
To construct the complete name of the last missing file name to be printed out, you want the program to take the name of the next available file ("g_2015_00000000087643_xx xx_105_yyy .DWH") and replace its sequence with 00000000087642 resulting in
In an overview:
00000000087540 - available
00000000087541 - not available and you want this printed out (1st of the sequence gap)
00000000087542 - also not available but you don't want this printed out
...
00000000087641 - also not available but you don't want this printed out
00000000087642 - not available and you want this printed out (last of the sequence gap)
00000000087643 - available
Can you confirm that we understand that correctly?
And so, to construct the complete name of the first missing file name to be printed out, you want the program to take the name of the previous available file ("g_2015_00000000087540_xx
g_2015_00000000087541_xxxx_103_yyy.DWH
To construct the complete name of the last missing file name to be printed out, you want the program to take the name of the next available file ("g_2015_00000000087643_xx
g_2015_00000000087642_xxxx_105_yyy.DWH
Can you also confirm that we understand that correctly?
ASKER
Hi,
i'm really sorry everyone I did mistake here the file suppose to be like :
I mistakenly put wrong digits all suppose to be start like 875
apologies for confusing
i'm really sorry everyone I did mistake here the file suppose to be like :
g_2015_00000000087539_xxxx_101_yyy.DWH
g_2015_00000000087540_xxxx_103_yyy.DWH
g_2015_00000000087543_xxxx_105_yyy.DWH
g_2015_00000000087544_xxxx_108_yyy.DWH
I mistakenly put wrong digits all suppose to be start like 875
apologies for confusing
>> I mistakenly put wrong digits
Well, that's a pity! I did already pointing this out in my first comment of a week ago...
Assuming that three files were missing, the available files being:
g_2015_00000000087539_xxxx _101_yyy.D WH
g_2015_00000000087540_xxxx _103_yyy.D WH
g_2015_00000000087544_xxxx _105_yyy.D WH
g_2015_00000000087545_xxxx _108_yyy.D WH
Then which output do you want:
g_2015_00000000087541_xxxx _103_yyy.D WH
g_2015_00000000087543_xxxx _105_yyy.D WH
or
g_2015_00000000087541_xxxx _103_yyy.D WH
g_2015_00000000087542_xxxx _???_yyy.D WH
g_2015_00000000087543_xxxx _105_yyy.D WH
In other words, do you want
1) the first missing (first of the gap) and the last missing (last of the gap)
or
2) all missing files (in this case three)
printed out?
Can you also please confirm the second part of my previous comment? Do we understand that correctly?
(It's really very helpful - and it avoids lots of misused time - if you answer the questions experts ask you one by one. Really.)
Well, that's a pity! I did already pointing this out in my first comment of a week ago...
Assuming that three files were missing, the available files being:
g_2015_00000000087539_xxxx
g_2015_00000000087540_xxxx
g_2015_00000000087544_xxxx
g_2015_00000000087545_xxxx
Then which output do you want:
g_2015_00000000087541_xxxx
g_2015_00000000087543_xxxx
or
g_2015_00000000087541_xxxx
g_2015_00000000087542_xxxx
g_2015_00000000087543_xxxx
In other words, do you want
1) the first missing (first of the gap) and the last missing (last of the gap)
or
2) all missing files (in this case three)
printed out?
Can you also please confirm the second part of my previous comment? Do we understand that correctly?
(It's really very helpful - and it avoids lots of misused time - if you answer the questions experts ask you one by one. Really.)
Using zzynx's assumption that the files in the folder were
g_2015_00000000087539_xxxx _101_yyy.D WH
g_2015_00000000087540_xxxx _103_yyy.D WH
g_2015_00000000087544_xxxx _105_yyy.D WH
g_2015_00000000087545_xxxx _108_yyy.D WH
My code (ID: 40621739) would produce the following:
g_2015_00000000087541_xxxx _103_yyy.D WH
g_2015_00000000087542_xxxx _103_yyy.D WH
g_2015_00000000087543_xxxx _103_yyy.D WH
Is that what you want?
I believe mccarl's code will likely produce the same.
g_2015_00000000087539_xxxx
g_2015_00000000087540_xxxx
g_2015_00000000087544_xxxx
g_2015_00000000087545_xxxx
My code (ID: 40621739) would produce the following:
g_2015_00000000087541_xxxx
g_2015_00000000087542_xxxx
g_2015_00000000087543_xxxx
Is that what you want?
I believe mccarl's code will likely produce the same.