sdc248
asked on
How to download files from a web site knowing only part of the filename
Hi:
I am coding a Java program that download files from a web page daily. The problem is that the file names change from day to day, although it does have a pattern, for example:
today's date + some digits (probably 6 but not sure) + ".csv"
How am I going to form a valid URL under this situation?
Thanks.
I am coding a Java program that download files from a web page daily. The problem is that the file names change from day to day, although it does have a pattern, for example:
today's date + some digits (probably 6 but not sure) + ".csv"
How am I going to form a valid URL under this situation?
Thanks.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi:
If I can know for sure that the number of digits is 6 and it actually represent the time (hhmmss) the file was modified, which we have no clue of when, would there be a more efficient way to search?
If I can know for sure that the number of digits is 6 and it actually represent the time (hhmmss) the file was modified, which we have no clue of when, would there be a more efficient way to search?
Check conn.getLastModified()
i.e. see how close that is to the written file name as a time
Yeah, I reckon scan backwards from the current time (then loop round if nothing found)
Might be quicker...
Might be quicker...
On the basis that the file may have been modified prior to, but around the same time as the file name stamp, you could try it this way, but it would be safer to scan from the beginning to the end of the day:
long fileDateAsLong = conn.getLastModified();
Calendar fileTimestamp = Calendar.getInstance();
fileTimestamp.setTime(new Date(fileDateAsLong));
Calendar midnightToday = Calendar.getInstance();
midnightToday.set(Calendar .HOUR_OF_D AY, 0);
midnightToday.set(Calendar .MINUTE, 0);
midnightToday.set(Calendar .SECOND, 0);
midnightToday.add(Calendar .DATE, 1);
System.out.println(fileTim estamp.get Time());
System.out.println(midnigh tToday.get Time());
DateFormat df = new SimpleDateFormat("HHmmss") ;
while(fileTimestamp.getTim e().before (midnightT oday.getTi me())) {
System.out.println(df.form at(fileTim estamp.get Time()));
fileTimestamp.add(Calendar .SECOND, 1);
}
long fileDateAsLong = conn.getLastModified();
Calendar fileTimestamp = Calendar.getInstance();
fileTimestamp.setTime(new Date(fileDateAsLong));
Calendar midnightToday = Calendar.getInstance();
midnightToday.set(Calendar
midnightToday.set(Calendar
midnightToday.set(Calendar
midnightToday.add(Calendar
System.out.println(fileTim
System.out.println(midnigh
DateFormat df = new SimpleDateFormat("HHmmss")
while(fileTimestamp.getTim
System.out.println(df.form
fileTimestamp.add(Calendar
}
ASKER
Hi CEHJ:
I recently had a chance to learn what spider is and it turned out to be a much more efficient way to do the job.
As you recommended I spider the page that publishes the file, search for the String that starts with the partial file pathname I have and I got the complete file name and thus url.
Sorry I am not able to give you points now. But I really want you to know how much I appreciate your help.
Best,
Denise
I recently had a chance to learn what spider is and it turned out to be a much more efficient way to do the job.
As you recommended I spider the page that publishes the file, search for the String that starts with the partial file pathname I have and I got the complete file name and thus url.
Sorry I am not able to give you points now. But I really want you to know how much I appreciate your help.
Best,
Denise
>>Sorry I am not able to give you points now.
Well you could if you wanted in fact, or split them. This can be arranged ;-)
Well you could if you wanted in fact, or split them. This can be arranged ;-)
Spiders: http://tinyurl.com/b57ns