Link to home
Start Free TrialLog in
Avatar of sdc248
sdc248Flag for United States of America

asked on

How to download files from a web site knowing only part of the filename

Hi:

I am coding a Java program that download files from a web page daily. The problem is that the file names change from day to day, although it does have a pattern, for example:

today's date + some digits (probably 6 but not sure) + ".csv"

How am I going to form a valid URL under this situation?

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of TimYates
TimYates
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Spider a page that contains links to the file(s)

Spiders:  http://tinyurl.com/b57ns
Avatar of sdc248

ASKER

Hi:

If I can know for sure that the number of digits is 6 and it actually represent the time (hhmmss) the file was modified, which we have no clue of when, would there be a more efficient way to search?
Check conn.getLastModified()
i.e. see how close that is to the written file name as a time
Yeah, I reckon scan backwards from the current time (then loop round if nothing found)

Might be quicker...
On the basis that the file may have been modified prior to, but around the same time as the file name stamp, you could try it this way, but it would be safer to scan from the beginning to the end of the day:


            long fileDateAsLong = conn.getLastModified();
            Calendar fileTimestamp = Calendar.getInstance();
            fileTimestamp.setTime(new Date(fileDateAsLong));
            
            
            Calendar midnightToday = Calendar.getInstance();
            midnightToday.set(Calendar.HOUR_OF_DAY, 0);
            midnightToday.set(Calendar.MINUTE, 0);
            midnightToday.set(Calendar.SECOND, 0);
            midnightToday.add(Calendar.DATE, 1);

            
            System.out.println(fileTimestamp.getTime());
            System.out.println(midnightToday.getTime());

            DateFormat df = new SimpleDateFormat("HHmmss");
            while(fileTimestamp.getTime().before(midnightToday.getTime())) {
                  System.out.println(df.format(fileTimestamp.getTime()));
                  fileTimestamp.add(Calendar.SECOND, 1);
            }
Avatar of sdc248

ASKER

Hi CEHJ:

I recently had a chance to learn what spider is and it turned out to be a much more efficient way to do the job.

As you recommended I spider the page that publishes the file, search for the String that starts with the partial file pathname I have and I got the complete file name and thus url.

Sorry I am not able to give you points now. But I really want you to know how much I appreciate your help.

Best,
Denise
>>Sorry I am not able to give you points now.

Well you could if you wanted in fact, or split them. This can be arranged ;-)