dkim18
asked on
jar file that contains manifest file
I am testing my WebCrawler program that retrieve a web page and the images it contains to local storage, so that I can look at the web page loaded from my local file system.
This program takes command-line application that takes two arguments:
The first argument represents the download directory into which the web page will be downloaded. This argument can be relative to the directory where the java command is executed or an absolute directory. If the directory does not exist, throw a CrawlerException.
The second argument represents the absolute URL of the web page to download. This URL WILL end with .../<filename>.html. The original <filename>.html will be used to save the HTML of the page in the download directory.
Now, if I run like this, everything is OK.
java -classpath c:\classes dkim18.crawler.WebCrawler c:\classes\ http://webdev.apl.jhu.edu/%%7Emed/summer03/homework/05LibrarySwing.html
However, if I make jar file then my progarm doesn't create sub directory where supposed to download all relative images.
This is my manifest file.
Manifest-Version: 1.0
Main-Class: dkim18.crawler.WebCrawler
jar -cvmf myManiFest dkim18.jar dkim18/
(all .classes files are under c:\classes\dkim18\crawler\ )
So, I can run it from any directory location via: java -jar dkim18.jar <download dir> <url>
Anyidea?
This program takes command-line application that takes two arguments:
The first argument represents the download directory into which the web page will be downloaded. This argument can be relative to the directory where the java command is executed or an absolute directory. If the directory does not exist, throw a CrawlerException.
The second argument represents the absolute URL of the web page to download. This URL WILL end with .../<filename>.html. The original <filename>.html will be used to save the HTML of the page in the download directory.
Now, if I run like this, everything is OK.
java -classpath c:\classes dkim18.crawler.WebCrawler c:\classes\ http://webdev.apl.jhu.edu/%%7Emed/summer03/homework/05LibrarySwing.html
However, if I make jar file then my progarm doesn't create sub directory where supposed to download all relative images.
This is my manifest file.
Manifest-Version: 1.0
Main-Class: dkim18.crawler.WebCrawler
jar -cvmf myManiFest dkim18.jar dkim18/
(all .classes files are under c:\classes\dkim18\crawler\
So, I can run it from any directory location via: java -jar dkim18.jar <download dir> <url>
Anyidea?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
O.K
Here is the problem. If this program compile and run in Windows, it works, but doesn't work in sun Solaris system. I changed \ with /, but doen't write anything in html file and download any images in sub dir. I know this is too much to ask, but here is my code.
-----------
package dkim18.crawler;
import java.io.*;
import java.util.*;
import java.lang.*;
import java.net.*;
import java.util.regex.*;
/**
* WebCrawler class is used to retrieve a web page and the
* images it contains to local storage as project description
*
* @author: Daniel Kim
*/
public class WebCrawler {
private URL url; //url to be retrieved
/**
* Initializes url
*
* @param: url
*/
public void setURL(URL url) {
this.url = url;
}
/**
* returns url
*
* @return : url
*/
public URL getURL() {
return url;
}
/**
* Writes contents in html file that was creadted
*
* @param : file, web contents, directory
*
*/
static public void writeContents(File aFile, String aContents, String dir) throws
FileNotFoundException, IOException {
if (aFile == null) {
throw new IllegalArgumentException(" File should not be null.");
}
if (!aFile.exists()) {
throw new FileNotFoundException("Fil e does not exist: " + aFile);
}
if (!aFile.isFile()) {
throw new IllegalArgumentException(" Should not be a directory: " + aFile);
}
if (!aFile.canWrite()) {
throw new IllegalArgumentException(" File cannot be written: " + aFile);
}
Writer output = null;
try {
output = new BufferedWriter(new FileWriter(aFile));
output.write(aContents);
}
finally {
if (output != null)
output.close();
}
}
/**
* Makes sub directory name for storinig images files
*
* @param : file name
*/
public String makeSubDirName(String fileName) {
int splitIndex = fileName.indexOf(".");
String concatName = fileName.substring(0, splitIndex);
String fileN = (concatName + "_html_files");
return fileN;
}
/**
* Creates sub directory
*
* @param : sub dir from html file, sub directory name
*/
public String makeSubDirectory(String subDirName, String subDir) {
String newDir = subDir+subDirName;
boolean success = (new File(newDir)).mkdir();
if (!success) {
//System.out.println("Fail ed");
}
success = (new File(subDir)).mkdirs();
if (!success) {
//System.out.println("Fail ed");
}
return newDir;
}
/**
* Detects images source directory from html file and
* store in array of string
*
* @param : contents of html, sub directory name
* @return : array of string that contains images directory
*/
public String[] getImgSrcDir(String html, String subDirName ) {
final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL ;
final String IMG_PATTERN = "<img\\s+src\\s*=\\s*('|\" )(.*?)('|\ ")";
String imgPatternArr[] = new String[50];
Pattern myPattern = Pattern.compile(IMG_PATTER N, FLAGS);
Matcher myMatcher = myPattern.matcher(html);
int counter = 0;
while (myMatcher.find()) {
String img = myMatcher.group(1);
String imagesTag = myMatcher.group();
int startTag = myMatcher.start();
int startImages = myMatcher.start(1);
int endImages = myMatcher.end(1);
int endTag = myMatcher.end();
imgPatternArr[counter] = imagesTag;
String REGEX_SPACE = "\\s*";
String REGEX_QUOTE = "('|\")";
String REGEX_TAG = "<(.*?)=";
//get rid of spaces if there is any
String REPLACE = "";
Pattern p = Pattern.compile(REGEX_SPAC E);
Matcher m = p.matcher(imgPatternArr[co unter]);
imgPatternArr[counter] = m.replaceAll(REPLACE);
p = Pattern.compile(REGEX_QUOT E);
m = p.matcher(imgPatternArr[co unter]);
imgPatternArr[counter] = m.replaceAll(REPLACE);
p = Pattern.compile(REGEX_TAG) ;
m = p.matcher(imgPatternArr[co unter]);
imgPatternArr[counter] = m.replaceAll(REPLACE);
counter++;
}
String imgArr[] = new String[counter];
for (int i = 0; i < counter; i++) {
imgArr[i] = imgPatternArr[i];
}
return imgArr;
}
/**
* Replaces all relative directories (<a href) in the html
*
* @param : htmlpage, url, html file
* @return : updated html
*/
public static String replaceLinkSrcDir(String htmlWebPage, String url, String htmlFile){
int splitIndex = 0;
String newURL = null;
splitIndex = url.indexOf(htmlFile);
newURL = url.substring(0, splitIndex);
final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL ;
final String FIND_PATTERN = "(<a href=\"*)(([/\\.]*)([^>\"] +))";
final String replace_str = "$1"+ newURL+ "$4";
Pattern myPattern = Pattern.compile(FIND_PATTE RN, FLAGS);
Matcher myMatcher = myPattern.matcher(htmlWebP age);
StringBuffer buffy = new StringBuffer();
while (myMatcher.find()) {
try {
URL uri = new URL(myMatcher.group(4));
}catch(MalformedURLExcepti on e) {
myMatcher.appendReplacemen t(buffy, replace_str);
}
}
myMatcher.appendTail(buffy );
String newHtml=buffy.toString();
return newHtml;
}
/**
* Replaces images patterns in html file
*
* @param : html, sub dir name, number of images tag in the html file
* @return : updated html
*/
public static String patternReplace(String htmlWebPage, String subDirName, String[] counter){
final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL ;
final String REPLACE_PATTERN = "<img\\s+src\\s*=\\s*('|\" )(.*?)/";
String replace_str = "<img src=\"" + subDirName + "/";
Pattern myPattern = Pattern.compile(REPLACE_PA TTERN, FLAGS);
Matcher myMatcher = myPattern.matcher(htmlWebP age);
StringBuffer buffy = new StringBuffer();
for(int i = 0; i < counter.length ; i++){
if (myMatcher.find()) {
myMatcher.appendReplacemen t(buffy, replace_str);
}
}
myMatcher.appendTail(buffy );
String newHtml=buffy.toString();
return newHtml;
}
/**
* Add all path and name to make full images path
*
* @param : url, html file name, string array contains part of tag
* @return : string of array contain updated img tag
*/
public String[] makeFullImgPath(String urlPath, String htmlFileName, String[] imgDir){
int splitIndex = 0;
String path = null;
splitIndex = urlPath.indexOf(htmlFileNa me);
path = urlPath.substring(0, splitIndex);
for(int i =0; i < imgDir.length ; i++){
imgDir[i] = path+imgDir[i];
}
return imgDir;
}
/**
* Down load images from relative directory in html
*
* @param : name of sub dir name, string array of full images path, images dir
*/
public void downloadImg(String subDirPathName, String[] fullImgPath, String[] imgDir) {
String targetDirAndFile = null;
String imgFileName = null;
for (int i = 0; i < fullImgPath.length; i++){
try {
URL url = new URL(fullImgPath[i]);
URLConnection urlConnection = url.openConnection();
InputStream is = urlConnection.getInputStre am();
int splitIndex = 0;
splitIndex = fullImgPath[i].lastIndexOf ("/");
imgFileName = fullImgPath[i].substring(+ +splitInde x);
targetDirAndFile = subDirPathName + "/" + imgFileName;
FileOutputStream file = new FileOutputStream(targetDir AndFile);
int len;
byte[] buf = new byte[256];
while ( (len = is.read(buf)) >= 0) {
file.write(buf, 0, len);
}
is.close();
file.close();
}
catch (EOFException e) {
System.out.println(e);
}
catch (Exception e) {
System.out.println(e.toStr ing());
System.exit(1);
}
}
}
/**
* Checks usage and throws crawler exception
*
* @param : number of command line agrs
*/
private static void checkUsage(String[] args)throws CrawlerException{
if (args.length != 2){
throw new CrawlerException();
}
File dir = new File(args[0]);
if(!(dir.exists())){
throw new CrawlerException();
}
try{
URL url = new URL(args[1]);
}catch(MalformedURLExcepti on mue){
throw new CrawlerException();
}
}
/**
* Gets html page by line
*
* @return : html contents
*/
public String getPage() throws IOException{
final String LINE_SEPARATOR = System.getProperty("line.s eparator") ;
BufferedReader bin = null;
StringBuffer buffy = new StringBuffer("");
try{
bin = new BufferedReader(new InputStreamReader(url.open Stream())) ;
String line = null;
while((line = bin.readLine()) != null){
buffy.append(line);
buffy.append(LINE_SEPARATO R);
}
}finally{
if (bin != null) {
bin.close();
}
}
return buffy.toString() ;
}
/**
* main that drives this progarm
*
* @param : command line args
*/
public static void main(String[] args) throws CrawlerException {
checkUsage(args);
try{
String webPage = null;
String htmlFileName = null;
WebCrawler myReader = new WebCrawler();
myReader.setURL(new URL(args[1]));
webPage = myReader.getPage();
File file = new File(args[1]);
//index.html
htmlFileName = file.getName();
//index_html_files
String subDirName = myReader.makeSubDirName(ht mlFileName );
//make sub dir name
String subDirPathName = myReader.makeSubDirectory( subDirName , args[0]);
//get img dir
String imgDir[] = myReader.getImgSrcDir(webP age, subDirName);
//replace img patterns
String newWebPage = patternReplace(webPage, subDirName, imgDir );
//replace relative link
String newWebPageURL= myReader.replaceLinkSrcDir (newWebPag e, args[1], htmlFileName);
String targetDir = args[0]; //target dir
URL url = new URL(args[1]);
String s = url.getFile();
if (s != null && s.length() > 0) {
s = s.substring(s.lastIndexOf( "/"));
File f = new File(targetDir + s);
f.createNewFile();
writeContents(f, newWebPageURL, args[0]);
//store final img path
String fullImgPath[] = myReader.makeFullImgPath(a rgs[1], htmlFileName, imgDir);
myReader.downloadImg( subDirPathName, fullImgPath, imgDir);
}
}
catch (IOException e) {
System.out.println(e);
}
}
}
Here is the problem. If this program compile and run in Windows, it works, but doesn't work in sun Solaris system. I changed \ with /, but doen't write anything in html file and download any images in sub dir. I know this is too much to ask, but here is my code.
-----------
package dkim18.crawler;
import java.io.*;
import java.util.*;
import java.lang.*;
import java.net.*;
import java.util.regex.*;
/**
* WebCrawler class is used to retrieve a web page and the
* images it contains to local storage as project description
*
* @author: Daniel Kim
*/
public class WebCrawler {
private URL url; //url to be retrieved
/**
* Initializes url
*
* @param: url
*/
public void setURL(URL url) {
this.url = url;
}
/**
* returns url
*
* @return : url
*/
public URL getURL() {
return url;
}
/**
* Writes contents in html file that was creadted
*
* @param : file, web contents, directory
*
*/
static public void writeContents(File aFile, String aContents, String dir) throws
FileNotFoundException, IOException {
if (aFile == null) {
throw new IllegalArgumentException("
}
if (!aFile.exists()) {
throw new FileNotFoundException("Fil
}
if (!aFile.isFile()) {
throw new IllegalArgumentException("
}
if (!aFile.canWrite()) {
throw new IllegalArgumentException("
}
Writer output = null;
try {
output = new BufferedWriter(new FileWriter(aFile));
output.write(aContents);
}
finally {
if (output != null)
output.close();
}
}
/**
* Makes sub directory name for storinig images files
*
* @param : file name
*/
public String makeSubDirName(String fileName) {
int splitIndex = fileName.indexOf(".");
String concatName = fileName.substring(0, splitIndex);
String fileN = (concatName + "_html_files");
return fileN;
}
/**
* Creates sub directory
*
* @param : sub dir from html file, sub directory name
*/
public String makeSubDirectory(String subDirName, String subDir) {
String newDir = subDir+subDirName;
boolean success = (new File(newDir)).mkdir();
if (!success) {
//System.out.println("Fail
}
success = (new File(subDir)).mkdirs();
if (!success) {
//System.out.println("Fail
}
return newDir;
}
/**
* Detects images source directory from html file and
* store in array of string
*
* @param : contents of html, sub directory name
* @return : array of string that contains images directory
*/
public String[] getImgSrcDir(String html, String subDirName ) {
final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL ;
final String IMG_PATTERN = "<img\\s+src\\s*=\\s*('|\"
String imgPatternArr[] = new String[50];
Pattern myPattern = Pattern.compile(IMG_PATTER
Matcher myMatcher = myPattern.matcher(html);
int counter = 0;
while (myMatcher.find()) {
String img = myMatcher.group(1);
String imagesTag = myMatcher.group();
int startTag = myMatcher.start();
int startImages = myMatcher.start(1);
int endImages = myMatcher.end(1);
int endTag = myMatcher.end();
imgPatternArr[counter] = imagesTag;
String REGEX_SPACE = "\\s*";
String REGEX_QUOTE = "('|\")";
String REGEX_TAG = "<(.*?)=";
//get rid of spaces if there is any
String REPLACE = "";
Pattern p = Pattern.compile(REGEX_SPAC
Matcher m = p.matcher(imgPatternArr[co
imgPatternArr[counter] = m.replaceAll(REPLACE);
p = Pattern.compile(REGEX_QUOT
m = p.matcher(imgPatternArr[co
imgPatternArr[counter] = m.replaceAll(REPLACE);
p = Pattern.compile(REGEX_TAG)
m = p.matcher(imgPatternArr[co
imgPatternArr[counter] = m.replaceAll(REPLACE);
counter++;
}
String imgArr[] = new String[counter];
for (int i = 0; i < counter; i++) {
imgArr[i] = imgPatternArr[i];
}
return imgArr;
}
/**
* Replaces all relative directories (<a href) in the html
*
* @param : htmlpage, url, html file
* @return : updated html
*/
public static String replaceLinkSrcDir(String htmlWebPage, String url, String htmlFile){
int splitIndex = 0;
String newURL = null;
splitIndex = url.indexOf(htmlFile);
newURL = url.substring(0, splitIndex);
final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL ;
final String FIND_PATTERN = "(<a href=\"*)(([/\\.]*)([^>\"]
final String replace_str = "$1"+ newURL+ "$4";
Pattern myPattern = Pattern.compile(FIND_PATTE
Matcher myMatcher = myPattern.matcher(htmlWebP
StringBuffer buffy = new StringBuffer();
while (myMatcher.find()) {
try {
URL uri = new URL(myMatcher.group(4));
}catch(MalformedURLExcepti
myMatcher.appendReplacemen
}
}
myMatcher.appendTail(buffy
String newHtml=buffy.toString();
return newHtml;
}
/**
* Replaces images patterns in html file
*
* @param : html, sub dir name, number of images tag in the html file
* @return : updated html
*/
public static String patternReplace(String htmlWebPage, String subDirName, String[] counter){
final int FLAGS = Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL ;
final String REPLACE_PATTERN = "<img\\s+src\\s*=\\s*('|\"
String replace_str = "<img src=\"" + subDirName + "/";
Pattern myPattern = Pattern.compile(REPLACE_PA
Matcher myMatcher = myPattern.matcher(htmlWebP
StringBuffer buffy = new StringBuffer();
for(int i = 0; i < counter.length ; i++){
if (myMatcher.find()) {
myMatcher.appendReplacemen
}
}
myMatcher.appendTail(buffy
String newHtml=buffy.toString();
return newHtml;
}
/**
* Add all path and name to make full images path
*
* @param : url, html file name, string array contains part of tag
* @return : string of array contain updated img tag
*/
public String[] makeFullImgPath(String urlPath, String htmlFileName, String[] imgDir){
int splitIndex = 0;
String path = null;
splitIndex = urlPath.indexOf(htmlFileNa
path = urlPath.substring(0, splitIndex);
for(int i =0; i < imgDir.length ; i++){
imgDir[i] = path+imgDir[i];
}
return imgDir;
}
/**
* Down load images from relative directory in html
*
* @param : name of sub dir name, string array of full images path, images dir
*/
public void downloadImg(String subDirPathName, String[] fullImgPath, String[] imgDir) {
String targetDirAndFile = null;
String imgFileName = null;
for (int i = 0; i < fullImgPath.length; i++){
try {
URL url = new URL(fullImgPath[i]);
URLConnection urlConnection = url.openConnection();
InputStream is = urlConnection.getInputStre
int splitIndex = 0;
splitIndex = fullImgPath[i].lastIndexOf
imgFileName = fullImgPath[i].substring(+
targetDirAndFile = subDirPathName + "/" + imgFileName;
FileOutputStream file = new FileOutputStream(targetDir
int len;
byte[] buf = new byte[256];
while ( (len = is.read(buf)) >= 0) {
file.write(buf, 0, len);
}
is.close();
file.close();
}
catch (EOFException e) {
System.out.println(e);
}
catch (Exception e) {
System.out.println(e.toStr
System.exit(1);
}
}
}
/**
* Checks usage and throws crawler exception
*
* @param : number of command line agrs
*/
private static void checkUsage(String[] args)throws CrawlerException{
if (args.length != 2){
throw new CrawlerException();
}
File dir = new File(args[0]);
if(!(dir.exists())){
throw new CrawlerException();
}
try{
URL url = new URL(args[1]);
}catch(MalformedURLExcepti
throw new CrawlerException();
}
}
/**
* Gets html page by line
*
* @return : html contents
*/
public String getPage() throws IOException{
final String LINE_SEPARATOR = System.getProperty("line.s
BufferedReader bin = null;
StringBuffer buffy = new StringBuffer("");
try{
bin = new BufferedReader(new InputStreamReader(url.open
String line = null;
while((line = bin.readLine()) != null){
buffy.append(line);
buffy.append(LINE_SEPARATO
}
}finally{
if (bin != null) {
bin.close();
}
}
return buffy.toString() ;
}
/**
* main that drives this progarm
*
* @param : command line args
*/
public static void main(String[] args) throws CrawlerException {
checkUsage(args);
try{
String webPage = null;
String htmlFileName = null;
WebCrawler myReader = new WebCrawler();
myReader.setURL(new URL(args[1]));
webPage = myReader.getPage();
File file = new File(args[1]);
//index.html
htmlFileName = file.getName();
//index_html_files
String subDirName = myReader.makeSubDirName(ht
//make sub dir name
String subDirPathName = myReader.makeSubDirectory(
//get img dir
String imgDir[] = myReader.getImgSrcDir(webP
//replace img patterns
String newWebPage = patternReplace(webPage, subDirName, imgDir );
//replace relative link
String newWebPageURL= myReader.replaceLinkSrcDir
String targetDir = args[0]; //target dir
URL url = new URL(args[1]);
String s = url.getFile();
if (s != null && s.length() > 0) {
s = s.substring(s.lastIndexOf(
File f = new File(targetDir + s);
f.createNewFile();
writeContents(f, newWebPageURL, args[0]);
//store final img path
String fullImgPath[] = myReader.makeFullImgPath(a
myReader.downloadImg( subDirPathName, fullImgPath, imgDir);
}
}
catch (IOException e) {
System.out.println(e);
}
}
}
ASKER
If I leave targetDirAndFile = subDirPathName + "\\" + imgFileName; like this in sun solaris system, html file does contains all the string. But still no downloaded images in sub dir.
public void downloadImg(String subDirPathName, String[] fullImgPath, String[] imgDir) {
String targetDirAndFile = null;
String imgFileName = null;
for (int i = 0; i < fullImgPath.length; i++){
try {
URL url = new URL(fullImgPath[i]);
URLConnection urlConnection = url.openConnection();
InputStream is = urlConnection.getInputStre am();
int splitIndex = 0;
splitIndex = fullImgPath[i].lastIndexOf ("/");
imgFileName = fullImgPath[i].substring(+ +splitInde x);
targetDirAndFile = subDirPathName + "\\" + imgFileName;
FileOutputStream file = new FileOutputStream(targetDir AndFile);
int len;
byte[] buf = new byte[256];
while ( (len = is.read(buf)) >= 0) {
file.write(buf, 0, len);
}
is.close();
file.close();
}
catch (EOFException e) {
System.out.println(e);
}
catch (Exception e) {
System.out.println(e.toStr ing());
System.exit(1);
}
}
}
public void downloadImg(String subDirPathName, String[] fullImgPath, String[] imgDir) {
String targetDirAndFile = null;
String imgFileName = null;
for (int i = 0; i < fullImgPath.length; i++){
try {
URL url = new URL(fullImgPath[i]);
URLConnection urlConnection = url.openConnection();
InputStream is = urlConnection.getInputStre
int splitIndex = 0;
splitIndex = fullImgPath[i].lastIndexOf
imgFileName = fullImgPath[i].substring(+
targetDirAndFile = subDirPathName + "\\" + imgFileName;
FileOutputStream file = new FileOutputStream(targetDir
int len;
byte[] buf = new byte[256];
while ( (len = is.read(buf)) >= 0) {
file.write(buf, 0, len);
}
is.close();
file.close();
}
catch (EOFException e) {
System.out.println(e);
}
catch (Exception e) {
System.out.println(e.toStr
System.exit(1);
}
}
}
What exact command are you giving in SOlaris?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
java -classpath classes dkim18.crawler.WebCrawler classes/ http://
webdev.apl.jhu.edu/%7Emed/ summer03/h omework/05 LibrarySwi ng.html
(all in one line)
just doesn't download images in sub dir(classes/05LibrarySwing _html_file )
webdev.apl.jhu.edu/%7Emed/
(all in one line)
just doesn't download images in sub dir(classes/05LibrarySwing
I assume you still have write access to classes directory?
What is the code inside your class that manipulates that directory parameter?
What is the code inside your class that manipulates that directory parameter?
ASKER
I didn't go through all the code you posted, but I saw this line (in the middle of your last post):
targetDirAndFile = subDirPathName + "\\" + imgFileName;
This (and any other relevant lines) should be changed to:
>>again If I leave targetDirAndFile = subDirPathName + "\\" + imgFileName; like this in sun solaris system, html file does contains all the string. But still no downloaded images in sub dir.
>>again, this program works find in Windows OS!!!
targetDirAndFile = subDirPathName + "\\" + imgFileName;
This (and any other relevant lines) should be changed to:
>>again If I leave targetDirAndFile = subDirPathName + "\\" + imgFileName; like this in sun solaris system, html file does contains all the string. But still no downloaded images in sub dir.
>>again, this program works find in Windows OS!!!
Add some debug to determine whether the problem is with the download, or with the file writing.
ASKER
I checked html source and there is this line:
<img src="05LibrarySwing_html_f iles/hw5.g if" alt="Hw5 snapshot">
which means this does changed
<img src="hw5/hw5.gif" alt="Hw5 snapshot"> to
<img src="05LibrarySwing_html_f iles/hw5.g if" alt="Hw5 snapshot">
but just dosn't download images...
<img src="05LibrarySwing_html_f
which means this does changed
<img src="hw5/hw5.gif" alt="Hw5 snapshot"> to
<img src="05LibrarySwing_html_f
but just dosn't download images...
Add some debug to print out the urls of things it is downloading and the file path it is saving it to.
ASKER
If I want to dect (<img...src="..."...>) in html file,
Is this "<img\\s+src=\"(.*?)\".*?> " correct?
Is this "<img\\s+src=\"(.*?)\".*?>
Not quite - for one thing, sometimes urls are not in quotes
ASKER
For this project, the pattern is always like this: <img...src="..."...>
> Is this "<img\\s+src=\"(.*?)\".*?> " correct?
are you accessing the same url as you were when running windoze?
are you accessing the same url as you were when running windoze?
ASKER
are you accessing the same url as you were when running windoze?
>>yes
>>yes
then pattern matching should be the same shouldn't it.
Have you added debug to determine exactly where the problem is occurring (I'm too lazy to wade thru all that code:) )
Have you added debug to determine exactly where the problem is occurring (I'm too lazy to wade thru all that code:) )
ASKER
I am trying...but it is weird...now I have empty html file, when I run it.
This will match with or without quotes:
String re = "<img\\s+src=\"*([^>\"]+)\ "*>";
String re = "<img\\s+src=\"*([^>\"]+)\
> File file = new File(args[1]);
args[1] is a URL, not a file.
args[1] is a URL, not a file.
ASKER
args[1] is a URL, not a file.
>>The next statement will get html file name and I am using this file name.
>>htmlFileName = file.getName();
String re = "<img\\s+src=\"*([^>\"]+)\ "*>";
>>didn;t work windows or solaris
>>The next statement will get html file name and I am using this file name.
>>htmlFileName = file.getName();
String re = "<img\\s+src=\"*([^>\"]+)\
>>didn;t work windows or solaris
print out all these values and post the results:
File file = new File(args[1]);
//index.html
htmlFileName = file.getName();
//index_html_files
String subDirName = myReader.makeSubDirName(ht mlFileName );
//make sub dir name
String subDirPathName = myReader.makeSubDirectory( subDirName , args[0]);
//get img dir
String imgDir[] = myReader.getImgSrcDir(webP age, subDirName);
//replace img patterns
String newWebPage = patternReplace(webPage, subDirName, imgDir );
//replace relative link
String newWebPageURL= myReader.replaceLinkSrcDir (newWebPag e, args[1], htmlFileName);
String targetDir = args[0]; //target dir
URL url = new URL(args[1]);
String s = url.getFile();
if (s != null && s.length() > 0) {
s = s.substring(s.lastIndexOf( "/"));
File f = new File(targetDir + s);
File file = new File(args[1]);
//index.html
htmlFileName = file.getName();
//index_html_files
String subDirName = myReader.makeSubDirName(ht
//make sub dir name
String subDirPathName = myReader.makeSubDirectory(
//get img dir
String imgDir[] = myReader.getImgSrcDir(webP
//replace img patterns
String newWebPage = patternReplace(webPage, subDirName, imgDir );
//replace relative link
String newWebPageURL= myReader.replaceLinkSrcDir
String targetDir = args[0]; //target dir
URL url = new URL(args[1]);
String s = url.getFile();
if (s != null && s.length() > 0) {
s = s.substring(s.lastIndexOf(
File f = new File(targetDir + s);
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
isn't the RE working already in Windoze?
ASKER