Link to home
Start Free TrialLog in
Avatar of arifdgr8
arifdgr8Flag for United Kingdom of Great Britain and Northern Ireland

asked on

How to save web content from URL using java

Hi I am new in java and here what I need  

- First connect to a web site like http://thinks.com/daily_crossword.htm

- Second download the web source content and keep only the Java/JavaScript coding

- Third convert Java/JavaScript code to J2ME


I can able to connect and download the web source content in a txt file. But I cant able to remove the html tags and content.


Please send me the solution (code) as soon as possible

Thanks

-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
[my coding in below]

import java.text.*;
import java.net.*;
import java.util.*;
import java.io.*;

public class test
{
      public static void main(String[] args)
      {
           try
                  {
                    URL PageUrl;
                    URLConnection GetConn = null;
                    GetConn = null;
                   
                     PageUrl = new URL("http://thinks.com/daily_crossword.htm");
                     GetConn = PageUrl.openConnection();
                    //establish connection:
                    GetConn.connect();
                    //save privacy policy source file into text file
                    InputStreamReader ReadIn = new InputStreamReader(GetConn.getInputStream());
                    BufferedReader BufData = new BufferedReader(ReadIn);
                    String TextFileName = ("C:/javatest/crossword.txt");
                    FileWriter FWriter = new FileWriter(TextFileName);
                    BufferedWriter BWriter = new BufferedWriter(FWriter);
                    String UrlData = null;
                    while ((UrlData = BufData.readLine()) != null)
                    {
                          BWriter.write(UrlData);
                          BWriter.newLine();
                    }
                    BWriter.close();
              }//end try
              catch(IOException io)
              {
                   System.out.println(io);
              }
      }
}
Avatar of SprudeVI
SprudeVI
Flag of United States of America image

Unfortunately, there are some false assumptions in your question:
* The applet on "http://thinks.com/daily-crossword/" is compiled code and not contained within the HTML page.
* They use a piece of sofatware which is sold commercially at "http://www.crossword-compiler.com/?lang=en". you can either download a demo or buy it there.
* However, you CAN download the datafile from the page. Just look into your HTML source you will find a line similar to:

  <PARAM value="/daily-crossword/puzzles/2008-03/dc1-2008-03-26.bin" name="DATAFILE" />

i.e. for today you have to load the file

  http://thinks.com/daily-crossword/puzzles/2008-03/dc1-2008-03-26.bin (paste into your browsers URL-field)

* Even if you paid the developers of "ccjava" enough to give you the sourcecode, it would be much harder to transfrom a JSE Applet into a JME Midlet that writing the whole thing from scratch on your own: These are *completely* different APIs. It can by no means be done automatically.
* There is however an OpenSource Crossword-Puzzle-Generator that I found: http://tea.ch/en/applet_tutorial.php.
Maybe that would be a good starting point for your research.
Avatar of arifdgr8

ASKER

Hi SprudeVI

Thanks for your comment. I do not need the source code. After downloading the web source code in txt file, I want to keep the <PARAM value="/daily-crossword/puzzles/2008-03/dc1-2008-03-26.bin" name="DATAFILE" /> line and all <script type="text/javascript" language="JavaScript1.2"> &&&&&..</script> tags. And rest of the code and content from txt file I want to remove.

Please can you tell me how can I do this. Code please.

Thanks in advance
ASKER CERTIFIED SOLUTION
Avatar of SprudeVI
SprudeVI
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi SprudeVI

Thank you very much. Your providing code is a great help for me and this is only my test basis not for commercial use.

Here every thing is ok, Except the JavaScript part. I want to keep all JavaScript tags part and inside the code in the JavaScript in the txt file. Like below 

<script type="text/javascript">

var dc_UnitID = 14;
var dc_PublisherID = 22684;
var dc_AdLinkColor = 'blue';
var dc_isBoldActive= 'no';
var dc_adprod='ADL';

</script>

Please can you tell me how can I do this. Code please.

Thanks in advance
Hi SprudeVI

Thanks for your code. Please can you tell me how I keep the whole JavaScript block in my txt file?

<script type="text/javascript">

var dc_UnitID = 14;
var dc_PublisherID = 22684;
var dc_AdLinkColor = 'blue';
var dc_isBoldActive= 'no';
var dc_adprod='ADL';

(May be more lines will be add here. Do not know how many lines exactly inside the tag. Just want to keep everything inside the script tag in the txt file.)

</script>


Please can you tell me how can I do this. Code please.

Thanks in advance
Hi SprudeVI

For your advice, now I am using free guardian website for crossword program. It is legal. Web address is (http://www.guardian.co.uk/crossword/free/interactive).

First, I download the source code. Then I am trying to use your provided code. However, I noticed that crossword clues are not static. It is change every day.

I want to keep from applet start to end <applet> ........................&</applet>. How can i will do this.

Please reply me with coding (urgent please help).

thanks

The code I posted above already takes the changing crossword clues into account. For that, I need the calendar (line 12).
I forgot to add code -
 
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
 
 
public class LoadCrossword {
	private static final String OUTPUT_FILE = "C:/javatest/c1.txt";
	
	public static void main(String[] args) throws Exception {
                PrintWriter writer = new PrintWriter(OUTPUT_FILE); 
		// writer = new PrintWriter(new OutputStreamWriter(System.out));
	        writer.println("<applet archive=\"/external/crossword/java/puzzle.jar\" name=\"puzzle\" code=\"crossword/Crossword.class\" width=\"640\" height=\"342\">");
        	writer.println("<param name=\"pixels\" value=\"21\">");
                writer.println("<param name=\"gridwidth\" value=\"15\">");
                writer.println("<param name=\"gridheight\" value=\"15\">");
               // after then param value is changed. 
                writer.println("</applet>");
		writer.flush();
		writer.close();
	}
}
 

Open in new window

SprudeVI

i don't understand what you mean in your above commnet.
I want to read below applet code from a txt file. Like -

document.writeln('<applet archive="/external/crossword/java/puzzle.jar" name="puzzle" code="crossword/Crossword.class" width="640" height="324">');
document.writeln('<param name="pixels" value="21">');

document.writeln('<param name="across" VALUE ="1,A1,7;5,A10,6;9,C1,8;10,C10,6;12,E4,12;15,G1,10;17,G13,3;19,I1,3;20,I6,10;22,K1,12;26,M1,6;27,M8,8;28,O1,6;29,O9,7;">'); // here VALUE is not static

document.writeln('<param name="down" VALUE  ="1,A1,4;2,A3,4;3,A5,8; 18,H11,8;21,J5,6;">'); // here VALUE is not static

document.writeln('<param name="solutions" VALUE="holywararnoldaasparadaa">'); // here VALUE is not static
      
document.writeln('<param name="clue1" value="194352|1|1|1|A struggle to maintain one\'s faith|4,3|">');
      
document.writeln('<param name="clue2" value="194353|5|1|5|A service attended by ancient head of Rugby|6|">');

document.writeln &&&&&&&..
document.writeln&&&&&&&&&
document.writeln&&&&&&&&&

[this clue is not static, change everytime]


</applet>

My code is:

import java.io.OutputStreamWriter;
import java.io.PrintWriter;
 
public class LoadCrossword {
      private static final String OUTPUT_FILE = "C:/javatest/crossword.txt";
      
      public static void main(String[] args) throws Exception {
                     PrintWriter writer = new PrintWriter(OUTPUT_FILE);
            // writer = new PrintWriter(new OutputStreamWriter(System.out));
                     writer.println("<applet archive=\"/external/crossword/java/puzzle.jar\" name=\"puzzle\"   code=\"crossword/Crossword.class\" width=\"640\" height=\"342\">");
           writer.println("<param name=\"pixels\" value=\"21\">");
            writer.println("<param name=\"gridwidth\" value=\"15\">");
            writer.println("<param name=\"gridheight\" value=\"15\">");
               // from here param value is not static.  
            writer.println("</applet>");
            writer.flush();
            writer.close();
      }
}

How i will do this. Please reply me with coding (urgent please help).

thanks
Hi SprudeVI

Please can you check my above question. Please reply me with coding (urgent please help)