• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 274
  • Last Modified:

java program for feed is not working

http://www.experts-exchange.com/Programming/Languages/Java/Q_26961318.htm

I am trying that solution which yang suggested for the link:
 static String [] urls =  {"http://newsrack.in/crawled.feeds/frontline.rss.xml","http://rss.cnn.com/rss/cnn_topstories.rss", "http://viralpatel.net/blogs/feed"};

with the link: "http://newsrack.in/crawled.feeds/frontline.rss.xml
it does not remove the duplicates ? but for other 2 links it removes the duplicates.
so, every 2 seconds if we run the feed everything will be printed 2 times


I am confused , why is that ..I have checked the feeds and they look some and we are writing into the file.
Can you please let me know what is the issue with the pogram?l
0
vkchaitu82
Asked:
vkchaitu82
  • 4
  • 2
2 Solutions
 
for_yanCommented:
What you mean every two seconds - do you check the feed  every 2 seconds?
0
 
for_yanCommented:
maybe you mean every 2 minutes?
0
 
for_yanCommented:
We had initial interval - every 10 minutes
Thread.currentThread().sleep(600000);
- how often do you check it now
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
for_yanCommented:


The reason for the problem with this feed
is that it returns links with additional extra space
before the link.
As we are using StringTokenizer when reading from file
this additional space gets automaically truncated
by tokenizer, so it gets into the file without space,
but next time it gets the links from the feed, it again
contains space and does not match with what is in the file.
I fixed it by adding .trim() to the link, as it comes from the feed.

This is the code (corrections are on line 46,
and, most importantly, on line 61):


import java.io.*;
import java.net.URL;
import java.util.ArrayList;
import java.util.Date;
import java.util.Iterator;
import java.util.StringTokenizer;

import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.XmlReader;

public class Reader {

 static String [] urls =  {"http://rss.cnn.com/rss/cnn_topstories.rss", "http://www.nytimes.com/services/xml/rss/nyt/World.xml",
 "http://www.nytimes.com/services/xml/rss/nyt/US.xml","http://newsrack.in/crawled.feeds/frontline.rss.xml"
};
public static void main(String[] args) throws Exception {                              

//URL url = new URL("http://viralpatel.net/blogs/feed");

  //  URL url = new URL("http://rss.cnn.com/rss/cnn_topstories.rss");
XmlReader reader = null;

try {

    while(true) {

long now = (new java.util.Date()).getTime();

    System.out.println("checking at " + (new Date(now)).toString());    
long week_before = now - (24L*3600L*7L*1000L);


ArrayList list = new ArrayList();

DataInputStream in = new DataInputStream(new FileInputStream("C:\\temp\\test\\visited.txt"));
PrintStream psout = new PrintStream(new FileOutputStream("C:\\temp\\test\\visited1.txt"));

String buff = null;

while((buff=in.readLine()) != null){
StringTokenizer t = new StringTokenizer(buff);
String ttime = t.nextToken();
if(Long.parseLong(ttime) <week_before)continue;
String llink = t.nextToken().trim();
list.add(llink);
psout.println(ttime + " " + llink);

}

          for (int jj=0; jj<urls.length; jj++){
            URL  url = new URL(urls[jj]);
reader = new XmlReader(url);
SyndFeed feed = new SyndFeedInput().build(reader);
//System.out.println("Feed Title: "+ feed.getAuthor());

for (Iterator i = feed.getEntries().iterator(); i.hasNext();) {
SyndEntry entry = (SyndEntry) i.next();
    String title = entry.getTitle();
     String link = entry.getUri().trim();
     if(list.contains(link))continue;
     Date date = entry.getPublishedDate();
// Problem here -->         **     SyndEntry source = item.getSource();
     String description;
     if (entry.getDescription()== null){
       description = "";
     } else {
       description = entry.getDescription().getValue();
     }
     String cleanDescription = description.replaceAll("\\<.*?>","").replaceAll("\\s+", " ");
        System.out.println(title);
System.out.println(link);
    System.out.println(cleanDescription);



//System.out.println(entry.getTitle());
//System.out.println(entry.getContents());
    System.out.println("");
     System.out.println("");
     psout.println(now + " " + link);
}

}

in.close();
psout.close();
File f0 = new File("C:\\temp\\test\\visited.txt");
File f1 = new File("C:\\temp\\test\\visited1.txt");
f0.delete();
f1.renameTo(f0);

Thread.currentThread().sleep(600000);

    }


} finally {
if (reader != null)
reader.close();
}
}
}

Open in new window

0
 
objectsCommented:
you don't actually check for duplicates, an RSS can quite correctly repeat the same url

Have made the changes you need, and cleaned up that code a bit or you

let me know if you have any questions

import java.net.URL;
import java.util.Date;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;

import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.XmlReader;

public class Reader1 {

	static String[] urls = {
	 "http://newsrack.in/crawled.feeds/frontline.rss.xml",
	 "http://rss.cnn.com/rss/cnn_topstories.rss",
	"http://viralpatel.net/blogs/feed" };

	public static void main(String[] args) throws Exception {

		Set<String> uris = new HashSet<String>();
		
		// URL url = new URL("http://viralpatel.net/blogs/feed");

		// URL url = new URL("http://rss.cnn.com/rss/cnn_topstories.rss");
		XmlReader reader = null;

		try {

			while (true) {

				long now = (new java.util.Date()).getTime();

				System.out.println("checking at " + (new Date(now)).toString());

				for (int jj = 0; jj < urls.length; jj++) {
					URL url = new URL(urls[jj]);
					reader = new XmlReader(url);
					SyndFeed feed = new SyndFeedInput().build(reader);

					for (Iterator<?> i = feed.getEntries().iterator(); i.hasNext();) {
						SyndEntry entry = (SyndEntry) i.next();
						String title = entry.getTitle();
						String link = entry.getUri();
						if (!uris.contains(link)) {
							uris.add(link);
							Date date = entry.getPublishedDate();
							String description;
							if (entry.getDescription() == null) {
								description = "";
							} else {
								description = entry.getDescription().getValue();
							}
							String cleanDescription = description.replaceAll(
									"\\<.*?>", "").replaceAll("\\s+", " ");
							System.out.println(date);
							System.out.println(title);
							System.out.println(link);
							System.out.println(cleanDescription);
	
							System.out.println("");
							System.out.println("");
						}
					}

				}
				Thread.sleep(6000);

			}

		} finally {
			if (reader != null)
				reader.close();
		}
	}
}

Open in new window

0
 
objectsCommented:
if you have got url s in a file then you can just use the following to read lines into a list
http://helpdesk.objects.com.au/java/how-do-i-read-a-text-file-line-by-line-into-a-list

nice and simple :)
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now