Solved

Sorting of multiple log files

Posted on 2002-04-16
25
181 Views
Last Modified: 2010-03-31
Lets assume the case with two files.

File1.txt
911     Wed Oct 17 12:33:44 2001     2
913     Wed Oct 17 12:44:43 2001     4

File2.txt
912     Wed Oct 17 12:36:44 2001     3
914     Wed Oct 17 12:54:43 2001     5

Now I would first read these two files in buffer and then like to run a sorting algorithm (preferably quicksort or anything faster than that) to sort the two files on basis of timestamp. The sorted information should be stored in the output file

Output.txt
911     Wed Oct 17 12:33:44 2001     2
912     Wed Oct 17 12:36:44 2001     3
913     Wed Oct 17 12:44:43 2001     4
914     Wed Oct 17 12:54:43 2001     5

There can be a case where multiple files are fed to this java class. How should I sort these files .Should I convert the date in some specific format inside the code itself for easier comparison.
0
Comment
Question by:alwayshunk
  • 11
  • 7
  • 3
  • +4
25 Comments
 
LVL 16

Expert Comment

by:heyhey_
Comment Utility
> Should I convert the date in some specific format inside the code itself for easier comparison.

if you use timestamps like
20011017125443
it will be much easier to compare them
0
 
LVL 3

Expert Comment

by:saxaboo
Comment Utility
The main issue here is to find a way not to code the quicksort, although you'll find it in virtually every computer science course ...
Java implements the mergesort algorithm for collections. All you have to do is make sure that the elements in your collection implement the Comparable interface.

The idea is :

public class LogEntry implements Comparable
{
     //getters/settters would be nicer ...
     //anyway you get the idea
     public int mID;
     public Date mDate;
     public int mSeverity;

    public int compareTo(Object pOtherEntry)
    {
         return  mDate.compareTo(pOtherEntry.mDate);
    }
}


Now in your main program :
- read the files and manage to build a java.util.List (using the arrayList class, for instance) containing all your entries. Let's call it theList

Now sort it :
java.util.Collections.sort(theList);

=> your list is sorted ! Isn't oo-programming great ?

Hope this helps,

-S

0
 

Author Comment

by:alwayshunk
Comment Utility
I'll convert the date in thte format you have told. But still I need to sort multiple files and then sort it in ascending order. Any idea?
0
 

Author Comment

by:alwayshunk
Comment Utility
I'll convert the date in thte format you have told. But still I need to sort multiple files and then sort it in ascending order. Any idea?
0
 

Author Comment

by:alwayshunk
Comment Utility
I'll convert the date in thte format you have told. But still I need to sort multiple files and then sort it in ascending order. Any idea?
0
 
LVL 2

Expert Comment

by:CSuvendra
Comment Utility
Use TreeMap. Here is an Example. Pls. check for performance with regards to your requirement. I can post the full code if you are not clear about something.

/* Create TreeMap */
TreeMap tmp = new TreeMap();

/* Add Values */
SimpleDateFormat sdf = new SimpleDateFormat("EEE MMM d hh:mm:ss y");

/* For each file, each Record */
/* s1 is input line from File(s) */
Date dt = sdf.parse(s1.substring(8, 31));
tmp.put(dt, s1);

/* Now write the collection to file */

Iterator i = ((Collection)tmp.values()).iterator();
while(i.hasNext()) // Write to File here
    System.out.println(""+i.next());

/* Here is the output */
911     Wed Oct 17 12:33:44 2001     2
912     Wed Oct 17 12:36:44 2001     3
913     Wed Oct 17 12:44:43 2001     4
914     Wed Oct 17 12:54:43 2001     5
0
 

Author Comment

by:alwayshunk
Comment Utility
Suvendra your solution sounds interesting. I would appreciate if you can pass me the complete code starting from reading the file, putting it in buffer, changing the timestamp format and sorting it. Thanks in advance
0
 

Expert Comment

by:jodear
Comment Utility
> Should I convert the date in some specific format inside the code itself for easier comparison.

You can convert each line of text in your log file into a Date object so that it would be sortable using saxaboo's idea above.

To convert your text to a Date object, substr or StringTokenize the line of text so that you only get the date-time part as follows:

"Wed Oct 17 12:33:44 2001"

then use the Date.parse() method as follows:

Date td = Date.parse("Wed Oct 17 12:33:44 2001")

Its deprecated though, but it still works better than the DateFormat.parse() method.  The date object td can now be inserted/appended to your list.  Do these for each line of text in your log files then sort your list.
0
 
LVL 2

Expert Comment

by:CSuvendra
Comment Utility
/* Syntax: java Test File1.txt File2.txt */

**********************************************
import java.io.*;
import java.util.*;
import java.text.*;

public class Test {
    public static void main(String [] args) {

          TreeMap tmp = new TreeMap();
          BufferedReader f1;
          String s1 = new String();
          SimpleDateFormat sdf = new SimpleDateFormat("EEE MMM d hh:mm:ss y");

          for (int j=0; j<args.length; j++) {
               try {
                    f1 = new BufferedReader(new FileReader(args[j]));
                    while((s1=f1.readLine()) != null) {
                         Date dt = sdf.parse(s1.substring(8, 31));
                         tmp.put(dt, s1);
                    }

                    f1.close();

               } catch (IOException e) {
                    e.printStackTrace();
               } catch (ParseException ex) {
                    ex.printStackTrace();
               }
          }

          Iterator i = ((Collection)tmp.values()).iterator();

          while(i.hasNext()) {
               System.out.println(""+i.next());
          }
    }
}
0
 
LVL 2

Expert Comment

by:CSuvendra
Comment Utility
alwayshunk
this question has moved into the 'Answered' zone because 'iodear' has proposed not a comment but an answer. Since the suggestions mentioned by him were already proposed by me and/ or 'saxaboo', I would urge you to Reject the proposed answer.

iodear
Your answer did not seem to add any value to previous posts since you did not give any new ideas. See the 'Tips on Comments and Answers' at the bottom of this page, if you are unsure of the meanings.
0
 
LVL 2

Expert Comment

by:Andrey_Kulik
Comment Utility
Hi
you could merge two log files (all records in any log file are in sorted order). Sorting takes much more time then merging.

Code:

java.io.BufferedReader r1 = new java.io.BufferedReader(new java.io.FileReader("d:/a.txt"));
java.io.BufferedReader r2 = new java.io.BufferedReader(new java.io.FileReader("d:/b.txt"));
java.io.Writer w = new java.io.FileWriter("d:/merged.txt");
try
{
     if (r1.ready() && r2.ready())
     {
          java.text.DateFormat dateParser = new java.text.SimpleDateFormat("EEE MMMM d hh:mm:ss yyyy", java.util.Locale.US);

          // initialization    
          String s1 = r1.readLine() + "\n";
          java.util.Date timeStamp1 = dateParser.parse(s1.substring(s1.indexOf(" "), s1.lastIndexOf(" ")).trim());
         
          String s2 = null;
          java.util.Date timeStamp2 = null;
         
          boolean flag = true;
         
          while (true)
          {
               if (flag)
               {
                    if (!r2.ready())
                         break;
                    s2 = r2.readLine() + "\n";
                    timeStamp2 = dateParser.parse(s2.substring(s2.indexOf(" "), s2.lastIndexOf(" ")).trim());
               } else {
                    if (!r1.ready())
                         break;
                    s1 = r1.readLine() + "\n";
                    timeStamp1 = dateParser.parse(s1.substring(s1.indexOf(" "), s1.lastIndexOf(" ")).trim());
               }
               w.write((flag = timeStamp1.after(timeStamp2)) ? s2 : s1);
          }

          w.write((flag) ? s1 : s2);
     }

     while (r1.ready())
          w.write(r1.readLine() + "\n");
     while (r2.ready())
          w.write(r2.readLine() + "\n");
} finally {
     r1.close();
     r2.close();
     w.close();
}

0
 

Author Comment

by:alwayshunk
Comment Utility
Your answer did not seem to add any value to previous posts since you did not give any new ideas
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Author Comment

by:alwayshunk
Comment Utility
Suvendra it seems to work fine but I have some performance issues. Can I merge  the two files and then try a sort on it. Any idea. Moreover if a case arises in which the timestamp is same in two files, I need to sort it on the last column.
0
 
LVL 2

Expert Comment

by:Andrey_Kulik
Comment Utility
alwayshunk see my previous comment. My code merges two log files into one. Time complexity O(n). Any sorter(for example MapTree) have time complexity O(n*logn)
0
 
LVL 2

Expert Comment

by:Andrey_Kulik
Comment Utility
You could try all implementation on big log files... for best choice :)

Best regards
0
 

Author Comment

by:alwayshunk
Comment Utility
Andrey I am new to JAVA. Can u pass me the code as a class file which I can compile and try out with two input files as parameter.
Thanks
0
 
LVL 9

Expert Comment

by:Venci75
Comment Utility
Are the lines in each log file sorted?
0
 
LVL 2

Accepted Solution

by:
Andrey_Kulik earned 175 total points
Comment Utility
java science.MergeLog logFile1 logFile2 mergedLog

package science;
/**
 * @author: Kulik Andrey
 */
public class MergeLog implements Cloneable {
private MergeLog() {
     super();
}
/**
 * Usage: logFile1 logFile2 mergedLogFile
 */
public static void main(java.lang.String[] args) {
     java.io.BufferedReader r1 = new java.io.BufferedReader(new java.io.FileReader(args[0]));
     java.io.BufferedReader r2 = new java.io.BufferedReader(new java.io.FileReader(args[1]));
     java.io.Writer w = new java.io.FileWriter(args[2]);
     try
     {
          if (r1.ready() && r2.ready())
          {
               java.text.DateFormat dateParser = new java.text.SimpleDateFormat("EEE MMMM d hh:mm:ss yyyy", java.util.Locale.US);

               // initialization    
               String s1 = r1.readLine() + "\n";
               java.util.Date timeStamp1 = dateParser.parse(s1.substring(s1.indexOf(" "), s1.lastIndexOf(" ")).trim());
               
               String s2 = null;
               java.util.Date timeStamp2 = null;
               
               boolean flag = true;
               
               while (true)
               {
                    if (flag)
                    {
                         if (!r2.ready())
                              break;
                         s2 = r2.readLine() + "\n";
                         timeStamp2 = dateParser.parse(s2.substring(s2.indexOf(" "), s2.lastIndexOf(" ")).trim());
                    } else {
                         if (!r1.ready())
                              break;
                         s1 = r1.readLine() + "\n";
                         timeStamp1 = dateParser.parse(s1.substring(s1.indexOf(" "), s1.lastIndexOf(" ")).trim());
                    }
                    // if timeStamps are equals then sort on last column
                    if (timeStamp1.equals(timeStamp2))
                         flag = (Integer.parseInt(s1.substring(s1.lastIndexOf(" ")).trim()) > Integer.parseInt(s2.substring(s2.lastIndexOf(" ")).trim()));
                    else
                         flag = flag = timeStamp1.after(timeStamp2);
                         
                    w.write((flag) ? s2 : s1);
               }

               w.write((flag) ? s1 : s2);
          }

          while (r1.ready())
               w.write(r1.readLine() + "\n");
          while (r2.ready())
               w.write(r2.readLine() + "\n");
     } finally {
          r1.close();
          r2.close();
          w.close();
     }
}
0
 

Author Comment

by:alwayshunk
Comment Utility
Andrey
The sorting happens properly. But the output has some special characters if I open it in notepad. Can u tell me how to remove them.
0
 

Author Comment

by:alwayshunk
Comment Utility
Andrey
The sorting happens properly. But the output has some special characters if I open it in notepad. Can
u tell me how to remove them.
Thanks
0
 
LVL 2

Expert Comment

by:Andrey_Kulik
Comment Utility
What the special characters ? (hex code)
0
 

Author Comment

by:alwayshunk
Comment Utility
If I open it in Notepad the lines are not coming in new lines. They are just seperated by a box like character. If I open it in MS-Word, it works fine. Each new line comes in next line. The problem seems to be because of "\n". It happens in notepad.
0
 
LVL 2

Expert Comment

by:Andrey_Kulik
Comment Utility
OK
I see :)

Please change the source:
1.

try
{
     String separator = System.getProperty("line.separator");
     if (r1.ready() && r2.ready())
....

2.
replace all '"\n"' strings with 'separator' variable

Good luck
0
 

Author Comment

by:alwayshunk
Comment Utility
Andrey

Bingo... I dont have extra points otherwise I have surely given u some.

Thanks a ton.
0
 
LVL 2

Expert Comment

by:Andrey_Kulik
Comment Utility
:) not at all ...
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
pairstar challenge 2 41
countPairs challenge 7 57
Java asynchronous logging 4 31
Java Jpanels and Jframe 8 19
For customizing the look of your lightweight component and making it look opaque like it was made of plastic.  This tip assumes your component to be of rectangular shape and completely opaque.   (CODE)
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
The viewer will learn how to implement Singleton Design Pattern in Java.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now