Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 228
  • Last Modified:

Sorting of multiple log files

Lets assume the case with two files.

File1.txt
911     Wed Oct 17 12:33:44 2001     2
913     Wed Oct 17 12:44:43 2001     4

File2.txt
912     Wed Oct 17 12:36:44 2001     3
914     Wed Oct 17 12:54:43 2001     5

Now I would first read these two files in buffer and then like to run a sorting algorithm (preferably quicksort or anything faster than that) to sort the two files on basis of timestamp. The sorted information should be stored in the output file

Output.txt
911     Wed Oct 17 12:33:44 2001     2
912     Wed Oct 17 12:36:44 2001     3
913     Wed Oct 17 12:44:43 2001     4
914     Wed Oct 17 12:54:43 2001     5

There can be a case where multiple files are fed to this java class. How should I sort these files .Should I convert the date in some specific format inside the code itself for easier comparison.
0
alwayshunk
Asked:
alwayshunk
  • 11
  • 7
  • 3
  • +4
1 Solution
 
heyhey_Commented:
> Should I convert the date in some specific format inside the code itself for easier comparison.

if you use timestamps like
20011017125443
it will be much easier to compare them
0
 
saxabooCommented:
The main issue here is to find a way not to code the quicksort, although you'll find it in virtually every computer science course ...
Java implements the mergesort algorithm for collections. All you have to do is make sure that the elements in your collection implement the Comparable interface.

The idea is :

public class LogEntry implements Comparable
{
     //getters/settters would be nicer ...
     //anyway you get the idea
     public int mID;
     public Date mDate;
     public int mSeverity;

    public int compareTo(Object pOtherEntry)
    {
         return  mDate.compareTo(pOtherEntry.mDate);
    }
}


Now in your main program :
- read the files and manage to build a java.util.List (using the arrayList class, for instance) containing all your entries. Let's call it theList

Now sort it :
java.util.Collections.sort(theList);

=> your list is sorted ! Isn't oo-programming great ?

Hope this helps,

-S

0
 
alwayshunkAuthor Commented:
I'll convert the date in thte format you have told. But still I need to sort multiple files and then sort it in ascending order. Any idea?
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
alwayshunkAuthor Commented:
I'll convert the date in thte format you have told. But still I need to sort multiple files and then sort it in ascending order. Any idea?
0
 
alwayshunkAuthor Commented:
I'll convert the date in thte format you have told. But still I need to sort multiple files and then sort it in ascending order. Any idea?
0
 
CSuvendraCommented:
Use TreeMap. Here is an Example. Pls. check for performance with regards to your requirement. I can post the full code if you are not clear about something.

/* Create TreeMap */
TreeMap tmp = new TreeMap();

/* Add Values */
SimpleDateFormat sdf = new SimpleDateFormat("EEE MMM d hh:mm:ss y");

/* For each file, each Record */
/* s1 is input line from File(s) */
Date dt = sdf.parse(s1.substring(8, 31));
tmp.put(dt, s1);

/* Now write the collection to file */

Iterator i = ((Collection)tmp.values()).iterator();
while(i.hasNext()) // Write to File here
    System.out.println(""+i.next());

/* Here is the output */
911     Wed Oct 17 12:33:44 2001     2
912     Wed Oct 17 12:36:44 2001     3
913     Wed Oct 17 12:44:43 2001     4
914     Wed Oct 17 12:54:43 2001     5
0
 
alwayshunkAuthor Commented:
Suvendra your solution sounds interesting. I would appreciate if you can pass me the complete code starting from reading the file, putting it in buffer, changing the timestamp format and sorting it. Thanks in advance
0
 
jodearCommented:
> Should I convert the date in some specific format inside the code itself for easier comparison.

You can convert each line of text in your log file into a Date object so that it would be sortable using saxaboo's idea above.

To convert your text to a Date object, substr or StringTokenize the line of text so that you only get the date-time part as follows:

"Wed Oct 17 12:33:44 2001"

then use the Date.parse() method as follows:

Date td = Date.parse("Wed Oct 17 12:33:44 2001")

Its deprecated though, but it still works better than the DateFormat.parse() method.  The date object td can now be inserted/appended to your list.  Do these for each line of text in your log files then sort your list.
0
 
CSuvendraCommented:
/* Syntax: java Test File1.txt File2.txt */

**********************************************
import java.io.*;
import java.util.*;
import java.text.*;

public class Test {
    public static void main(String [] args) {

          TreeMap tmp = new TreeMap();
          BufferedReader f1;
          String s1 = new String();
          SimpleDateFormat sdf = new SimpleDateFormat("EEE MMM d hh:mm:ss y");

          for (int j=0; j<args.length; j++) {
               try {
                    f1 = new BufferedReader(new FileReader(args[j]));
                    while((s1=f1.readLine()) != null) {
                         Date dt = sdf.parse(s1.substring(8, 31));
                         tmp.put(dt, s1);
                    }

                    f1.close();

               } catch (IOException e) {
                    e.printStackTrace();
               } catch (ParseException ex) {
                    ex.printStackTrace();
               }
          }

          Iterator i = ((Collection)tmp.values()).iterator();

          while(i.hasNext()) {
               System.out.println(""+i.next());
          }
    }
}
0
 
CSuvendraCommented:
alwayshunk
this question has moved into the 'Answered' zone because 'iodear' has proposed not a comment but an answer. Since the suggestions mentioned by him were already proposed by me and/ or 'saxaboo', I would urge you to Reject the proposed answer.

iodear
Your answer did not seem to add any value to previous posts since you did not give any new ideas. See the 'Tips on Comments and Answers' at the bottom of this page, if you are unsure of the meanings.
0
 
Andrey_KulikCommented:
Hi
you could merge two log files (all records in any log file are in sorted order). Sorting takes much more time then merging.

Code:

java.io.BufferedReader r1 = new java.io.BufferedReader(new java.io.FileReader("d:/a.txt"));
java.io.BufferedReader r2 = new java.io.BufferedReader(new java.io.FileReader("d:/b.txt"));
java.io.Writer w = new java.io.FileWriter("d:/merged.txt");
try
{
     if (r1.ready() && r2.ready())
     {
          java.text.DateFormat dateParser = new java.text.SimpleDateFormat("EEE MMMM d hh:mm:ss yyyy", java.util.Locale.US);

          // initialization    
          String s1 = r1.readLine() + "\n";
          java.util.Date timeStamp1 = dateParser.parse(s1.substring(s1.indexOf(" "), s1.lastIndexOf(" ")).trim());
         
          String s2 = null;
          java.util.Date timeStamp2 = null;
         
          boolean flag = true;
         
          while (true)
          {
               if (flag)
               {
                    if (!r2.ready())
                         break;
                    s2 = r2.readLine() + "\n";
                    timeStamp2 = dateParser.parse(s2.substring(s2.indexOf(" "), s2.lastIndexOf(" ")).trim());
               } else {
                    if (!r1.ready())
                         break;
                    s1 = r1.readLine() + "\n";
                    timeStamp1 = dateParser.parse(s1.substring(s1.indexOf(" "), s1.lastIndexOf(" ")).trim());
               }
               w.write((flag = timeStamp1.after(timeStamp2)) ? s2 : s1);
          }

          w.write((flag) ? s1 : s2);
     }

     while (r1.ready())
          w.write(r1.readLine() + "\n");
     while (r2.ready())
          w.write(r2.readLine() + "\n");
} finally {
     r1.close();
     r2.close();
     w.close();
}

0
 
alwayshunkAuthor Commented:
Your answer did not seem to add any value to previous posts since you did not give any new ideas
0
 
alwayshunkAuthor Commented:
Suvendra it seems to work fine but I have some performance issues. Can I merge  the two files and then try a sort on it. Any idea. Moreover if a case arises in which the timestamp is same in two files, I need to sort it on the last column.
0
 
Andrey_KulikCommented:
alwayshunk see my previous comment. My code merges two log files into one. Time complexity O(n). Any sorter(for example MapTree) have time complexity O(n*logn)
0
 
Andrey_KulikCommented:
You could try all implementation on big log files... for best choice :)

Best regards
0
 
alwayshunkAuthor Commented:
Andrey I am new to JAVA. Can u pass me the code as a class file which I can compile and try out with two input files as parameter.
Thanks
0
 
Venci75Commented:
Are the lines in each log file sorted?
0
 
Andrey_KulikCommented:
java science.MergeLog logFile1 logFile2 mergedLog

package science;
/**
 * @author: Kulik Andrey
 */
public class MergeLog implements Cloneable {
private MergeLog() {
     super();
}
/**
 * Usage: logFile1 logFile2 mergedLogFile
 */
public static void main(java.lang.String[] args) {
     java.io.BufferedReader r1 = new java.io.BufferedReader(new java.io.FileReader(args[0]));
     java.io.BufferedReader r2 = new java.io.BufferedReader(new java.io.FileReader(args[1]));
     java.io.Writer w = new java.io.FileWriter(args[2]);
     try
     {
          if (r1.ready() && r2.ready())
          {
               java.text.DateFormat dateParser = new java.text.SimpleDateFormat("EEE MMMM d hh:mm:ss yyyy", java.util.Locale.US);

               // initialization    
               String s1 = r1.readLine() + "\n";
               java.util.Date timeStamp1 = dateParser.parse(s1.substring(s1.indexOf(" "), s1.lastIndexOf(" ")).trim());
               
               String s2 = null;
               java.util.Date timeStamp2 = null;
               
               boolean flag = true;
               
               while (true)
               {
                    if (flag)
                    {
                         if (!r2.ready())
                              break;
                         s2 = r2.readLine() + "\n";
                         timeStamp2 = dateParser.parse(s2.substring(s2.indexOf(" "), s2.lastIndexOf(" ")).trim());
                    } else {
                         if (!r1.ready())
                              break;
                         s1 = r1.readLine() + "\n";
                         timeStamp1 = dateParser.parse(s1.substring(s1.indexOf(" "), s1.lastIndexOf(" ")).trim());
                    }
                    // if timeStamps are equals then sort on last column
                    if (timeStamp1.equals(timeStamp2))
                         flag = (Integer.parseInt(s1.substring(s1.lastIndexOf(" ")).trim()) > Integer.parseInt(s2.substring(s2.lastIndexOf(" ")).trim()));
                    else
                         flag = flag = timeStamp1.after(timeStamp2);
                         
                    w.write((flag) ? s2 : s1);
               }

               w.write((flag) ? s1 : s2);
          }

          while (r1.ready())
               w.write(r1.readLine() + "\n");
          while (r2.ready())
               w.write(r2.readLine() + "\n");
     } finally {
          r1.close();
          r2.close();
          w.close();
     }
}
0
 
alwayshunkAuthor Commented:
Andrey
The sorting happens properly. But the output has some special characters if I open it in notepad. Can u tell me how to remove them.
0
 
alwayshunkAuthor Commented:
Andrey
The sorting happens properly. But the output has some special characters if I open it in notepad. Can
u tell me how to remove them.
Thanks
0
 
Andrey_KulikCommented:
What the special characters ? (hex code)
0
 
alwayshunkAuthor Commented:
If I open it in Notepad the lines are not coming in new lines. They are just seperated by a box like character. If I open it in MS-Word, it works fine. Each new line comes in next line. The problem seems to be because of "\n". It happens in notepad.
0
 
Andrey_KulikCommented:
OK
I see :)

Please change the source:
1.

try
{
     String separator = System.getProperty("line.separator");
     if (r1.ready() && r2.ready())
....

2.
replace all '"\n"' strings with 'separator' variable

Good luck
0
 
alwayshunkAuthor Commented:
Andrey

Bingo... I dont have extra points otherwise I have surely given u some.

Thanks a ton.
0
 
Andrey_KulikCommented:
:) not at all ...
0

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

  • 11
  • 7
  • 3
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now