• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 373
  • Last Modified:

Help needed to make this collections method much efficient

 private java.util.List[] getURLList(java.util.List list) {
    Hashtable ht = new Hashtable();
    for (int i = 0; i < list.size(); i++) {
      String url = list.get(i).toString();
      String array[] = url.split("/");

      String domain = array[0];
      if (array[0].toLowerCase().startsWith("http")) {
        domain = array[2];
        domain = domain + array[4];
      }
      else {
        domain = domain + array[2];
      }

      ArrayList sublist = (ArrayList) ht.get(domain);
      if (sublist == null) {
        sublist = new ArrayList();
      }
      sublist.add(url);
      ht.put(domain, sublist);
    }
    return (List[]) Collections.list(ht.elements()).toArray(new ArrayList[0]);
  }


Thanks
Koundinya
0
sudhakar_koundinya
Asked:
sudhakar_koundinya
  • 62
  • 39
  • 16
  • +2
12 Solutions
 
CEHJCommented:
Probably lots you could do to optimise this (possibly at the expense of extensibility) but i'd approach it like this:

      private java.util.List[] getUrlList(java.util.List list) {
            Hashtable hostToUrls = new Hashtable();
            try {
                  for (int i = 0; i < list.size(); i++) {
                        URL url = new URL((String) list.get(i));
                        String hostKey = url.getHost();
                        // Could be a Set if no duplicates required
                        ArrayList urlsForHost = (ArrayList)hostToUrls.get(hostKey);
                        if (urlsForHost == null) {
                              urlsForHost = new ArrayList();
                              urlsForHost.add(url);
                              hostToUrls.put(hostKey, urlsForHost);
                        }
                        else {
                              urlsForHost.add(url);
                        }
                  }
            }
            catch (MalformedURLException e) {
                  return null;
            }
            int numberOfLists = hostToUrls.size();
            return (java.util.List[]) Collections.list(hostToUrls.elements()).toArray(new ArrayList[numberOfLists]);
      }
0
 
CEHJCommented:
I used Lists of type URL as they're more versatile, but you may want to use String
0
 
sudhakar_koundinyaAuthor Commented:
suppose if the list contains both http and https based urls then how can we handle them??
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
CEHJCommented:
Not quite sure what you mean. They'd appear together under the same host list
0
 
sudhakar_koundinyaAuthor Commented:
list will be some thing like this


            al.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0101.EML");
          al.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0102.EML");
        al.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0103.EML");
      al.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0104.EML");
        al.add("http://192.12.0.38/exchange/Administrator/Deleted Items/TEST Message 0056.EML");
        al.add("http://192.12.0.38/exchange/Administrator/Deleted Items/TEST Message 0055.EML");
        al.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0105.EML");
        al.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0106.EML");
        al.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0107.EML");
        al.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0108.EML");
        al.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0109.EML");
        al.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0110.EML");
        al.add("http://192.12.0.26/exchange/t1/Inbox/testmailfromAD.EML");
        al.add("http://192.12.0.3/exchange/vijaya/Inbox/testattachment.EML");
        al.add("https://192.12.0.38:403/exchange/Administrator/Inbox/TEST Message 01.EML");
        al.add("https://192.12.0.3/exchange/Administrator/Inbox/TEST Message 011.EML");
        al.add("https://192.12.0.3/exchange/Administrator/Inbox/TEST Message 012.EML");
        al.add("https://192.12.0.38:403/exchange/Administrator/Inbox/TEST Message 013.EML");
0
 
CEHJCommented:
(slightly unprettily printed, but you'll get the idea ;-))
0
 
sudhakar_koundinyaAuthor Commented:
following is the TestCase created with my method and your method by adding url.getProtocol() as extra

Now your  results are much faster

My Case : above 7 Seconds
Your Case : above 5 Seconds

Is it possible to reduce still??

Thanks
Sudha



import junit.framework.*;
import java.util.*;
import java.net.*;
import junit.extensions.*;

/**
 * A sample test case, testing <code>java.util.ArrayList</code>.
 *
 */
public class TestURLList extends TestCase {
      
      ArrayList vect=new ArrayList();
      public TestURLList(String name) {
            super(name);
      }

      public static void main (String[] args) {
            junit.textui.TestRunner.run (suite());
      }
      protected void setUp() {
      }

      public static Test suite() {
            return new TestSuite(TestURLList.class);
      }
      public void testURLS()
      {
            for(int i=0;i<100;i++)
            {
                  for(int j=0;j<1000;j++)
                  {
                        vect.add("http://Test"+i+"/"+j+"/"+(i+j));
                  }
            }
            //List[] array=getUrlList(vect); //7501 seconds
            List[] array=getURLList(vect); // 5258 seconds
      }

//7501
        private java.util.List[] getUrlList(java.util.List list) {
    Hashtable ht = new Hashtable();
    for (int i = 0; i < list.size(); i++) {
      String url = list.get(i).toString();
      String array[] = url.split("/");

      String domain = array[0];
      if (array[0].toLowerCase().startsWith("http")) {
        domain = array[2];
        domain = domain + array[4];
      }
      else {
        domain = domain + array[2];
      }

      ArrayList sublist = (ArrayList) ht.get(domain);
      if (sublist == null) {
        sublist = new ArrayList();
      }
      sublist.add(url);
      ht.put(domain, sublist);
    }
    return (List[]) Collections.list(ht.elements()).toArray(new ArrayList[0]);
  }

// 5258 milli seconds
   private static java.util.List[] getURLList(java.util.List list) {
          Hashtable hostToUrls = new Hashtable();
          try {
               for (int i = 0; i < list.size(); i++) {
                    URL url = new URL((String) list.get(i));
                    String hostKey = url.getHost();
                              String protocol=url.getProtocol();
                        //      System.err.println(url.getProtocol() );
                    // Could be a Set if no duplicates required
                    ArrayList urlsForHost = (ArrayList)hostToUrls.get(protocol+hostKey);
                    if (urlsForHost == null) {
                         urlsForHost = new ArrayList();
                         urlsForHost.add(url);
                         hostToUrls.put(protocol+hostKey, urlsForHost);
                    }
                    else {
                         urlsForHost.add(url);
                    }
               }
          }
          catch (MalformedURLException e) {
               return null;
          }
          int numberOfLists = hostToUrls.size();
          return (java.util.List[]) Collections.list(hostToUrls.elements()).toArray(new ArrayList[numberOfLists]);
     }
}
0
 
CEHJCommented:
I'll think about it. So you want to keep http and https separate?

>>domain = domain + array[4];

What makes you think the above will always be in range?
0
 
sudhakar_koundinyaAuthor Commented:
there is a possibility of giving urls some thing like this

192.12.0.39/exchange/MailBox/Mail.eml or
http://192.12.0.39/exchange/MailBox/Mail.eml
0
 
sudhakar_koundinyaAuthor Commented:
>> So you want to keep http and https separate?

Yes including domain
0
 
MogalManicCommented:
You can squeeze a few more milliseconds by removing the call to list.size() in the for loop.  The function is called every iteration in your loop, in your example that is 100,000 times (every time it is the same answer).

Change the for loop to the following:
final int listSize=list.size();
for (int i = 0; i < listSize; i++) {
...

You could try making all of the string delcarations 'final'.  This would allow the java compiler to make better optomization choices using temporary variables or registers.

Change the return type from List[] to Collection.  This would save the final iteration through your results to convert it to an array.  The results would be a Collection of Array lists.  Like this:
            return hostToUrls.values();  //Return the Set of values in Map
0
 
CEHJCommented:
>>The function is called every iteration in your loop

That's not the case in fact
0
 
sudhakar_koundinyaAuthor Commented:

>> Change the return type from List[] to Collection.

Friend,
If that is the case why I should write this method?? I could somply use either ArrayList or Vector. Thanks for helping me

How Efficiently I could change this line??
 return (java.util.List[]) Collections.list(hostToUrls.elements()).toArray(new ArrayList[numberOfLists]);

Thanks and Best Regards
Sudhakar

0
 
sudhakar_koundinyaAuthor Commented:
>>results would be a Collection of Array lists.

My apologies, I didn't read entire comment
0
 
sudhakar_koundinyaAuthor Commented:
Let me try your Idea
0
 
CEHJCommented:
Or just return a List:

      public final static java.util.List getUrlList2(java.util.List list) {
            Hashtable hostToUrls = new Hashtable();
            try {
                  for (int i = 0; i < list.size(); i++) {
                        URL url = new URL((String) list.get(i));
                        final String key = url.getHost() + url.getProtocol();
                        // Could be a Set if no duplicates required
                        ArrayList urlsForHost = (ArrayList) hostToUrls.get(key);
                        if (urlsForHost == null) {
                              urlsForHost = new ArrayList();
                              urlsForHost.add(url);
                              hostToUrls.put(key, urlsForHost);
                        }
                        else {
                              urlsForHost.add(url);
                        }
                  }
            }
            catch (MalformedURLException e) {
                  return null;
            }
            return Collections.list(hostToUrls.elements());
      }
0
 
sudhakar_koundinyaAuthor Commented:
I compqred your idea with my logic (modified version from CEHJ)

And here are the sample results

My code : 1484 milli seconds
Your  code : 1485 milli seconds

And I tested the codes  on  Pentium 4/256 MB Ram. running On multiple tests what I have identified is some times ur idea is fast and some times my idea is fast and that too a difference of few milli seconds. Maximum difference I have identified is +/- 25 milli seconds. This may be because of other processes that are running.  So I don't think returning as collection is right solution.

More or less we are doing the same in both codes. My idea converts the list into list array by calee  whereas ur idea converts the list into list array by caller

Here is the test case

import java.util.*;
import java.net.*;
public class ArrayTest
{
      


   private static Collection getURLList(java.util.List list) {
          Hashtable hostToUrls = new Hashtable();
          try {
               for (int i = 0; i < list.size(); i++) {
                    URL url = new URL((String) list.get(i));
                    String hostKey = url.getHost();
                         String protocol=url.getProtocol();
                    //     System.err.println(url.getProtocol() );
                    // Could be a Set if no duplicates required
                    ArrayList urlsForHost = (ArrayList)hostToUrls.get(protocol+hostKey);
                    if (urlsForHost == null) {
                         urlsForHost = new ArrayList();
                         urlsForHost.add(url);
                         hostToUrls.put(protocol+hostKey, urlsForHost);
                    }
                    else {
                         urlsForHost.add(url);
                    }
               }
          }
          catch (MalformedURLException e) {
               return null;
          }
          int numberOfLists = hostToUrls.size();
       
      return hostToUrls.values();

     }


  private static java.util.List[] getUrlList(java.util.List list) {
          Hashtable hostToUrls = new Hashtable();
          try {
               for (int i = 0; i < list.size(); i++) {
                    URL url = new URL((String) list.get(i));
                    String hostKey = url.getHost();
                         String protocol=url.getProtocol();
                    //     System.err.println(url.getProtocol() );
                    // Could be a Set if no duplicates required
                    ArrayList urlsForHost = (ArrayList)hostToUrls.get(protocol+hostKey);
                    if (urlsForHost == null) {
                         urlsForHost = new ArrayList();
                         urlsForHost.add(url);
                         hostToUrls.put(protocol+hostKey, urlsForHost);
                    }
                    else {
                         urlsForHost.add(url);
                    }
               }
          }
          catch (MalformedURLException e) {
               return null;
          }
          int numberOfLists = hostToUrls.size();
          return (java.util.List[]) Collections.list(hostToUrls.elements()).toArray(new ArrayList[numberOfLists]);
     }


public static void main(String s[])
{
   
      java.util.Date start=new java.util.Date();
ArrayList al=new ArrayList();

  for(int i=0;i<100;i++)
          {
               for(int j=0;j<1000;j++)
               {
                    al.add("http://Test"+i+"/"+j+"/"+(i+j));
               }
          }

/**
   Following is your Idea
*/
Collection c=getURLList(al);
Object[] array=c.toArray();
for(int i=0;i<array.length;i++)
{

      List array1=(List)array[i];
//      System.err.println("Array "+i+"\n");
      for(int j=0;j<array1.size();j++)
      {
            //System.err.println(array1.get(j));
      }
      
} // This had taken 1485 milli seconds


/*
    Following is My Idea
*/

/*List array[]=getUrlList(al);
for(int i=0;i<array.length;i++)
{

      List array1=(List)array[i];
//      System.err.println("Array "+i+"\n");
      for(int j=0;j<array1.size();j++)
      {
            //System.err.println(array1.get(j));
      }
      
}*/ //this had taken 1484 milli seconds

java.util.Date end=new java.util.Date();
System.err.println(end.getTime()-start.getTime());

}
}
0
 
CEHJCommented:
i.e. return

List<List<URL>>

(per my example)

or

List<List<String>>
0
 
sudhakar_koundinyaAuthor Commented:
CEHJ,

I think even ur second idea also gives the same results more or less.
0
 
CEHJCommented:
Obviously try running these things repeatedly and take averages
0
 
sudhakar_koundinyaAuthor Commented:
How Efficiently I could change this line??
 return (java.util.List[]) Collections.list(hostToUrls.elements()).toArray(new ArrayList[numberOfLists]);
0
 
sudhakar_koundinyaAuthor Commented:
>>List<List<URL>>

The current JDK Version is 1.4.2, So that Idea may not be helpful for me
0
 
CEHJCommented:
>>The current JDK Version is 1.4.2, So that Idea may not be helpful for me

No - i mean conceptually - not necessarily literally in code ;-)

What are your ideal return types?
0
 
sudhakar_koundinyaAuthor Commented:
>>What are your ideal return types?

I did not get you
0
 
CEHJCommented:
What types do you want returned from the method, i.e. what would you like to work with ideally, or does it not matter?
0
 
sudhakar_koundinyaAuthor Commented:
Basically I need to return List[]

If this is not possible, I am open to know other solutions. Main thing is I need to have optimized solution

At runtime, method  need to work with 10000 - 1000000 mail objects(URLs) depending on conditions.

0
 
sudhakar_koundinyaAuthor Commented:
It's time to sleep now (1.55 AM). I will discuss with you tomorrow from my office PC.

Good Night

0
 
sudhakar_koundinyaAuthor Commented:
BTW,

What is your full name and what is your Native Place?? Just Curious

I am from INDIA
0
 
sudhakar_koundinyaAuthor Commented:
Hi,

Back again
0
 
CEHJCommented:
Were you asking me that q?
0
 
sudhakar_koundinyaAuthor Commented:
nope just intimating
0
 
MogalManicCommented:
If you change your return type from List[] to Collection, you need to change how you parse the results.  Instead of:
  Collection c=getURLList(al);
  Object[] array=c.toArray();
  for(int i=0;i<array.length;i++)
  {...
do the following:
  Collection c=getURLList(al);
  for(Iterator it=c.Iterator();it.hasNext();)
  {
     ArrayList eachList=(ArrayList) it.next();
     ...Process eachList...


The iterator will process the collection w/o moving things around in memory (In a sense you are still iterating through the original HashTable).  If you use toArray() you are effectively using the same code when you were just returning List[].

0
 
CEHJCommented:
:-)
0
 
CEHJCommented:
Can you confirm this is returning the right results?

      public final static java.util.List getUrlList4(final java.util.List list) {
            Hashtable hostToUrls = new Hashtable(list.size());
            try {
                  for (int i = 0; i < list.size(); i++) {
                        URL url = new URL((String) list.get(i));
                        final String key = url.getHost() + url.getProtocol() + url.getFile().split("/")[1];
                        // Could be a Set if no duplicates required
                        ArrayList urlsForHost = (ArrayList) hostToUrls.get(key);
                        if (urlsForHost == null) {
                              urlsForHost = new ArrayList();
                              urlsForHost.add(url);
                              hostToUrls.put(key, urlsForHost);
                        }
                        else {
                              urlsForHost.add(url);
                        }
                  }
            }
            catch (MalformedURLException e) {
                  return null;
            }
            return Collections.list(hostToUrls.elements());
      }
0
 
sudhakar_koundinyaAuthor Commented:
url.getFile().split("/")[2]; do the job

But I believe it also reduces the performance as discussed  already here
http://www.experts-exchange.com/Programming/Programming_Languages/Java/Q_21109363.html

Thanks
Sudhakar
0
 
CEHJCommented:
As mentioned in another q., could you post the link to a sizeable chunk of test data?
0
 
CEHJCommented:
OK, with the (i guess smallish) sample of data you posted earlier, this seems to return the correct result and is about 5 times faster than the fastest one i tested so far (yours actually):

      public final static List getUrlList6(java.util.List urls) {
            Hashtable hostToUrls = new Hashtable();
            StringBuffer accountBuffer = new StringBuffer(32);
            for (int i = 0; i < urls.size(); i++) {
                  accountBuffer.setLength(0);
                  String url = (String) urls.get(i);
                  int forwardSlashCount = 0;
                  for (int ixChar = 4; forwardSlashCount < 5 && ixChar < url.length(); ixChar++) {
                        if (url.charAt(ixChar) == '/') {
                              forwardSlashCount++;
                        }
                        if (forwardSlashCount == 4) {
                              accountBuffer.append(url.charAt(ixChar));
                        }
                  }
                  // end for
                  accountBuffer.append(url.charAt(4));
                  final String key = accountBuffer.toString();
                  ArrayList urlsForHost = (ArrayList) hostToUrls.get(key);
                  if (urlsForHost == null) {
                        urlsForHost = new ArrayList();
                        urlsForHost.add(url);
                        hostToUrls.put(key, urlsForHost);
                  }
                  else {
                        urlsForHost.add(url);
                  }
            }
            return Collections.list(hostToUrls.elements());
      }
0
 
sudhakar_koundinyaAuthor Commented:
0
 
sudhakar_koundinyaAuthor Commented:
Thanks CEHJ,

didn't see the comment.

I will test that and let you know
0
 
sudhakar_koundinyaAuthor Commented:
I am seeing the changes you have done, trying to understand the code
0
 
CEHJCommented:
Sorry - it's almost completely uncommented. More or less the same as before apart from counting slashes by String index
0
 
sudhakar_koundinyaAuthor Commented:
Although the change  improved the performance

The code is failed for following test case  :(

array.add("http://192.12.0.38/exchange/Administrator/Inbox/TEST Message 0237.EML");
array.add("http://192.12.0.39/exchange/Administrator/Inbox/TEST Message 0237.EML");

0
 
CEHJCommented:
LOL sorry - i've forgotten in the meantime that the host is part of the key too!
0
 
sudhakar_koundinyaAuthor Commented:
Just wrote this code

I beg your help to check if this is giving fast results than previous methods at your end

public static final List getUrlList7(List l)
    {
          //http://192.12.0.62/exchange/Administrator
          Hashtable ht=new Hashtable();
          for(int i=0;i<l.size();i++)
          {
                String url=(String)l.get(i);
                int hashCount=0;
                StringBuffer key=new StringBuffer();
                for(int j=0;j<url.length();j++)
                {
                      char c=url.charAt(j);
                      key.append(c);
                      if(c=='/')
                            hashCount++;
                      if(hashCount==5)
                            break;
                      
                }
                List array=(List)ht.get(key.toString());
                if(array==null)
                {
                      array=new ArrayList();
                      ht.put(key.toString(),array);
                }
                array.add(url);
                
          }
          return Collections.list(ht.elements());
   
          
    }

thanks,
sudhakar
0
 
sudhakar_koundinyaAuthor Commented:
nope that is  bad code i think :( giving poor results
0
 
CEHJCommented:
I'll try it. In the meantime - try this:

      public final static List getUrlList8(java.util.List urls) {
            Hashtable hostToUrls = new Hashtable(urls.size());
            StringBuffer accountBuffer = new StringBuffer(32);
            for (int i = 0; i < urls.size(); i++) {
                  accountBuffer.setLength(0);
                  String url = (String) urls.get(i);
                  int forwardSlashCount = 0;
                  int hostHashCode = 0;
                  for (int ixChar = 4; forwardSlashCount < 5 && ixChar < url.length(); ixChar++) {
                        if (url.charAt(ixChar) == '/') {
                              forwardSlashCount++;
                        }
                        if (forwardSlashCount == 2) {
                              // Compute hashcode of host
                              hostHashCode = 31 * hostHashCode + url.charAt(ixChar);
                        }
                        else if (forwardSlashCount == 4) {
                              accountBuffer.append(url.charAt(ixChar));
                        }
                  }
                  // end for
                  accountBuffer.append(url.charAt(4)).append(hostHashCode);
                  final String key = accountBuffer.toString();
                  ArrayList urlsForHost = (ArrayList) hostToUrls.get(key);
                  if (urlsForHost == null) {
                        urlsForHost = new ArrayList();
                        urlsForHost.add(url);
                        hostToUrls.put(key, urlsForHost);
                  }
                  else {
                        urlsForHost.add(url);
                  }
            }
            return Collections.list(hostToUrls.elements());
      }
0
 
CEHJCommented:
>>nope that is  bad code i think :( giving poor results

Well it's (getUrlListSud2) about 3 times faster than your original one (getURLList):
Following results in milliseconds for 1000 invocations against the test data. If you're wondering what the '3' is it's just me checking that my method is returning the right number of lists ;-)


C:\java\tiger\lang>jr TestUrlLists 1000
Executing getURLList...
40208.0
Executing getUrlListSud2...
12739.0
Executing getUrlList8...
8481.0
3

C:\java\tiger\lang>jr TestUrlLists 1000
Executing getURLList...
39727.0
Executing getUrlListSud2...
13149.0
Executing getUrlList8...
8671.0
3

C:\java\tiger\lang>jr TestUrlLists 1000
Executing getUrlList8...
8572.0
Executing getUrlListSud2...
12437.0
Executing getURLList...
40459.0
3
0
 
sudhakar_koundinyaAuthor Commented:
Ok I willtry that

And I  have done a small change in above code

If my assumtion is correct, this should work fast. I need to check with Exchange administrator for this.

I am assuming exchange is common.

public static final List getUrlList7(List l)
    {
          //http://192.12.0.62/exchange/Administrator
          Hashtable ht=new Hashtable();
          for(int i=0;i<l.size();i++)
          {
                String url=(String)l.get(i);
                int hashCount=0;
                StringBuffer key=new StringBuffer();
                for(int j=4;j<url.length();j++)
                {
                      //char c=url.charAt(j);
                      key.append(url.charAt(j));
                      if(key.charAt(key.length()-1)=='/')
                            hashCount++;
                      if(hashCount==3)
                            j=j+8;
                      if(hashCount==5)
                            break;
                      
                }
                List array=(List)ht.get(key.toString());
                if(array==null)
                {
                      array=new ArrayList();
                      ht.put(key.toString(),array);
                }
                array.add(url);
                
          }
            Enumeration e= ht.keys();
         while(e.hasMoreElements())
          System.err.println(e.nextElement());
          return Collections.list(ht.elements());
   
          
    }
0
 
sudhakar_koundinyaAuthor Commented:
it  improved performance by 300 milli seconds at my side

for folllowing test case

 public Vector getURLS() {
        Vector vect=new Vector();
        for(int i=0;i<100;i++) {
            for(int j=0;j<1000;j++) {
                vect.add("http://Test"+i+"/"+j+"/"+(i+j));
            }
        }
        //List[] array=getUrlList(vect); //7501 seconds
       return vect;
}
0
 
sudhakar_koundinyaAuthor Commented:
i mean this

public Vector getURLS() {
        Vector vect=new Vector();
        for(int i=0;i<100;i++) {
            for(int j=0;j<1000;j++) {
                vect.add("http://Test"+i+"/exchange/"+(i+j));
            }
        }
0
 
sudhakar_koundinyaAuthor Commented:
looking at ur code

if (forwardSlashCount == 2) {
                       // Compute hashcode of host
                       hostHashCode = 31 * hostHashCode + url.charAt(ixChar);
                  }
                  else if (forwardSlashCount == 4) {
                       accountBuffer.append(url.charAt(ixChar));
                  }


what you are trying to do on above snippet??

0
 
sudhakar_koundinyaAuthor Commented:
>>hostHashCode = 31 * hostHashCode + url.charAt(ixChar);

will this be unique for both http and https based ??
0
 
CEHJCommented:
Your last is actually slower than your previous:

C:\java\tiger\lang>jr TestUrlLists 10000
Executing getUrlListSud3...
130687.0
Executing getUrlList8...
84761.0
Executing getUrlListSud2...
126582.0
Executing getURLList...
394578.0

>>what you are trying to do on above snippet??

First block takes the hashcode of the host name
Second block distinguishes between http & https

The variable named 'accountBuffer' should better be called 'keyBuffer'
0
 
sudhakar_koundinyaAuthor Commented:
could you please post your whole test code??

I wanted to test the same at my end also

thanks
Sudhakar
0
 
sudhakar_koundinyaAuthor Commented:
0
 
sudhakar_koundinyaAuthor Commented:
>>hostHashCode = 31 * hostHashCode + url.charAt(ixChar);

Why specifically this one?? OR can we use any other logic also?? in identifying the hashcode
0
 
CEHJCommented:
>>Why specifically this one??

That's actually the central code of String's hashCode()

>>could you please post your whole test code??

Well there's quite a lot of it, but here are the essentials:

http://www.cehjohnson.uklinux.net/aire/MethodTimer.java

, which is 'work in progress' the obvious thing that's missing (great if you can put it in) is that it should be able to return the function's return value to the caller, as well as the time (simple custom class required) and this is being called from main thus:

            Class[] methodTypes = new Class[]{java.util.List.class};
            Object[] methodArgs = new Object[]{al};
            // Put test method names here
            final String[] METHOD_NAMES = {"getUrlList8", "getUrlListSud9"};
            //final String[] METHOD_NAMES = {"testStringTokenizer", "testSplit"};
            Stack methodStack = new Stack();
            methodStack.addAll(Arrays.asList(METHOD_NAMES));
            Collections.shuffle((java.util.List) methodStack);
            double aggregatedTime = 0.0;
            // Number of invocations passed in on the command line
            final int NUMBER_OF_INVOCATIONS = Integer.parseInt(args[0]);
            while (!methodStack.isEmpty()) {
                  String currentMethodName = (String) methodStack.pop();
                  System.out.println("Executing " + currentMethodName + "...");
                  for (int i = 0; i < NUMBER_OF_INVOCATIONS; i++) {
                        aggregatedTime += MethodTimer.timeStaticMethod("TestUrlLists", currentMethodName, methodTypes, methodArgs);
                  }
                  System.out.println(aggregatedTime);
                  aggregatedTime = 0.0;
            }



0
 
sudhakar_koundinyaAuthor Commented:
it seems ur last code is only the final way. Anyhow I will  wait for some other experts inputs for some time. If nobody  participate here then the points will be ur's ;-)

thanks
sudhakar

 
0
 
objectsCommented:
I'll have a look when I get in the office tomorrow, whats the current state?
0
 
sudhakar_koundinyaAuthor Commented:
CEHJ gave me this code


     public final static List getUrlList8(java.util.List urls) {
          Hashtable hostToUrls = new Hashtable(urls.size());
          StringBuffer accountBuffer = new StringBuffer(32);
          for (int i = 0; i < urls.size(); i++) {
               accountBuffer.setLength(0);
               String url = (String) urls.get(i);
               int forwardSlashCount = 0;
               int hostHashCode = 0;
               for (int ixChar = 4; forwardSlashCount < 5 && ixChar < url.length(); ixChar++) {
                    if (url.charAt(ixChar) == '/') {
                         forwardSlashCount++;
                    }
                    if (forwardSlashCount == 2) {
                         // Compute hashcode of host
                         hostHashCode = 31 * hostHashCode + url.charAt(ixChar);
                    }
                    else if (forwardSlashCount == 4) {
                         accountBuffer.append(url.charAt(ixChar));
                    }
               }
               // end for
               accountBuffer.append(url.charAt(4)).append(hostHashCode);
               final String key = accountBuffer.toString();
               ArrayList urlsForHost = (ArrayList) hostToUrls.get(key);
               if (urlsForHost == null) {
                    urlsForHost = new ArrayList();
                    urlsForHost.add(url);
                    hostToUrls.put(key, urlsForHost);
               }
               else {
                    urlsForHost.add(url);
               }
          }
         

After discussing the different test cases. At this moment it seems to better to me
0
 
objectsCommented:
> At this moment it seems to better to me

how much improvement is that giving you?
0
 
objectsCommented:
Get rid of that Hashtable for a start and replace it with a HashMap.
And give your ArrayList an initial size (you could make it as big as the number of elements remaining in urls).
0
 
sudhakar_koundinyaAuthor Commented:
My first posting took  7  seconds

where as CEHJ's took 1.2 seconds currently for 20 Milliion URLs  grouping

Thanks
Sudhakar
0
 
CEHJCommented:
Almost twice as fast as the last one, but needs more testing for accuracy. We really need more data than the test data in the link:

public final static List getUrlList10(java.util.List urls) {
            Hashtable hostToUrls = new Hashtable(urls.size());
            for (int i = 0; i < urls.size(); i++) {
                  String url = (String) urls.get(i);
                  int forwardSlashCount = 0;
                  int hostPlusAccountPlusProtocolHashcode = 0;
                  for (int ixChar = 4; forwardSlashCount < 5 && ixChar < url.length(); ixChar++) {
                        if (url.charAt(ixChar) == '/') {
                              forwardSlashCount++;
                        }
                        if (forwardSlashCount == 2 || forwardSlashCount == 4) {
                              // Compute hashcode of host
                              hostPlusAccountPlusProtocolHashcode = 31 * hostPlusAccountPlusProtocolHashcode + url.charAt(ixChar);
                        }
                  }
                  // end for
                  if (url.charAt(4) == 's') {
                        hostPlusAccountPlusProtocolHashcode = ~hostPlusAccountPlusProtocolHashcode;
                  }
                  Integer key = new Integer(hostPlusAccountPlusProtocolHashcode);
                  ArrayList urlsForHost = (ArrayList) hostToUrls.get(key);
                  if (urlsForHost == null) {
                        urlsForHost = new ArrayList();
                        urlsForHost.add(url);
                        hostToUrls.put(key, urlsForHost);
                  }
                  else {
                        urlsForHost.add(url);
                  }
            }
            return Collections.list(hostToUrls.elements());
      }
0
 
sudhakar_koundinyaAuthor Commented:
On higher end, no doubt that will help me.

curently I am working on exchange server project in which the current task is to download the 25 mails per minute using slide

Now currently I am able to reach the target of 15/min.

So I am currently searching for the code optimizations

thanks
sudhakar

 
0
 
sudhakar_koundinyaAuthor Commented:
>> Get rid of that Hashtable for a start and replace it with a HashMap.

Just discussing the same here
http://www.experts-exchange.com/Programming/Programming_Languages/Java/Q_21111028.html
0
 
CEHJCommented:
>>where as CEHJ's took 1.2 seconds currently for 20 Milliion URLs  grouping

Can you try the last one on the 20,000,000 and see if the result is accurate?
0
 
objectsCommented:
>> Get rid of that Hashtable for a start and replace it with a HashMap.
> Just discussing the same here

It will be faster.

0
 
sudhakar_koundinyaAuthor Commented:
yeah that's true took 985 milli secs
0
 
objectsCommented:
specifying an initial capacity for the list will give you further improvements.
Give it as much as you can afford (<= number of elements left in list)
0
 
objectsCommented:
wheres the list coming from anyway?
and do you have any analysis on the typical breakdown of the data?

0
 
sudhakar_koundinyaAuthor Commented:
CEHJ on several test runs

What I have identified is ur last  code is almost double the spped.

>>wheres the list coming from anyway?

Acutally we are developing the project along with the client. client's programmers are developing their own API and we need to integrate the API with our's. And we did not get their API also yet for testing. So it is difficult to get the actual data.

So what i did is, I have created test environment on 10 machines and    testing my API

>> specifying an initial capacity for the list will give you further improvements.
 I will do that.  assuming 10,000 to 50,1000 urls for single list. How much initial capacity I can give to that??

Suppose if the list contains only 10 urls and if set the size to 100, will it occupy extra  memory??



Thanks
sudhakar
0
 
CEHJCommented:
How and where are you planning to set the capacity of the List?
0
 
sudhakar_koundinyaAuthor Commented:
Your last code is really a good Idea :-)
0
 
objectsCommented:
> Suppose if the list contains only 10 urls and if set the size to 100, will it occupy extra  memory??

Sure but your goal is to make it go as fast as you can isn't it, not to use less memory :)
0
 
objectsCommented:
that code does not seem to behave exactly the same as the requirements were specified earlier.
Exactly what is the possible formats that the code would need to handle, and how should they be broken down.

>  client's programmers are developing their own API and we need to integrate the API with
> our's. And we did not get their API also yet for testing. So it is difficult to get the actual data.

If you really need to squeeze your msec out of this, then you should be discussing with them to ensure they pass it to you in a form that is as ideal as possible for processing.

0
 
objectsCommented:
Earlier you said you could also recieve the format:
192.12.0.39/exchange/MailBox/Mail.eml

Is this still the case?

What is the definition of possible formats?
0
 
sudhakar_koundinyaAuthor Commented:
>>192.12.0.39/exchange/MailBox/Mail.eml

Sorry objects,  I forgot to mention that. This case was removed
0
 
CEHJCommented:
>>Your last code is really a good Idea :-)

Was that directed at me sudhakar? (sometimes i don't know which of us you're addressing ;-))
0
 
MogalManicCommented:
No matter how you tweek the same routine It is still going to bo O(n) (Since the routine is just a for loop of each item in the list, the speed processing the list is some factor times the size of the list).

If your server is on a multi CPU box, you could get factor of O((n/m)lg(n/m)) by spliting the list into a number of M sublists.  Where M is the number of CPU's.  Here is the pseudocode:

int listSize=list.size();
for (m=0; m<#ofCPU's;m++) {
  BuildThreadWorker worker[m]=BuildThreadWorker(list, listSize*m, listSize*m+listSize/m);
  Thread t= new Thread(worker[m]);
  t.start();
}

//wait for workers to complete
int done=false
while(!done) {
  done=true;
  for(int m=0;m<#ofCPU's;m++) {
    if (worker[m].isDone()==false) {
         done=false;
        Thread.yield();  //Allow threads to work
        break;//check again
   }
}
MergeResults(list, listSize/m);

class BuildThreadWorker implements Runnable
{
 
  BuildThreadWorkder(List list, int start, int end)
  {
     this.list=list;
     this.start=start;
     this.end;
  }

  public void run()
  {
      for(int i=start;i<end;i++) {
          //Modified merge sort or quicksort of sublist
      }
  }
}

For large lists, it might be faster doing some sort of divide and conquer method.
0
 
sudhakar_koundinyaAuthor Commented:
>>Was that directed at me sudhakar?

Yes :-)
0
 
CEHJCommented:
Actually, the last code i posted should not be used. It's late here - more later ;-)
0
 
objectsCommented:
Also make a best guess for the initial capacity and load factor used by your Map.
0
 
sudhakar_koundinyaAuthor Commented:
>> Actually, the last code i posted should not be used. It's late here - more later ;-)


why??
0
 
sudhakar_koundinyaAuthor Commented:
>> load factor used by your Map.

Sorry, I didn't get you
0
 
objectsCommented:
A lower load factor should improve the lookup times.
0
 
sudhakar_koundinyaAuthor Commented:
Objects,
As earlier said, at following link
http://www.experts-exchange.com/Programming/Programming_Languages/Java/Q_21111028.html

other experts are getting negative results only regarding Hashmap. So I am not yet/able decide which to use
0
 
objectsCommented:
Both Maps use a load factor so you'd get an improvement whichever you use.
0
 
objectsCommented:
Can you email me a large set of test data and I'll do some benchmarking.
0
 
sudhakar_koundinyaAuthor Commented:
>> Can you email me a large set of test data and I'll do some benchmarking

I will try to do that tomorrow. As today is Festival (Rakhee) Holiday in India

thanks
Sudhakar
0
 
CEHJCommented:
>>why??

Let's carry on as normal and i'll explain this later:

>>Can you email me a large set of test data

Please don't email anything, but post the link instead.
0
 
CEHJCommented:
>>other experts are getting negative results only regarding Hashmap

HashMap will give you (in theory) only slightly better results, as the cost of synchronization is not as high as is commonly imagined
0
 
objectsCommented:
> Let's carry on as normal and i'll explain this later

????

> Please don't email anything

email is fine for me
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Moderators generally don't like seeing e-mail IDs or questions being solved through e-mail.
0
 
objectsCommented:
not solving it thru email, just need the test data.
And its too much to post here so email is the easiest, and its not directly relevant to the solution anyway.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Yeah, I agree to that :)
0
 
CEHJCommented:
There's no problem with posting a link to test data, and absolutely no need for emails, quite apart from it being discouraged by the moderators.
0
 
sudhakar_koundinyaAuthor Commented:
ok guys,

I have uploaded the testurls zip file at http://www.prithvisolutions.com/urllist.zip  I may remove this with in one or two days as I have uploaded at my comany's official site. I just get 2 days permission to place that zip file there.

OK coming to the scenario,

With the mail client that sends the multiple emails at a time I was able to send the mails to different exchange servers. All these are just test servers. So I am sorry because I am not able to give realtime Email URLS for this.

And using Slide API, I got the urllist that I have placed in a text file. It contains arround 16 lacs of urls list

Sridhar,
Welcome to this thread participation :-). And I hope u understand what exactly I am looking for by seeing the previous discussions.


Thanks,
Sudhakar
0
 
sudhakar_koundinyaAuthor Commented:
I am increasing the points to 500 as you guys are too patience in helping me :-)
0
 
objectsCommented:
> There's no problem with posting a link to test data

And no difference :-D Except some inconvenience on suds.
No more of your petty complaints please, noone else had a problem so why should you.
It not like you even asked for any test data.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
>> And I hope u understand what exactly I am looking for by seeing the previous discussions

I've been listening to this but haven't been able to take out time for bench-marking something. BTW, I tried clicking on the link you posted (the ZIP file). It said Page cannot be displayed.
0
 
CEHJCommented:
sudhakar - it would be useful if you could post how many lists are meant to be produced by the test data
0
 
sudhakar_koundinyaAuthor Commented:
>> It said Page cannot be displayed.
???. It is working fine at my side. I just tested on both IE and NS
0
 
sudhakar_koundinyaAuthor Commented:

>>Let's carry on as normal and i'll explain this later:

CEHJ, will that won't work for some conditions??
0
 
CEHJCommented:
>>CEHJ, will that won't work for some conditions??

I'm not going to prejudice you against it at the moment - keep testing it ;-)

a. does it produce the right number of lists for the test data?
b. what *is* the right number of lists?
0
 
sudhakar_koundinyaAuthor Commented:
Please let me know if you have the copy of that zip file. I am revoing that by this evening

thanks
Sudhakar
0
 
CEHJCommented:
I've got it
0
 
sudhakar_koundinyaAuthor Commented:
objects shall I remove that file??
0
 
sudhakar_koundinyaAuthor Commented:
Ok CEHJ,

for the contents of that zip file

here is the key information

http://www.geocities.com/sudhakar_koundinya/urls.html
0
 
CEHJCommented:
Did my code produce the right number of lists? (i haven't tested)
0
 
sudhakar_koundinyaAuthor Commented:
CEHJ,

for 1823794 URLS your method get 1860 lists
where as actuals are 1984

Still trying to identify the problem
0
 
sudhakar_koundinyaAuthor Commented:
CEHJ,

here are the results http://www.geocities.com/sudhakar_koundinya/results.htm

for the following test case

import java.io.*;
import java.util.*;
class ReadURLS
{
static long urls=0;
      public static void main(String[] args)
      {
      try {
        BufferedReader in = new BufferedReader(new FileReader("c:\\urllist.txt"));
        String str;
        while ((str = in.readLine()) != null) {
                    getUrlList10(str);    //cehj          
                             process(str); //sudhakar
        }
                        System.err.println(ht.size());
                        Enumeration e=ht.keys();
                        while(e.hasMoreElements())
            {
                              String str1=(String)e.nextElement();
                              System.out.println("Key  : "+str1+"   Value : "+ht.get(str1));
            }
            
                        System.err.println("Current Size :"+ht.size()+" -> No of URLS "+urls);
        in.close();
            List l=Collections.list(hostToUrls.keys());
            long value=0;
            for(int i=0;i<l.size();i++)
            {
                  Long l1=(Long)l.get(i);
                              System.out.println("Key  : "+l1+"   Value : "+hostToUrls.get(l1));
            }
            System.out.println("No of Lists :"+l.size());


    } catch (IOException e) {
    }
      }
      public static void process(String str)
      {
            if(str==null) return;
            if(str.trim().length()<=0) return;
            StringBuffer sb=new StringBuffer();
            int counter=0;
            for(int i=0;i<str.length();i+=1)
            {
                  sb.append(str.charAt(i));
                  if(sb.charAt(sb.length()-1)=='/')
                  {
                        counter++;
                        if(counter==5)
                        {
                              Long i1=(Long)ht.get(sb.toString());
                              if(i1==null)
                              {
                                    i1=new Long(1);
                              }
                              else
                              {
                                    i1=new Long(i1.longValue()+1);
                              }
                              ht.put(sb.toString(),i1);
                              break;
                        }
                  }
            }
            urls++;
            if(urls%50000==0)
            System.err.println("Current Size :"+ht.size()+" -> No of URLS "+urls);
      }
      public final static void getUrlList10(String str) {

          //for (int i = 0; i < urls.size(); i++)
                    {
               String url = (String) str;
                                 if(str==null) return;
            if(str.trim().length()<=0) return;
               int forwardSlashCount = 0;
               int hostPlusAccountPlusProtocolHashcode = 0;
               for (int ixChar = 4; forwardSlashCount < 5 && ixChar < url.length(); ixChar++) {
                    if (url.charAt(ixChar) == '/') {
                         forwardSlashCount++;
                    }
                    if (forwardSlashCount == 2 || forwardSlashCount == 4) {
                         // Compute hashcode of host
                         hostPlusAccountPlusProtocolHashcode = 3 * hostPlusAccountPlusProtocolHashcode + url.charAt(ixChar);
                    }
               }
               // end for
               if (url.charAt(4) == 's') {
                    hostPlusAccountPlusProtocolHashcode = ~hostPlusAccountPlusProtocolHashcode;
               }
               Long key = new Long(hostPlusAccountPlusProtocolHashcode);
               Long urlsForHost = (Long) hostToUrls.get(key);
               if (urlsForHost == null) {
                    urlsForHost = new Long(1);
                    //urlsForHost.add(url);
                    hostToUrls.put(key, urlsForHost);
               }
               else {
                    urlsForHost=new Long(urlsForHost.longValue()+1);
                                                  hostToUrls.put(key, urlsForHost);
               }
                     _urls++;
                     if(_urls%100000==0)
                     System.err.println("Key :"+key+" Value :"+urlsForHost+"Urls Count :"+_urls);
          }


     }

    static      Hashtable hostToUrls = new Hashtable();
          static      Hashtable ht= new Hashtable();
            static long _urls=0;

}
0
 
sudhakar_koundinyaAuthor Commented:
CEHJ,

any updates??
0
 
sudhakar_koundinyaAuthor Commented:
Hello objects,

Are you able to find some benchmark solution??
0
 
CEHJCommented:
>>
for 1823794 URLS your method get 1860 lists
where as actuals are 1984
>>

Is that the one using String as the hash key or Integer?
0
 
sudhakar_koundinyaAuthor Commented:
that is Integer
0
 
CEHJCommented:
What did i tell you? ;-) Try the one using String(Buffer)
0
 
sudhakar_koundinyaAuthor Commented:
But the later one is faster know??. Whay it is failing to get the right hashcode??
0
 
CEHJCommented:
I know it's faster, but there's an  problem inherent to it: Integer's hashCode is essentially 'synonymous' with its equals method. That means that if a hashcode collision *does* occur, and this is more likely with such large amounts of data as you have [too large for me to handle here unfortunately], then the equals method will not 'save' it from replacing the previous entry in the table. This is not the case with String, as the String needs to be equal as well as returning the same hashcode before the previous entry will get kicked out of the table.

Of course, i'm not certain that this is why the anomaly occurs - it could be an error in my code ;-)
0
 
sudhakar_koundinyaAuthor Commented:
OK CEHJ,

I end up with ur StringBuffer related code

thanks
sudhakar
0
 
CEHJCommented:
:-)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 62
  • 39
  • 16
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now