Link to home
Start Free TrialLog in
Avatar of mmccy
mmccy

asked on

2D Array of cleaning and normalization !

306,362,348,265,114,88,110,90,110,91,103,81,908,475,581,4248,416936,1
..............
320,378,361,298,113,89,109,90,113,93,105,90,926,469,590,4478,426774,1
346,387,349,278,92,80,95,87,86,78,78,70,868,428,538,4279,365147,2

1)I have a problem for data cleaning and normalization, I need to read the text file(the above format)  into a 2D Arraylist and the last column indicated the userid and the first 16 of each row is the user's sample, treat the last column as a userid for indication in the 2D Array is ok !!!!

 2) I need to compare the data of each column with the median of each column(totaly 16 columns for calculations, except the useid column which still use for indication only ) , if the cell contains a data which is > median x 3 , then the whole ROW of the data sample will be deleted.(cleaning)

3) After that I need to normalize the data in the cleaned arraylist, using the min-max matching scores that is
normalized data(each cell)=[s - min(S)]/[max(S)-min(S)]
where s is the original data, min(S) is the mininium value of the each COLUMN and max(S) is the maximium value of each COLUMN!!

4) Finally , after that a cleaned and normalized Arraylist is created. then print it out to a text file agin !!

any ideas for the above and some sample code is preferred !

Thank you very much !
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

2) It's definitely the median for the *column* i.e. reading vertically?
Avatar of mmccy
mmccy

ASKER

yes !!
Avatar of mmccy

ASKER

by the way, have u heard of 10 fold cross validation and Euclidean distance ?
I saw your previous posts.
What is the structure you use?
ArrayList, [][], or other?

Giant.
if you use the ArrayList you can delete a row directly using remove method of ArrayList without worry about compact the ArrayList.
if you use [][] you must use System.arraycopy to copy deleting the particular row.

Giant.
Avatar of mmccy

ASKER

basically it doesn't matter , just easier of me to understand is ok !!
now I know how to do with arraylistofarraylist(arraylistof int[]) ! can I use array[][] ?
Avatar of mmccy

ASKER

oic !
I prefer Arraylist of int[] then, because I need to other calculation after the normalization !!
sure, but it's little more difficultbecause you must write some method allready implemented with ArrayList (for example).
This is the Euclidean Distance definition:
http://www.nist.gov/dads/HTML/euclidndstnc.html
Avatar of mmccy

ASKER

Arraylist of int[] makes me easier to calculate the distance between two different rows !!
>Arraylist of int[] makes me easier to calculate the distance between two different rows !!
yes.
Avatar of mmccy

ASKER

yes ED !! this is what I will finally do with the normalised arraylist !! I understand the concept but I need to put it into programming in a very short time !!
     public ArrayList normalize(ArrayList original,int[] averages){
            int i=0;
            while (original.size()>=i){
                  int[]el=(int[])original.get(i);
                  boolean remove=false;
                  for (int k=0;k<el.length;k++){
                        if (el[k]>(averages[k]*3)){remove=true;break;}
                  }//end for k
                  if (remove){original.remove(i);}
                  else {i++;}
            }//end while
            return original;
      }
What did you mean with "S" and "s" ?
this read the file and insert data in an ArrayList of int[]
      public ArrayList readFile(String fileName) {
            ArrayList ret = new ArrayList();
            try {
                  RandomAccessFile fos = new RandomAccessFile(fileName, "r");
                  fos.seek(0);
                  String line = fos.readLine();
                  while (line != null) {
                        StringTokenizer t = new StringTokenizer(line, ",");
                        ArrayList lineArray = new ArrayList();
                        while (t.hasMoreTokens()) {
                              lineArray.add(t.nextToken());
                        }
                        line = fos.readLine();
                        int[] lineInt=new int[lineArray.size()];
                        for (int i=0;i<lineArray.size();i++){
                              lineInt[i]=Integer.parseInt(lineArray.get(i).toString());
                        }
                        ret.add(lineArray);
                  }
                  fos.close();
            } catch (IOException ex) {
                  System.out.println(ex);
            }

            return ret;
      }

Avatar of mmccy

ASKER

s means the original data (before normalized)
min(S) means the mininium value of the one column , max(S) means the maximium value of one column ! the column is formed by the i index of each int[i] ! if I got 1000 rows then I will have 1000 values in one column, find the minimium and maximium of the column and normalized the data in the cell !!
something like this to write:

      public void writeFile(String fileName,ArrayList original){
            try {
                  RandomAccessFile fos = new RandomAccessFile(fileName, "w");
                  fos.seek(0);
                  for (int i=0;i<original.size();i++){
                        String line=original.get(i).toString()+"\n";//or other thing you want to write
                        fos.write(line.getBytes());
                  }
                  fos.close();
            } catch (IOException ex) {
                  System.out.println(ex);
            }
      }

Avatar of mmccy

ASKER

in my program, I try to use float[] because some decimal value is calculated after the normalization !!
ASKER CERTIFIED SOLUTION
Avatar of Giant2
Giant2

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
this for get the minimum array and maximum array:

      public float[] maxims(ArrayList list){
            float[] maxs= new float[((float[])list.get(0)).length];
            for (int i = 0; i < list.size(); i++) {
                  float[] a = (float[]) list.get(i);
                  for (int j = 0; j < a.length; j++) {
                        if (maxs[j]<a[j])maxs[j]=a[j];
                  }
            }
            return maxs;
      }
      
      public float[] minims(ArrayList list){
            float[] mins= new float[((float[])list.get(0)).length];
            for (int i = 0; i < list.size(); i++) {
                  float[] a = (float[]) list.get(i);
                  for (int j = 0; j < a.length; j++) {
                        if (mins[j]>a[j])mins[j]=a[j];
                  }
            }
            return mins;
      }
      

Hope this help you.
Bye, Giant.
Avatar of mmccy

ASKER

public static void main( String args[] )
    {
        readfile_test1  process = new readfile_test1();
            String file="testFile";      
        ArrayList uncleanedlist;
        try
        {
            uncleanedlist       = readfile_test1.readFile(file);
            ///how can I pass the uncleanedlist into the normalise with the float[] ??
           
         }
           
       
        catch ( FileNotFoundException e )
        {
            e.printStackTrace(  );
        }
    }
     public static void main( String args[] ){
            FileReadAndTokenize process=new FileReadAndTokenize;
            String file="testFile";    
            ArrayList uncleanedlist=new ArrayList();
            try
            {
                  uncleanedlist      = process.readFile(file);
                  ///how can I pass the uncleanedlist into the normalise with the float[] ??
                  float[] averages=process.average(uncleanedlist);
                  ArrayList cleanedlist=process.normalize(uncleanedlist,averages);
                  float[] maxims=process.maxims(cleanedlist);
                  float[] minims=process.minims(cleanedlist);
                  ArrayList normalizedlist=process.normalize2(cleanedlist,maxims,minims);
                  process.writeFile("outFile",normalizedlist);
             }
           
       
            catch ( FileNotFoundException e )
            {
                  e.printStackTrace(  );
            }
      }


I call the class FileReadAndTokenize, so I create in the main an instance of it (its name is process) with:
FileReadAndTokenize process = new FileReadAndTokenize();

Hope this help you.
Giant.
Avatar of mmccy

ASKER

oh !! yes !! I understand !!
but there seems no method called average there ?
Avatar of mmccy

ASKER

also this seems no need to use average !!
     public float[] average(ArrayList list) {
            float[] tot = new float[((float[]) list.get(0)).length];
            float[] averages = new float[tot.length];
            for (int i = 0; i < list.size(); i++) {
                  float[] a = (float[]) list.get(i);
                  for (int j = 0; j < a.length; j++) {
                        tot[j] += a[j];
                  }
            }
            System.out.println("AVERAGES");
            int numRows = list.size();
            for (int i = 0; i < tot.length; i++) {
                  averages[i] = tot[i] / numRows;
                  System.out.println("column " + i + " average=" + averages[i]);
            }
            return averages;
      }
Avatar of mmccy

ASKER

oh !! this is the first time when I try to learn how to read data into 2DArray !! it is not related !!!
Sorry about that !!! basically I need to find the median of the column instead of the mean of the column !!!
>median of the column instead of the mean of the column !!!
??
Avatar of mmccy

ASKER

something like

static boolean delete( float[] listOfIntegers )
    {
        boolean delete = false;
        Arrays.sort( listOfIntegers );

        float median      = listOfIntegers[ listOfIntegers.length / 2 ];
        float targetValue = median * 3;

        for ( int i = 0; ( i < listOfIntegers.length ) && ( !delete ); i++ )
        {
            float testValue = listOfIntegers[ i ];
            delete = testValue > targetValue;
            System.out.println( "test=" + testValue + ", target=" + targetValue );
        }

        return delete;
    }
Avatar of mmccy

ASKER

I think this one is finding the median of one row !! am I right ?
Avatar of mmccy

ASKER

I want to find the median of the column and once the data(in a cell) is greater than median x3 , delete the whole ROW !!
ah! Ok.

   public float[] medium(ArrayList list) {
          float[] medium= new float[((float[]) list.get(0)).length];
          int mediumPosition=list.size()/2;
               for (int j = 0; j < a.length; j++) {
                    medium[j]=((float[])list.get(mediumPosition))[j];
               }
          return medium;
     }
and this is the main:
   public static void main( String args[] ){
          FileReadAndTokenize process=new FileReadAndTokenize();
          String file="testFile";    
          ArrayList uncleanedlist=new ArrayList();
          try
          {
               uncleanedlist      = process.readFile(file);
               ///how can I pass the uncleanedlist into the normalise with the float[] ??
               float[] medium=process.medium(uncleanedlist);
               ArrayList cleanedlist=process.normalize(uncleanedlist,medium);
               float[] maxims=process.maxims(cleanedlist);
               float[] minims=process.minims(cleanedlist);
               ArrayList normalizedlist=process.normalize2(cleanedlist,maxims,minims);
               process.writeFile("outFile",normalizedlist);
           }
          catch ( FileNotFoundException e )
          {
               e.printStackTrace(  );
          }
     }

Hope this is what you are lloking for.

Bye, Giant.
Avatar of mmccy

ASKER

public float[] medium(ArrayList list) {
          float[] medium= new float[((float[]) list.get(0)).length];
          int mediumPosition=list.size()/2;
               for (int j = 0; j < a.length; j++) {
                    medium[j]=((float[])list.get(mediumPosition))[j];
               }
          return medium;
     }

is it a = list ?
Avatar of mmccy

ASKER

oh !! a should medium ! right ?
Avatar of mmccy

ASKER

null
press any key to continue............

such error after I run the program , any idea ?
Can you show us what output comes from running the answer on those first three rows of data?
...meaning the normalization process
Thanks for accepting.

>null
>press any key to continue............
>such error after I run the program

What do you use to run the program?
Avatar of mmccy

ASKER

I am using java 1.4.2_02
I run it by typing java classify in the jdk\bin\  (I put the class file and the text file in it)
classify is the class name !!
Are there any exceptions?
Avatar of mmccy

ASKER

java.lang.ClassCastException
        at classify.medium(classify.java:50)
        at classify.main(classify.java:115)
Press any key to continue...
see the line 50 of the class classify.
What is this line (post it please).
Avatar of mmccy

ASKER

float[] medium= new float[((float[]) list.get(0)).length];
try to replace with these lines:
System.out.println(list.get(0));
float[] medium= new float[((float[]) list.get(0)).length];
and tell me what it display (the error I believe will be at line 51)
Avatar of mmccy

ASKER

[321, 371, 361, 305, 112, 88, 109, 89, 115, 97, 101, 91, 922, 468, 586, 4333, 41
5047, 1]
java.lang.ClassCastException
        at classify.medium(classify.java:53)
        at classify.main(classify.java:118)
Press any key to continue...
what is list object?

list object I believe is an ArrayList of float[].
in the last post I understand it's an ArrayList of int[], isn't it?
Avatar of mmccy

ASKER

yes Arraylist of float[] !
because needed to calculate decimal numbers !
try this:

 public float[] medium(ArrayList list) {
System.out.println((list.get(0)).getClass().getName());
          float[] medium= new float[((float[]) list.get(0)).length];
          int mediumPosition=list.size()/2;
               for (int j = 0; j < medium.length; j++) {
                    medium[j]=((float[])list.get(mediumPosition))[j];
               }
          return medium;
     }

Avatar of mmccy

ASKER

java.util.ArrayList
java.lang.ClassCastException
        at classify.medium(classify.java:62)
        at classify.main(classify.java:128)
Press any key to continue...
So you have an ArrayList of ArrayList. Is it so?

public float[] medium(ArrayList list) {
          float[] medium= new float[((ArrayList) list.get(0)).size()];
          int mediumPosition=list.size()/2;
               for (int j = 0; j < medium.length; j++) {
                    medium[j]=(float)(((ArrayList)list.get(mediumPosition)).get(j));
               }
          return medium;
     }
Avatar of mmccy

ASKER

inconvertible types error after compile at
medium[j]=(float)(((ArrayList)list.get(mediumPosition)).get(j));
?????
Could ou post the method you use for read data from file?
Avatar of mmccy

ASKER

public ArrayList readFile(String fileName) {//read in the text file into the Arraylist of float[]
          ArrayList ret = new ArrayList();
          try {
               RandomAccessFile fos = new RandomAccessFile(fileName, "r");
               fos.seek(0);
               String line = fos.readLine();
               while (line != null) {
                    StringTokenizer t = new StringTokenizer(line, ",");
                    ArrayList lineArray = new ArrayList();
                    while (t.hasMoreTokens()) {
                         lineArray.add(t.nextToken());
                    }
                    line = fos.readLine();
                    float[] lineInt=new float[lineArray.size()];
                    for (int i=0;i<lineArray.size();i++){
                         lineInt[i]=Float.parseFloat(lineArray.get(i).toString());
                    }
                    ret.add(lineArray);
               }
               fos.close();
          } catch (IOException ex) {
               System.out.println(ex);
          }

          return ret;
     }

public static void main( String args[] ){
          
          classify process=new classify();
             
          ArrayList uncleanedlist=new ArrayList();
          try
          {
               uncleanedlist      = process.readFile("testFile");
               float[] medium2=process.medium(uncleanedlist);
               ArrayList cleanedlist=process.cleaning(uncleanedlist,medium2);
               float[] maxims=process.maxims(cleanedlist);
               float[] minims=process.minims(cleanedlist);
               ArrayList normalizedlist=process.normalize(cleanedlist,maxims,minims);
               process.writeFile("outFile",normalizedlist);
               
               System.out.print("total line read after normalizedlist is " + normalizedlist.size());
          }
           
          catch ( Exception e )
          {
               e.printStackTrace(  );
          }

     }
these are correct methods:

      public float[] medium(ArrayList list) {
            //System.out.println((list.get(0)).getClass().getName());
            float[] medium = new float[((ArrayList)list.get(0)).size()];
            int mediumPosition = list.size() / 2;
            for (int j = 0; j < medium.length; j++) {
                  medium[j] = Float.valueOf((((ArrayList)list.get(mediumPosition)).get(j)).toString()).floatValue();
            }
            return medium;
      }

      public float[] maxims(ArrayList list) {
            float[] maxs = new float[((ArrayList)list.get(0)).size()];
            for (int i = 0; i < list.size(); i++) {
                  ArrayList a = (ArrayList) list.get(i);
                  for (int j = 0; j < a.size(); j++) {
                        if (maxs[j] < Float.valueOf((String)a.get(j)).floatValue())
                              maxs[j] = Float.valueOf((String)a.get(j)).floatValue();
                  }
            }
            return maxs;
      }

      public float[] minims(ArrayList list) {
            float[] mins = new float[((ArrayList)list.get(0)).size()];
            for (int i = 0; i < list.size(); i++) {
                  ArrayList a = (ArrayList) list.get(i);
                  for (int j = 0; j < a.size(); j++) {
                        if (mins[j] > Float.valueOf((String)a.get(j)).floatValue())
                              mins[j] = Float.valueOf((String)a.get(j)).floatValue();
                  }
            }
            return mins;
      }

      public ArrayList readFile(String fileName) throws IOException {
            ArrayList ret = new ArrayList();
            RandomAccessFile fos = new RandomAccessFile(fileName, "r");
            fos.seek(0);
            String line = fos.readLine();
            while (line != null) {
                  StringTokenizer t = new StringTokenizer(line, ",");
                  ArrayList lineArray = new ArrayList();
                  while (t.hasMoreTokens()) {
                        lineArray.add(t.nextToken());
                  }
                  line = fos.readLine();
                  float[] lineInt = new float[lineArray.size()];
                  for (int i = 0; i < lineArray.size(); i++) {
                        lineInt[i] = Float.parseFloat(lineArray.get(i).toString());
                  }
                  ret.add(lineArray);
            }
            fos.close();
            return ret;
      }

      public void writeFile(String fileName, ArrayList original) throws IOException {
            RandomAccessFile fos = new RandomAccessFile(fileName, "rw");
            fos.seek(0);
            for (int i = 0; i < original.size(); i++) {
                  String line = original.get(i).toString() + "\n"; //or other thing you want to write
                  fos.write(line.getBytes());
            }
            fos.close();
      }

      public ArrayList normalize(ArrayList original, float[] averages) {
            int i = 0;
            while (original.size() > i) {
                  ArrayList el = (ArrayList) original.get(i);
                  boolean remove = false;
                  for (int k = 0; k < el.size(); k++) {
                        if (Float.valueOf((String)el.get(k)).floatValue() > (averages[k] * 3)) {
                              remove = true;
                              break;
                        }
                  } //end for k
                  if (remove) {
                        original.remove(i);
                  } else {
                        i++;
                  }
            } //end while
            return original;
      }

      public ArrayList normalize2(ArrayList original, float[] maxs, float[] mins) {
            for (int i = 0; i < original.size(); i++) {
                  ArrayList el = (ArrayList) original.get(i);
                  for (int k = 0; k < maxs.length; k++) {
                        el.set(k, String.valueOf((Float.valueOf((String)el.get(k)).floatValue() - mins[k]) / (maxs[k] - mins[k])));
                  } //end for k
            } //end for i
            return original;
      }

      public ArrayList[] divide(ArrayList original, int numOfSubset) {
            ArrayList[] ret = new ArrayList[numOfSubset];
            int subset = 0;
            int nrlen = original.size() / numOfSubset;
            int pos = 0;
            while (subset < numOfSubset && pos < original.size()) {
                  ret[subset].add(original.get(pos));
                  pos++;
                  if (pos == nrlen)
                        subset++;
            }
            if (pos < original.size()) {
                  for (int i = pos; i < original.size(); i++)
                        ret[numOfSubset - 1].add(original.get(i));
            }
            return ret;
      }

Tell me if all Ok now.
Giant.
Avatar of mmccy

ASKER

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.RangeCheck(ArrayList.java:507)
        at java.util.ArrayList.get(ArrayList.java:324)
        at classify1.normalize(classify1.java:97)
        at classify1.main(classify1.java:133)
Press any key to continue...


what is the line nr. 97 of the class classify1 ?
Avatar of mmccy

ASKER

el.set(k, String.valueOf((Float.valueOf((String)el.get(k)).floatValue() - mins[k]) / (maxs[k] - mins[k])));
If I well remember the code correct is:
Try:

for (int k = 0; k < el.size(); k++) {
      el.set(k, String.valueOf((Float.valueOf((String)el.get(k)).floatValue() - mins[k]) / (maxs[k] - mins[k])));
} //end for k

Tell me if it's Ok.
Avatar of mmccy

ASKER

yes it is !! I change normalize2 to normalize and normalize to cleaning !!
public ArrayList normalize(ArrayList original, float[] maxs, float[] mins) {
          for (int i = 0; i < original.size(); i++) {
               ArrayList el = (ArrayList) original.get(i);
               for (int k = 0; k < maxs.length; k++) {
                    el.set(k, String.valueOf((Float.valueOf((String)el.get(k)).floatValue() - mins[k]) / (maxs[k] - mins[k])));
               } //end for k
          } //end for i
          return original;
     }
ok.
Try what I posted:
public ArrayList normalize(ArrayList original, float[] maxs, float[] mins) {
          for (int i = 0; i < original.size(); i++) {
               ArrayList el = (ArrayList) original.get(i);
               for (int k = 0; k < el.length; k++) {//here I changed
                    el.set(k, String.valueOf((Float.valueOf((String)el.get(k)).floatValue() - mins[k]) / (maxs[k] - mins[k])));
               } //end for k
          } //end for i
          return original;
     }
Avatar of mmccy

ASKER

for (int k = 0; k < el.length; k++)
incompatiable type varaible length !
is it el.size() ?
>is it el.size()
yes, it's.
0.000,0.000,0.000,0.000,1.000,0.889,1.000,1.000,0.889,0.867,0.926,0.550,0.690,1.000,0.827,0.000,0.840
0.350,0.640,1.000,1.000,0.955,1.000,0.933,1.000,1.000,1.000,1.000,1.000,1.000,0.872,1.000,1.000,1.000
1.000,1.000,0.077,0.394,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.135,0.000

is what i get for those three rows of normalized data to 3 dp