Solved

DAta weighting

Posted on 2011-03-08
15
359 Views
Last Modified: 2012-08-13
Dear all,
I came into a problem today that i didn't think about it before.
I have sets of features:
Some sets have only 1 feature. other 5 features.
Example: set1 = [f1], set2 = [f1,f2,f3,f4], set3 = [f1.....f20] etc...
as u can see Set2 will have more weight than set 1 in a classification and set3 will have more weights than all of them.


So My question is how can make set 1 equal in weight like set2. Is there way to give a weight to set1? or normalization?

It's a very hard question. I almost completed the tests. While i thought of that problem.

bad luck i guess.
anyway Thank you for your help!
0
Comment
Question by:dadadude
  • 8
  • 4
  • 2
15 Comments
 
LVL 27

Expert Comment

by:aburr
ID: 35071205
It depends entirely on the relation of the features to each other and what you want to do with the features.
If one of the features is color and another weight with no relation between them , it does not make any difference.
More help could be available is you could give an example of what features are included and the relationship between them.
0
 

Author Comment

by:dadadude
ID: 35071706
thank you.
actually i have 11 sets of features:
sets with 1 feature.
Height
width
excentricity
global density
direction
perimiter
Compacity
-----------------------------------
density: 9 features.
orientations: 10 features
Zernike moments: 25 features
0
 

Author Comment

by:dadadude
ID: 35071715
so suppose that i select the height and Zernike moments for classification wont the zernike moments have more weight than height which is of size 1?
0
 
LVL 27

Expert Comment

by:aburr
ID: 35072027
"wont the zernike moments have more weight than height which is of size 1?"
not necessarily. Why are you classifying them?
I am unsure why you are classifying? and what do you mean by feature? (perhaps value?)
If you are classifying them on the ability to float, density is critical but direction is immaterial.
For example, I do not understand what you mean when you say height has one feature
and density 9
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 35073986
Are you building a classifier with a neural network or some kind of regression analysis? The weights for the individual features should be based on how well that feature corresponds to whatever you are classifying. It doesn't matter what features are in what group. Each one should be considered as an individual.
I count 51 total features. How many data points do you have?
0
 

Author Comment

by:dadadude
ID: 35074720
Sorry for the delay i'll attach to this mesage the file. Actually Tommy i am using genetic algorithms with Kmeans.
so for the feature selection it's either 1 or 0 that's where i found the problem.
Well i made an error. Actually i have 59 feature And 151 variable as u can see in the file.
as i said earlier these 59 features are grouped into sets. So u can have a set of zernike moments, u can have a set for the height.
Like for example Column 1 is the height. Column 2 is the width which are considered sets of 1 feature.
the zernike moments are the last 25 columns or in another way a set of 25 features.

as i  was saying in the genetic algorithm i'll have something like [1 0 1 0 0 0 0 0 0 1 1], this represents the 11 sets. a 1 indicates that a set is selected and 0 no.
So if height is selected it's just 1 Column, where as if set 11 is selected it will be 11 columns.

that's where i think there is a problem. Sorry if i am not clear i'll explain it again if u want.

Thank you. feature145.txt
0
 

Author Comment

by:dadadude
ID: 35074736
so if i select Height and the zernike Moments. Wont the zernike moments have more weight in the classification?
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 37

Expert Comment

by:TommySzalapski
ID: 35076413
Why do you group the features like that? Shouldn't you just have an array of size 59 instead of 11?

Um 151 data points is really low for the number of features you have. If this is an academic excercise, then don't worry about it; your professor just needs to improve his assignments. If this is for real, you need a lot more data.

The process of running the genetic algorithm/neural network/classifier will determine the weights for you. It should be good enough to pick the correct features and assign weights if applicable no matter what order they are in.
0
 

Author Comment

by:dadadude
ID: 35081856
I wish if it was an assignment lol i would have to worry about it lolololol.

i was thinking about the same thing, using the 59 features. But they asked me to group them in descriptors. Actually that problem came to my mind yesterday.
What i am thinking is if i can normalize like a Descriptor of 11 features u know into 1. so that all the descriptors will have equal chance. Anyway that's another question that i have to solve.

0
 

Author Comment

by:dadadude
ID: 35082018
I have a last comment if u can help it will be great.
I want to know if there is a method that merges columns into 1 column. I just need a technical name that's all. I am trying to look for it on google.
Is that possible.
like for example:
C1 C2 C3 C4                                          One Column
1   0.1 .
2   0.2  .             =====>>>>>          
3   0.3   .
4   0.4   .
0
 

Author Comment

by:dadadude
ID: 35082182
I GOT AN IDEA!!!! TELL ME WHAT DO U THINK????
i'll combine the GEnetic algorithm with PCA !!

So suppose that i selected Descriptor 1 and Descriptor 2
if desc2 has 11 columns i'll select the best one! what do u think???
0
 

Author Comment

by:dadadude
ID: 35082186
or 2 layers genetic algorithm what ever... Is that a good idea?
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 35096963
I want to know if there is a method that merges columns into 1 column.
No. The only way that would make any sense would be if there was a sensible way to merge. Like if you had 4 columns that were 0 or 1 for presence of certain defects, you could have a count column for total defects, but in general it is a bad idea to try to merge columns because you lose information.

Doing some kind of two layer setup is the same as doing it will all 59 features.
If you just do it with the 11 features (as you were instructed), then each of the 11 will have equal weight. You might miss some important information if one of the groups has one really good feature and 10 or so bad ones, then you lose the benefit of the one somewhat. Also, if there's multicorrelation (where some features are proportional to others) that can mess it up.
0
 
LVL 37

Accepted Solution

by:
TommySzalapski earned 500 total points
ID: 35096981
But, you still will never get very good results with only 151 data points. You need more. Tell whoever gave you the assignment that you need more data or no matter how good your algorithm, the results will be unreliable.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Article by: Nadia
Suppose you use Uber application as a rider and you request a ride to go from one place to another. Your driver just arrived at the parking lot of your place. The only thing you know about the ride is the license plate number. How do you find your U…
This article seeks to propel the full implementation of geothermal power plants in Mexico as a renewable energy source.
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now