# DAta weighting

Dear all,
I came into a problem today that i didn't think about it before.
I have sets of features:
Some sets have only 1 feature. other 5 features.
Example: set1 = [f1], set2 = [f1,f2,f3,f4], set3 = [f1.....f20] etc...
as u can see Set2 will have more weight than set 1 in a classification and set3 will have more weights than all of them.

So My question is how can make set 1 equal in weight like set2. Is there way to give a weight to set1? or normalization?

It's a very hard question. I almost completed the tests. While i thought of that problem.

anyway Thank you for your help!
aburr

It depends entirely on the relation of the features to each other and what you want to do with the features.
If one of the features is color and another weight with no relation between them , it does not make any difference.
More help could be available is you could give an example of what features are included and the relationship between them.

thank you.
actually i have 11 sets of features:
sets with 1 feature.
Height
width
excentricity
global density
direction
perimiter
Compacity
-----------------------------------
density: 9 features.
orientations: 10 features
Zernike moments: 25 features

so suppose that i select the height and Zernike moments for classification wont the zernike moments have more weight than height which is of size 1?
"wont the zernike moments have more weight than height which is of size 1?"
not necessarily. Why are you classifying them?
I am unsure why you are classifying? and what do you mean by feature? (perhaps value?)
If you are classifying them on the ability to float, density is critical but direction is immaterial.
For example, I do not understand what you mean when you say height has one feature
and density 9
Are you building a classifier with a neural network or some kind of regression analysis? The weights for the individual features should be based on how well that feature corresponds to whatever you are classifying. It doesn't matter what features are in what group. Each one should be considered as an individual.
I count 51 total features. How many data points do you have?

Sorry for the delay i'll attach to this mesage the file. Actually Tommy i am using genetic algorithms with Kmeans.
so for the feature selection it's either 1 or 0 that's where i found the problem.
Well i made an error. Actually i have 59 feature And 151 variable as u can see in the file.
as i said earlier these 59 features are grouped into sets. So u can have a set of zernike moments, u can have a set for the height.
Like for example Column 1 is the height. Column 2 is the width which are considered sets of 1 feature.
the zernike moments are the last 25 columns or in another way a set of 25 features.

as i  was saying in the genetic algorithm i'll have something like [1 0 1 0 0 0 0 0 0 1 1], this represents the 11 sets. a 1 indicates that a set is selected and 0 no.
So if height is selected it's just 1 Column, where as if set 11 is selected it will be 11 columns.

that's where i think there is a problem. Sorry if i am not clear i'll explain it again if u want.

Thank you. feature145.txt

so if i select Height and the zernike Moments. Wont the zernike moments have more weight in the classification?
Why do you group the features like that? Shouldn't you just have an array of size 59 instead of 11?

Um 151 data points is really low for the number of features you have. If this is an academic excercise, then don't worry about it; your professor just needs to improve his assignments. If this is for real, you need a lot more data.

The process of running the genetic algorithm/neural network/classifier will determine the weights for you. It should be good enough to pick the correct features and assign weights if applicable no matter what order they are in.

I wish if it was an assignment lol i would have to worry about it lolololol.

i was thinking about the same thing, using the 59 features. But they asked me to group them in descriptors. Actually that problem came to my mind yesterday.
What i am thinking is if i can normalize like a Descriptor of 11 features u know into 1. so that all the descriptors will have equal chance. Anyway that's another question that i have to solve.

I have a last comment if u can help it will be great.
I want to know if there is a method that merges columns into 1 column. I just need a technical name that's all. I am trying to look for it on google.
Is that possible.
like for example:
C1 C2 C3 C4                                          One Column
1   0.1 .
2   0.2  .             =====>>>>>
3   0.3   .
4   0.4   .

I GOT AN IDEA!!!! TELL ME WHAT DO U THINK????
i'll combine the GEnetic algorithm with PCA !!

So suppose that i selected Descriptor 1 and Descriptor 2
if desc2 has 11 columns i'll select the best one! what do u think???

or 2 layers genetic algorithm what ever... Is that a good idea?
I want to know if there is a method that merges columns into 1 column.
No. The only way that would make any sense would be if there was a sensible way to merge. Like if you had 4 columns that were 0 or 1 for presence of certain defects, you could have a count column for total defects, but in general it is a bad idea to try to merge columns because you lose information.

Doing some kind of two layer setup is the same as doing it will all 59 features.
If you just do it with the 11 features (as you were instructed), then each of the 11 will have equal weight. You might miss some important information if one of the groups has one really good feature and 10 or so bad ones, then you lose the benefit of the one somewhat. Also, if there's multicorrelation (where some features are proportional to others) that can mess it up.