Solved

What are the best tools for performing data mining cluster analysis

Posted on 2011-02-22
26
1,246 Views
Last Modified: 2013-11-15
What are the best tools for performing data mining cluster analysis? It does not matter whether they are expensive/cheap, open source/proprietary software. All options are taken into consideration.

For example, I need to select or customize cluster fragmentation rules

P.S.: Couldn't find appropriate question zones so selected this 2.
0
Comment
Question by:AlexKostrub
  • 9
  • 7
  • 5
  • +1
26 Comments
 
LVL 45

Expert Comment

by:patrickab
ID: 34950320
Might it be easiest to calculate the standard deviation and then specify how many standard deviations from the mean the data is considered to be within a 'cluster'?
0
 

Author Comment

by:AlexKostrub
ID: 34950726
It can be efficient only for continuous values (i.e. numbers) but I need to analyze descrete values (for example: tuples of fixed cardinality from some set of discrete values ) where it is unknown how to set metric
0
 
LVL 45

Expert Comment

by:patrickab
ID: 34951056
>for example: tuples of fixed cardinality from some set of discrete values

Not sure I understand the problem. If there's a set of discreet values why can they not be treated as a continuous set?
0
 

Author Comment

by:AlexKostrub
ID: 34991011
I mean that I have 6 different descrete values for one tuple. And it is unknown how to calculate mean and standard deviation.
0
 
LVL 45

Expert Comment

by:patrickab
ID: 34991024
AlexKostrub,

Please upload your file.

Patrick
0
 

Author Comment

by:AlexKostrub
ID: 34991351
Each row is a tuple which consists of 6 numbers. My aim is to group this tuples into several groups according to some patterns.
Report3.txt
0
 
LVL 45

Expert Comment

by:patrickab
ID: 34991419
What do the tuples represent?
0
 

Author Comment

by:AlexKostrub
ID: 34991574
Each tuple consists of numbers of one lottery play
0
 
LVL 45

Expert Comment

by:patrickab
ID: 34992271
If this is an attempt to predict lottery numbers then I cannot help.
0
 

Author Comment

by:AlexKostrub
ID: 34995538
Not predict but group tuples of numbers into several different classes
0
 
LVL 45

Expert Comment

by:patrickab
ID: 34995800
>Not predict but group tuples of numbers into several different classes

What does that mean?
0
 

Author Comment

by:AlexKostrub
ID: 34999353
This means that I assume lottery winning numbers can be divided into groups. I treat a row from the file above as one unit. I want to divide this units into several groups.
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 45

Expert Comment

by:patrickab
ID: 35000077
AlexKostrub,

I'm afraid I'm leaving this question to others.

Patrick
0
 
LVL 37

Accepted Solution

by:
TommySzalapski earned 250 total points
ID: 35045001
There is no possible way to effectively data mine or do any analysis of any kind on winning lottery numbers. They are random, independent events.
If you flip a fair coin and get heads 5 times in a row, what is the chance that the next flip will be tails? 50%. Every time. It doesn't matter at all, ever, what has happened in the past. There is no effect whatsoever on future events.
Please, do not waste your time and resources on trying to make patterns out of independent random data. It will never work. Guaranteed.
I have a bachelor's degree in math with a statistics emphasis. I also have a master's in computer science and have done my fair share of data mining. Everyone who studies either will agree with me.
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 35045006
Random data will appear to form patterns, but those patterns are mirages and are not real. The only possible way that analysis of lotteries could be of any use would be if they were generated by an especially poor random number generator, but no reasonably sized lottery or casino would dare do that.
0
 
LVL 10

Expert Comment

by:itcouple
ID: 35045794
Hi

I don't know data mining very well but disagree that you cannot find patterns in "chaos". The reason for that is you have x numbers and you take x numbers from it then this is your base pattern in your "chaos" (so it is not true chaos). Example. Depends on the lottery but if you draw 20 numbers out of 80 and you want to hit 10 out of them (The lottery I analyzed a while ago) there is a pattern 10 numbers in a row do happen very rarelly but do happen. Also if you split 80 into add / even you can range of 3-17 odds/evens so occasionaly your lottery is 17 out of 40 where you want to hit 10..(very rare).

Appologies if it isn't applicabe to this question... but thought I will share that BECAUSE I want to use data mining myself (I need to learn it at some point) but not for lottery; in my case for predicting games results based on carefully selected data (which I don't have yet).

Regards
Emil
0
 
LVL 45

Expert Comment

by:patrickab
ID: 35045863
I agree with Tommy. I decided to leave this question to others as I believe it' a total waste of time analysing lottery results in the hope of predicting the winning numbers and I have no intention of wasting any of my time on such a pointless venture.
Patrick
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 35046884
I never said you can't find patterns. You can. All the time. They just don't mean anything because the patterns you find have no bearing on the future. At all. Ever. The end. It's a mathematically proven fact.

Many people try to demonstrate otherwise and even write books and sell their ideas, but they are either seriously misguided or liars who are preying on the simple minded.
0
 
LVL 10

Expert Comment

by:itcouple
ID: 35047361
Hi

My understanding was that you don't know if a particular customer will buy a particular product from a particular range because you don't know the future, but you can identify more or less what he might buy and when if you have sufficient data.

From lottery point of view.... (depends on lottery).... if a lottery draws 20 balls out of 80 and I want to hit 10 and I know that 10 in a row appears once every 2 years with maximum 4 years and the last that that occured was 3.5 years then taking the same numbers (70 combinations) gives me good chances of winning.... if I'm lucky.... so let says it will occur with after 8 months (above max) and it is daily draw and costs 2 zloty :) then 70x2zloty=140 * 8months *30 days = 33 600 zloty and hitting 10 is 100 000 zloty (and many 9s 8s and so on) so you win by predicting the future...... unless you are very unlucky.

Maybe this is not pure data mining but I think close enough ;)

Regards
Emil
0
 

Author Comment

by:AlexKostrub
ID: 35056920
It is wide known that probability of falling out of single number is statistically almost equal for all listed numbers. But I think that probability of falling out of some combinations of numbers is more probable than others and analysis of fallen numbers of several lotteries shows that it is true. Although statistically it is incorrect to make conclusions on falling out numbers on finite dataset of results but in practice it has sense. My aim is to find such combinations of numbers that are more probable than other combinations.
0
 
LVL 10

Expert Comment

by:itcouple
ID: 35057150
I'm not sure if the data mining can answer this question but my previous approach was odd vs evens to increase probability (less frequent but more probable to hit) obviously by always using the same numbers. this is more efficient if we wait for the right moment, but we need to be still lucky. Total balls in a row seems to be good in certain lotteries but it is rare, may take long time to occur again and can be rather expensive with obviously the risk of getting unsual scenario.

Anyway forex seems to be easier then this especially for short time of periods (minutes = low risk = low gain) - using statistics only... as there are certain expected behaviours.

Regards
Emil

0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 35075335
My aim is to find such combinations of numbers that are more probable than other combinations.
There are none. There are definitely some that have come up more than others in the past, but that means absolutely nothing. The future will do whatever it pleases.
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 35075371
odd vs evens to increase probability (less frequent but more probable to hit) obviously by always using the same numbers.
That actually doesn't increase your chances at all.
0
 
LVL 45

Expert Comment

by:patrickab
ID: 35121358
>My aim is to find such combinations of numbers that are more probable than other combinations.

i'm afaid tha is  just not going to happen.
'
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Read about achieving the basic levels of HRIS security in the workplace.
Many companies are looking to get out of the datacenter business and to services like Microsoft Azure to provide Infrastructure as a Service (IaaS) solutions for legacy client server workloads, rather than continuing to make capital investments in h…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial will demonstrate in Microsoft Excel how to add style and sexy appeal to horizontal bar charts.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now