[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Data Mining for what?

Posted on 2004-11-25
20
Medium Priority
?
879 Views
Last Modified: 2013-11-16
Why do we need data mining?

Actually i have concepts regarding data mining but not no what is it in real life, like DM imporves the quality of data using various techniques and methods
It can predicts the behaviour of customers...............etc,

DO DM is related Dataware housing?....i think dataware housing is diff to DM, please shed light on me.

cheers
0
Comment
Question by:joy12345
  • 7
  • 6
  • 5
  • +2
20 Comments
 
LVL 7

Assisted Solution

by:wael_tahon
wael_tahon earned 120 total points
ID: 12672963
Datawarehouse:

Is a repository of integrated information, available for queries and analysis.

Datamining:

Through datamining, we can discover and target the most important aspects of our business, focusing on key issues, relationships, and markets.  Datamining systems improve an organization's effectiveness, efficiency and value by increasing the usefulness of the knowledge the organization possesses
---------------------------------------------------

So we use the datawarehousing to move data from the Production database to a new database (Datawarehouse) and we use datamining techniques to get intellegant data from the datawarehouse
0
 

Accepted Solution

by:
Hest42 earned 240 total points
ID: 12674794
Datamining can be view simply as advanced statistical analysis of multi-dimensional data. I disagree that datamining automatically improves anything. There are no guarantees that you will gain anything using datamining. Its in the term 'mining', if there is no gold in the mountain, mining for it will only waste your time.

There are many good introductions to datamining out there that you can read. My favorite is the CRISP-DM (CRoss Industry Standard Process for Data Mining) which is a document that is developed to describe datamining as a 7-step process. You will find it at www.CRISP-DM.org.

In regard to differences between datamining and datawarehousing, you are very right that the 2 are related. In fact it is very common to see datawarehousing and datamining combined. Datawarehousing is used to store data in a different way in order to make for better overview of the context of the data elements. Oracle uses materialized views to store this new view of the data source. Datawarehousing allows for quick access and overview of areas of your data with no regards for other areas. That said, a datawarehouse is only for viewing and analyzing data, you cannot change the data from this storage. In combination with datamining, datawarehousing provides a very nice interface to retrieving data for the purpose of  analysis using datamining.
0
 
LVL 6

Expert Comment

by:pedros7
ID: 12675092
Yes, they are faces to the same coin.
"A datawarehouse is a repository of integrated information, available for queries and analysis.  Data and information are extracted from non-homogeneous sources as they are generated....  This makes it much easier and more efficient to run queries over data that originally came from different sources."


Have a look at these:
http://freedatawarehouse.com/

http://databases.about.com/od/datamining/
http://databases.about.com/library/weekly/aa100700a.htm

Good reading :)
Pedro




0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 

Author Comment

by:joy12345
ID: 12678627
Okay thanks to all i have now few concepts of them.

What i have understood is DM is subset of DW, DM just realted to particular department, and any concerns with that dpt. but DW it covers all the aspects of organization, does DM is just a part of  DW which resides with DW.?

Are they just snapshots to make the processes faster? For example this is just my though in previous time we need to create tables, etc all manually but these days we have s/w to do that, does DM and DW supports on this case just explain me. Why nowadays people are more focused (motivated) on them (DM and DW)

Hest42 yes i like your example of mining, if there is no good data then mining is worthless. could anyone  please let me know some key features and benifits of DM and DW that force organization to implement them.

I have read the CRISP-DM process but i have no idea in real world how they do!

Cheers

0
 

Expert Comment

by:Hest42
ID: 12679553
There are (I think) 4 basic steps that both DM and DW use:

1) Data gathering
2) Data cleaning
3) Data transformation
4) Feature selection

The last involves picking out the columns from the aggregated data you want to include in your project. This is not an easy thing to do!

Anyways, the point is that in these 4 steps DM and DW are very much alike. In DW however you want to store the resulting data in materialized views in a format the represent the context you want. Examples of formats could be Star schema, Constellations, Snow flakes. You then analyze the data using techniques such as drill-down. The point here is that these methods do not apply to DM.

In a DM project you either start from step 1, or if available you can actually use the DW to extract the data you need for your project. This is why they are often found to be combined. Both DM and DW have benefits, but the costs are not doubled when applying both to your data. DM then uses a more algorithmic approach to doing statistical analysis (Clustering or Classification). A few keywords on such topics would be, Bayesian Networks, Kohonen networks, K-means and the EM-algorithm.

The conclusion here is that in my opinion DM is not a subset of DW but the 2 techniques definetely have alot of basics in common.

I hope that helps :)
0
 
LVL 6

Expert Comment

by:pedros7
ID: 12680005
Thats an idea re-inforced on the links i provided above. DM isn't a subset of DW.

DWs are repositories of integrated information, available for queries and analysis.  Its not just storing data but actually gathering information (data as well as processed data) from different systems.
DM is a process for discovering new information or relationships in large data repositories.
(see http://www.research.ibm.com/people/a/almasi/sc95/postrhlp.html)

As they complement each other they are often seen in conjunction!

Cheers
Pedro
0
 
LVL 6

Assisted Solution

by:pedros7
pedros7 earned 200 total points
ID: 12680064
--==--==--=
DataWarehousing: a system that extracts data from these different sources, transforms it and loads it into a shared system from which it can be analysed. It is literally a Warehouse of the organisation's data. It should provide a single, properly structured source of data, packed for fast and efficient analysis.
also see http://www.andrews.edu/ITS/AS/dw/Andrews/WhatIsDW.html

--==--==--=

Data Mining: The main task of Data Mining is the transition from data to knowledge. This transition is done in two steps: First, subsymbolic descriptions of the data are produced by the help of self-organizing maps. Next, symbolic descriptions of the discovered structures are produced by the help of a knowledge conversion tool. This model of the Data Mining process leads to the following task list:

   1. inspection & preprocessing of the dataset
   2. discovery of spatial structures
   3. knowledge conversion
   4. construction of classifiers
   5. validation
also see http://www.mathematik.uni-marburg.de/~johnny/science/datamining.html

--==--==--=

HTH
0
 

Expert Comment

by:Hest42
ID: 12680153
I couldnt agree more on the main task of data mining. It is important to point out how data mining searches for new relationships in the data set. Relationships that would be very hard to find using other tools such as datawarehousing.

Self-organizing maps however is only one way of doing clustering. I assume "subsymbolic description" is another term for clustering. The term datamining covers both clustering and classification techniques, and as such 2 different datamining projects can vary alot depending on the goal. The basics of the data mining task list I can follow, but I disagree on steps 2, 3 and 4 that are optional tasks for data mining.

'Knowledge conversion' covers interpretation of the results of 'discovery of spatial structures'? If this is correct I additionally disagree on the order of tasks. Classification can be the sole result of a datamining project, in which interpretation can be done using e.g. Decision Tree's.

A small note on step 1: I cannot stress enough the importance of preprocessing the dataset. It is crucial, but unfortunately can be very boring to perform :)

A very nice tool for easy approach to data mining is "SPSS Clementine".
For Datawarehousing I know "Oracle WareHouse Builder" is popular, but I am not personally very familiar with this.
0
 
LVL 6

Expert Comment

by:pedros7
ID: 12680207
I agree, 2,3,4 are optional rather than mandatory. The particular paragraph on Datamining was an extract from the link provided aimed at providing a good description of that particular model.
I mainly deal with DW using SQL server and its OLAP implementation, so of course would promote the microsoft camp but Oracle would be a popular option. :)

0
 

Author Comment

by:joy12345
ID: 12680256
What OLAP does with DW, i know its analytic tools which helps to analyse the data.

Sorry i was last time confused with data mart, data mart is part of DW, and it only covers particular department of DW. but what role does it plays with DW.

I have read DW's main development phase like
- Business model
- Dimensional (logical) model
- Model summarise
- Physical model

But i am in confuse is DW is just for knowing the requirement of data, as DW follows CLDS approach so it is data driven rather than requirement driven (SDLC) does DW just reposits finest data.

And my concepts was differnet i used to think that DW is technology but i have found now that it is just a archieture could please explain me more

cheers
0
 
LVL 6

Expert Comment

by:pedros7
ID: 12680382
are you using SQL server? Even if you're not, Microsoft documentation as ever is most useful.

=--==-=--==-=--==-

OLAP
Whereas data warehouses and data marts are the data stores for analysis data, online analytical processing (OLAP) is the technology that enables client applications to efficiently access this data. OLAP provides many benefits to analytical users, for example:
    * An intuitive multidimensional data model makes it easy to select, navigate, and explore the data.
    * An analytical query language provides power to explore complex business data relationships.
    * Precalculation of frequently queried data enables very fast response time to ad hoc queries.

see
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/olapdmad/agaboutolap_3u7k.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/olapdmad/agaboutolap_4x83.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/olapdmad/agaboutolap_0lv7.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/olapdmad/agaboutolap_9i5s.asp
http://msdn.microsoft.com/SQL/sqlwarehouse/AnalysisServices/default.aspx


=--==-=--==-=--==-

Data Warehousing
see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/createdw/createdw_67qr.asp

=--==-=--==-=--==-

Data Marts
In some data warehouse implementations, a data mart is a miniature data warehouse; in others, it is just one segment of the data warehouse. Data marts are often used to provide information to functional segments of the organization.

see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/createdw/createdw_0oz7.asp

=--==-=--==-=--==-

DataMine
Data mining technology helps users analyze data in relational databases and multidimensional OLAP cubes to uncover patterns and trends that can be used to make predictions. The data mining capabilities in SQL Server 2000 are integrated tightly with both relational and OLAP data sources.

see http://www.microsoft.com/sql/evaluation/features/datamine.asp

=--==-=--==-=--==-

P
0
 

Expert Comment

by:Hest42
ID: 12680606
"Microsoft documentation as ever is most useful"

I do not even know where to begin on that comment! :)

I know a guy sitting on the street below this building every morning with a little cardboard sign stating 'feed my dog' - that pretty much sums up the msdn library for me.

Well, mildly overstated probably. Guess im just not really their biggest fan, sorry :)

I am very intrigued to hear of MS SQL server 2000's data mining capabilities. Does it really have this kind of functionality? I've seen products before from MS that claims data mining capabilities, but as it turns out it is a cheap work around to make the application look better. "Targit" comes to mind.

I am very interested in your opinions on this if you have tried using it.
0
 

Author Comment

by:joy12345
ID: 12680610
Hi pedro thanks for the information, but i am not too technically i mean deeply in this field. Just like to know basic concept and the world of DM, DW and its subsets.

Actually i am not using any tools, just trying to learn theoritcally and i have no intention to do recently but just like to know what are they?

cheers
0
 

Expert Comment

by:Hest42
ID: 12680657
Datamining is a means to perform statistical analysis on a set of data using clustering or classification techniques.

Datawarehousing is a means to provide a contextual overview of a set of data which can then be used to perform statistical analysis using a range of techniques of which datamining is only a subset.

That pretty much sums things up in my head :)
0
 
LVL 6

Expert Comment

by:pedros7
ID: 12680678
Y! Couldn't summarised it better myself! :)
0
 

Author Comment

by:joy12345
ID: 12680682
In conclusion i can say that they are just a tools to purify the data using various techniques and methodology. However DW is not pure is that right? DW is a snapshot to show the overview of whole data element.

Thanks for Hest for your conclusion....i am really confused now tooooo much information from u guys, actaully i was just looking what they are, features and benifits and more importantly why business are more motivated to use them.

cheers
0
 

Author Comment

by:joy12345
ID: 12680694
1 min earlier than u pedro my last mesg.

Now we are in concusion, may be its time to split the points........but please let me know why business are more motivated towards it if even it is not pure what is most key aspect it motivates the organizations.

Sorry guys actually i have no idea what they are b4 starting this question, so my questions and my participation could be stupid please don;t mind

cheers
 
0
 
LVL 6

Expert Comment

by:pedros7
ID: 12680700
Hest42, its good we're from completely different camps!
Well, when i started using MS SQL data warehousing I went on a couple of data warehousing and olap courses. And yeah as you say its got data mining capabilities! But thats it, capabilities.
Don't get me wrong its been pretty usefull for me and i've used it mainly as a reporting tool. On the other hand I think its still a stop gap unti lthey come up with a fully fleged data warehousing and data mining product! as MS does!
0
 

Expert Comment

by:Hest42
ID: 12680736
Well, their best feature in that context is that they enable third party tools for datamining. I truly hope that is never ruined since better datamining techniques and tools are developed everyday right now.

The motivation for organizations to use these tools are mainly to gain better overview. Better overview in turn can improve efficiency or decrease costs or give indications to which fields to invest in. Datawarehousing is most popular here since after building a datawarehouse it would hopefully be able to last for many years. Datamining is more risky, since you don't know whether anything will be found. Datamining is a one shot process, when it is done all you have are the results to act upon if at all any.
0
 
LVL 1

Expert Comment

by:DeltaFix
ID: 12683200
For you situation, check out About.com's intro on the topics below.  Also for DW, I found (http://databases.about.com/gi/dynamic/offsite.htm?site=http%3A%2F%2Fwww.carleton.com.au%2FUnderstanding%2520Data%2520Warehousing%2520Strategically.htm)
very helpful.  Good luck :-)

"Data Mining: An Introduction

By this point in time, you've probably heard a good deal about data mining -- the database industry's latest buzzword.  What's this trend all about?  To use a simple analogy, it's finding the proverbial needle in the haystack.  In this case, the needle is that single piece of intelligence your business needs and the haystack is the large data warehouse you've built up over a long period of time.

Through the use of automated statistical analysis (or "data mining") techniques, businesses are discovering new trends and patterns of behavior that previously went unnoticed.  Once they've uncovered this vital intelligence, it can be used in a predictive manner for a variety of applications.  Brian James, assistant coach of the Toronto Raptors, uses data mining techniques to rack and stack his team against the rest of the NBA.  The Bank of Montreal's business intelligence and knowledge discovery program is used to gain insight into customer behavior.  CIO Magazine provides a great executive overview of data mining for business-minded professionals.

The first step toward building a productive data mining program is, of course, to gather data!  Most businesses already perform these data gathering tasks to some extent -- the key here is to locate the data critical to your business, refine it and prepare it for the data mining process.  If you're currently tracking customer data in a modern DBMS, chances are you're almost done.  Take a look at the article Mining Customer Data from DB2 Magazine for a great feature on preparing your data for the mining process.

At this point, take a moment to pat yourself on the back.  You have a data warehouse!  The next step is to choose one or more data mining algorithms to apply to your problem.  If you're just starting out, it's probably a good idea to experiment with several techniques to give yourself a feel for how they work.  Your choice of algorithm will depend upon the data you've gathered, the problem you're trying to solve and the computing tools you have available to you.  Let's take a brief look at two of the more popular algorithms.  

Regression is the oldest and most well-known statistical technique that the data mining community utilizes.  Basically, regression takes a numerical dataset and develops a mathematical formula that fits the data.  When you're ready to use the results to predict future behavior, you simply take your new data, plug it into the developed formula and you've got a prediction!  The major limitation of this technique is that it only works well with continuous quantitative data (like weight, speed or age).  If you're working with categorical data where order is not significant (like color, name or gender) you're better off choosing another technique.  For a humorous look at regression, read Multiple Regression with Ren and Stimpy from New Mexico State University's Psychology Department.

Working with categorical data or a mixture of continuous numeric and categorical data?  Classification analysis might suit your needs well.  This technique is capable of processing a wider variety of data than regression and is growing in popularity.  You'll also find output that is much easier to interpret.  Instead of the complicated mathematical formula given by the regression technique you'll receive a decision tree that requires a series of binary decisions.  Take a look at the Classification Trees chapter from the Electronic Statistics Textbook for in-depth coverage of this technique.

Regression and classification are two of the more popular classification techniques, but they only form the tip of the iceberg.  For a detailed look at other data mining algorithms, look at this feature on Data Mining Techniques or the SPSS Data Mining page.

Data mining products are taking the industry by storm.  The major database vendors have already taken steps to ensure that their platforms incorporate data mining techniques. Oracle's Data Mining Suite (Darwin) implements classification and regression trees, neural networks, k-nearest neighbors, regression analysis and clustering algorithms.  Microsoft's SQL Server 2000 also offers data mining functionality through the use of classification trees and clustering algorithms.  If you're already working in a statistics environment, you're probably familiar with the data mining algorithm implementations offered by the advanced statistical packages SPSS and S-Plus.

Have we whetted your appetite for data mining knowledge?  For a more detailed look, check out Pilot Software's white paper on data mining which provides a detailed description of knowledge discovery terms and techniques.  You'll also find some excellent slide show presentations and a variety of other data mining resources on Megaputer.com's homepage.  If you're ready to get started but can't find any sample data, take a look at the various repositories listed in Data Sources for Knowledge Discovery.  Good luck with your data mining endeavors!  Stop by our forum and let us know how things are going!"

Reference: http://databases.about.com/library/weekly/aa100700a.htm
0

Featured Post

Veeam and MySQL: How to Perform Backup & Recovery

MySQL and the MariaDB variant are among the most used databases in Linux environments, and many critical applications support their data on them. Watch this recorded webinar to find out how Veeam Backup & Replication allows you to get consistent backups of MySQL databases.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Lotus Notes has been used since a very long time as an e-mail client and is very popular because of it's unmatched security. In this article we are going to learn about  RRV Bucket corruption and understand various methods to Fix "RRV Bucket Corrupt…
In today's business world, data is more important than ever for informing marketing campaigns. Accessing and using data, however, may not come naturally to some creative marketing professionals. Here are four tips for adapting to wield data for insi…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
With just a little bit of  SQL and VBA, many doors open to cool things like synchronize a list box to display data relevant to other information on a form.  If you have never written code or looked at an SQL statement before, no problem! ...  give i…
Suggested Courses

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question