Solved

Statistics/Math Question

Posted on 2011-02-23
10
297 Views
Last Modified: 2012-06-21
I need a stat head.  I used to work for a PhD in Statistics...I might have to hit him up.

This is not exactly a String Theory question.  I think it's basic.  I just need to take the best approach.

My manufacturing company makes widgets.  Every 10 minutes, we collect 200 points of data--mostly voltages, conveyor speeds, etc.  We store this data in SQL Server.

Usually widgets turn out fine, but sometimes widgets turn out poorly, quality-wise.  This is almost always because there's a "bad" voltage setting somewhere on the production line.  This can only be found by one of our "experts" who finds that Voltage 13 is incorrect.

What I would like: I can "train" the system by feeding it lots of data (on 10-minute intervals), and by showing it scenarios (hours-long collections of data) where the widget quality was good.

I can also train it by using "bad" scenarios--periods of time where poor-quality widgets were produced.  By comparing the data during the periods of bad quality with the data from good-producing periods, the system can "learn" what makes for bad quality.

The final step is that the system can also now "warn".  As soon as it encounters a voltage that has historically produced bad-quality widgets, we don't need to wait to see the results.  It can warn us now.

I'd love for some help getting started on this--methodology, for starters.  And ultimately, some SQL-specific direction would be great as well.

Is this regression analysis?  How would I approach it?  How would I do this "training" and scenario "feeding"?

Thanks very much.
0
Comment
Question by:bamapie
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 39

Expert Comment

by:lcohan
ID: 34964754
<<My manufacturing company makes widgets.  Every 10 minutes, we collect 200 points of data--mostly voltages, conveyor speeds, etc.  We store this data in SQL Server.

Usually widgets turn out fine, but sometimes widgets turn out poorly, quality-wise.  This is almost always because there's a "bad" voltage setting somewhere on the production line.  This can only be found by one of our "experts" who finds that Voltage 13 is incorrect.>>

In my opinion something could be done right away is you already store data in SQL as you mention above. You could set triggers on table.column(s) of interest and send alerts if values are outside limits imposed by you. All you need is to define the key columns and the values that should trigger the alerts.

As far as modeling/reporting maybe you should look at building some SSAS cubes using BI and do the analysis based on different scenarios.
0
 

Author Comment

by:bamapie
ID: 34964881
>You could set triggers on table.column(s) of interest and send alerts if values are outside limits imposed

We already do that.  I mean, we do have specs, and we do know when things are out of those specs.  We can even set off a flasher on the production line.

But if you have what I'm talking about, then it can go so much more deeply than just "they turned it up to 11".  Maybe it's learned that a Voltage A of 4.5 is fine UNLESS a Voltage of B is > 7 at any time when the sum of Voltage C and Voltage D is > 10.  In other words, more complex, more constantly-learning than humans can do without a lot of help.  It'll take us a year to figure out that it's only when this weird situation happens that a voltage of 4.5 is bad.  At any other time it's totally fine.
0
 
LVL 12

Expert Comment

by:PCableGuy
ID: 34966375
I think this is a form of supervised learning. Your output seems like it would be a form of classification with three discrete outcomes: good, bad or warn.

http://en.wikipedia.org/wiki/Supervised_learning

Unfortunately, I can't help you further than that, but some other math/science expert in this forum might agree or disagree with my amatuer opinion. In any case, this post might keep the dialogue going foward. Good luck with your problem.
0
 
LVL 12

Expert Comment

by:PCableGuy
ID: 34966877
This link might be helpful for the SQL.

http://msdn.microsoft.com/en-us/library/ms175595(v=SQL.90).aspx
0
 
LVL 39

Expert Comment

by:lcohan
ID: 34970239
As you just said - it may take a long time to figure out all the conditions for something to hapen however once that is determined then it's a waste if you do not convert all those complex conditions into triggers as well. What you are trying to achieve however is a intelligent system that could even predict events before they happen and send alerts....for that I wish you all the best and good luck.

What I suggested at a very high level was to use SQL tools you already have like SSRS and SSAS and write SQL code (stored procedures taking parameters) for different scenarios then give different values to these parameters will give you the results for each scenario/SP. The parameters could be also generated and passed by a stored procedure randomly yet controlled so you don't have to manually change them and execute the different scenarios.
Hope this makes sense and is in line with your thoughts.
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 12

Expert Comment

by:PCableGuy
ID: 34976377
bamapie,

Have you ever tried to plot a bunch of the bad scenarios on a graph in Excel perhaps? The voltages would be input horizontally in the Excel cells. One line would display the state of all 200 voltages. The line would be a horizontal line which would show voltage spikes, dips, etc. You might notice a certain curve or trend line that the voltages make when the widgets go bad which might be easier to see on a graph than merely looking at all the numbers.

Of course, your graph would need to be scaled correctly, that is the Y axis would need to be scaled correctly to display spikes or dips in the voltages along the trend line.

Not sure how well 200 data points would display in an Excel graph though. Just something to put on the table.

0
 

Author Comment

by:bamapie
ID: 35028486
PCCableGuy:

I need an automated solution, and even if I see an issue with the data today, tomorrow or a year ago it will be a whole different issue stemming from a different scenario.  I could eyeball things today, sure, using your approach.  

But with the right mathematical approach, the only thing I would need to do is to have my users periodically train the tool by telling it when quality was good and when it was bad.  Upon each input, my logic would examine the data and look for causation.

Thanks
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 35038826
SQL Server has nothing like this nor was it designed to have any such functionality.  The best you can do is find some commercial software that does this.
0
 

Author Comment

by:bamapie
ID: 35168699
I didn't think that SQL had any such functionality on-board.  But my data is stored in SQL, and any commercial software like you mentioned can probably operate against data stored in SQL Server.  I'm leaving the question open in case someone has experience with such a software package doing what I've described.  Please, please, no links from a Google search.  Still looking for someone who's taken on a task like this.

Thanks.
0
 
LVL 4

Accepted Solution

by:
jazjef earned 500 total points
ID: 37347808
Multivariate logistic regression--statistical modeling that yields an equation. You have a dichotomous dependent variable---good quality VS bad quality (e.g. good or not good). Logistic regression will also allow for different variable types to be used in the analysis. In addition, what you want from the modeling equation is:
1) statistical coefficient weightings---these will tell you what variables contribute most to either a good or bad process.
2) the constant

Equation is attached as jpg.
     LogistRegress equation
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Hi all, It is important and often overlooked to understand “Database properties”. Often we see questions about "log files" or "where is the database" and one of the easiest ways to get general information about your database is to use “Database p…
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
This video discusses moving either the default database or any database to a new volume.
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now