Solved

Statistics/Math Question

Posted on 2011-02-23
10
318 Views
Last Modified: 2012-06-21
I need a stat head.  I used to work for a PhD in Statistics...I might have to hit him up.

This is not exactly a String Theory question.  I think it's basic.  I just need to take the best approach.

My manufacturing company makes widgets.  Every 10 minutes, we collect 200 points of data--mostly voltages, conveyor speeds, etc.  We store this data in SQL Server.

Usually widgets turn out fine, but sometimes widgets turn out poorly, quality-wise.  This is almost always because there's a "bad" voltage setting somewhere on the production line.  This can only be found by one of our "experts" who finds that Voltage 13 is incorrect.

What I would like: I can "train" the system by feeding it lots of data (on 10-minute intervals), and by showing it scenarios (hours-long collections of data) where the widget quality was good.

I can also train it by using "bad" scenarios--periods of time where poor-quality widgets were produced.  By comparing the data during the periods of bad quality with the data from good-producing periods, the system can "learn" what makes for bad quality.

The final step is that the system can also now "warn".  As soon as it encounters a voltage that has historically produced bad-quality widgets, we don't need to wait to see the results.  It can warn us now.

I'd love for some help getting started on this--methodology, for starters.  And ultimately, some SQL-specific direction would be great as well.

Is this regression analysis?  How would I approach it?  How would I do this "training" and scenario "feeding"?

Thanks very much.
0
Comment
Question by:bamapie
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 40

Expert Comment

by:lcohan
ID: 34964754
<<My manufacturing company makes widgets.  Every 10 minutes, we collect 200 points of data--mostly voltages, conveyor speeds, etc.  We store this data in SQL Server.

Usually widgets turn out fine, but sometimes widgets turn out poorly, quality-wise.  This is almost always because there's a "bad" voltage setting somewhere on the production line.  This can only be found by one of our "experts" who finds that Voltage 13 is incorrect.>>

In my opinion something could be done right away is you already store data in SQL as you mention above. You could set triggers on table.column(s) of interest and send alerts if values are outside limits imposed by you. All you need is to define the key columns and the values that should trigger the alerts.

As far as modeling/reporting maybe you should look at building some SSAS cubes using BI and do the analysis based on different scenarios.
0
 

Author Comment

by:bamapie
ID: 34964881
>You could set triggers on table.column(s) of interest and send alerts if values are outside limits imposed

We already do that.  I mean, we do have specs, and we do know when things are out of those specs.  We can even set off a flasher on the production line.

But if you have what I'm talking about, then it can go so much more deeply than just "they turned it up to 11".  Maybe it's learned that a Voltage A of 4.5 is fine UNLESS a Voltage of B is > 7 at any time when the sum of Voltage C and Voltage D is > 10.  In other words, more complex, more constantly-learning than humans can do without a lot of help.  It'll take us a year to figure out that it's only when this weird situation happens that a voltage of 4.5 is bad.  At any other time it's totally fine.
0
 
LVL 12

Expert Comment

by:PCableGuy
ID: 34966375
I think this is a form of supervised learning. Your output seems like it would be a form of classification with three discrete outcomes: good, bad or warn.

http://en.wikipedia.org/wiki/Supervised_learning

Unfortunately, I can't help you further than that, but some other math/science expert in this forum might agree or disagree with my amatuer opinion. In any case, this post might keep the dialogue going foward. Good luck with your problem.
0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 
LVL 12

Expert Comment

by:PCableGuy
ID: 34966877
This link might be helpful for the SQL.

http://msdn.microsoft.com/en-us/library/ms175595(v=SQL.90).aspx
0
 
LVL 40

Expert Comment

by:lcohan
ID: 34970239
As you just said - it may take a long time to figure out all the conditions for something to hapen however once that is determined then it's a waste if you do not convert all those complex conditions into triggers as well. What you are trying to achieve however is a intelligent system that could even predict events before they happen and send alerts....for that I wish you all the best and good luck.

What I suggested at a very high level was to use SQL tools you already have like SSRS and SSAS and write SQL code (stored procedures taking parameters) for different scenarios then give different values to these parameters will give you the results for each scenario/SP. The parameters could be also generated and passed by a stored procedure randomly yet controlled so you don't have to manually change them and execute the different scenarios.
Hope this makes sense and is in line with your thoughts.
0
 
LVL 12

Expert Comment

by:PCableGuy
ID: 34976377
bamapie,

Have you ever tried to plot a bunch of the bad scenarios on a graph in Excel perhaps? The voltages would be input horizontally in the Excel cells. One line would display the state of all 200 voltages. The line would be a horizontal line which would show voltage spikes, dips, etc. You might notice a certain curve or trend line that the voltages make when the widgets go bad which might be easier to see on a graph than merely looking at all the numbers.

Of course, your graph would need to be scaled correctly, that is the Y axis would need to be scaled correctly to display spikes or dips in the voltages along the trend line.

Not sure how well 200 data points would display in an Excel graph though. Just something to put on the table.

0
 

Author Comment

by:bamapie
ID: 35028486
PCCableGuy:

I need an automated solution, and even if I see an issue with the data today, tomorrow or a year ago it will be a whole different issue stemming from a different scenario.  I could eyeball things today, sure, using your approach.  

But with the right mathematical approach, the only thing I would need to do is to have my users periodically train the tool by telling it when quality was good and when it was bad.  Upon each input, my logic would examine the data and look for causation.

Thanks
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 35038826
SQL Server has nothing like this nor was it designed to have any such functionality.  The best you can do is find some commercial software that does this.
0
 

Author Comment

by:bamapie
ID: 35168699
I didn't think that SQL had any such functionality on-board.  But my data is stored in SQL, and any commercial software like you mentioned can probably operate against data stored in SQL Server.  I'm leaving the question open in case someone has experience with such a software package doing what I've described.  Please, please, no links from a Google search.  Still looking for someone who's taken on a task like this.

Thanks.
0
 
LVL 4

Accepted Solution

by:
jazjef earned 500 total points
ID: 37347808
Multivariate logistic regression--statistical modeling that yields an equation. You have a dichotomous dependent variable---good quality VS bad quality (e.g. good or not good). Logistic regression will also allow for different variable types to be used in the analysis. In addition, what you want from the modeling equation is:
1) statistical coefficient weightings---these will tell you what variables contribute most to either a good or bad process.
2) the constant

Equation is attached as jpg.
     LogistRegress equation
0

Featured Post

Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question