Statistics/Math Question

I need a stat head.  I used to work for a PhD in Statistics...I might have to hit him up.

This is not exactly a String Theory question.  I think it's basic.  I just need to take the best approach.

My manufacturing company makes widgets.  Every 10 minutes, we collect 200 points of data--mostly voltages, conveyor speeds, etc.  We store this data in SQL Server.

Usually widgets turn out fine, but sometimes widgets turn out poorly, quality-wise.  This is almost always because there's a "bad" voltage setting somewhere on the production line.  This can only be found by one of our "experts" who finds that Voltage 13 is incorrect.

What I would like: I can "train" the system by feeding it lots of data (on 10-minute intervals), and by showing it scenarios (hours-long collections of data) where the widget quality was good.

I can also train it by using "bad" scenarios--periods of time where poor-quality widgets were produced.  By comparing the data during the periods of bad quality with the data from good-producing periods, the system can "learn" what makes for bad quality.

The final step is that the system can also now "warn".  As soon as it encounters a voltage that has historically produced bad-quality widgets, we don't need to wait to see the results.  It can warn us now.

I'd love for some help getting started on this--methodology, for starters.  And ultimately, some SQL-specific direction would be great as well.

Is this regression analysis?  How would I approach it?  How would I do this "training" and scenario "feeding"?

Thanks very much.
Asked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x

Commented:
Multivariate logistic regression--statistical modeling that yields an equation. You have a dichotomous dependent variable---good quality VS bad quality (e.g. good or not good). Logistic regression will also allow for different variable types to be used in the analysis. In addition, what you want from the modeling equation is:
1) statistical coefficient weightings---these will tell you what variables contribute most to either a good or bad process.
2) the constant

Equation is attached as jpg.

0

Database AnalystCommented:
<<My manufacturing company makes widgets.  Every 10 minutes, we collect 200 points of data--mostly voltages, conveyor speeds, etc.  We store this data in SQL Server.

Usually widgets turn out fine, but sometimes widgets turn out poorly, quality-wise.  This is almost always because there's a "bad" voltage setting somewhere on the production line.  This can only be found by one of our "experts" who finds that Voltage 13 is incorrect.>>

In my opinion something could be done right away is you already store data in SQL as you mention above. You could set triggers on table.column(s) of interest and send alerts if values are outside limits imposed by you. All you need is to define the key columns and the values that should trigger the alerts.

As far as modeling/reporting maybe you should look at building some SSAS cubes using BI and do the analysis based on different scenarios.
0

Author Commented:
>You could set triggers on table.column(s) of interest and send alerts if values are outside limits imposed

We already do that.  I mean, we do have specs, and we do know when things are out of those specs.  We can even set off a flasher on the production line.

But if you have what I'm talking about, then it can go so much more deeply than just "they turned it up to 11".  Maybe it's learned that a Voltage A of 4.5 is fine UNLESS a Voltage of B is > 7 at any time when the sum of Voltage C and Voltage D is > 10.  In other words, more complex, more constantly-learning than humans can do without a lot of help.  It'll take us a year to figure out that it's only when this weird situation happens that a voltage of 4.5 is bad.  At any other time it's totally fine.
0

Commented:
I think this is a form of supervised learning. Your output seems like it would be a form of classification with three discrete outcomes: good, bad or warn.

http://en.wikipedia.org/wiki/Supervised_learning

Unfortunately, I can't help you further than that, but some other math/science expert in this forum might agree or disagree with my amatuer opinion. In any case, this post might keep the dialogue going foward. Good luck with your problem.
0

Commented:
This link might be helpful for the SQL.

http://msdn.microsoft.com/en-us/library/ms175595(v=SQL.90).aspx
0

Database AnalystCommented:
As you just said - it may take a long time to figure out all the conditions for something to hapen however once that is determined then it's a waste if you do not convert all those complex conditions into triggers as well. What you are trying to achieve however is a intelligent system that could even predict events before they happen and send alerts....for that I wish you all the best and good luck.

What I suggested at a very high level was to use SQL tools you already have like SSRS and SSAS and write SQL code (stored procedures taking parameters) for different scenarios then give different values to these parameters will give you the results for each scenario/SP. The parameters could be also generated and passed by a stored procedure randomly yet controlled so you don't have to manually change them and execute the different scenarios.
Hope this makes sense and is in line with your thoughts.
0

Commented:
bamapie,

Have you ever tried to plot a bunch of the bad scenarios on a graph in Excel perhaps? The voltages would be input horizontally in the Excel cells. One line would display the state of all 200 voltages. The line would be a horizontal line which would show voltage spikes, dips, etc. You might notice a certain curve or trend line that the voltages make when the widgets go bad which might be easier to see on a graph than merely looking at all the numbers.

Of course, your graph would need to be scaled correctly, that is the Y axis would need to be scaled correctly to display spikes or dips in the voltages along the trend line.

Not sure how well 200 data points would display in an Excel graph though. Just something to put on the table.

0

Author Commented:
PCCableGuy:

I need an automated solution, and even if I see an issue with the data today, tomorrow or a year ago it will be a whole different issue stemming from a different scenario.  I could eyeball things today, sure, using your approach.

But with the right mathematical approach, the only thing I would need to do is to have my users periodically train the tool by telling it when quality was good and when it was bad.  Upon each input, my logic would examine the data and look for causation.

Thanks
0

Commented:
SQL Server has nothing like this nor was it designed to have any such functionality.  The best you can do is find some commercial software that does this.
0

Author Commented:
I didn't think that SQL had any such functionality on-board.  But my data is stored in SQL, and any commercial software like you mentioned can probably operate against data stored in SQL Server.  I'm leaving the question open in case someone has experience with such a software package doing what I've described.  Please, please, no links from a Google search.  Still looking for someone who's taken on a task like this.

Thanks.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Already a member? Login.

All Courses

From novice to tech pro — start learning today.