AndyPandy
asked on
Trend/Anomaly Analysis
Hi,
I am trying to put into words a concept that is not in my area of expertise.
What area of knowledge concerns itself with the following?
Specifics will be appreciated and the best, rewarded.
I am collecting daily data on many differant metrics (numbers, measures).
For example:
I am seeing how much the current measure varies from the average, as indicator of a deviation from the norm, but it is not satisfactory.
I am envisioning computer screen with a slider control which will show:
all data....sliding up to highly anomalous data only.
I want to be alerted to anomalous trends.
How should anomalous be defined?
What about filtering data-errors?
What differentiates a data-anomaly and a data-error?
Thanks for your input
I am trying to put into words a concept that is not in my area of expertise.
What area of knowledge concerns itself with the following?
Specifics will be appreciated and the best, rewarded.
I am collecting daily data on many differant metrics (numbers, measures).
For example:
I am seeing how much the current measure varies from the average, as indicator of a deviation from the norm, but it is not satisfactory.
I am envisioning computer screen with a slider control which will show:
all data....sliding up to highly anomalous data only.
I want to be alerted to anomalous trends.
How should anomalous be defined?
What about filtering data-errors?
What differentiates a data-anomaly and a data-error?
Thanks for your input
ASKER
Hi D-glitch,
thanks for the input.
To further clarify the context. I am collecting numbers from the web.
Article counts, web page counts, prices, indexes.
A variety of numbers, having Nothing in common except that they generallly move 'smoothly' up and down, as in a curve.
(if I find later there are coorelations between seemingly dissimilar metrics, that would be very cool!)
Each metric, or source is kept unique from other metrics, ie page count from site A, is identified and kept seperate from price X on page Y.
Example Normal trends:
Page count: 3,4,3,4,3,4,5,6,7,6,5,4,3, 2,2,2,2,21 ,2,3
Price X: .22,.21,.23,21,.22,.22,.21 ,.22,.23,. 24,.25,.26
Anomalies:
Page count: 3,4,3,4,3,4,5,6,7,100223,6 ,5,4,3,2,2 ,2,2,21,2, 3
Price X: .22,.21,.23,21,.22,.22,.00 ,.00,.00,. 21,.22,.23 ,.24,.25,. 26
I am seeking the language to describe variations and anomlaies in these movements.
Is this the study of statistics, if so, is ther a sub-specialty that deals with trends?
thanks for the input.
To further clarify the context. I am collecting numbers from the web.
Article counts, web page counts, prices, indexes.
A variety of numbers, having Nothing in common except that they generallly move 'smoothly' up and down, as in a curve.
(if I find later there are coorelations between seemingly dissimilar metrics, that would be very cool!)
Each metric, or source is kept unique from other metrics, ie page count from site A, is identified and kept seperate from price X on page Y.
Example Normal trends:
Page count: 3,4,3,4,3,4,5,6,7,6,5,4,3,
Price X: .22,.21,.23,21,.22,.22,.21
Anomalies:
Page count: 3,4,3,4,3,4,5,6,7,100223,6
Price X: .22,.21,.23,21,.22,.22,.00
I am seeking the language to describe variations and anomlaies in these movements.
Is this the study of statistics, if so, is ther a sub-specialty that deals with trends?
Probably the best description of what you are doing is Data Mining
.
The Wikipedia article should help bring you up to speed on the jargon:
http://en.wikipedia.org/wiki/Data_mining
Certainly statistics is part of it. So is pattern recognitions and many other
specialties.
.
The Wikipedia article should help bring you up to speed on the jargon:
http://en.wikipedia.org/wiki/Data_mining
Certainly statistics is part of it. So is pattern recognitions and many other
specialties.
'Outlier detection' is another buzz phrase you might want to research. The whole thing (data mining, pattern recognition, outlier detection, etc.) is a vast and extensively researched field especially with internet traffic. Once you get searching, you'll find more info than you probably want!
If you really want to dive in, search Google Scholar for some of these terms and look for survey papers (papers that summerize a lot of ideas).
If you really want to dive in, search Google Scholar for some of these terms and look for survey papers (papers that summerize a lot of ideas).
ASKER
Ok...thinkin.....give me a couple days.. ;)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks everyone!
Say you are looking at heights and weights of men and women taken from
questionaires filled in by hand at a local college and scanned in with OCR.
Most of the data will fit onto Bell Curves. There will be average values with
some spread around them. You can decide that anomalous data is anything
three standard deviations from the mean.
The anomalies you might see this way are basketball players, maybe some
double amputees, a few people with eating disorders.
You might also see some errors: A person who entered their data in
centimeters and kilograms instead of inches and pounds would stand out.
These would be anomalies. Real but unusual data.
People who feel like they're ten feet tall or carrying the weight of the world
on their shoulders and say so, would show up. These would be bad data rather than anomalies.
And you might find some smudges or hanging chads in the scanning process. These would be data errors.