[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now


Trend/Anomaly Analysis

Posted on 2011-03-18
Medium Priority
Last Modified: 2012-05-11
I am trying to put into words a concept that is not in my area of expertise.
What area of knowledge concerns itself with the following?
Specifics will be appreciated and the best, rewarded.

I am collecting daily data on  many differant metrics (numbers, measures).

For example:
I am seeing how much the current measure varies from the average, as indicator of a deviation from the norm,    but it is not satisfactory.

I am envisioning computer screen with a slider control which will show:
 all data....sliding up to highly anomalous data only.

I want to be alerted to anomalous trends.
How should anomalous be defined?
What about filtering data-errors?
What differentiates a data-anomaly and a data-error?

Thanks for your input
Question by:AndyPandy
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 27

Expert Comment

ID: 35166219
Depends on the data of course.

Say you are looking at heights and weights of men and women taken from
questionaires filled in by hand at a local college and scanned in with OCR.

Most of the data will fit onto Bell Curves.  There will be average values with
some spread around them.  You can decide that anomalous data is anything
three standard deviations from the mean.

The anomalies you might see this way are basketball players, maybe some
double amputees, a few people with eating disorders.

You might also see some errors:  A person who entered their data in
centimeters and kilograms instead of inches and pounds would stand out.
These would be anomalies.  Real but unusual data.

People who feel like they're ten feet tall or carrying the weight of the world
on their shoulders and say so, would show up.  These would be bad data rather than anomalies.

And you might find some smudges or hanging chads in the scanning process.  These would be data errors.

Author Comment

ID: 35166460
Hi D-glitch,

thanks for the input.

To further clarify the context.  I am collecting numbers from the web.
Article counts, web page counts, prices, indexes.
A variety of numbers,  having Nothing in common except that they generallly move 'smoothly' up and down, as in a curve.
(if I find later there are coorelations between seemingly dissimilar metrics,  that would be very cool!)
Each metric, or source is kept unique from other metrics,   ie page count from site A,  is identified and kept seperate from price X on page Y.
Example Normal trends:
Page count:  3,4,3,4,3,4,5,6,7,6,5,4,3,2,2,2,2,21,2,3
Price X: .22,.21,.23,21,.22,.22,.21,.22,.23,.24,.25,.26
Page count:  3,4,3,4,3,4,5,6,7,100223,6,5,4,3,2,2,2,2,21,2,3
Price X: .22,.21,.23,21,.22,.22,.00,.00,.00,.21,.22,.23,.24,.25,.26

I  am seeking the language to describe variations and anomlaies in these movements.
Is this  the study of statistics,   if so, is ther a sub-specialty that deals with trends?
LVL 27

Expert Comment

ID: 35167008
Probably the best description of what you are doing is Data Mining
The Wikipedia article should help bring you up to speed on the jargon:


Certainly statistics is part of it.  So is pattern recognitions and many other
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 37

Expert Comment

ID: 35167634
'Outlier detection' is another buzz phrase you might want to research. The whole thing (data mining, pattern recognition, outlier detection, etc.) is a vast and extensively researched field especially with internet traffic. Once you get searching, you'll find more info than you probably want!
If you really want to dive in, search Google Scholar for some of these terms and look for survey papers (papers that summerize a lot of ideas).

Author Comment

ID: 35168378
Ok...thinkin.....give me a couple days.. ;)

Accepted Solution

InfoStranger earned 2000 total points
ID: 35204693
Yes, Statistics is the main way to check your data.  Data mining is typically used after data collection and analyzing the information.  Sounds like you are not analyzing information.  It sounds more like you want to check for data integrity. Your assumption is that your data is normal, but the data you are presenting are not normal.

If you assume normality, you should use consider using confidence intervals.  As you receive new data, you check to see if the data is within the confidence interval.

1) if data is within confidence interval, include the data in the next average
2) if data is outside the confidence interval, you will reject the data (contact you)

In the simplest form, you can average your "good" data as your mean.  Use the same data for your standard deviation.

Assuming 95% confidence interval, n = number of data point
mean + 1.96 * (Standard Deviation/sqrt (n))

You should have it continuous change.  The reason is that you will smooth the changes as increases or shifts occurs in your data.

Author Closing Comment

ID: 35231131
Thanks everyone!

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever thought of installing a power system that generates solar electricity to power your house? Some may say yes, while others may tell me no. But have you noticed that people around you are now considering installing such systems in their …
This article covers the basics of data encryption, what it is, how it works, and why it's important. If you've ever wondered what goes on when you "encrypt" data, you can look here to build a good foundation for your personal learning.
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question