Algorithm for dealing with unreliable event timestamps

Posted on 2013-02-05
Last Modified: 2013-02-11
I need to develop an algorithm for dealing with a timestamped event stream where the timestamps are created by the event source. The order of the received events is guaranteed to be correct (due to a file append operation), but due to potential problems with the clock at the event source, the timestamps may not be accurate and may not reflect the real order of the events.

Here's an example of an ordered event stream where one timestamp is in the wrong order.

0008 (this timestamp is wrong)

I need to process these events in the order that they arrive but somehow deal with the incorrect timestamps (for example the timestamps may be incorrect due to issues with the clock that is used to create the timestamp not being set correctly, if only temporarily).

One approach is reorder the events so that the timestamps define their ordering. However, this is wrong because I know that the order in which the timestamps are received is the correct order.

I can ignore the incorrect timestamp and use an effective timestamp that is an infinitesimally small interval after the preceding timestamp so that the effective timestamps are in the same order as the received event stream. However, this is problematic if the erroneous timestamp is a long time in the future, as all following events will have an effective timestamp that is immediately after this one until the timestamps catch up.

What would you advise?
Question by:Seiker
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
  • +3
LVL 57

Expert Comment

ID: 38854842
Knowing nothing about your enviroment my first question is:

How do you know the timestamp is wrong?  Could be the timestamp is correct, but that packet was delayed somewhere within the network and thus was just received later than the others.

However, if you know for a fact that the records were received in the correct order and "immediate" after being sent with no delay,  then ignore the timestamps and process the records in the order they were received.

Author Comment

ID: 38854869
I think I need to add more information. The timestamps are used for determining duration between events and are usually correct. Unfortunately, sometimes due to incorrectly set clocks the timestamps may jump back or forward for a while.

Our policy so far has been to view the timestamps as correct and to redo any calculations that depend on them when one or more are received 'late'. However, this policy does not produce correct duration information (at least not for event with the first incorrect timestamp) and has the additional problem of incorrectly reordering the events.

What I am after is a least-harmful policy to employ when the timestamp information is (perhaps temporarily) incorrect.
LVL 56

Expert Comment

by:Julian Hansen
ID: 38854882
If I understand you you can set the bad timestamp to something else but you first need to know it is bad

If we look at your example


The question is: is 0011 the bad timestampe or 0008?

I guess you need more of a sequence to determine that - also we need to know how often this occurs - can you have two, three, four bad timestamps in a row.

Is this on the right track?
On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

LVL 57

Expert Comment

ID: 38854915
--> "The timestamps are used for determining duration between events and are usually correct.

I am assuming that these events occur on different devices/hosts/computers.  Are you trying to figure out the duration between events on across the different devices or on a single device?  That is:

0001 device 1
0005 device 2
0011 device 1
0008 device 2

Do you want to know the difference between "0001" and "0005" (across devices) or "0001" and "0011", within a single device?

If you are trying to go across devices, then not a whole lot you can do.  As julianH pointed out, you have no clue if "0011" or "0008" is the wrong time stamp.
LVL 37

Expert Comment

ID: 38855227
So you process this after the fact when you have all the data, correct? This means you could look ahead in the data and make logical guesses about which time was wrong, however, what if the times should have been
1   5   7   13   15   20
but got reported as
1   5   7   17   19   20
because the clock got off by 4 for two cycles? What do you do then? You will just have bad data.

Can you use something to check the clock? Perhaps query the processor's clock to check? It would be much better if you could just identify the bad data points and throw them out than trying to make some kind of guess in some cases and try to correct the values.
LVL 84

Expert Comment

ID: 38855645
Without quantifying the likelihood of an incorrect timestamp, and the expected sizes of the errors,
it is difficult to justify one algorithm over any other, since any given set of assumptions about the nature of the inaccuracies might produce a different algorithm, but  for some assumptions that seem not unreasonable to me, I might use an algorithm like:
Find the least square linear fit to all the timestamps,
Throw out the timestamp that deviates the most from the trend line,
Find a new least square fit to the remaining timestamps,
Repeat until errors seem reasonable,
Assume the rejected timestamps fall along the resulting trendline.

Author Comment

ID: 38855879
We believe the erroneous timestamps are due to the computers generating the events not having their clocks properly synchronised. Usually it is fine, but some customers seem unable to get this right at least in the beginning.

The problem we have is that we analyse the events on the fly, and create pre-calculated data based on the timestamps. If we receive events with out of sequence timestamps then taking the view that the timestamps are correct we end up having to invalidate previously pre-calculated data back to before the out of sequence timestamp. This is complex, time consuming and still doesn't produce a correct result.

I need a damage limitation policy that is also minimises complexity and computation time.

Yes you are quite right, given the sequence shown, we cannot say which timestamps are correct or even if any are correct.
LVL 57

Expert Comment

ID: 38855904
So the events are generated by customers.  

Can a customer have multiple computers generating events, or will a customer have a single computer sending you events?

Author Comment

ID: 38855937
Single computer for each event stream.
LVL 57

Expert Comment

ID: 38856343
So as long as you know which computer generated each event stream you really don't care if the timestamp is correct or not, as long as it is constant.

If the time changes, you would be able to detect if it moved "backwards", but you would not be able to detect if it moved forwards.

Meaning in my prior post:

You want to know the time difference on device #1 between "0001" and "0011" and then for device #2 between "0005" and "0008".  Right?
LVL 84

Expert Comment

ID: 38857718
Can you identify which computer generated a given timestamp?
Does it look like improperly synchronized clocks are usually set to the wrong timezone, with an error close to a multiple of an hour, or does typical unsynchronized clock tend to be off by a few seconds?  What is a typical interval between timestamps, and how much variation would you usually expect?

Accepted Solution

Seiker earned 0 total points
ID: 38858464
Each event stream is for a single device.

Not usually the wrong timezone, more out by a few seconds to several minutes. The timestamps should be in UTC.

I'm coming to the idea that there is no really acceptable solution to this problem. Basically the data is bad whichever way you look at it. The system relies on properly synchronised clocks.

We could stipulate validation rules along the lines of:
Timestamps must not be later than the current time on the event stream consumer.
The timestamp for every events must not be earlier than the timestamp of the preceding event.

If either of these rules are violated then the event is put in an invalid message channel and not processed. We would need some way for the invalid events to corrected and reprocessed if necessary.
LVL 84

Expert Comment

ID: 38858569
If the problem is unsynchonized clocks on different computers,
wouldn't a stream from  single device have self consistent timestamps?
If you need to compare timestamps from different streams,
you should be able to constrain the difference between the clocks to a range that makes the relationships between the streams consistent.

If you see a timestamp for an event earlier than the timestamp of a preceding event,
you don't know whether the timestamp of the preceding event or of the following event, or both, are invalid, but you can split the difference, and adjust
all timestamps from the same streams by the same amount,
This may require further adjustments if those shifts cause other events to become out of order.
Again, if you have a known probability distribution for the chances of clock errors of different amounts, you can simultaneously solve for the set of corrective shifts which give a maximum likelihood estimator for the observed data.
LVL 32

Assisted Solution

phoffric earned 250 total points
ID: 38858815
>> I'm coming to the idea that there is no really acceptable solution to this problem. Basically the data is bad whichever way you look at it. The system relies on properly synchronised clocks.

I agree with your assessment. If there is nothing in the event data (excluding the time) that can give you a clue as to which event occurs first, and if knowing which event occurs first is critical to your application, then you need to develop an automatic process (leaving humans out of the loop) where all the devices have their clocks synchronized to UTC time. Your program can update the clocks periodically to keep the error within reasonable tolerances. Here's a starter link that possibly may help.
LVL 37

Assisted Solution

TommySzalapski earned 250 total points
ID: 38860187
If either of these rules are violated then the event is put in an invalid message channel and not processed.
I agree with both you and phoffric. If you can't force them to synchronize, then you have to detect and remove as much of the bad data as possible. Trying to guess at what the bad data was will only cause more problems.

Author Closing Comment

ID: 38875435
Thanks for the input everyone.

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The greatest common divisor (gcd) of two positive integers is their largest common divisor. Let's consider two numbers 12 and 20. The divisors of 12 are 1, 2, 3, 4, 6, 12 The divisors of 20 are 1, 2, 4, 5, 10 20 The highest number among the c…
Iteration: Iteration is repetition of a process. A student who goes to school repeats the process of going to school everyday until graduation. We go to grocery store at least once or twice a month to buy products. We repeat this process every mont…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below.…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question