Over the course of my career, one of the hardest problems for me to wrap my head around has been timezones and dealing with them. Over the last month, I had to solve a particularly nasty problem where timestamps, or what I thought to be timestamps, were being stored in a system.

The particular issue was rooted in how a particular time-sensitive records were being stored. The source data provided a time that was nothing more than a date-time(i.e. ‘2014-05-09 08:00:00’). What’s missing from this format? The UTC offset value. Without that piece of information, we can not know which ‘2014-05-09 08:00:00’ that particular record is referring to. There are, after all, 24 ‘todays’ occurring over any given 24 hour period.

In order to determine the precise time, to which a time refers, on earth, you need to know two things, the relative ‘when’ the date is captured, and ‘where’ the date is captured. Anything less and you can not know for certain the point in time which that date refers.

From the outside in, the rings represent different days: UTC, EST, PST

time-slice

Looking at 8 on the middle ring, you can see that corresponds with 5 PST (the middle ring) and 12 UTC (the outer ring)

In my case the result was that I had nothing more than a bunch of date-times, NOT timestamps. Refering to the image above, the data I had to work with could not tell me which ring it belonged to. Reporting on this data is now far more limited than it should be. For example, we can not query for all records that happened in a particular 24-hour period. We can only ask questions like: What are the values for records the happened at an 8:00. Looking at this, 2014-05-09 08:00:00, we don’t know which of the twenty-four “8:00s” occurring across the globe THIS particular record refers to.

As a side note, storing the date as only a date-time simplifies asking the question: What records occured at an 8:00? If we stored the timestamp instead, date math would need to be involved to compute 8:00 at each timezone offset. However, the downside of not knowing what particular point in absolute time a date refers, leaves some questions unanswerable.

The fix, of course, is to additionally supply the UTC timestamp when capturing the record. While it is impossible to know precisely what time records occurred before adding the UTC field, we can be certain going forward. Luckily, for this project, the data is somewhat transient and eventually we can migrate the reports and logic to deal with the new UTC value.

Why can’t I simply apply a timezone offset to the existing timestamps and call it a day? First, I would need to know where each date was recorded and when (in UTC) it was recorded. Without that information, I’d be unable to adjust for timezone and daylight savings time. Even with this information, daylight savings time does not follow a predictable pattern.

Moral of the story: Date-times are not timestamps, and always store your times as UTC or Unix time (aka POSIX or Epoch time).

See Also: Time dos and donts