Python datetime
s are naïve by default, in that they do not include time zone (or time offset) information. E.g. one might be surprised to find that (datetime.now() - datetime.utcnow()).total_seconds()
is basically the local time offset (28800 in my case for UTC+08:00). I personally kind of expected a value near zero. This said, datetime
is able to handle time zones, but the definitions of time zones are not included in the Python standard library. A third-party library is necessary for handling time zones. In our project, a developer introduced pytz in the beginning. It all looked well, until I found the following:
>>> from datetime import datetime >>> from pytz import timezone >>> timezone('Asia/Shanghai') <DstTzInfo 'Asia/Shanghai' LMT+8:06:00 STD> >>> (datetime(2017, 6, 1, tzinfo=timezone('Asia/Shanghai')) ... - datetime(2017, 6, 1, tzinfo=timezone('UTC')) ... ).total_seconds() -29160.0
Sh*t! Was pytz a joke? The time zone of Shanghai (or China) should be UTC+08:00, and I did not care a bit about its local mean time (I was, of course, expecting -28800
on the last line). What was the author thinking about? Besides, it did not provide a local time zone function, and we had to hardcode our time zone to 'Asia/Shanghai'
, which was ugly.—Disappointed, I searched for an alternative, and I found dateutil.tz. From then on, I routinely use code like the following:
from datetime import datetime from dateutil.tz import tzlocal, tzutc … datetime.now(tzlocal()) # for local time datetime.now(tzutc()) # for UTC time
When answering a StackOverflow question, I realized I misunderstood pytz. I still thought it had some bad design decisions; however, it would have been able to achieve everything I needed, if I had read its manual carefully (I cannot help remembering the famous acronym ‘RTFM’). It was explicitly mentioned in the manual that passing a pytz time zone to the datetime
constructor (as I did above) ‘“does not work” with pytz for many timezones’. One has to use the pytz localize
method or the standard astimezone
method of datetime
.
As tzlocal
and tzutc
from dateutil.tz fulfilled all my needs and were easy to use, I continued to use them. The fact that I got a few downvotes on StackOverflow certainly did not make me like pytz better.
When introducing apscheduler to our project, we noticed that it required that the time zone be provided by pytz—it ruled out the use of dateutil.tz. I wondered what was special about it. I also became aware of a Python package called tzlocal, which was able to provide a pytz time zone conforming to the local system settings. More searching and reading revealed facts that I had missed so far:
- The Python
datetime
object does not store or handle daylight-saving status. Adding atimedelta
to it does not alter its time zone information, and can result in an invalid local time (say, adding one day to the last day of daylight-saving time does not result in adatetime
in standard time). - The time zone provided by dateutil.tz does not handle all corner cases. E.g. it does not know that Russia observed all-year daylight-saving time from 2012 to 2014, and it does not know that China observed daylight-saving time from 1986 to 1991.
- The pytz
localize
andnormalize
methods can handle all these complexities, and this is partly the reason why pytz requires people to use itslocalize
method instead of passing the time zone todatetime
.
So pytz can actually do more, and correctly. I can do things like finding out in which years China observed daylight-saving time:
from datetime import datetime, timedelta from pytz import timezone china = timezone('Asia/Shanghai') utc = timezone('UTC') expect_diff = timedelta(hours=8) for year in range(1980, 2000): dt = datetime(year, 6, 1) if utc.localize(dt) - china.localize(dt) != expect_diff: print(year)
It is now clear to me that the pytz-style time zone is necessary when apscheduler handles a past or future local time.
A few benchmarks regarding the related functions in ipython (not that they are very important):
from datetime import datetime import dateutil.tz import pytz import tzlocal dateutil_utc = dateutil.tz.tzutc() dateutil_local = dateutil.tz.tzlocal() pytz_utc = pytz.utc pytz_local = tzlocal.get_localzone() %timeit datetime.utcnow() 310 ns ± 0.405 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit datetime.now() 745 ns ± 1.65 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit datetime.now(dateutil_utc) 924 ns ± 0.907 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %timeit datetime.now(pytz_utc) 2.28 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) %timeit datetime.now(dateutil_local) 17.4 µs ± 29.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) %timeit datetime.now(pytz_local) 5.54 µs ± 11.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
My final recommendations:
- One should consider using naïve UTC everywhere, as they are easy and fast to work with.
- The next best is using offset-aware UTC. Both dateutil.tz and pytz can be used in this case without any problems.
- In all other cases, pytz (as well as tzlocal) is preferred, but one should beware of the peculiar behaviour of pytz time zones.