Как определить соответствующий формат strftime из строки даты?

Парсер dateutil отлично справляется с правильным угадыванием даты и времени из самых разных источников.

Мы обрабатываем файлы, в которых каждый файл использует только один формат даты и времени, но формат зависит от файлов. Профилирование показывает, что много времени используется dateutil.parser.parse . Так как это нужно только определить один раз в файл, реализуя то, что не угадывает формат, каждый раз может ускорить работу.

Я не знаю заранее форматов, и мне все равно нужно вывести формат. Что-то вроде:

 from MysteryPackage import date_string_to_format_string import datetime # eg mystring = '1 Jan 2016' myformat = None ... # somewhere in a loop reading from a file or connection: if myformat is None: myformat = date_string_to_format_string(mystring) # do the usual checks to see if that worked, then: mydatetime = datetime.strptime(mystring, myformat) 

Есть ли такая функция?

3 Solutions collect form web for “Как определить соответствующий формат strftime из строки даты?”

У меня нет готового решения, но это очень сложная проблема, и поскольку слишком много часов работы мозга уже потрачено на dateutil , вместо того, чтобы пытаться заменить это, я предложу подход, который включает его:

  1. Прочитайте первые N записей и проанализируйте каждую дату, используя dateutil
  2. Для каждой части даты обратите внимание, где в строке отображается значение
  3. Если совпадают все позиции даты (или> 90%) даты (например, «YYYY всегда после DD, разделенных запятой и пробелом»), превратите эту информацию в строку формата strptime
  4. Переключитесь на использование datetime.strptime () с относительно хорошим уровнем уверенности в том, что он будет работать с остальной частью файла

Поскольку вы заявили, что «каждый файл использует только один формат даты / времени», этот подход должен работать (при условии, что у вас разные даты в каждом файле, так что неоднозначность mm / dd может быть устранена путем сравнения нескольких значений даты).

Это сложно. В моем подходе используются регулярные выражения и синтаксис (?(DEFINE)...) который поддерживается только новым модулем regex .


По сути, DEFINE позволяет нам определять подпрограммы перед их сопоставлением, поэтому прежде всего мы определяем все необходимые кирпичи для нашей функции угадывания даты:

  (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>\d{2}:\d{2}:\d{2}) (?P<hm_def>\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) &  (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>\d{2}:\d{2}:\d{2}) (?P<hm_def>\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) ;  (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>\d{2}:\d{2}:\d{2}) (?P<hm_def>\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) &  (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>\d{2}:\d{2}:\d{2}) (?P<hm_def>\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) ;  (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>\d{2}:\d{2}:\d{2}) (?P<hm_def>\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) &  (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>\d{2}:\d{2}:\d{2}) (?P<hm_def>\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) ;  (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>\d{2}:\d{2}:\d{2}) (?P<hm_def>\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) &  (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>\d{2}:\d{2}:\d{2}) (?P<hm_def>\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) ;  (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>\d{2}:\d{2}:\d{2}) (?P<hm_def>\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) 

После этого нам нужно подумать о возможных разделителях:

 # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') 

Форматирование:

 # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} 

Функция GuessFormat() разделяет детали с помощью разделителей, пытается их сопоставить и выводит соответствующий код для strftime() :

 def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) & def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) ; def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) & def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) ; def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) & def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) ; def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) & def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) ; def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) & def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) ; def GuessFormat(datestring): # define the bricks bricks = re.compile(r""" (?(DEFINE) (?P<year_def>[12]\d{3}) (?P<year_short_def>\d{2}) (?P<month_def>January|February|March|April|May|June| July|August|September|October|November|December) (?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01])) (?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day) (?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun) (?P<hms_def>T?\d{2}:\d{2}:\d{2}) (?P<hm_def>T?\d{2}:\d{2}) (?P<ms_def>\d{5,6}) (?P<delim_def>([-/., ]+|(?<=\d|^)T)) ) # actually match them (?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)| (?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$) """, re.VERBOSE) # delim delim = re.compile(r'([-/., ]+|(?<=\d)T)') # formats formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''} parts = delim.split(datestring) out = [] for index, part in enumerate(parts): try: brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items())) key = next(iter(brick)) # ambiguities if key == 'day' and index == 2: key = 'month_dec' item = part if key == 'delim' else formats[key] out.append(item) except AttributeError: out.append(part) return "".join(out) 

Тест в конце:

 import regex as re datestrings = [datetime.now().isoformat(), '2006-11-02', 'Thursday, 10 August 2006 08:42:51', 'August 9, 1995', 'Aug 9, 1995', 'Thu, 01 Jan 1970 00:00:00', '21/11/06 16:30', '06 Jun 2017 20:33:10'] # test for dt in datestrings: print("Date: {}, Format: {}".format(dt, GuessFormat(dt))) 

Это дает:

 Date: 2017-06-07T22:02:05.001811, Format: %Y-%m-%dT%H:%M:%S.%f Date: 2006-11-02, Format: %Y-%m-%d Date: Thursday, 10 August 2006 08:42:51, Format: %A, %m %B %Y %H:%M:%S Date: August 9, 1995, Format: %B %m, %Y Date: Aug 9, 1995, Format: %b %m, %Y Date: Thu, 01 Jan 1970 00:00:00, Format: %a, %m %b %Y %H:%M:%S Date: 21/11/06 16:30, Format: %d/%m/%d %H:%M Date: 06 Jun 2017 20:33:10, Format: %d %b %Y %H:%M:%S 

Вы можете написать собственный синтаксический анализатор:

 import datetime class DateFormatFinder: def __init__(self): self.fmts = [] def add(self,fmt): self.fmts.append(fmt) def find(self, ss): for fmt in self.fmts: try: datetime.datetime.strptime(ss, fmt) return fmt except: pass return None 

Вы можете использовать его следующим образом:

 >>> df = DateFormatFinder() >>> df.add('%m/%d/%y %H:%M') >>> df.add('%m/%d/%y') >>> df.add('%H:%M') >>> df.find("01/02/06 16:30") '%m/%d/%y %H:%M' >>> df.find("01/02/06") '%m/%d/%y' >>> df.find("16:30") '%H:%M' >>> df.find("01/02/06 16:30") '%m/%d/%y %H:%M' >>> df.find("01/02/2006") 

Однако это не так просто, поскольку даты могут быть неоднозначными, и их формат не может быть определен без какого-либо контекста.

 >>> datetime.strptime("01/02/06 16:30", "%m/%d/%y %H:%M") # us format datetime.datetime(2006, 1, 2, 16, 30) >>> datetime.strptime("01/02/06 16:30", "%d/%m/%y %H:%M") # european format datetime.datetime(2006, 2, 1, 16, 30) 
  • Как получить min, seconds и milliseconds из datetime.now () в python?
  • Почему datetime.datetime.utcnow () не содержит информацию о часовом поясе?
  • Вставить tzinfo в datetime
  • Как увеличить день в datetime? питон
  • Как анализировать дату / время RFC 2822 в дату и время Python?
  • Найдите, прошло ли 24 часа между датами - Python
  • Выберите отчетные годы и месяцы для страницы архива Django
  • как создать идентификатор группы на основе 5-минутного интервала в pandas timeseries?
  • Панды: час возвращения из колонки Datetime Directly
  • Как фильтровать объект на основе диапазона DateTimeField в Python (Django) с помощью Tastypie
  • В Python, как отображать текущее время в читаемом формате
  • Python - лучший язык программирования в мире.