I have lots of dates in a column in a CSV file that I need to convert from dd/mm/yyyy to yyyy-mm-dd format. For example 17/01/2010 should be converted to 2010-01-17.
我在CSV文件的列中有很多日期,我需要将其从dd / mm / yyyy转换为yyyy-mm-dd格式。例如,17/01/2010应转换为2010-01-17。
How can I do this in Perl or Python?
我怎么能用Perl或Python做到这一点?
8 个解决方案
#1
17
>>> from datetime import datetime
>>> datetime.strptime('02/11/2010', '%d/%m/%Y').strftime('%Y-%m-%d')
'2010-11-02'
or more hackish way (that doesn't check for validity of values):
或更多的hackish方式(不检查值的有效性):
>>> '-'.join('02/11/2010'.split('/')[::-1])
'2010-11-02'
>>> '-'.join(reversed('02/11/2010'.split('/')))
'2010-11-02'
#2
29
If you are guaranteed to have well-formed data consisting of nothing else but a singleton date in the DD-MM-YYYY format, then this works:
如果您保证格式良好的数据只包含DD-MM-YYYY格式的单例日期,那么这有效:
# FIRST METHOD
my $ndate = join("-" => reverse split(m[/], $date));
That works on a $date
holding "07/04/1776" but fails on "this 17/01/2010 and that 01/17/2010 there". Instead, use:
这适用于持有“07/04/1776”的$ date,但在“这个17/01/2010和那个01/17/2010那里”失败了。相反,使用:
# SECOND METHOD
($ndate = $date) =~ s{
\b
( \d \d )
/ ( \d \d )
/ ( \d {4} )
\b
}{$3-$2-$1}gx;
If you prefer a more "grammatical" regex, so that it’s easier to maintain and update, you can instead use this:
如果您更喜欢更“语法”的正则表达式,以便更容易维护和更新,您可以改为使用:
# THIRD METHOD
($ndate = $date) =~ s{
(?&break)
(?<DAY> (?&day) )
(?&slash) (?<MONTH> (?&month) )
(?&slash) (?<YEAR> (?&year) )
(?&break)
(?(DEFINE)
(?<break> \b )
(?<slash> / )
(?<year> \d {4} )
(?<month> \d {2} )
(?<day> \d {2} )
)
}{
join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;
Finally, if you have Unicode data, you might want to be a bit more careful.
最后,如果您有Unicode数据,您可能需要更加小心。
# FOURTH METHOD
($ndate = $date) =~ s{
(?&break_before)
(?<DAY> (?&day) )
(?&slash) (?<MONTH> (?&month) )
(?&slash) (?<YEAR> (?&year) )
(?&break_after)
(?(DEFINE)
(?<slash> / )
(?<start> \A )
(?<finish> \z )
# don't really want to use \D or [^0-9] here:
(?<break_before>
(?<= [\pC\pP\pS\p{Space}] )
| (?<= \A )
)
(?<break_after>
(?= [\pC\pP\pS\p{Space}]
| \z
)
)
(?<digit> \d )
(?<year> (?&digit) {4} )
(?<month> (?&digit) {2} )
(?<day> (?&digit) {2} )
)
}{
join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;
You can see how each of these four approaches performs when confronted with sample input strings like these:
您可以看到这四种方法在面对如下样本输入字符串时如何执行:
my $sample = q(17/01/2010);
my @strings = (
$sample, # trivial case
# multiple case
"this $sample and that $sample there",
# multiple case with non-ASCII BMP code points
# U+201C and U+201D are LEFT and RIGHT DOUBLE QUOTATION MARK
"from \x{201c}$sample\x{201d} through\xA0$sample",
# multiple case with non-ASCII code points
# from both the BMP and the SMP
# code point U+02013 is EN DASH, props \pP \p{Pd}
# code point U+10179 is GREEK YEAR SIGN, props \pS \p{So}
# code point U+110BD is KAITHI NUMBER SIGN, props \pC \p{Cf}
"\x{10179}$sample\x{2013}\x{110BD}$sample",
);
Now letting $date
be a foreach
iterator through that array, we get this output:
现在让$ date成为通过该数组的foreach迭代器,我们得到这个输出:
Original is: 17/01/2010
First method: 2010-01-17
Second method: 2010-01-17
Third method: 2010-01-17
Fourth method: 2010-01-17
Original is: this 17/01/2010 and that 17/01/2010 there
First method: 2010 there-01-2010 and that 17-01-this 17
Second method: this 2010-01-17 and that 2010-01-17 there
Third method: this 2010-01-17 and that 2010-01-17 there
Fourth method: this 2010-01-17 and that 2010-01-17 there
Original is: from “17/01/2010” through 17/01/2010
First method: 2010-01-2010” through 17-01-from “17
Second method: from “2010-01-17” through 2010-01-17
Third method: from “2010-01-17” through 2010-01-17
Fourth method: from “2010-01-17” through 2010-01-17
Original is: ????17/01/2010–????17/01/2010
First method: 2010-01-2010–????17-01-????17
Second method: ????2010-01-17–????2010-01-17
Third method: ????2010-01-17–????2010-01-17
Fourth method: ????2010-01-17–????2010-01-17
Now let’s suppose that you actually do want to match non-ASCII digits. For example:
现在让我们假设你确实想要匹配非ASCII数字。例如:
U+660 ARABIC-INDIC DIGIT ZERO
U+661 ARABIC-INDIC DIGIT ONE
U+662 ARABIC-INDIC DIGIT TWO
U+663 ARABIC-INDIC DIGIT THREE
U+664 ARABIC-INDIC DIGIT FOUR
U+665 ARABIC-INDIC DIGIT FIVE
U+666 ARABIC-INDIC DIGIT SIX
U+667 ARABIC-INDIC DIGIT SEVEN
U+668 ARABIC-INDIC DIGIT EIGHT
U+669 ARABIC-INDIC DIGIT NINE
or even
甚至
U+1D7F6 MATHEMATICAL MONOSPACE DIGIT ZERO
U+1D7F7 MATHEMATICAL MONOSPACE DIGIT ONE
U+1D7F8 MATHEMATICAL MONOSPACE DIGIT TWO
U+1D7F9 MATHEMATICAL MONOSPACE DIGIT THREE
U+1D7FA MATHEMATICAL MONOSPACE DIGIT FOUR
U+1D7FB MATHEMATICAL MONOSPACE DIGIT FIVE
U+1D7FC MATHEMATICAL MONOSPACE DIGIT SIX
U+1D7FD MATHEMATICAL MONOSPACE DIGIT SEVEN
U+1D7FE MATHEMATICAL MONOSPACE DIGIT EIGHT
U+1D7FF MATHEMATICAL MONOSPACE DIGIT NINE
So imagine you have a date in mathematical monospace digits, like this:
所以假设你有一个数学等宽数字的日期,如下所示:
$date = "\x{1D7F7}\x{1D7FD}/\x{1D7F7}\x{1D7F6}/\x{1D7F8}\x{1D7F6}\x{1D7F7}\x{1D7F6}";
The Perl code will work just fine on that:
Perl代码可以正常工作:
Original is: ????????/????????/????????????????
First method: ????????????????-????????-????????
Second method: ????????????????-????????-????????
Third method: ????????????????-????????-????????
Fourth method: ????????????????-????????-????????
I think you’ll find that Python has a pretty brain‐damaged Unicode model whose lack of support for abstract characters and strings irrespective of content makes it ridiculously difficult to write things like this.
我想你会发现Python有一个相当大脑损坏的Unicode模型,它缺乏对抽象字符和字符串的支持,无论内容如何都会让写这样的东西变得非常困难。
It’s also tough to write legible regular expressions in Python where you decouple the declaration of the subexpressions from their execution, since (?(DEFINE)...)
blocks are not supported there. Heck, Python doesn’t even support Unicode properties. It’s just not suitable for Unicode regex work because of this.
在Python中编写清晰的正则表达式也是很困难的,你可以将子表达式的声明与它们的执行分离,因为那里不支持(?(DEFINE)...)块。哎呀,Python甚至不支持Unicode属性。因此,它不适合Unicode正则表达式工作。
But hey, if you think that’s bad in Python compared to Perl (and it certainly is), just try any other language. I haven’t found one that isn’t still worse for this sort of work.
但是,嘿,如果你认为Python相比Perl更糟糕(当然也是如此),那就试试其他任何语言吧。我没有找到一个对这类工作来说还不差的人。
As you see, you run into real problems when you ask for regex solutions from multiple languages. First of all, the solutions are difficult to compare because of the different regex flavors. But also because no other language can compare with Perl for power, expressivity, and maintainability in its regular expressions. This may become even more obvious once arbitrary Unicode enters the picture.
如您所见,当您要求使用多种语言的正则表达式解决方案时,您会遇到实际问题。首先,由于不同的正则表达风味,难以比较解决方案。但也因为没有其他语言可以与Perl在正则表达式中的功能,表现力和可维护性进行比较。一旦任意Unicode进入图片,这可能会变得更加明显。
So if you just wanted Python, you should have asked for only that. Otherwise it’s a terribly unfair contest that Python will nearly always lose; it’s just too messy to get things like this correct in Python, let alone both correct and clean. That’s asking more of it than it can produce.
所以如果你只是想要Python,你应该只是要求它。否则,这将是一场非常不公平的比赛,Python几乎总会失败;在Python中使用这样的东西太正确了,更不用说正确和干净了。这比它能产生的要多得多。
In contrast, Perl’s regexes excel at both those.
相比之下,Perl的正则表达式在这两方面都表现出色。
#3
11
Use Time::Piece (in core since 5.9.5), very similar to the Python solution accepted, as it provides the strptime and strftime functions:
使用Time :: Piece(自5.9.5以来的核心),与接受的Python解决方案非常相似,因为它提供了strptime和strftime函数:
use Time::Piece;
my $dt_str = Time::Piece->strptime('13/10/1979', '%d/%m/%Y')->strftime('%Y-%m-%d');
or
要么
$ perl -MTime::Piece
print Time::Piece->strptime('13/10/1979', '%d/%m/%Y')->strftime('%Y-%m-%d');
1979-10-13
$
#4
6
Go with Perl: the datetime
Python package is just broken. You could just do it with regexes to swap the date parts around, eg
使用Perl:日期时间Python包刚刚破解。您可以使用正则表达式来交换周围的日期部分,例如
echo "17/01/2010" | perl -pe 's{(\d+)/(\d+)/(\d+)}{$3-$2-$1}g'
If you do need to parse these dates (eg to compute their day of week or other calendar-type operations), look into DateTimeX::Easy (you can install it with apt-get
under Ubuntu):
如果你确实需要解析这些日期(例如计算他们的星期几或其他日历类型的操作),请查看DateTimeX :: Easy(您可以在Ubuntu下使用apt-get安装它):
perl -MDateTimeX::Easy -e 'print DateTimeX::Easy->parse("17/01/2010")->ymd("-")'
#5
5
Perl :
Perl:
while (<>) {
s/(^|[^\d])(\d\d)\/(\d\d)\/(\d{4})($|[^\d])/$4-$3-$2/g;
print $_;
}
Then you just have to run:
然后你只需要运行:
perl MyScript.pl < oldfile.txt > newfile.txt
#6
1
Perl:
Perl的:
my $date =~ s/(\d+)\/(\d+)\/(\d+)/$3-$2-$1/;
#7
0
In Perl you can do:
在Perl中你可以做到:
use strict;
while(<>) {
chomp;
my($d,$m,$y) = split/\//;
my $newDate = $y.'-'.$m.'-'.$d;
}
#8
-2
In glorious perl-oneliner form:
以光荣的perl-oneliner形式:
echo 17/01/2010 | perl -p -e "chomp; join('-', reverse split /\//);"
But seriously I would do it like this:
但严肃地说,我会这样做:
#!/usr/bin/env perl
while (<>) {
chomp;
print join('-', reverse split /\//), "\n";
}
Which will work on a pipe, converting and printing one date per line.
这将适用于管道,每行转换和打印一个日期。
#1
17
>>> from datetime import datetime
>>> datetime.strptime('02/11/2010', '%d/%m/%Y').strftime('%Y-%m-%d')
'2010-11-02'
or more hackish way (that doesn't check for validity of values):
或更多的hackish方式(不检查值的有效性):
>>> '-'.join('02/11/2010'.split('/')[::-1])
'2010-11-02'
>>> '-'.join(reversed('02/11/2010'.split('/')))
'2010-11-02'
#2
29
If you are guaranteed to have well-formed data consisting of nothing else but a singleton date in the DD-MM-YYYY format, then this works:
如果您保证格式良好的数据只包含DD-MM-YYYY格式的单例日期,那么这有效:
# FIRST METHOD
my $ndate = join("-" => reverse split(m[/], $date));
That works on a $date
holding "07/04/1776" but fails on "this 17/01/2010 and that 01/17/2010 there". Instead, use:
这适用于持有“07/04/1776”的$ date,但在“这个17/01/2010和那个01/17/2010那里”失败了。相反,使用:
# SECOND METHOD
($ndate = $date) =~ s{
\b
( \d \d )
/ ( \d \d )
/ ( \d {4} )
\b
}{$3-$2-$1}gx;
If you prefer a more "grammatical" regex, so that it’s easier to maintain and update, you can instead use this:
如果您更喜欢更“语法”的正则表达式,以便更容易维护和更新,您可以改为使用:
# THIRD METHOD
($ndate = $date) =~ s{
(?&break)
(?<DAY> (?&day) )
(?&slash) (?<MONTH> (?&month) )
(?&slash) (?<YEAR> (?&year) )
(?&break)
(?(DEFINE)
(?<break> \b )
(?<slash> / )
(?<year> \d {4} )
(?<month> \d {2} )
(?<day> \d {2} )
)
}{
join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;
Finally, if you have Unicode data, you might want to be a bit more careful.
最后,如果您有Unicode数据,您可能需要更加小心。
# FOURTH METHOD
($ndate = $date) =~ s{
(?&break_before)
(?<DAY> (?&day) )
(?&slash) (?<MONTH> (?&month) )
(?&slash) (?<YEAR> (?&year) )
(?&break_after)
(?(DEFINE)
(?<slash> / )
(?<start> \A )
(?<finish> \z )
# don't really want to use \D or [^0-9] here:
(?<break_before>
(?<= [\pC\pP\pS\p{Space}] )
| (?<= \A )
)
(?<break_after>
(?= [\pC\pP\pS\p{Space}]
| \z
)
)
(?<digit> \d )
(?<year> (?&digit) {4} )
(?<month> (?&digit) {2} )
(?<day> (?&digit) {2} )
)
}{
join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;
You can see how each of these four approaches performs when confronted with sample input strings like these:
您可以看到这四种方法在面对如下样本输入字符串时如何执行:
my $sample = q(17/01/2010);
my @strings = (
$sample, # trivial case
# multiple case
"this $sample and that $sample there",
# multiple case with non-ASCII BMP code points
# U+201C and U+201D are LEFT and RIGHT DOUBLE QUOTATION MARK
"from \x{201c}$sample\x{201d} through\xA0$sample",
# multiple case with non-ASCII code points
# from both the BMP and the SMP
# code point U+02013 is EN DASH, props \pP \p{Pd}
# code point U+10179 is GREEK YEAR SIGN, props \pS \p{So}
# code point U+110BD is KAITHI NUMBER SIGN, props \pC \p{Cf}
"\x{10179}$sample\x{2013}\x{110BD}$sample",
);
Now letting $date
be a foreach
iterator through that array, we get this output:
现在让$ date成为通过该数组的foreach迭代器,我们得到这个输出:
Original is: 17/01/2010
First method: 2010-01-17
Second method: 2010-01-17
Third method: 2010-01-17
Fourth method: 2010-01-17
Original is: this 17/01/2010 and that 17/01/2010 there
First method: 2010 there-01-2010 and that 17-01-this 17
Second method: this 2010-01-17 and that 2010-01-17 there
Third method: this 2010-01-17 and that 2010-01-17 there
Fourth method: this 2010-01-17 and that 2010-01-17 there
Original is: from “17/01/2010” through 17/01/2010
First method: 2010-01-2010” through 17-01-from “17
Second method: from “2010-01-17” through 2010-01-17
Third method: from “2010-01-17” through 2010-01-17
Fourth method: from “2010-01-17” through 2010-01-17
Original is: ????17/01/2010–????17/01/2010
First method: 2010-01-2010–????17-01-????17
Second method: ????2010-01-17–????2010-01-17
Third method: ????2010-01-17–????2010-01-17
Fourth method: ????2010-01-17–????2010-01-17
Now let’s suppose that you actually do want to match non-ASCII digits. For example:
现在让我们假设你确实想要匹配非ASCII数字。例如:
U+660 ARABIC-INDIC DIGIT ZERO
U+661 ARABIC-INDIC DIGIT ONE
U+662 ARABIC-INDIC DIGIT TWO
U+663 ARABIC-INDIC DIGIT THREE
U+664 ARABIC-INDIC DIGIT FOUR
U+665 ARABIC-INDIC DIGIT FIVE
U+666 ARABIC-INDIC DIGIT SIX
U+667 ARABIC-INDIC DIGIT SEVEN
U+668 ARABIC-INDIC DIGIT EIGHT
U+669 ARABIC-INDIC DIGIT NINE
or even
甚至
U+1D7F6 MATHEMATICAL MONOSPACE DIGIT ZERO
U+1D7F7 MATHEMATICAL MONOSPACE DIGIT ONE
U+1D7F8 MATHEMATICAL MONOSPACE DIGIT TWO
U+1D7F9 MATHEMATICAL MONOSPACE DIGIT THREE
U+1D7FA MATHEMATICAL MONOSPACE DIGIT FOUR
U+1D7FB MATHEMATICAL MONOSPACE DIGIT FIVE
U+1D7FC MATHEMATICAL MONOSPACE DIGIT SIX
U+1D7FD MATHEMATICAL MONOSPACE DIGIT SEVEN
U+1D7FE MATHEMATICAL MONOSPACE DIGIT EIGHT
U+1D7FF MATHEMATICAL MONOSPACE DIGIT NINE
So imagine you have a date in mathematical monospace digits, like this:
所以假设你有一个数学等宽数字的日期,如下所示:
$date = "\x{1D7F7}\x{1D7FD}/\x{1D7F7}\x{1D7F6}/\x{1D7F8}\x{1D7F6}\x{1D7F7}\x{1D7F6}";
The Perl code will work just fine on that:
Perl代码可以正常工作:
Original is: ????????/????????/????????????????
First method: ????????????????-????????-????????
Second method: ????????????????-????????-????????
Third method: ????????????????-????????-????????
Fourth method: ????????????????-????????-????????
I think you’ll find that Python has a pretty brain‐damaged Unicode model whose lack of support for abstract characters and strings irrespective of content makes it ridiculously difficult to write things like this.
我想你会发现Python有一个相当大脑损坏的Unicode模型,它缺乏对抽象字符和字符串的支持,无论内容如何都会让写这样的东西变得非常困难。
It’s also tough to write legible regular expressions in Python where you decouple the declaration of the subexpressions from their execution, since (?(DEFINE)...)
blocks are not supported there. Heck, Python doesn’t even support Unicode properties. It’s just not suitable for Unicode regex work because of this.
在Python中编写清晰的正则表达式也是很困难的,你可以将子表达式的声明与它们的执行分离,因为那里不支持(?(DEFINE)...)块。哎呀,Python甚至不支持Unicode属性。因此,它不适合Unicode正则表达式工作。
But hey, if you think that’s bad in Python compared to Perl (and it certainly is), just try any other language. I haven’t found one that isn’t still worse for this sort of work.
但是,嘿,如果你认为Python相比Perl更糟糕(当然也是如此),那就试试其他任何语言吧。我没有找到一个对这类工作来说还不差的人。
As you see, you run into real problems when you ask for regex solutions from multiple languages. First of all, the solutions are difficult to compare because of the different regex flavors. But also because no other language can compare with Perl for power, expressivity, and maintainability in its regular expressions. This may become even more obvious once arbitrary Unicode enters the picture.
如您所见,当您要求使用多种语言的正则表达式解决方案时,您会遇到实际问题。首先,由于不同的正则表达风味,难以比较解决方案。但也因为没有其他语言可以与Perl在正则表达式中的功能,表现力和可维护性进行比较。一旦任意Unicode进入图片,这可能会变得更加明显。
So if you just wanted Python, you should have asked for only that. Otherwise it’s a terribly unfair contest that Python will nearly always lose; it’s just too messy to get things like this correct in Python, let alone both correct and clean. That’s asking more of it than it can produce.
所以如果你只是想要Python,你应该只是要求它。否则,这将是一场非常不公平的比赛,Python几乎总会失败;在Python中使用这样的东西太正确了,更不用说正确和干净了。这比它能产生的要多得多。
In contrast, Perl’s regexes excel at both those.
相比之下,Perl的正则表达式在这两方面都表现出色。
#3
11
Use Time::Piece (in core since 5.9.5), very similar to the Python solution accepted, as it provides the strptime and strftime functions:
使用Time :: Piece(自5.9.5以来的核心),与接受的Python解决方案非常相似,因为它提供了strptime和strftime函数:
use Time::Piece;
my $dt_str = Time::Piece->strptime('13/10/1979', '%d/%m/%Y')->strftime('%Y-%m-%d');
or
要么
$ perl -MTime::Piece
print Time::Piece->strptime('13/10/1979', '%d/%m/%Y')->strftime('%Y-%m-%d');
1979-10-13
$
#4
6
Go with Perl: the datetime
Python package is just broken. You could just do it with regexes to swap the date parts around, eg
使用Perl:日期时间Python包刚刚破解。您可以使用正则表达式来交换周围的日期部分,例如
echo "17/01/2010" | perl -pe 's{(\d+)/(\d+)/(\d+)}{$3-$2-$1}g'
If you do need to parse these dates (eg to compute their day of week or other calendar-type operations), look into DateTimeX::Easy (you can install it with apt-get
under Ubuntu):
如果你确实需要解析这些日期(例如计算他们的星期几或其他日历类型的操作),请查看DateTimeX :: Easy(您可以在Ubuntu下使用apt-get安装它):
perl -MDateTimeX::Easy -e 'print DateTimeX::Easy->parse("17/01/2010")->ymd("-")'
#5
5
Perl :
Perl:
while (<>) {
s/(^|[^\d])(\d\d)\/(\d\d)\/(\d{4})($|[^\d])/$4-$3-$2/g;
print $_;
}
Then you just have to run:
然后你只需要运行:
perl MyScript.pl < oldfile.txt > newfile.txt
#6
1
Perl:
Perl的:
my $date =~ s/(\d+)\/(\d+)\/(\d+)/$3-$2-$1/;
#7
0
In Perl you can do:
在Perl中你可以做到:
use strict;
while(<>) {
chomp;
my($d,$m,$y) = split/\//;
my $newDate = $y.'-'.$m.'-'.$d;
}
#8
-2
In glorious perl-oneliner form:
以光荣的perl-oneliner形式:
echo 17/01/2010 | perl -p -e "chomp; join('-', reverse split /\//);"
But seriously I would do it like this:
但严肃地说,我会这样做:
#!/usr/bin/env perl
while (<>) {
chomp;
print join('-', reverse split /\//), "\n";
}
Which will work on a pipe, converting and printing one date per line.
这将适用于管道,每行转换和打印一个日期。