as.Date在一系列基于周的日期中产生意外结果

时间:2022-09-03 22:52:04

I am working on the transformation of week based dates to month based dates.

我正在努力将基于周的日期转换为基于月份的日期。

When checking my work, I found the following problem in my data which is the result of a simple call to as.Date()

在检查我的工作时,我在我的数据中发现了以下问题,这是对as.Date()的简单调用的结果

as.Date("2016-50-4", format = "%Y-%U-%u")
as.Date("2016-50-5", format = "%Y-%U-%u")
as.Date("2016-50-6", format = "%Y-%U-%u")
as.Date("2016-50-7", format = "%Y-%U-%u") # this is the problem

The previous code yields correct date for the first 3 lines:

前面的代码产生前3行的正确日期:

"2016-12-15"
"2016-12-16"
"2016-12-17"  

The last line of code however, goes back 1 week:

然而,最后一行代码可以追溯到1周:

 "2016-12-11"

Can anybody explain what is happening here?

谁能解释一下这里发生了什么?

3 个解决方案

#1


7  

Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek package:

一年中的一周工作可能变得非常棘手。您可以尝试使用ISOweek包转换日期:

# create date strings in the format given by the OP
wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
# convert to "normal" dates
ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))

The result

#[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"

is of class Date.

属于Date类。

Note that the ISO week-based date format is yyyy-Www-d with a capital W preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd.

请注意,基于ISO周的日期格式为yyyy-Www-d,其中大写字母W在周数之前。这需要将其与标准的基于月份的日期格式yyyy-mm-dd区分开来。

So, in order to convert the date strings provided by the OP using ISOweek2date() it is necessary to insert a W after the first hyphen which is accomplished by replacing the first - by -W in each string.

因此,为了使用ISOweek2date()转换OP提供的日期字符串,必须在第一个连字符之后插入一个W,这是通过替换每个字符串中的第一个-W来实现的。

Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7 is converted to 2017-01-01.

另请注意,ISO周从星期一开始,一周的日期编号为1到7.属于ISO周的年份可能与日历年不同。这可以从上面的样本日期看出,其中基于周的日期2016-W52-7被转换为2017-01-01。

About the ISOweek package

Back in 2011, the %G, %g, %u, and %V format specifications weren't available to strptime() in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime for details).

早在2011年,Windows版本的R中的strptime()无法获得%G,%g,%u和%V格式规范。这很烦人,因为我必须准备每周报告,包括每周一周比较。我花了几个小时找到处理ISO周,ISO工作日和ISO年的解决方案。最后,我最终创建了ISOweek包并在CRAN上发布。今天,该软件包仍有其优点,因为上述格式在输入时被忽略(有关详细信息,请参阅?strptime)。

#2


5  

As @lmo said in the comments, %u stands for the weekdays as a decimal number (1–7, with Monday as 1) and %U stands for the week of the year as decimal number (00–53) using Sunday as the first day. Thus, as.Date("2016-50-7", format = "%Y-%U-%u") will result in "2016-12-11".

正如@lmo在评论中所说,%u代表工作日的十进制数字(1-7,星期一为1),%U代表一年中的星期,十进制数字(00-53)使用星期日作为第一天。因此,as.Date(“2016-50-7”,format =“%Y-%U-%u”)将导致“2016-12-11”。

However, if that should give "2016-12-18", then you should use a week format that has also Monday as starting day. According to the documentation of ?strptime you would expect that the format "%Y-%V-%u" thus gives the correct output, where %V stands for the week of the year as decimal number (01–53) with monday as the first day.

但是,如果这应该给出“2016-12-18”,那么您应该使用星期一作为开始日的星期格式。根据?strptime的文档,您可以预期格式“%Y-%V-%u”因此给出正确的输出,其中%V代表一年中的星期,作为十进制数(01-53),星期一为第一天。

Unfortunately, it doesn't:

不幸的是,它没有:

> as.Date("2016-50-7", format = "%Y-%V-%u")
[1] "2016-01-18"

However, at the end of the explanation of %V it sais "Accepted but ignored on input" meaning that it won't work.

但是,在对%V的解释结束时,它是“已接受但在输入时被忽略”,这意味着它将无效。

You can circumvent this behavior as follows to get the correct dates:

您可以按如下方式规避此行为以获取正确的日期:

# create a vector of dates
d <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1")

# convert to the correct dates
as.Date(paste0(substr(d,1,8), as.integer(substring(d,9))-1), "%Y-%U-%w") + 1

which gives:

[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19"

#3


2  

The issue is because for %u, 1 is Monday and 7 is Sunday of the week. The problem is further complicated by the fact that %U assumes week begins on Sunday.

问题是因为对于%u,1是星期一,7是星期日。由于%U假设周从星期日开始,这个问题变得更加复杂。

For the given input and expected behavior of format = "%Y-%U-%u", the output of line 4 is consistent with the output of previous 3 lines.

对于format =“%Y-%U-%u”的给定输入和预期行为,第4行的输出与前3行的输出一致。

That is, if you want to use format = "%Y-%U-%u", you should pre-process your input. In this case, the fourth line would have to be as.Date("2016-51-7", format = "%Y-%U-%u") as revealed by

也就是说,如果要使用format =“%Y-%U-%u”,则应预先处理输入。在这种情况下,第四行必须是as.Date(“2016-51-7”,格式=“%Y-%U-%u”)

format(as.Date("2016-12-18"), "%Y-%U-%u")
# "2016-51-7"

Instead, you are currently passing "2016-50-7".

相反,您目前正在通过“2016-50-7”。

Better way of doing it might be to use the approach suggested in Uwe Block's answer. Since you are happy with "2016-50-4" being transformed to "2016-12-15", I suspect in your raw data, Monday is counted as 1 too. You could also create a custom function that changes the value of %U to count the week number as if week begins on Monday so that the output is as you expected.

更好的方法可能是使用Uwe Block答案中建议的方法。由于您对“2016-50-4”转换为“2016-12-15”感到满意,我怀疑在您的原始数据中,星期一也算作1。您还可以创建一个自定义函数,该函数会更改%U的值以计算周数,就好像周从星期一开始一样,以便输出符合您的预期。

#Function to change value of %U so that the week begins on Monday
pre_process = function(x, delim = "-"){
    y = unlist(strsplit(x,delim))
    # If the last day of the year is 7 (Sunday for %u),
    # add 1 to the week to make it the week 00 of the next year
    # I think there might be a better solution for this
    if (y[2] == "53" & y[3] == "7"){
        x = paste(as.integer(y[1])+1,"00",y[3],sep = delim)
    } else if (y[3] == "7"){
    # If the day is 7 (Sunday for %u), add 1 to the week 
        x = paste(y[1],as.integer(y[2])+1,y[3],sep = delim)
    }
    return(x)
}

And usage would be

用法就是

as.Date(pre_process("2016-50-7"), format = "%Y-%U-%u")
# [1] "2016-12-18"

I'm not quite sure how to handle when the year ends on a Sunday.

我不太确定如何在周日结束时如何处理。

#1


7  

Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek package:

一年中的一周工作可能变得非常棘手。您可以尝试使用ISOweek包转换日期:

# create date strings in the format given by the OP
wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
# convert to "normal" dates
ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))

The result

#[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"

is of class Date.

属于Date类。

Note that the ISO week-based date format is yyyy-Www-d with a capital W preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd.

请注意,基于ISO周的日期格式为yyyy-Www-d,其中大写字母W在周数之前。这需要将其与标准的基于月份的日期格式yyyy-mm-dd区分开来。

So, in order to convert the date strings provided by the OP using ISOweek2date() it is necessary to insert a W after the first hyphen which is accomplished by replacing the first - by -W in each string.

因此,为了使用ISOweek2date()转换OP提供的日期字符串,必须在第一个连字符之后插入一个W,这是通过替换每个字符串中的第一个-W来实现的。

Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7 is converted to 2017-01-01.

另请注意,ISO周从星期一开始,一周的日期编号为1到7.属于ISO周的年份可能与日历年不同。这可以从上面的样本日期看出,其中基于周的日期2016-W52-7被转换为2017-01-01。

About the ISOweek package

Back in 2011, the %G, %g, %u, and %V format specifications weren't available to strptime() in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime for details).

早在2011年,Windows版本的R中的strptime()无法获得%G,%g,%u和%V格式规范。这很烦人,因为我必须准备每周报告,包括每周一周比较。我花了几个小时找到处理ISO周,ISO工作日和ISO年的解决方案。最后,我最终创建了ISOweek包并在CRAN上发布。今天,该软件包仍有其优点,因为上述格式在输入时被忽略(有关详细信息,请参阅?strptime)。

#2


5  

As @lmo said in the comments, %u stands for the weekdays as a decimal number (1–7, with Monday as 1) and %U stands for the week of the year as decimal number (00–53) using Sunday as the first day. Thus, as.Date("2016-50-7", format = "%Y-%U-%u") will result in "2016-12-11".

正如@lmo在评论中所说,%u代表工作日的十进制数字(1-7,星期一为1),%U代表一年中的星期,十进制数字(00-53)使用星期日作为第一天。因此,as.Date(“2016-50-7”,format =“%Y-%U-%u”)将导致“2016-12-11”。

However, if that should give "2016-12-18", then you should use a week format that has also Monday as starting day. According to the documentation of ?strptime you would expect that the format "%Y-%V-%u" thus gives the correct output, where %V stands for the week of the year as decimal number (01–53) with monday as the first day.

但是,如果这应该给出“2016-12-18”,那么您应该使用星期一作为开始日的星期格式。根据?strptime的文档,您可以预期格式“%Y-%V-%u”因此给出正确的输出,其中%V代表一年中的星期,作为十进制数(01-53),星期一为第一天。

Unfortunately, it doesn't:

不幸的是,它没有:

> as.Date("2016-50-7", format = "%Y-%V-%u")
[1] "2016-01-18"

However, at the end of the explanation of %V it sais "Accepted but ignored on input" meaning that it won't work.

但是,在对%V的解释结束时,它是“已接受但在输入时被忽略”,这意味着它将无效。

You can circumvent this behavior as follows to get the correct dates:

您可以按如下方式规避此行为以获取正确的日期:

# create a vector of dates
d <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1")

# convert to the correct dates
as.Date(paste0(substr(d,1,8), as.integer(substring(d,9))-1), "%Y-%U-%w") + 1

which gives:

[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19"

#3


2  

The issue is because for %u, 1 is Monday and 7 is Sunday of the week. The problem is further complicated by the fact that %U assumes week begins on Sunday.

问题是因为对于%u,1是星期一,7是星期日。由于%U假设周从星期日开始,这个问题变得更加复杂。

For the given input and expected behavior of format = "%Y-%U-%u", the output of line 4 is consistent with the output of previous 3 lines.

对于format =“%Y-%U-%u”的给定输入和预期行为,第4行的输出与前3行的输出一致。

That is, if you want to use format = "%Y-%U-%u", you should pre-process your input. In this case, the fourth line would have to be as.Date("2016-51-7", format = "%Y-%U-%u") as revealed by

也就是说,如果要使用format =“%Y-%U-%u”,则应预先处理输入。在这种情况下,第四行必须是as.Date(“2016-51-7”,格式=“%Y-%U-%u”)

format(as.Date("2016-12-18"), "%Y-%U-%u")
# "2016-51-7"

Instead, you are currently passing "2016-50-7".

相反,您目前正在通过“2016-50-7”。

Better way of doing it might be to use the approach suggested in Uwe Block's answer. Since you are happy with "2016-50-4" being transformed to "2016-12-15", I suspect in your raw data, Monday is counted as 1 too. You could also create a custom function that changes the value of %U to count the week number as if week begins on Monday so that the output is as you expected.

更好的方法可能是使用Uwe Block答案中建议的方法。由于您对“2016-50-4”转换为“2016-12-15”感到满意,我怀疑在您的原始数据中,星期一也算作1。您还可以创建一个自定义函数,该函数会更改%U的值以计算周数,就好像周从星期一开始一样,以便输出符合您的预期。

#Function to change value of %U so that the week begins on Monday
pre_process = function(x, delim = "-"){
    y = unlist(strsplit(x,delim))
    # If the last day of the year is 7 (Sunday for %u),
    # add 1 to the week to make it the week 00 of the next year
    # I think there might be a better solution for this
    if (y[2] == "53" & y[3] == "7"){
        x = paste(as.integer(y[1])+1,"00",y[3],sep = delim)
    } else if (y[3] == "7"){
    # If the day is 7 (Sunday for %u), add 1 to the week 
        x = paste(y[1],as.integer(y[2])+1,y[3],sep = delim)
    }
    return(x)
}

And usage would be

用法就是

as.Date(pre_process("2016-50-7"), format = "%Y-%U-%u")
# [1] "2016-12-18"

I'm not quite sure how to handle when the year ends on a Sunday.

我不太确定如何在周日结束时如何处理。