Vim - 使用正则表达式按字母顺序比较字符串(查找更早/更晚的日期)

时间:2022-10-25 00:12:02

I want to write a simple regex, in vim, that will find all strings lexicographically smaller than another string.

我想在vim中编写一个简单的正则表达式,它会发现所有字符串在字典上都比另一个字符串小。

Specifically, I want to use this to compare dates formatted as 2014-02-17. These dates are lexicographically sortable, which is why I use them.

具体来说,我想用它来比较格式为2014-02-17的日期。这些日期是按字典顺序排序的,这就是我使用它们的原因。

My specific use case: I'm trying to run through a script and find all the dates that are earlier than today's today.

我的具体用例:我正在尝试运行一个脚本,找到比今天更早的所有日期。

I'm also OK with comparing these as numbers, or any other solution.

我也可以将这些作为数字或任何其他解决方案进行比较。

4 个解决方案

#1


3  

I don't think there is anyway to do this easily in regex. For matching any date earlier than the current date you can use run the function below (Some of the stuff was stolen from benjifisher)

我认为在正则表达式中无论如何都不容易这样做。为了匹配当前日期之前的任何日期,您可以使用下面的功能(有些东西是从benjifisher偷来的)

function! Convert_to_char_class(cur) 
    if a:cur =~ '[2-9]'
        return '[0-' . (a:cur-1) . ']'
    endif
    return '0'
endfunction

function! Match_number_before(num)
    let branches = []
    let init = ''
    for i in range(len(a:num))
        if a:num[i] =~ '[1-9]'
            call add(branches, init . Convert_to_char_class(a:num[i]) . repeat('\d', len(a:num) - i - 1))
        endif 
        let init .= a:num[i]
    endfor
    return '\%(' . join(branches, '\|') .'\)'
endfunction

function! Match_date_before(date)
    if a:date !~ '\v\d{4}-\d{2}-\d{2}'
        echo "invalid date"
        return
    endif

    let branches =[]

    let parts = split(a:date, '-')
    call add(branches, Match_number_before(parts[0]) . '-\d\{2}-\d\{2}')
    call add(branches, parts[0] . '-' . Match_number_before(parts[1]) . '-\d\{2}')
    call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_before(parts[2]))

    return '\%(' . join(branches, '\|') .'\)'
endfunction

To use you the following to search for all matches before 2014-02-24.

在2014-02-24之前使用以下内容搜索所有匹配项。

/<C-r>=Match_date_before('2014-02-24')

You might be able to wrap it in a function to set the search register if you wanted to.

如果您愿意,您可以将其包装在函数中以设置搜索寄存器。

The generated regex for dates before 2014-02-24 is the following.

生成的2014-02-24之前的正则表达式如下。

\%(\%([0-1]\d\d\d\|200\d\|201[0-3]\)-\d\{2}-\d\{2}\|2014-\%(0[0-1]\)-\d\{2}\|2014-02-\%([0-1]\d\|2[0-3]\)\)

It does not do any validation of dates. It assumes if you are in that format you are a date.

它不会对日期进行任何验证。它假设你是那种格式,你是约会。


Equivalent set of functions for matching after the passed in date.

传入日期后匹配的等效函数集。

function! Convert_to_char_class_after(cur) 
    if a:cur =~ '[0-7]'
        return '[' . (a:cur+1) . '-9]'
    endif
    return '9'
endfunction

function! Match_number_after(num)
    let branches = []
    let init = ''
    for i in range(len(a:num))
        if a:num[i] =~ '[0-8]'
            call add(branches, init . Convert_to_char_class_after(a:num[i]) . repeat('\d', len(a:num) - i - 1))
        endif 
        let init .= a:num[i]
    endfor
    return '\%(' . join(branches, '\|') .'\)'
endfunction

function! Match_date_after(date)
    if a:date !~ '\v\d{4}-\d{2}-\d{2}'
        echo "invalid date"
        return
    endif

    let branches =[]

    let parts = split(a:date, '-')
    call add(branches, Match_number_after(parts[0]) . '-\d\{2}-\d\{2}')
    call add(branches, parts[0] . '-' . Match_number_after(parts[1]) . '-\d\{2}')
    call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_after(parts[2]))

    return '\%(' . join(branches, '\|') .'\)'
endfunction

The regex generated was

产生的正则表达式是

\%(\%([3-9]\d\d\d\|2[1-9]\d\d\|20[2-9]\d\|201[5-9]\)-\d\{2}-\d\{2}\|2014-\%([1-9]\d\|0[3-9]\)-\d\{2}\|2014-02-\%([3-9]\d\|2[5-9]\)\)

#2


3  

You do not say how you want to use this; are you sure that you really want a regular expression? Perhaps you could get away with

你没有说你想如何使用它;你确定你真的想要一个正则表达式吗?也许你可以逃脱

if DateCmp(date, '2014-02-24') < 0
  " ...
endif

In that case, try this function.

在这种情况下,请尝试此功能。

" Compare formatted date strings:
" @param String date1, date2
"   dates in YYYY-MM-DD format, e.g. '2014-02-24'
" @return Integer
"   negative, zero, or positive according to date1 < date2, date1 == date2, or
"   date1 > date2
function! DateCmp(date1, date2)
  let [year1, month1, day1] = split(a:date1, '-')
  let [year2, month2, day2] = split(a:date2, '-')
  if year1 != year2
    return year1 - year2
  elseif month1 != month2
    return month1 - month2
  else
    return day1 - day2
  endif
endfun

If you really want a regular expression, then try this:

如果你真的想要一个正则表达式,那么试试这个:

" Construct a pattern that matches a formatted date string if and only if the
" date is less than the input date.  Usage:
" :echo '2014-02-24' =~ DateLessRE('2014-03-12')
function! DateLessRE(date)
  let init = ''
  let branches = []
  for c in split(a:date, '\zs')
    if c =~ '[1-9]'
      call add(branches, init . '[0-' . (c-1) . ']')
    endif
    let init .= c
  endfor
  return '\d\d\d\d-\d\d-\d\d\&\%(' . join(branches, '\|') . '\)'
endfun

Does that count as a "simple" regex? One way to use it would be to type :g/ and then CRTL-R and = and then DateLessRE('2014-02-24') and Enter, followed by the rest of your command. In other words,

这算是一个“简单”的正则表达式吗?使用它的一种方法是键入:g /然后键入CRTL-R和=然后键入DateLessRE('2014-02-24')和Enter,然后输入其余的命令。换一种说法,

:g/<C-R>=DateLessRE('2014-02-24')<CR>/s/foo/bar

EDIT: I added a concat (:help /\&) that matches a complete "formatted date string". Now, there is no need to anchor the pattern.

编辑:我添加了一个concat(:help / \&),匹配完整的“格式化日期字符串”。现在,没有必要锚定模式。

#3


1  

Use nested subpatterns. It starts simple, with the century:

使用嵌套的子模式。它开始简单,与世纪:

[01]\d\d\d-\d\d-\d\d|20

As for each digit to follow, use one of the following patterns; you may want to replace .* by an appropriate sequence of \d and -.

对于要遵循的每个数字,请使用以下模式之一;您可能希望用适当的\ d和 - 序列替换。*。

for 0:   (0
for 1:   (0.*|1
for 2:   ([01].*|2
for 3:   ([0-2].*|3
for 4:   ([0-3].*|4
for 5:   ([0-4].*|5
for 6:   ([0-5].*|6
for 7:   ([0-6].*|7
for 8:   ([0-7].*|8
for 9:   ([0-8].*|9

For the last digit, you only need the digit range, e.g.:

对于最后一位数,您只需要数字范围,例如:

[0-6]

Finally, all parentheses should be closed:

最后,所有括号都应该关闭:

)))))

In the example of 2014-02-17, this becomes:

在2014-02-17的例子中,这变为:

[01]\d\d\d-\d\d-\d\d|20
(0\d-\d\d-\d\d|1
([0-3]-\d\d-\d\d|4
-
(0
([01]-\d\d|2
-
(0\d|1
[0-6]
)))))

Now in one line:

现在在一行:

[01]\d\d\d-\d\d-\d\d|20(0\d-\d\d-\d\d|1([0-3]-\d\d-\d\d|4-(0([01]-\d\d|2-(0\d|1[0-6])))))

For VIM, let's not forget to escape (, ) and |:

对于VIM,让我们不要忘记逃避(,)和|:

[01]\d\d\d-\d\d-\d\d\|20\(0\d-\d\d-\d\d\|1\([0-3]-\d\d-\d\d\|4-\(0\([01]-\d\d\|2-\(0\d\|1[0-6]\)\)\)\)\)

Would be best to try and generate this (much like in FDinoff's answer), rather than write it yourself...

最好尝试生成这个(很像在FDinoff的答案中),而不是自己写的...

Update: Here is a sample AWK script to generate the correct regex for any date yyyy-mm-dd.

更新:这是一个示例AWK脚本,用于为任何日期yyyy-mm-dd生成正确的正则表达式。

#!/usr/bin/awk -f

BEGIN {                 # possible overrides for non-VIM users
    switch (digit) {
        case "ascii"     : digit = "[0-9]";     break;
        case "posix"     : digit = "[:digit:]"; break;
        default          : digit = "\\d";
    }
    switch (metachar) {
        case "unescaped" : escape = "";         break;
        default          : escape = "\\";
    }
}

/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]$/ {
    print BuildRegex($0);
}

function BuildRegex(s) {
    if (s ~ /^[1-9][^1-9]*$/) {
        regex = LessThanOnFirstDigit(s);
    }
    else {
        regex = substr(s, 1, 1) BuildRegex(substr(s, 2));    # recursive call
        if (s ~ /^[1-9]/) {
            regex = escape "(" LessThanOnFirstDigit(s) escape "|" regex escape ")";
        }
    }
    return regex;
}

function LessThanOnFirstDigit(s) {
    first = substr(s, 1, 1) - 1;
    rest = substr(s, 2);
    gsub(/[0-9]/, digit, rest);
    return (first ? "[0-" first "]" : "0") rest;
}

Call it like this:

这样叫:

echo 2014-02-17 | awk -f genregex.awk

Of course, you can write such a simple generator in any language you like. Would be nice to do it in Vimscript, but I have no experience with that, so I will leave that as a home assignment.

当然,你可以用你喜欢的任何语言编写这样一个简单的生成器。很高兴在Vimscript中做到这一点,但我对此没有经验,所以我将把它作为家庭作业。

#4


0  

If you wanted to search for all dates that were less than 2014-11-23, inclusive, you would use the following regex.

如果您要搜索小于2014-11-23(含)的所有日期,您将使用以下正则表达式。

2014-(?:[1-9]|1[0-1])-(?:[1-9]|1[0-9]|2[0-3])

for a better explanation of the regex visit regex101.com and paste the regex in. You can also test it by using that site.

为了更好地解释正则表达式访问regex101.com并粘贴正则表达式。您还可以使用该站点进行测试。

The basics of the regex are to search all dates that:

正则表达式的基础是搜索所有日期:

start with 2014-
either contain a single character from 1 - 9 
    or a 1 and a single character from 0 - 1, i.e. numbers from 1 - 11
finished by - and numbers from 1 - 23 done in the same style as the second term

#1


3  

I don't think there is anyway to do this easily in regex. For matching any date earlier than the current date you can use run the function below (Some of the stuff was stolen from benjifisher)

我认为在正则表达式中无论如何都不容易这样做。为了匹配当前日期之前的任何日期,您可以使用下面的功能(有些东西是从benjifisher偷来的)

function! Convert_to_char_class(cur) 
    if a:cur =~ '[2-9]'
        return '[0-' . (a:cur-1) . ']'
    endif
    return '0'
endfunction

function! Match_number_before(num)
    let branches = []
    let init = ''
    for i in range(len(a:num))
        if a:num[i] =~ '[1-9]'
            call add(branches, init . Convert_to_char_class(a:num[i]) . repeat('\d', len(a:num) - i - 1))
        endif 
        let init .= a:num[i]
    endfor
    return '\%(' . join(branches, '\|') .'\)'
endfunction

function! Match_date_before(date)
    if a:date !~ '\v\d{4}-\d{2}-\d{2}'
        echo "invalid date"
        return
    endif

    let branches =[]

    let parts = split(a:date, '-')
    call add(branches, Match_number_before(parts[0]) . '-\d\{2}-\d\{2}')
    call add(branches, parts[0] . '-' . Match_number_before(parts[1]) . '-\d\{2}')
    call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_before(parts[2]))

    return '\%(' . join(branches, '\|') .'\)'
endfunction

To use you the following to search for all matches before 2014-02-24.

在2014-02-24之前使用以下内容搜索所有匹配项。

/<C-r>=Match_date_before('2014-02-24')

You might be able to wrap it in a function to set the search register if you wanted to.

如果您愿意,您可以将其包装在函数中以设置搜索寄存器。

The generated regex for dates before 2014-02-24 is the following.

生成的2014-02-24之前的正则表达式如下。

\%(\%([0-1]\d\d\d\|200\d\|201[0-3]\)-\d\{2}-\d\{2}\|2014-\%(0[0-1]\)-\d\{2}\|2014-02-\%([0-1]\d\|2[0-3]\)\)

It does not do any validation of dates. It assumes if you are in that format you are a date.

它不会对日期进行任何验证。它假设你是那种格式,你是约会。


Equivalent set of functions for matching after the passed in date.

传入日期后匹配的等效函数集。

function! Convert_to_char_class_after(cur) 
    if a:cur =~ '[0-7]'
        return '[' . (a:cur+1) . '-9]'
    endif
    return '9'
endfunction

function! Match_number_after(num)
    let branches = []
    let init = ''
    for i in range(len(a:num))
        if a:num[i] =~ '[0-8]'
            call add(branches, init . Convert_to_char_class_after(a:num[i]) . repeat('\d', len(a:num) - i - 1))
        endif 
        let init .= a:num[i]
    endfor
    return '\%(' . join(branches, '\|') .'\)'
endfunction

function! Match_date_after(date)
    if a:date !~ '\v\d{4}-\d{2}-\d{2}'
        echo "invalid date"
        return
    endif

    let branches =[]

    let parts = split(a:date, '-')
    call add(branches, Match_number_after(parts[0]) . '-\d\{2}-\d\{2}')
    call add(branches, parts[0] . '-' . Match_number_after(parts[1]) . '-\d\{2}')
    call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_after(parts[2]))

    return '\%(' . join(branches, '\|') .'\)'
endfunction

The regex generated was

产生的正则表达式是

\%(\%([3-9]\d\d\d\|2[1-9]\d\d\|20[2-9]\d\|201[5-9]\)-\d\{2}-\d\{2}\|2014-\%([1-9]\d\|0[3-9]\)-\d\{2}\|2014-02-\%([3-9]\d\|2[5-9]\)\)

#2


3  

You do not say how you want to use this; are you sure that you really want a regular expression? Perhaps you could get away with

你没有说你想如何使用它;你确定你真的想要一个正则表达式吗?也许你可以逃脱

if DateCmp(date, '2014-02-24') < 0
  " ...
endif

In that case, try this function.

在这种情况下,请尝试此功能。

" Compare formatted date strings:
" @param String date1, date2
"   dates in YYYY-MM-DD format, e.g. '2014-02-24'
" @return Integer
"   negative, zero, or positive according to date1 < date2, date1 == date2, or
"   date1 > date2
function! DateCmp(date1, date2)
  let [year1, month1, day1] = split(a:date1, '-')
  let [year2, month2, day2] = split(a:date2, '-')
  if year1 != year2
    return year1 - year2
  elseif month1 != month2
    return month1 - month2
  else
    return day1 - day2
  endif
endfun

If you really want a regular expression, then try this:

如果你真的想要一个正则表达式,那么试试这个:

" Construct a pattern that matches a formatted date string if and only if the
" date is less than the input date.  Usage:
" :echo '2014-02-24' =~ DateLessRE('2014-03-12')
function! DateLessRE(date)
  let init = ''
  let branches = []
  for c in split(a:date, '\zs')
    if c =~ '[1-9]'
      call add(branches, init . '[0-' . (c-1) . ']')
    endif
    let init .= c
  endfor
  return '\d\d\d\d-\d\d-\d\d\&\%(' . join(branches, '\|') . '\)'
endfun

Does that count as a "simple" regex? One way to use it would be to type :g/ and then CRTL-R and = and then DateLessRE('2014-02-24') and Enter, followed by the rest of your command. In other words,

这算是一个“简单”的正则表达式吗?使用它的一种方法是键入:g /然后键入CRTL-R和=然后键入DateLessRE('2014-02-24')和Enter,然后输入其余的命令。换一种说法,

:g/<C-R>=DateLessRE('2014-02-24')<CR>/s/foo/bar

EDIT: I added a concat (:help /\&) that matches a complete "formatted date string". Now, there is no need to anchor the pattern.

编辑:我添加了一个concat(:help / \&),匹配完整的“格式化日期字符串”。现在,没有必要锚定模式。

#3


1  

Use nested subpatterns. It starts simple, with the century:

使用嵌套的子模式。它开始简单,与世纪:

[01]\d\d\d-\d\d-\d\d|20

As for each digit to follow, use one of the following patterns; you may want to replace .* by an appropriate sequence of \d and -.

对于要遵循的每个数字,请使用以下模式之一;您可能希望用适当的\ d和 - 序列替换。*。

for 0:   (0
for 1:   (0.*|1
for 2:   ([01].*|2
for 3:   ([0-2].*|3
for 4:   ([0-3].*|4
for 5:   ([0-4].*|5
for 6:   ([0-5].*|6
for 7:   ([0-6].*|7
for 8:   ([0-7].*|8
for 9:   ([0-8].*|9

For the last digit, you only need the digit range, e.g.:

对于最后一位数,您只需要数字范围,例如:

[0-6]

Finally, all parentheses should be closed:

最后,所有括号都应该关闭:

)))))

In the example of 2014-02-17, this becomes:

在2014-02-17的例子中,这变为:

[01]\d\d\d-\d\d-\d\d|20
(0\d-\d\d-\d\d|1
([0-3]-\d\d-\d\d|4
-
(0
([01]-\d\d|2
-
(0\d|1
[0-6]
)))))

Now in one line:

现在在一行:

[01]\d\d\d-\d\d-\d\d|20(0\d-\d\d-\d\d|1([0-3]-\d\d-\d\d|4-(0([01]-\d\d|2-(0\d|1[0-6])))))

For VIM, let's not forget to escape (, ) and |:

对于VIM,让我们不要忘记逃避(,)和|:

[01]\d\d\d-\d\d-\d\d\|20\(0\d-\d\d-\d\d\|1\([0-3]-\d\d-\d\d\|4-\(0\([01]-\d\d\|2-\(0\d\|1[0-6]\)\)\)\)\)

Would be best to try and generate this (much like in FDinoff's answer), rather than write it yourself...

最好尝试生成这个(很像在FDinoff的答案中),而不是自己写的...

Update: Here is a sample AWK script to generate the correct regex for any date yyyy-mm-dd.

更新:这是一个示例AWK脚本,用于为任何日期yyyy-mm-dd生成正确的正则表达式。

#!/usr/bin/awk -f

BEGIN {                 # possible overrides for non-VIM users
    switch (digit) {
        case "ascii"     : digit = "[0-9]";     break;
        case "posix"     : digit = "[:digit:]"; break;
        default          : digit = "\\d";
    }
    switch (metachar) {
        case "unescaped" : escape = "";         break;
        default          : escape = "\\";
    }
}

/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]$/ {
    print BuildRegex($0);
}

function BuildRegex(s) {
    if (s ~ /^[1-9][^1-9]*$/) {
        regex = LessThanOnFirstDigit(s);
    }
    else {
        regex = substr(s, 1, 1) BuildRegex(substr(s, 2));    # recursive call
        if (s ~ /^[1-9]/) {
            regex = escape "(" LessThanOnFirstDigit(s) escape "|" regex escape ")";
        }
    }
    return regex;
}

function LessThanOnFirstDigit(s) {
    first = substr(s, 1, 1) - 1;
    rest = substr(s, 2);
    gsub(/[0-9]/, digit, rest);
    return (first ? "[0-" first "]" : "0") rest;
}

Call it like this:

这样叫:

echo 2014-02-17 | awk -f genregex.awk

Of course, you can write such a simple generator in any language you like. Would be nice to do it in Vimscript, but I have no experience with that, so I will leave that as a home assignment.

当然,你可以用你喜欢的任何语言编写这样一个简单的生成器。很高兴在Vimscript中做到这一点,但我对此没有经验,所以我将把它作为家庭作业。

#4


0  

If you wanted to search for all dates that were less than 2014-11-23, inclusive, you would use the following regex.

如果您要搜索小于2014-11-23(含)的所有日期,您将使用以下正则表达式。

2014-(?:[1-9]|1[0-1])-(?:[1-9]|1[0-9]|2[0-3])

for a better explanation of the regex visit regex101.com and paste the regex in. You can also test it by using that site.

为了更好地解释正则表达式访问regex101.com并粘贴正则表达式。您还可以使用该站点进行测试。

The basics of the regex are to search all dates that:

正则表达式的基础是搜索所有日期:

start with 2014-
either contain a single character from 1 - 9 
    or a 1 and a single character from 0 - 1, i.e. numbers from 1 - 11
finished by - and numbers from 1 - 23 done in the same style as the second term