如何从linux命令行对文件进行排序?

Okay, now this is more a rant about Linux than a question, but maybe someone knows how to do what I want. I know this can be achieved using the sort command, but I want a better solution because getting that to work is about as easy as writing a C program to do the same thing.

好吧，这与其说是一个问题，不如说是一个关于Linux的咆哮，但也许有人知道如何去做我想做的事。我知道这可以通过使用sort命令实现，但是我想要一个更好的解决方案，因为让它工作就像编写一个C程序做同样的事情一样简单。

I have files, for arguments sake, lets say I have these files: (my files are the same I just have many more)

我有文件，为参数起见，假设我有这些文件:(我的文件是相同的，我只有更多)

file-10.xml
xml文件- 10.
file-20.xml
xml文件- 20.
file-100.xml
xml文件- 100.
file-k10.xml
file-k10.xml
file-k20.xml
file-k20.xml
file-k100.xml
file-k100.xml
file-M10.xml
file-M10.xml
file-M20.xml
file-M20.xml
file-M100.xml
file-M100.xml

Now this turns out to be the order I want them sorted in. Incidentally, this is the order in Windows that they are by default sorted into. That's nice. Windows groups consecutive numerical characters into one effective character which sorts alphabetically before letters.

这就是我要排序的顺序。顺便说一句，这是Windows中默认排序的顺序。这很好。Windows将连续的数字字符分组为一个有效字符，按字母顺序排列。

If I type ls at the linux command line, I get the following garbage. Notice the 20 is displaced. This is a bigger deal when I have hundreds of these files that I want to view in a report, in order.

如果我在linux命令行输入ls，就会得到下面的垃圾。注意20被移位了。当我有数百个我想要在报告中查看的文件时，这是一件大事。

file-100.xml
xml文件- 100.
file-10.xml
xml文件- 10.
file-20.xml
xml文件- 20.
file-k100.xml
file-k100.xml
file-k10.xml
file-k10.xml
file-k20.xml
file-k20.xml
file-M100.xml
file-M100.xml
file-M10.xml
file-M10.xml
file-M20.xml
file-M20.xml

I can use ls -1 | sort -n -k 1.6 to get the ones without 'k' or 'M' correct...

我可以使用ls -1 |排序-n -k 1.6来得到没有“k”或“M”正确的…

file-k100.xml
file-k100.xml
file-k10.xml
file-k10.xml
file-k20.xml
file-k20.xml
file-M100.xml
file-M100.xml
file-M10.xml
file-M10.xml
file-M20.xml
file-M20.xml
file-10.xml
xml文件- 10.
file-20.xml
xml文件- 20.
file-100.xml
xml文件- 100.

I can use ls -1 | sort -n -k 1.7 to get none of it correct

我可以使用ls -1 |排序-n -k -1。7来得到没有一个是正确的

file-100.xml
xml文件- 100.
file-10.xml
xml文件- 10.
file-20.xml
xml文件- 20.
file-k10.xml
file-k10.xml
file-M10.xml
file-M10.xml
file-k20.xml
file-k20.xml
file-M20.xml
file-M20.xml
file-k100.xml
file-k100.xml
file-M100.xml
file-M100.xml

Okay, fine. Let's really get it right. ls -1 | grep "file-[0-9]*\.xml" | sort -n -k1.6 && ls -1 file-k*.xml | sort -n -k1.7 && ls -1 file-M*.xml | sort -n -k1.7

好的,很好。让我们把它做对。l -1 | grep文件-[0-9]\。xml“|排序-n -k1.6 & ls -1文件-k*”。xml | sort -n -k1.7 && ls -1文件- m *。xml | sort -n -k1.7。

file-10.xml
xml文件- 10.
file-20.xml
xml文件- 20.
file-100.xml
xml文件- 100.
file-k10.xml
file-k10.xml
file-k20.xml
file-k20.xml
file-k100.xml
file-k100.xml
file-M10.xml
file-M10.xml
file-M20.xml
file-M20.xml
file-M100.xml
file-M100.xml

Whew! Boy glad the "power of the linux command line" saved me there. (This isn't practical for my situation, because instead of ls -1 I have a command that is another line or two long)

唷!男孩很高兴“linux命令行的力量”救了我。(这对我的情况不太实际，因为我的命令不是ls -1，而是另一行或两行)

Now, the Windows behavior is simple, elegant, and does what you want it to do 99% of the time. Why can't I have that in linux? Why oh why does sort not have a "automagic sort numbers in a way that doesn't make me bang head into wall" switch?

现在，Windows的行为是简单、优雅的，99%的情况下都是你想要的。为什么我不能在linux中使用它呢?为什么排序没有一个“自动排序的数字不会让我撞到墙上”的开关?

Here's the pseudo-code for C++:

下面是c++的伪代码:

bool compare_two_strings_to_avoid_head_injury(string a, string b)
{
    string::iterator ai = a.begin();
    string::iterator bi = b.begin();
    for(; ai != a.end() && bi != b.end(); ai++, bi++)
    {
        if (*ai is numerical)
            gobble up the number incrementing ai past numerical chars;
        if (*bi is numerical)
            gobble up the number incrementing bi past numerical chars;
        actually compare *ai and *bi and/or the gobbled up number(s) here
            to determine if we need to compare more chars or can return the 
            answer now;
    }
    return something here;
}

Was that so hard? Can someone put this in sort and send me a copy? Please?

是那么难吗?有人能把这个整理好发给我一份吗?好吗?

3 个解决方案

#1

This would be my first thought:

这将是我的第一个想法:

ls -1 | sed 's/\-\([kM]\)\?\([0-9]\{2\}\)\./-\10\2./' | sort | sed 's/0\([0-9]\{2\}\)/\1/'

Basically I just use sed to pad the number with zeros and then use it again afterwards to strip off the leading zero.

基本上我只是用sed来填充0，然后再用它来除去前导零。

I don't know if it might be quicker in Perl.

我不知道在Perl中是否会更快。

#2

Try sort --version-sort -f

尝试——version-sort - f

file-10.xml
xml文件- 10.
file-20.xml
xml文件- 20.
file-100.xml
xml文件- 100.
file-k10.xml
file-k10.xml
file-k20.xml
file-k20.xml
file-k100.xml
file-k100.xml
file-M10.xml
file-M10.xml
file-M20.xml
file-M20.xml
file-M100.xml
file-M100.xml

The -f option is to ignore case (otherwise, it would put the k's and M's in the wrong order in this example). However, I don't think sort isn't properly interpreting the letters k and M as thousands and millions, if that was your goal - its just alphabetical order.

-f选项是忽略case(否则，在本例中，它将使k和M处于错误的顺序)。然而，我不认为sort不能正确地把k和M这两个字母解释成千千千万万，如果那是你的目标——只是字母顺序。

#3

ls -1v will get you pretty close. It just sorts all capital letters before lower case.

ls -1v会让你很接近。它只是把所有大写字母排列在小写字母之前。

#1