查找目录中每个扩展的文件数

时间:2022-02-07 10:31:56

I want to count the number of files for each extension in a directory as well as the files without extension.

我想计算目录中每个扩展名的文件数以及没有扩展名的文件。

I have tried a few options, but I haven't found a working solution yet:

我尝试了一些选项,但我还没有找到一个可行的解决方案:

  • find "$folder" -type f | sed 's/.*\.//' | sort | uniq -c is an option but doesn't work if there is no file extension. I need to know how many files do not have an extension.

    找到“$ folder”-type f | sed's /.*\。//'|排序| uniq -c是一个选项,但如果没有文件扩展名则不起作用。我需要知道有多少文件没有扩展名。

  • I have also tried a find loop into an array and then sum the results, but at this time that code throws an undeclared variable error, but only outside of the loop:

    我还尝试了一个数组的查找循环,然后对结果求和,但此时代码抛出一个未声明的变量错误,但只在循环之外:

    declare -a arr
    arr=()
    echo ${arr[@]}
    

    This throws an undeclared variable, as well as once the find loop completes.

    这将抛出一个未声明的变量,以及一旦find循环完成。

5 个解决方案

#1


6  

find "$path" -type f | sed -e '/.*\/[^\/]*\.[^\/]*$/!s/.*/(none)/' -e 's/.*\.//' | LC_COLLATE=C sort | uniq -c

Explanation:

  • find "$path" -type f get a recursive listing of all the files on the "$path" folder.
  • 找到“$ path”-type f获取“$ path”文件夹中所有文件的递归列表。
  • sed -e '/.*\/[^\/]*\.[^\/]*$/!s/.*/(none)/' -e 's/.*\.//' regular expressions:
    • /.*\/[^\/]*\.[^\/]*$/!s/.*/(none)/ replace all the files without extension by (none).
    • /.*\/[^\/]*\。[^\/]*$/!s /.*/(none)/将所有没有扩展名的文件替换为(无)。
    • s/.*\.// get the extension of the remaining files.
    • s /.* \ .//获取剩余文件的扩展名。
  • sed -e'/.* \ / [^ \ /] *。[^ \ /] * $ /!s /。* /(无)/'-e'/.* \ .//'正则表达式:/.* \ / [^ \\] *。[^ \ /] * $ /!s /。* /(无)/替换所有没有扩展名的文件(无)。 s /.* \ .//获取剩余文件的扩展名。
  • LC_COLLATE=C sort sort the result, keeping the symbols at the top.
  • LC_COLLATE = C排序结果,将符号保持在顶部。
  • uniq -c count the number of repeated entries.
  • uniq -c计算重复条目的数量。

#2


5  

If you have GNU awk, you could do something like

如果你有GNU awk,你可以做类似的事情

printf '%s\0' * | gawk 'BEGIN{RS="\0"; FS="."; OFS="\t"} 
  {a[(NF>1 ? $NF : "(none)")]++} 
  END{for(i in a) print a[i],i}
'

i.e. construct / increment an associative array keyed on the last . separated field, or some arbitrary fixed string such as (none) if there is no extension.

即构造/增加一个键在最后一个上的关联数组。如果没有扩展名,则为分隔字段,或某些任意固定字符串,如(none)。

mawk doesn't seem to allow a null-byte record separator - you could use mawk with the default newline separator if you are confident that you don't need to deal with newlines in your file names:

mawk似乎不允许使用空字节记录分隔符 - 如果您确信不需要处理文件名中的换行符,则可以将mawk与默认换行符分隔符一起使用:

printf '%s\n' * | mawk 'BEGIN{FS="."; OFS="\t"} {a[(NF>1 ? $NF : "(none)")]++} END{for(i in a) print a[i],i}'

#3


5  

Using Python:

使用Python:

import os
from collections import Counter
from pprint import pprint

lst = []
for file in os.listdir('./'):
        name, ext = os.path.splitext(file)
        lst.append(ext)

pprint(Counter(lst))

The output:

输出:

Counter({'': 7,
         '.png': 4,
         '.mp3': 3,
         '.jpg': 3,
         '.mkv': 3,
         '.py': 1,
         '.swp': 1,
         '.sh': 1})

#4


2  

With basic /bin/sh or even bash the task can be a little difficult, but as you can see in other answers the tools that can work on aggregate data can deal with such task particularly easy. One such tool would be sqlite database.

使用basic / bin / sh甚至bash,任务可能有点困难,但正如您在其他答案中所看到的,可以处理聚合数据的工具可以轻松处理此类任务。一个这样的工具将是sqlite数据库。

The very simple process to use sqlite database would be to create a .csv file with two fields: file name and extension. Later sqlite can use simple aggregate statement COUNT() with GROUP BY ext to perform counting of files based on extension field

使用sqlite数据库的非常简单的过程是创建一个包含两个字段的.csv文件:文件名和扩展名。后来的sqlite可以使用简单的聚合语句COUNT()和GROUP BY ext来根据扩展字段执行文件计数

$ { printf "file,ext\n"; find -type f -exec sh -c 'f=${1##*/};printf "%s,%s\n" "${1}" "${1##*.}"' sh {} \; ; }  > files.csv
$ sqlite3 <<EOF
> .mode csv
> .import ./files.csv files_tb
> SELECT ext,COUNT(file) FROM files_tb GROUP BY ext;
> EOF
csv,1
mp3,6
txt,1
wav,27

#5


2  

Using PowerShell if that's an option:

使用PowerShell,如果这是一个选项:

Get-ChildItem -File | Group-Object Extension -NoElement

or shorter, using aliases:

或更短,使用别名:

ls -file | group -n Extension

#1


6  

find "$path" -type f | sed -e '/.*\/[^\/]*\.[^\/]*$/!s/.*/(none)/' -e 's/.*\.//' | LC_COLLATE=C sort | uniq -c

Explanation:

  • find "$path" -type f get a recursive listing of all the files on the "$path" folder.
  • 找到“$ path”-type f获取“$ path”文件夹中所有文件的递归列表。
  • sed -e '/.*\/[^\/]*\.[^\/]*$/!s/.*/(none)/' -e 's/.*\.//' regular expressions:
    • /.*\/[^\/]*\.[^\/]*$/!s/.*/(none)/ replace all the files without extension by (none).
    • /.*\/[^\/]*\。[^\/]*$/!s /.*/(none)/将所有没有扩展名的文件替换为(无)。
    • s/.*\.// get the extension of the remaining files.
    • s /.* \ .//获取剩余文件的扩展名。
  • sed -e'/.* \ / [^ \ /] *。[^ \ /] * $ /!s /。* /(无)/'-e'/.* \ .//'正则表达式:/.* \ / [^ \\] *。[^ \ /] * $ /!s /。* /(无)/替换所有没有扩展名的文件(无)。 s /.* \ .//获取剩余文件的扩展名。
  • LC_COLLATE=C sort sort the result, keeping the symbols at the top.
  • LC_COLLATE = C排序结果,将符号保持在顶部。
  • uniq -c count the number of repeated entries.
  • uniq -c计算重复条目的数量。

#2


5  

If you have GNU awk, you could do something like

如果你有GNU awk,你可以做类似的事情

printf '%s\0' * | gawk 'BEGIN{RS="\0"; FS="."; OFS="\t"} 
  {a[(NF>1 ? $NF : "(none)")]++} 
  END{for(i in a) print a[i],i}
'

i.e. construct / increment an associative array keyed on the last . separated field, or some arbitrary fixed string such as (none) if there is no extension.

即构造/增加一个键在最后一个上的关联数组。如果没有扩展名,则为分隔字段,或某些任意固定字符串,如(none)。

mawk doesn't seem to allow a null-byte record separator - you could use mawk with the default newline separator if you are confident that you don't need to deal with newlines in your file names:

mawk似乎不允许使用空字节记录分隔符 - 如果您确信不需要处理文件名中的换行符,则可以将mawk与默认换行符分隔符一起使用:

printf '%s\n' * | mawk 'BEGIN{FS="."; OFS="\t"} {a[(NF>1 ? $NF : "(none)")]++} END{for(i in a) print a[i],i}'

#3


5  

Using Python:

使用Python:

import os
from collections import Counter
from pprint import pprint

lst = []
for file in os.listdir('./'):
        name, ext = os.path.splitext(file)
        lst.append(ext)

pprint(Counter(lst))

The output:

输出:

Counter({'': 7,
         '.png': 4,
         '.mp3': 3,
         '.jpg': 3,
         '.mkv': 3,
         '.py': 1,
         '.swp': 1,
         '.sh': 1})

#4


2  

With basic /bin/sh or even bash the task can be a little difficult, but as you can see in other answers the tools that can work on aggregate data can deal with such task particularly easy. One such tool would be sqlite database.

使用basic / bin / sh甚至bash,任务可能有点困难,但正如您在其他答案中所看到的,可以处理聚合数据的工具可以轻松处理此类任务。一个这样的工具将是sqlite数据库。

The very simple process to use sqlite database would be to create a .csv file with two fields: file name and extension. Later sqlite can use simple aggregate statement COUNT() with GROUP BY ext to perform counting of files based on extension field

使用sqlite数据库的非常简单的过程是创建一个包含两个字段的.csv文件:文件名和扩展名。后来的sqlite可以使用简单的聚合语句COUNT()和GROUP BY ext来根据扩展字段执行文件计数

$ { printf "file,ext\n"; find -type f -exec sh -c 'f=${1##*/};printf "%s,%s\n" "${1}" "${1##*.}"' sh {} \; ; }  > files.csv
$ sqlite3 <<EOF
> .mode csv
> .import ./files.csv files_tb
> SELECT ext,COUNT(file) FROM files_tb GROUP BY ext;
> EOF
csv,1
mp3,6
txt,1
wav,27

#5


2  

Using PowerShell if that's an option:

使用PowerShell,如果这是一个选项:

Get-ChildItem -File | Group-Object Extension -NoElement

or shorter, using aliases:

或更短,使用别名:

ls -file | group -n Extension