I want to do this:
我想做这个:
findstr /s /c:some-symbol *
or the grep equivalent
或grep等价物
grep -R some-symbol *
but I need the utility to autodetect files encoded in UTF-16 (and friends) and search them appropriately. My files even have the byte-ordering mark FFEE in them so I'm not even looking for heroic autodetection.
但我需要该实用程序来自动检测以UTF-16(和朋友)编码的文件并适当地搜索它们。我的文件甚至还有字节排序标记FFEE,所以我甚至都没有寻找英雄的自动检测。
Any suggestions?
I'm referring to Windows Vista and XP.
我指的是Windows Vista和XP。
7 个解决方案
#1
Thanks for the suggestions. I was referring to Windows Vista and XP.
谢谢你的建议。我指的是Windows Vista和XP。
I also discovered this workaround, using free Sysinternals strings.exe
:
我还发现了这个解决方法,使用免费的Sysinternals strings.exe:
C:\> strings -s -b dir_tree_to_search | grep regexp
Strings.exe
extracts all of the strings it finds (from binaries, but works fine with text files too) and prepends each result with a filename and colon, so take that into account in the regexp (or use cut or another step in the pipeline). The -s
makes it do a recursive extraction and -b
just suppresses the banner message.
Strings.exe提取它找到的所有字符串(来自二进制文件,但也适用于文本文件)并使用文件名和冒号预先添加每个结果,因此在regexp中考虑到这一点(或使用cut或管道中的其他步骤) )。 -s使它进行递归提取,-b只是抑制横幅消息。
Ultimately I'm still kind of surprised that the flagship searching utilities Gnu grep
and findstr
don't handle Unicode character encodings natively.
最终,我仍然感到惊讶的是,旗舰搜索实用程序Gnu grep和findstr本身不处理Unicode字符编码。
#2
On Windows, you can also use find.exe.
在Windows上,您还可以使用find.exe。
find /i /n "YourSearchString" *.*
The only problem is this prints file names followed by matches. You may filter them by piping to findstr
唯一的问题是打印文件名后跟匹配。您可以通过管道到findstr来过滤它们
find /i /n "YourSearchString" *.* | findstr /i "YourSearchString"
#3
findstr /s /c:some-symbol *
can be replaced with the following character encoding aware command:
可以使用以下字符编码感知命令替换:
for /r %f in (*) do @find /i /n "some-symbol" "%f"
#4
A workaround is to convert your UTF-16 to ASCII or ANSI
解决方法是将UTF-16转换为ASCII或ANSI
TYPE UTF-16.txt > ASCII.txt
Then you can use FINDSTR.
然后你可以使用FINDSTR。
FINDSTR object ASCII.txt
#5
In higher versions of Windows, UTF-16 is supported out-of-box. If not, try changing active code page by chcp
command.
在更高版本的Windows中,UTF-16支持开箱即用。如果没有,请尝试通过chcp命令更改活动代码页。
In my case when using findstr
alone was failing for UTF-16 files, however it worked with type
:
在我的情况下,单独使用findstr失败的UTF-16文件,但它适用于类型:
type *.* | findstr /s /c:some-symbol
#6
According to this blog article by Damon Cortesi grep doesn't work with UTF-16 files, as you found out. However, it presents this work-around:
根据Damon Cortesi撰写的这篇博客文章,grep与UTF-16文件不兼容,正如您所发现的那样。但是,它介绍了这种解决方法:
for f in `find . -type f | xargs -I {} file {} | grep UTF-16 | cut -f1 -d\:`
do iconv -f UTF-16 -t UTF-8 $f | grep -iH --label=$f ${GREP_FOR}
done
This is obviously for Unix, not sure what the equivalent on Windows would be. The author of that article also provides a shell-script to do the above that you can find on github here.
这显然是针对Unix的,不确定Windows上的等价物是什么。该文章的作者还提供了一个shell脚本来执行上述操作,您可以在github上找到它。
This only greps files that are UTF-16. You'd also grep your ASCII files the normal way.
这只是greps UTF-16文件。你也可以正常方式grep你的ASCII文件。
#7
You didn't say which platform you want to do this on.
您没有说明要在哪个平台上执行此操作。
On Windows, you could use PowerGREP, which automatically detects Unicode files that start with a byte order mark. (There's also an option to auto-detect files without a BOM. The auto-detection is very reliable for UTF-8, but limited for UTF-16.)
在Windows上,您可以使用PowerGREP,它会自动检测以字节顺序标记开头的Unicode文件。 (还有一个选项可以自动检测没有BOM的文件。自动检测对于UTF-8非常可靠,但仅限于UTF-16。)
#1
Thanks for the suggestions. I was referring to Windows Vista and XP.
谢谢你的建议。我指的是Windows Vista和XP。
I also discovered this workaround, using free Sysinternals strings.exe
:
我还发现了这个解决方法,使用免费的Sysinternals strings.exe:
C:\> strings -s -b dir_tree_to_search | grep regexp
Strings.exe
extracts all of the strings it finds (from binaries, but works fine with text files too) and prepends each result with a filename and colon, so take that into account in the regexp (or use cut or another step in the pipeline). The -s
makes it do a recursive extraction and -b
just suppresses the banner message.
Strings.exe提取它找到的所有字符串(来自二进制文件,但也适用于文本文件)并使用文件名和冒号预先添加每个结果,因此在regexp中考虑到这一点(或使用cut或管道中的其他步骤) )。 -s使它进行递归提取,-b只是抑制横幅消息。
Ultimately I'm still kind of surprised that the flagship searching utilities Gnu grep
and findstr
don't handle Unicode character encodings natively.
最终,我仍然感到惊讶的是,旗舰搜索实用程序Gnu grep和findstr本身不处理Unicode字符编码。
#2
On Windows, you can also use find.exe.
在Windows上,您还可以使用find.exe。
find /i /n "YourSearchString" *.*
The only problem is this prints file names followed by matches. You may filter them by piping to findstr
唯一的问题是打印文件名后跟匹配。您可以通过管道到findstr来过滤它们
find /i /n "YourSearchString" *.* | findstr /i "YourSearchString"
#3
findstr /s /c:some-symbol *
can be replaced with the following character encoding aware command:
可以使用以下字符编码感知命令替换:
for /r %f in (*) do @find /i /n "some-symbol" "%f"
#4
A workaround is to convert your UTF-16 to ASCII or ANSI
解决方法是将UTF-16转换为ASCII或ANSI
TYPE UTF-16.txt > ASCII.txt
Then you can use FINDSTR.
然后你可以使用FINDSTR。
FINDSTR object ASCII.txt
#5
In higher versions of Windows, UTF-16 is supported out-of-box. If not, try changing active code page by chcp
command.
在更高版本的Windows中,UTF-16支持开箱即用。如果没有,请尝试通过chcp命令更改活动代码页。
In my case when using findstr
alone was failing for UTF-16 files, however it worked with type
:
在我的情况下,单独使用findstr失败的UTF-16文件,但它适用于类型:
type *.* | findstr /s /c:some-symbol
#6
According to this blog article by Damon Cortesi grep doesn't work with UTF-16 files, as you found out. However, it presents this work-around:
根据Damon Cortesi撰写的这篇博客文章,grep与UTF-16文件不兼容,正如您所发现的那样。但是,它介绍了这种解决方法:
for f in `find . -type f | xargs -I {} file {} | grep UTF-16 | cut -f1 -d\:`
do iconv -f UTF-16 -t UTF-8 $f | grep -iH --label=$f ${GREP_FOR}
done
This is obviously for Unix, not sure what the equivalent on Windows would be. The author of that article also provides a shell-script to do the above that you can find on github here.
这显然是针对Unix的,不确定Windows上的等价物是什么。该文章的作者还提供了一个shell脚本来执行上述操作,您可以在github上找到它。
This only greps files that are UTF-16. You'd also grep your ASCII files the normal way.
这只是greps UTF-16文件。你也可以正常方式grep你的ASCII文件。
#7
You didn't say which platform you want to do this on.
您没有说明要在哪个平台上执行此操作。
On Windows, you could use PowerGREP, which automatically detects Unicode files that start with a byte order mark. (There's also an option to auto-detect files without a BOM. The auto-detection is very reliable for UTF-8, but limited for UTF-16.)
在Windows上,您可以使用PowerGREP,它会自动检测以字节顺序标记开头的Unicode文件。 (还有一个选项可以自动检测没有BOM的文件。自动检测对于UTF-8非常可靠,但仅限于UTF-16。)