
时间:2022-07-31 13:53:43

I want to do this:


 findstr /s /c:some-symbol *

or the grep equivalent


 grep -R some-symbol *

but I need the utility to autodetect files encoded in UTF-16 (and friends) and search them appropriately. My files even have the byte-ordering mark FFEE in them so I'm not even looking for heroic autodetection.


Any suggestions?

I'm referring to Windows Vista and XP.

我指的是Windows Vista和XP。

7 个解决方案


Thanks for the suggestions. I was referring to Windows Vista and XP.

谢谢你的建议。我指的是Windows Vista和XP。

I also discovered this workaround, using free Sysinternals strings.exe:

我还发现了这个解决方法,使用免费的Sysinternals strings.exe:

C:\> strings -s -b dir_tree_to_search | grep regexp 

Strings.exe extracts all of the strings it finds (from binaries, but works fine with text files too) and prepends each result with a filename and colon, so take that into account in the regexp (or use cut or another step in the pipeline). The -s makes it do a recursive extraction and -b just suppresses the banner message.

Strings.exe提取它找到的所有字符串(来自二进制文件,但也适用于文本文件)并使用文件名和冒号预先添加每个结果,因此在regexp中考虑到这一点(或使用cut或管道中的其他步骤) )。 -s使它进行递归提取,-b只是抑制横幅消息。

Ultimately I'm still kind of surprised that the flagship searching utilities Gnu grep and findstr don't handle Unicode character encodings natively.

最终,我仍然感到惊讶的是,旗舰搜索实用程序Gnu grep和findstr本身不处理Unicode字符编码。


On Windows, you can also use find.exe.


find /i /n "YourSearchString" *.*

The only problem is this prints file names followed by matches. You may filter them by piping to findstr


find /i /n "YourSearchString" *.* | findstr /i "YourSearchString"


findstr /s /c:some-symbol *

can be replaced with the following character encoding aware command:


for /r %f in (*) do @find /i /n "some-symbol" "%f"


A workaround is to convert your UTF-16 to ASCII or ANSI


TYPE UTF-16.txt > ASCII.txt

Then you can use FINDSTR.


FINDSTR object ASCII.txt


In higher versions of Windows, UTF-16 is supported out-of-box. If not, try changing active code page by chcp command.


In my case when using findstr alone was failing for UTF-16 files, however it worked with type:


type *.* | findstr /s /c:some-symbol


According to this blog article by Damon Cortesi grep doesn't work with UTF-16 files, as you found out. However, it presents this work-around:

根据Damon Cortesi撰写的这篇博客文章,grep与UTF-16文件不兼容,正如您所发现的那样。但是,它介绍了这种解决方法:

for f in `find . -type f | xargs -I {} file {} | grep UTF-16 | cut -f1 -d\:`
        do iconv -f UTF-16 -t UTF-8 $f | grep -iH --label=$f ${GREP_FOR}

This is obviously for Unix, not sure what the equivalent on Windows would be. The author of that article also provides a shell-script to do the above that you can find on github here.


This only greps files that are UTF-16. You'd also grep your ASCII files the normal way.

这只是greps UTF-16文件。你也可以正常方式grep你的ASCII文件。


You didn't say which platform you want to do this on.


On Windows, you could use PowerGREP, which automatically detects Unicode files that start with a byte order mark. (There's also an option to auto-detect files without a BOM. The auto-detection is very reliable for UTF-8, but limited for UTF-16.)

在Windows上,您可以使用PowerGREP,它会自动检测以字节顺序标记开头的Unicode文件。 (还有一个选项可以自动检测没有BOM的文件。自动检测对于UTF-8非常可靠,但仅限于UTF-16。)


Thanks for the suggestions. I was referring to Windows Vista and XP.

谢谢你的建议。我指的是Windows Vista和XP。

I also discovered this workaround, using free Sysinternals strings.exe:

我还发现了这个解决方法,使用免费的Sysinternals strings.exe:

C:\> strings -s -b dir_tree_to_search | grep regexp 

Strings.exe extracts all of the strings it finds (from binaries, but works fine with text files too) and prepends each result with a filename and colon, so take that into account in the regexp (or use cut or another step in the pipeline). The -s makes it do a recursive extraction and -b just suppresses the banner message.

Strings.exe提取它找到的所有字符串(来自二进制文件,但也适用于文本文件)并使用文件名和冒号预先添加每个结果,因此在regexp中考虑到这一点(或使用cut或管道中的其他步骤) )。 -s使它进行递归提取,-b只是抑制横幅消息。

Ultimately I'm still kind of surprised that the flagship searching utilities Gnu grep and findstr don't handle Unicode character encodings natively.

最终,我仍然感到惊讶的是,旗舰搜索实用程序Gnu grep和findstr本身不处理Unicode字符编码。


On Windows, you can also use find.exe.


find /i /n "YourSearchString" *.*

The only problem is this prints file names followed by matches. You may filter them by piping to findstr


find /i /n "YourSearchString" *.* | findstr /i "YourSearchString"


findstr /s /c:some-symbol *

can be replaced with the following character encoding aware command:


for /r %f in (*) do @find /i /n "some-symbol" "%f"


A workaround is to convert your UTF-16 to ASCII or ANSI


TYPE UTF-16.txt > ASCII.txt

Then you can use FINDSTR.


FINDSTR object ASCII.txt


In higher versions of Windows, UTF-16 is supported out-of-box. If not, try changing active code page by chcp command.


In my case when using findstr alone was failing for UTF-16 files, however it worked with type:


type *.* | findstr /s /c:some-symbol


According to this blog article by Damon Cortesi grep doesn't work with UTF-16 files, as you found out. However, it presents this work-around:

根据Damon Cortesi撰写的这篇博客文章,grep与UTF-16文件不兼容,正如您所发现的那样。但是,它介绍了这种解决方法:

for f in `find . -type f | xargs -I {} file {} | grep UTF-16 | cut -f1 -d\:`
        do iconv -f UTF-16 -t UTF-8 $f | grep -iH --label=$f ${GREP_FOR}

This is obviously for Unix, not sure what the equivalent on Windows would be. The author of that article also provides a shell-script to do the above that you can find on github here.


This only greps files that are UTF-16. You'd also grep your ASCII files the normal way.

这只是greps UTF-16文件。你也可以正常方式grep你的ASCII文件。


You didn't say which platform you want to do this on.


On Windows, you could use PowerGREP, which automatically detects Unicode files that start with a byte order mark. (There's also an option to auto-detect files without a BOM. The auto-detection is very reliable for UTF-8, but limited for UTF-16.)

在Windows上,您可以使用PowerGREP,它会自动检测以字节顺序标记开头的Unicode文件。 (还有一个选项可以自动检测没有BOM的文件。自动检测对于UTF-8非常可靠,但仅限于UTF-16。)