I need to convert all text to lowercase, but not using the traditional "tr" command because it does not handle UTF-8 languages properly.
我需要将所有文本转换为小写,但不使用传统的“tr”命令,因为它不能正确处理UTF-8语言。
Is there a nice way to do that? I need some UNIX filter so I can process this in a pipe.
有一个很好的方法吗?我需要一些UNIX过滤器,所以我可以在管道中处理它。
2 个解决方案
#1
12
Gnu sed should be able to handle unicode. Try
Gnu sed应该能够处理unicode。尝试
$ echo 'Some StrAngÉ LeTTeRs 123' | sed -e 's/./\L\0/g'
some strangé letters 123
#2
3
If you can use Python then such code can help you:
如果你可以使用Python那么这样的代码可以帮助你:
import sys
import codecs
utf8input = codecs.getreader("utf-8")(sys.stdin)
utf8output = codecs.getwriter("utf-8")(sys.stdout)
utf8output.write(utf8input.read().lower())
On my Windows machine (sorry :) I can use it as filter:
在我的Windows机器上(抱歉:)我可以将它用作过滤器:
cat big.txt | python tolowerutf8.py > lower.txt3
#1
12
Gnu sed should be able to handle unicode. Try
Gnu sed应该能够处理unicode。尝试
$ echo 'Some StrAngÉ LeTTeRs 123' | sed -e 's/./\L\0/g'
some strangé letters 123
#2
3
If you can use Python then such code can help you:
如果你可以使用Python那么这样的代码可以帮助你:
import sys
import codecs
utf8input = codecs.getreader("utf-8")(sys.stdin)
utf8output = codecs.getwriter("utf-8")(sys.stdout)
utf8output.write(utf8input.read().lower())
On my Windows machine (sorry :) I can use it as filter:
在我的Windows机器上(抱歉:)我可以将它用作过滤器:
cat big.txt | python tolowerutf8.py > lower.txt3