用于从文件名中删除特殊字符的批处理文件脚本(Windows)

时间:2021-12-12 02:31:53

I have a large set of files, some of which contain special characters in the filename (e.g. ä,ö,%, and others). I'd like a script file to iterate over these files and rename them removing the special characters. I don't really mind what it does, but it could replace them with underscores for example e.g.

我有一大堆文件,其中一些文件包含特殊字符(例如ä,ö,%和其他)。我想要一个脚本文件迭代这些文件并重命名它们删除特殊字符。我真的不介意它做什么,但它可以用下划线代替它们,例如

Störung%20.doc would be renamed to St_rung_20.doc

Störung%20.doc将重命名为St_rung_20.doc

In order of preference:

按优先顺序排列:

  1. A Windiws batch file
  2. Windiws批处理文件

  3. A Windows script file to run with cscript (vbs)
  4. 使用cscript(vbs)运行的Windows脚本文件

  5. A third party piece of software that can be run from the command-line (i.e. no user interaction required)
  6. 可以从命令行运行的第三方软件(即不需要用户交互)

  7. Another language script file, for which I'd have to install an additional script engine
  8. 另一个语言脚本文件,我必须安装一个额外的脚本引擎

Background: I'm trying to encrypt these file with GnuPG on Windows but it doesn't seem to handle special characters in filenames with the --encrypt-files option.

背景:我正在尝试使用Windows上的GnuPG加密这些文件,但它似乎没有使用--encrypt-files选项处理文件名中的特殊字符。

5 个解决方案

#1


4  

Have you tried setting cmd.exe into another codepage before you feed the file names to gnupg? Issue chcp 65001 to set cmd.exe to Unicode beforehand and try again.

在将文件名提供给gnupg之前,您是否尝试将cmd.exe设置到另一个代码页中?发出chcp 65001以预先将cmd.exe设置为Unicode,然后重试。

If that fails, the following VBScript would do it:

如果失败,以下VBScript将执行此操作:

Option Explicit

Dim fso: Set fso = CreateObject("Scripting.FileSystemObject")
Dim invalidChars: Set invalidChars = New RegExp

' put all characters that you want to strip inside the brackets
invalidChars.Pattern =  "[äöüß&%]"
invalidChars.IgnoreCase = True
invalidChars.Global = True

If WScript.Arguments.Unnamed.Count = 0 Then
  WScript.Echo "Please give folder name as argument 1."
  WScript.Quit 1
End If

Recurse fso.GetFolder(WScript.Arguments.Unnamed(0))

Sub Recurse(f)
  Dim item

  For Each item In f.SubFolders
    Recurse item
    Sanitize item
  Next
  For Each item In f.Files
    Sanitize item
  Next
End Sub

Sub Sanitize(folderOrFile)
  Dim newName: newName = invalidChars.Replace(folderOrFile.Name, "_")
  If folderOrFile.Name = newName Then Exit Sub
  WScript.Echo folderOrFile.Name, " -> ", newName
  folderOrFile.Name = newName 
End Sub

call it like this:

这样叫:

cscript replace.vbs "c:\path\to\my\files"

You can also drag&drop a folder onto it in Windows Explorer.

您还可以在Windows资源管理器中将文件夹拖放到该文件夹​​中。

#2


3  

From http://www.robvanderwoude.com/bht.html:

use NT's SET's string substitution to replace or remove characters anywhere in a string:

使用NT的SET的字符串替换来替换或删除字符串中的任何字符:

SET STRING=[ABCDEFG]
SET STRING=%STRING:[=%
SET STRING=%STRING:]=%
ECHO String: %STRING%

will display 
String: ABCDEFG

SET STRING=[ABCDEFG]
SET STRING=%STRING:[=(%
SET STRING=%STRING:]=)%
ECHO String: %STRING%

will display 
String: (ABCDEFG)

SET STRING=[ABCDEFG]
SET STRING=%STRING:~1,7%
ECHO String: %STRING%

will display 
String: ABCDEFG

If you use this attempt, you will have to process each character you want to replace (e.g. Ä,Ö,Ü,ä,ö,ü,ß, but also á,à,é,è...) seperately.

如果您使用此尝试,则必须单独处理要替换的每个字符(例如Ä,Ö,Ü,ä,ö,ü,ß,还有á,à,é,è......)。

#3


3  

Thanks to Tomalak who actually pointed me in the right direction. Thought I'd post here for completeness.

感谢Tomalak,他实际上指出了我正确的方向。以为我会在这里发布完整性。

The problem seems to be that the codepage used by GPG is fixed (Latin I) independent of the codepage configured in the console. But once he pointed this out, I figured out how to workaraound this.

问题似乎是GPG使用的代码页是固定的(Latin I),与控制台中配置的代码页无关。但是一旦他指出这一点,我就想出了如何解决这个问题。

The trick is to change the codepage before generating the file list. This will actually make the filelist appear to be incorrect when viewed in the console. However, when passed to GPG, it works fine. GPG accepts the files and spits out the encrytped files with correct filenames.

诀窍是在生成文件列表之前更改代码页。这实际上会使文件列表在控制台中查看时显示不正确。但是,当传递给GPG时,它工作正常。 GPG接受文件并使用正确的文件名吐出加密文件。

The batch file looks something like this:

批处理文件看起来像这样:

chcp 1252
dir /b /s /a-d MyFolder >filelist.txt
gpg -r test@test.com --encrypt-files <filelist.txt

#4


1  

I'm using this batch to rename folders and seems to work fine so far... In my case codepage is 1252, yours might be different.

我正在使用这个批处理重命名文件夹,到目前为止似乎工作正常...在我的情况下代码页是1252,你的可能会有所不同。

mode con codepage select=1252
@echo off
Setlocal enabledelayedexpansion
::folder only (/D option)
for /R /D %%d in (*) do (

set an=%%~nd
set bn=!an:.=_!
set cn=!bn:-=_!
set dn=!cn: =_!
set en=!dn:Á=A!
set fn=!en:É=E!
set gn=!fn:Í=I!
set hn=!gn:Ó=O!
set in=!hn:Ú=U!
set jn=!in:Ü=U!
set kn=!jn:á=a!
set ln=!kn:é=e!
set mn=!ln:í=i!
set nn=!mn:ó=o!
set on=!nn:ú=u!
set pn=!on:ü=u!
set qn=!pn:Ñ=N!
set zn=!on:ñ=n!

set ax=%%~xd
set bx=!ax:.=_!
set cx=!bx:-=_!
set dx=!cx: =_!
set bx=!ax:.=_!
set cx=!bx:-=_!
set dx=!cx: =_!
set ex=!dx:Á=A!
set fx=!ex:É=E!
set gx=!fx:Í=I!
set hx=!gx:Ó=O!
set ix=!hx:Ú=U!
set jx=!ix:Ü=U!
set kx=!jx:á=a!
set lx=!kx:é=e!
set mx=!lx:í=i!
set nx=!mx:ó=o!
set ox=!nx:ú=u!
set px=!ox:ü=u!
set qx=!px:Ñ=N!
set zx=!ox:ñ=n!


if [!an!]==[] (set zn=)
if [!ax!]==[] (set zx=)

set newname=!zn!!zx!

if /i not [%%~nd%%~xd]==[!newname!] rename "%%d" !newname!

)

endlocal

pause

#5


1  

Following 'RenameFilesWithAccentedAndDiacriticalLatinChars.pl' PERL script renames files with accented and diacritical Latin characters :

在'RenameFilesWithAccentedAndDiacriticalLatinChars.pl'PERL脚本之后,重命名带有重音和变音拉丁字符的文件:

  • This PERL script starts from the folder given in parameter, or else from the current folder.
  • 此PERL脚本从参数中指定的文件夹开始,或者从当前文件夹开始。

  • It recursively searches for files with characters belonging to 80 - FF of CP 1250, CP 1252, CP 1254 and CP 1257 (mostly accented Latin characters) or Latin characters having diacritical marks.
  • 它递归地搜索具有属于80-FF的CP 1250,CP 1252,CP 1254和CP 1257(主要是重音拉丁字符)或具有变音符号的拉丁字符的字符的文件。

  • It calculates new file names by removing the accents and diacritical marks only from Latin characters (For example, Été --> Ete).
  • 它通过仅从拉丁字符中删除重音符号和变音符号来计算新文件名(例如,Été - > Ete)。

  • It displays all proposed renaming and perhaps conflicts, and asks the user for global approval.
  • 它显示所有建议的重命名和可能的冲突,并要求用户进行全局批准。

  • If the user has approved, it renames all files having no conflict.
  • 如果用户已批准,则会重命名所有没有冲突的文件。

Option '--batch' avoids interactive questions. Use with care.

选项'--batch'避免了交互式问题。小心使用。

Option '--' avoids the next parameter to be interpreted as option.

选项' - '避免将下一个参数解释为选项。

Special Warning :

特别警告:

  • This script was originally encoded in UTF-8, and should stay so.
  • 这个脚本最初是用UTF-8编码的,应该保持不变。

  • This script may rename a lot of files.
  • 此脚本可能会重命名很多文件。

  • Files names are theoretically all encoded only with UTF-8. But some file names may be found to contain also some characters having legacy encoding.
  • 理论上,文件名都只用UTF-8编码。但是可能发现一些文件名也包含一些具有遗留编码的字符。

  • The author has applied efforts for consistency checks, robustness, conflict detection and use of appropriate encoding. So this script should only rename files by removing accents and diacritical marks from Latin characters.
  • 作者已应用于一致性检查,健壮性,冲突检测和使用适当的编码。因此,此脚本应仅通过从拉丁字符中删除重音符号和变音符号来重命名文件。

  • But this script has been tested only under a limited number of OS (Windows, Mac OS X, Linux) and a limited number of terminal encodings (CP 850, ISO-8859-1, UTF-8).
  • 但是这个脚本仅在有限数量的操作系统(Windows,Mac OS X,Linux)和有限数量的终端编码(CP 850,ISO-8859-1,UTF-8)下进行了测试。

  • So, under weird circumstances, this script could rename many files with random names.
  • 因此,在奇怪的情况下,此脚本可以使用随机名称重命名许多文件。

  • Therefore, this script should be used with care, and modified with extreme care (beware encoding of internal strings, inputs, outputs and commands)
  • 因此,应谨慎使用此脚本,并小心修改(注意内部字符串,输入,输出和命令的编码)

#!/usr/bin/perl -w
#=============================================================================
#
#  Copyright 2010  Etienne URBAH
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details at
#  http://www.gnu.org/licenses/gpl.html
#
#  For usage and SPECIAL WARNING, see the 'Help' section below.
#
#=============================================================================
use 5.008_000;    #  For correct Unicode support
use warnings;
use strict;
use Encode;

$| = 1;           #  Autoflush STDOUT

#-----------------------------------------------------------------------------
#  Function ucRemoveEolUnderscoreDash :
#  Set Uppercase, remove End of line, Underscores and Dashes
#-----------------------------------------------------------------------------
sub ucRemoveEolUnderscoreDash
{
  local $_ = uc($_[0]);
  chomp;
  tr/_\-//d;
  $_;
}

#-----------------------------------------------------------------------------
#  Constants
#-----------------------------------------------------------------------------
my $Encoding_Western  = 'ISO-8859-1';
my $Encoding_Central  = 'ISO-8859-2';
my $Encoding_Baltic   = 'ISO-8859-4';
my $Encoding_Turkish  = 'ISO-8859-9';
my $Encoding_W_Euro   = 'ISO-8859-15';
my $Code_Page_OldWest =   850;
my $Code_Page_Central =  1250;
my $Code_Page_Western =  1252;
my $Code_Page_Turkish =  1254;
my $Code_Page_Baltic  =  1257;
my $Code_Page_UTF8    = 65001;

my $HighBitSetChars   = pack('C*', 0x80..0xFF);

my %SuperEncodings    =
  ( &ucRemoveEolUnderscoreDash($Encoding_Western), 'cp'.$Code_Page_Western,
    &ucRemoveEolUnderscoreDash($Encoding_Central), 'cp'.$Code_Page_Central,
    &ucRemoveEolUnderscoreDash($Encoding_Baltic),  'cp'.$Code_Page_Baltic,
    &ucRemoveEolUnderscoreDash($Encoding_Turkish), 'cp'.$Code_Page_Turkish,
    &ucRemoveEolUnderscoreDash($Encoding_W_Euro),  'cp'.$Code_Page_Western,
    &ucRemoveEolUnderscoreDash('cp'.$Code_Page_OldWest),
                                                   'cp'.$Code_Page_Western );

my %EncodingNames     = ( 'cp'.$Code_Page_Central, 'Central European',
                          'cp'.$Code_Page_Western, 'Western European',
                          'cp'.$Code_Page_Turkish, '    Turkish     ',
                          'cp'.$Code_Page_Baltic,  '     Baltic     ' );

my %NonAccenChars     = ( 
                          #--------------------------------#
'cp'.$Code_Page_Central,  #   Central European (cp1250)    #
                          #--------------------------------#
                          #€_‚_„…†‡_‰Š‹ŚŤŽŹ_‘’“”•–—_™š›śťžź#
                          'E_,_,.++_%S_STZZ_````.--_Ts_stzz'.

                          # ˇ˘Ł¤Ą¦§¨©Ş«¬­®Ż°±˛ł´µ¶·¸ąş»Ľ˝ľż#
                          '_``LoAlS`CS_--RZ`+,l`uP.,as_L~lz'.

                          #ŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢß#
                          'RAAAALCCCEEEEIIDDNNOOOOxRUUUUYTS'.

                          #ŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙#
                          'raaaalccceeeeiiddnnoooo%ruuuuyt`',

                          #--------------------------------#
'cp'.$Code_Page_Western,  #   Western European  (cp1252)   #
                          #--------------------------------#
                          #€_‚ƒ„…†‡ˆ‰Š‹Œ_Ž__‘’“”•–—˜™š›œ_žŸ#
                          'E_,f,.++^%S_O_Z__````.--~Ts_o_zY'.

                          # ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿#
                          '_!cLoYlS`Ca_--R-`+23`uP.,10_qh3_'.

                          #ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß#
                          'AAAAAAACEEEEIIIIDNOOOOOxOUUUUYTS'.

                          #àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ#
                          'aaaaaaaceeeeiiiidnooooo%ouuuuyty',

                          #--------------------------------#
'cp'.$Code_Page_Turkish,  #       Turkish  (cp1254)        #
                          #--------------------------------#
                          #€_‚ƒ„…†‡ˆ‰Š‹Œ____‘’“”•–—˜™š›œ__Ÿ#
                          'E_,f,.++^%S_O____````.--~Ts_o__Y'.

                          # ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿#
                          '_!cLoYlS`Ca_--R-`+23`uP.,10_qh3_'.

                          #ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞß#
                          'AAAAAAACEEEEIIIIGNOOOOOxOUUUUISS'.

                          #àáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ#
                          'aaaaaaaceeeeiiiignooooo%ouuuuisy',

                          #--------------------------------#
'cp'.$Code_Page_Baltic,   #       Baltic   (cp1257)        #
                          #--------------------------------#
                          #€_‚_„…†‡_‰_‹_¨ˇ¸_‘’“”•–—_™_›_¯˛_#
                          'E_,_,.++_%___``,_````.--_T___-,_'.

                          # �¢£¤�¦§Ø©Ŗ«¬­®Æ°±²³´µ¶·ø¹ŗ»¼½¾æ#
                          '__cLo_lSOCR_--RA`+23`uP.o1r_qh3a'.

                          #ĄĮĀĆÄÅĘĒČÉŹĖĢĶĪĻŠŃŅÓŌÕÖ×ŲŁŚŪÜŻŽß#
                          'AIACAAEECEZEGKILSNNOOOOxULSUUZZS'.

                          #ąįāćäåęēčéźėģķīļšńņóōõö÷ųłśūüżž˙#
                          'aiacaaeecezegkilsnnoooo%ulsuuzz`' );

my %AccentedChars;
my $AccentedChars     = '';
my $NonAccenChars     = '';
for ( $Code_Page_Central, $Code_Page_Western,
      $Code_Page_Turkish, $Code_Page_Baltic )
    {
      $AccentedChars{'cp'.$_}  = decode('cp'.$_, $HighBitSetChars);
      $AccentedChars          .= $AccentedChars{'cp'.$_};
      $NonAccenChars          .= $NonAccenChars{'cp'.$_};
    }
#print "\n", length($NonAccenChars), '  ', $NonAccenChars,"\n";
#print "\n", length($AccentedChars), '  ', $AccentedChars,"\n";

my $QuotedMetaNonAccenChars = quotemeta($NonAccenChars);

my $DiacriticalChars  = '';
for  ( 0x0300..0x036F, 0x1DC0..0x1DFF )
     { $DiacriticalChars .= chr($_) }

#-----------------------------------------------------------------------------
#  Parse options and parameters
#-----------------------------------------------------------------------------
my $b_Help        = 0;
my $b_Interactive = 1;
my $b_UTF8        = 0;
my $b_Parameter   = 0;
my $Folder;

for  ( @ARGV )
{
  if    ( lc($_) eq '--' )
        { $b_Parameter = 1 }
  elsif ( (not $b_Parameter) and (lc($_) eq '--batch') )
        { $b_Interactive = 0 }
  elsif ( (not $b_Parameter) and (lc($_) eq '--utf8') )
        { $b_UTF8 = 1 }
  elsif ( $b_Parameter or (substr($_, 0, 1) ne '-') )
        {
          if  ( defined($Folder) )
              { die "$0 accepts only 1 parameter\n" }
          else
              { $Folder = $_ }
        }
  else
        { $b_Help = 1 }
}

#-----------------------------------------------------------------------------
#  Help
#-----------------------------------------------------------------------------
if  ( $b_Help )
    {
      die << "END_OF_HELP"

$0  [--help]  [--batch]  [--]  [folder]

This script renames files with accented and diacritical Latin characters :

-  This PERL script starts from the folder given in parameter, or else from
   the current folder.
-  It recursively searches for files with characters belonging to 80 - FF of
   CP 1250, CP 1252, CP 1254 and CP 1257 (mostly accented Latin characters)
   or Latin characters having diacritical marks.
-  It calculates new file names by removing the accents and diacritical marks
   only from Latin characters  (For example,  Été --> Ete).
-  It displays all proposed renaming and perhaps conflicts, and asks the user
   for global approval.
-  If the user has approved, it renames all files having no conflict.

Option '--batch' avoids interactive questions.  Use with care.

Option '--'      avoids the next parameter to be interpreted as option.

SPECIAL WARNING :
-  This script was originally encoded in UTF-8, and should stay so.
-  This script may rename a lot of files.
-  Files names are theoretically all encoded only with UTF-8.  But some file
   names may be found to contain also some characters having legacy encoding.
-  The author has applied efforts for consistency checks, robustness, conflict
   detection and use of appropriate encoding.
   So this script should only rename files by removing accents and diacritical
   marks from Latin characters.
-  But this script has been tested only under a limited number of OS
   (Windows, Mac OS X, Linux) and a limited number of terminal encodings
   (CP 850, ISO-8859-1, UTF-8).
-  So, under weird circumstances, this script could rename many files with
   random names.
-  Therefore, this script should be used with care, and modified with extreme
   care (beware encoding of internal strings, inputs, outputs and commands)
END_OF_HELP
    }

#-----------------------------------------------------------------------------
#  If requested, change current folder
#-----------------------------------------------------------------------------
if  ( defined($Folder) )
    { chdir($Folder)  or  die  "Can NOT set '$Folder' as current folder\n" }

#-----------------------------------------------------------------------------
#  Following instruction is MANDATORY.
#  The return value should be non-zero, but on some systems it is zero.
#-----------------------------------------------------------------------------
utf8::decode($AccentedChars);
#  or  die "$0: '\$AccentedChars' should be UTF-8 but is NOT.\n";

#-----------------------------------------------------------------------------
#  Check consistency on 'tr'
#-----------------------------------------------------------------------------
$_ = $AccentedChars;
eval "tr/$AccentedChars/$QuotedMetaNonAccenChars/";
if  ( $@ )  { warn $@ }
if  ( $@ or ($_ ne $NonAccenChars) )
    { die "$0: Consistency check on 'tr' FAILED :\n\n",
          "Translated Accented Chars :  ", length($_), ' :  ', $_, "\n\n",
          "       Non Accented Chars :  ", length($NonAccenChars), ' :  ',
          $NonAccenChars, "\n" }

#-----------------------------------------------------------------------------
#  Constants depending on the OS
#-----------------------------------------------------------------------------
my $b_Windows = ( defined($ENV{'OS'}) and ($ENV{'OS'} eq 'Windows_NT') );

my ($Q, $sep, $sep2, $HOME, $Find, @List, $cwd, @Move);

if  ( $b_Windows )
    {
      $Q    = '"';
      $sep  = '\\';
      $sep2 = '\\\\';
      $HOME = $ENV{'USERPROFILE'};
      $Find = 'dir /b /s';
      @List = ( ( (`ver 2>&1` =~ m/version\s+([0-9]+)/i) and ($1 >= 6) ) ?
                ('icacls') :
                ( 'cacls') );
      $cwd  = `cd`;  chomp $cwd;  $cwd = quotemeta($cwd);
      @Move = ('move');
    }
else
    {
      $Q    = "'";
      $sep  = '/';
      $sep2 = '/';
      $HOME = $ENV{'HOME'};
      $Find = 'find .';
      @List = ('ls', '-d', '--');
      @Move = ('mv', '--');
      if  ( -w '/bin' )  { die "$0: For safety reasons, ",
                               "usage is BLOCKED to administrators.\n"}
    }

my $Encoding;
my $ucEncoding;
my $InputPipe = '-|';                                # Used as global variable

#-----------------------------------------------------------------------------
#  Under Windows, associate input and output encodings to code pages :
#  -  Get the original code page,
#  -  If it is not UTF-8, try to set it to UTF-8,
#  -  Define the input encoding as the one associated to the ACTIVE code page,
#  -  If STDOUT is the console, encode output for the ORIGINAL code page.
#-----------------------------------------------------------------------------
my $Code_Page_Original;
my $Code_Page_Active;

if  ( $b_Windows )
    {
      #-----------------------------------------------------------------------
      #  Get the original code page
      #-----------------------------------------------------------------------
      $_ = `chcp`;
      m/([0-9]+)$/  or  die "Non numeric Windows code page :  ", $_;
      $Code_Page_Original = $1;
      print 'Windows Original Code Page = ', $Code_Page_Original,
            ( $Code_Page_Original == $Code_Page_UTF8 ?
              ' = UTF-8, display is perhaps correct with a true type font.' :
              '' ), "\n\n";
      $Code_Page_Active = $Code_Page_Original ;

      #-----------------------------------------------------------------------
      #  The input encoding must be the same as the ACTIVE code page
      #-----------------------------------------------------------------------
      $Encoding = ( $Code_Page_Active == $Code_Page_UTF8 ?
                    'utf8' :
                    'cp'.$Code_Page_Active ) ;
      $InputPipe .= ":encoding($Encoding)";
      print "InputPipe = '$InputPipe'\n\n";

      #-----------------------------------------------------------------------
      #  If STDOUT is the console, output encoding must be the same as the
      #  ORIGINAL code page
      #-----------------------------------------------------------------------
      if  ( $Code_Page_Original != $Code_Page_UTF8 )
          {
            no  warnings  'unopened';
            @_ = stat(STDOUT);
            use warnings;
            if  ( scalar(@_) and ($_[0] == 1) )
                { binmode(STDOUT, ":encoding(cp$Code_Page_Original)") }
            else
                { binmode(STDOUT, ":encoding($Encoding)") }
          }
    }

#-----------------------------------------------------------------------------
#  Under *nix, if the 'LANG' environment variable contains an encoding,
#  verify that this encoding is supported by the OS and by PERL.
#-----------------------------------------------------------------------------
elsif ( defined($ENV{'LANG'}) and ($ENV{'LANG'} =~ m/\.([^\@]+)$/i) )
      {
        $Encoding = $1;

        my $Kernel = `uname -s`;
        chomp $Kernel;
        my $ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
        if  ( (lc($Kernel) ne 'darwin') and not grep {$_ eq $ucEncoding}
                       ( map { ($_, &ucRemoveEolUnderscoreDash($_)) }
                         `locale -m` ) )
            { die "Encoding = '$Encoding' or '$ucEncoding'  NOT supported ".
                  "by the OS\n" }

        my $ucLocale = &ucRemoveEolUnderscoreDash($ENV{'LANG'});
        if  ( not grep {$_ eq $ucLocale}
                       ( map { ($_, &ucRemoveEolUnderscoreDash($_)) }
                         `locale -a` ) )
            { die "Locale = '$ENV{LANG}' or '$ucLocale'  NOT supported ".
                  "by the OS\n" }

        if  ( not defined(Encode::find_encoding($Encoding)) )
            { die "Encoding = '$Encoding' or '$ucEncoding'  NOT supported ".
                  "by PERL\n" }

        print "Encoding = '$Encoding'  is supported by the OS and PERL\n\n";
        binmode(STDOUT, ":encoding($Encoding)");
      }

#-----------------------------------------------------------------------------
#  Check consistency between parameter of 'echo' and output of 'echo'
#-----------------------------------------------------------------------------
undef $_;
if  ( defined($Encoding)  )
    {
      $ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
      if    ( defined($SuperEncodings{$ucEncoding}) )
            { $_ = substr($AccentedChars{$SuperEncodings{$ucEncoding}},
                          0x20, 0x60) }
      elsif ( defined($AccentedChars{$Encoding}) )
            { $_ = $AccentedChars{$Encoding} }
      elsif ( $Encoding =~ m/^utf-?8$/i )
            { $_ = $AccentedChars }
    }
if  ( not defined($_) )                # Chosen chars are same in 4 code pages
    { $_ = decode('cp'.$Code_Page_Central,
                  pack('C*', 0xC9, 0xD3, 0xD7, 0xDC,                    # ÉÓ×Ü
                             0xE9, 0xF3, 0xF7, 0xFC)) }                 # éó÷ü
#print  $_, "  (Parameter)\n\n";
#system 'echo', $_;
utf8::decode($_);
#print  "\n", $_, "  (Parameter after utf8::decode)\n\n";
my @EchoCommand = ( $b_Windows ?
                    "echo $_" :
                    ('echo', $_) );
#system @EchoCommand;

open(ECHO, $InputPipe, @EchoCommand)  or  die 'echo $_: ', $!;
my $Output = join('', <ECHO>);
close(ECHO);
chomp $Output;
#print "\n", $Output, "  (Output of 'echo')\n";
utf8::decode($Output);
#print "\n", $Output, "  (Output of 'echo' after utf8::decode)\n\n";

if  ( $Output ne $_ )
    {
      warn "$0: Consistency check between parameter ",
                         "of 'echo' and output of 'echo' FAILED :\n\n",
           "Parameter of 'echo' :  ", length($_), ' :  ', $_, "\n\n",
           "   Output of 'echo' :  ", length($Output), ' :  ', $Output, "\n";
      exit 1;
    }

#-----------------------------------------------------------------------------
#  Print the translation table
#-----------------------------------------------------------------------------
if  ( defined($Encoding) )
{
  undef $_;
  $ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
  if    ( defined($SuperEncodings{$ucEncoding}) )
        {
          $_ = $SuperEncodings{$ucEncoding};
          print "--------- $EncodingNames{$_} ---------\n",
                '    ', substr($AccentedChars{$_}, 0x20, 0x20), "\n",
                '--> ', substr($NonAccenChars{$_}, 0x20, 0x20), "\n\n",
                '    ', substr($AccentedChars{$_}, 0x40, 0x20), "\n",
                '--> ', substr($NonAccenChars{$_}, 0x40, 0x20), "\n\n",
                '    ', substr($AccentedChars{$_}, 0x60, 0x20), "\n",
                '--> ', substr($NonAccenChars{$_}, 0x60, 0x20), "\n\n" }
  else
  {
    for ( 'cp'.$Code_Page_Central, 'cp'.$Code_Page_Western,
          'cp'.$Code_Page_Turkish, 'cp'.$Code_Page_Baltic )
    {
      if  ( ('cp'.$Encoding eq $_) or ($Encoding =~ m/^utf-?8$/i) )
          { print "--------- $EncodingNames{$_} ---------\n",
                  '    ', substr($AccentedChars{$_},    0, 0x20), "\n",
                  '--> ', substr($NonAccenChars{$_},    0, 0x20), "\n\n",
                  '    ', substr($AccentedChars{$_}, 0x20, 0x20), "\n",
                  '--> ', substr($NonAccenChars{$_}, 0x20, 0x20), "\n\n",
                  '    ', substr($AccentedChars{$_}, 0x40, 0x20), "\n",
                  '--> ', substr($NonAccenChars{$_}, 0x40, 0x20), "\n\n",
                  '    ', substr($AccentedChars{$_}, 0x60, 0x20), "\n",
                  '--> ', substr($NonAccenChars{$_}, 0x60, 0x20), "\n\n" }
    }
  }
}

#-----------------------------------------------------------------------------
#  Completely optional :
#  Inside the Unison file, find the accented file names to ignore
#-----------------------------------------------------------------------------
my $UnisonFile = $HOME.$sep.'.unison'.$sep.'common.unison';
my @Ignores;

if  ( open(UnisonFile, '<', $UnisonFile) )
    {
      print "\nUnison File '", $UnisonFile, "'\n";
      while  ( <UnisonFile> )
      {
        if  ( m/^\s*ignore\s*=\s*Name\s*(.+)/ )
            {
              $_ = $1 ;
              if  ( m/[$AccentedChars]/ )
                  { push(@Ignores, $_) }
            }
      }
      close(UnisonFile);
    }
print map("  Ignore: ".$_."\n", @Ignores);

#-----------------------------------------------------------------------------
#  Function OutputAndErrorFromCommand :
#
#  Execute the command given as array in parameter, and return STDOUT + STDERR
#
#  Reads global variable $InputPipe
#-----------------------------------------------------------------------------
sub OutputAndErrorFromCommand
{
  local $_;
  my @Command = @_;             # Protects content of @_ from any modification
  #---------------------------------------------------------------------------
  #  Under Windows, fork fails, so :
  #  -  Enclose into double quotes parameters containing blanks or simple
  #     quotes,
  #  -  Use piped open with redirection of STDERR.
  #---------------------------------------------------------------------------
  if  ( defined($ENV{'OS'}) and ($ENV{'OS'} eq 'Windows_NT') )
      {
        for  ( @Command )
             { s/^((-|.*(\s|')).*)$/$Q$1$Q/ }
        my $Command = join('  ', @Command);
        #print "\n", $Command;
        open(COMMAND, $InputPipe, "$Command  2>&1")  or  die '$Command: ', $!;
      }
  #---------------------------------------------------------------------------
  #  Under Unix, quoting is too difficult, but fork succeeds
  #---------------------------------------------------------------------------
  else
      {
        my $pid = open(COMMAND, $InputPipe);
        defined($pid) or die "Can't fork: $!";
        if  ( $pid == 0 )                           #  Child process
            {
              open STDERR, '>&=STDOUT';
              exec @Command;                        #  Returns only on failure
              die "Can't @Command";
            }
      }
  $_ = join('', <COMMAND>);                         #  Child's STDOUT + STDERR
  close COMMAND;
  chomp;
  utf8::decode($_);
  $_;
}

#-----------------------------------------------------------------------------
#  Find recursively all files inside the current folder.
#  Verify accessibility of files with accented names.
#  Calculate non-accented file names from accented file names.
#  Build the list of duplicates.
#-----------------------------------------------------------------------------
my %Olds;                                # $Olds{$New} = [ $Old1, $Old2, ... ]
my $Old;
my $Dir;
my $Command;
my $ErrorMessage;
my $New;
my %News;

print "\n\nFiles with accented name and the corresponding non-accented name ",
      ":\n";

open(FIND, $InputPipe, $Find)  or  die $Find, ': ', $!;

FILE:
while  ( <FIND> )
{
  chomp;
  #---------------------------------------------------------------------------
  #  If the file path contains UTF-8, following instruction is MANDATORY.
  #  If the file path does NOT contain UTF-8, it should NOT hurt.
  #---------------------------------------------------------------------------
  utf8::decode($_);

  if  ( $b_Windows )
      { s/^$cwd$sep2// }
  else
      { s/^\.$sep2// }

  #---------------------------------------------------------------------------
  #  From now on :  $_ = Dir/OldFilename
  #---------------------------------------------------------------------------
  push(@{$Olds{$_}}, $_);

  if  ( m/([^$sep2]+)$/ and
        ($1 =~ m/[$AccentedChars]|([\ -\~][$DiacriticalChars])/) )
      {
        if  ( $b_Windows and m/$Q/ )
            {
              print "\n    $Q$_$Q\n***  contains quotes.\n";
              next;
            }
        for  my $Ignore  ( @Ignores )
        {
          if  ( m/$Ignore$/ )
              { next FILE }
        }
        $Old = $_ ;
        m/^(.*$sep2)?([^$sep2]+)$/;
        $Dir = ( defined($1) ? $1 : '');
        $_   = $2;

        #---------------------------------------------------------------------
        #  From now on :  $Old = Dir/OldFilename
        #                 $_   = OldFilename
        #---------------------------------------------------------------------
        print "\n    $Q$Old$Q\n";
        $ErrorMessage = &OutputAndErrorFromCommand(@List, $Old);
        if  ( $? != 0 )
            { print "*** $ErrorMessage\n" }
        else
            {
              #---------------------------------------------------------------
              #  Change accented Latin chars to non-accented chars.
              #  Remove all diacritical marks after Latin chars.
              #---------------------------------------------------------------
              eval "tr/$AccentedChars/$QuotedMetaNonAccenChars/";
              s/([\ -\~])[$DiacriticalChars]+/$1/g;
              #---------------------------------------------------------------
              #  From now on :  $Old = Dir/OldFilename
              #                 $_   = NewFilename
              #---------------------------------------------------------------
              if  ( $@ )
                  { warn $@ }
              else
                  {
                    $New = $Dir.$_;
                    if  ( $b_Windows or (not utf8::is_utf8($Dir)) )    # Weird
                        { utf8::decode($New) }                 # but necessary
                    $News{$Old} = $New;
                    push(@{$Olds{$New}}, $Old);
                  }
              print "--> $Q$Dir$_$Q\n";
            }
      }
}

close(FIND);

#-----------------------------------------------------------------------------
#  Print list of duplicate non-accented file names
#-----------------------------------------------------------------------------
my $b_NoDuplicate = 1;

for  my $New  ( sort keys %Olds )
{
  if  ( scalar(@{$Olds{$New}}) > 1 )
      {
        if  ( $b_NoDuplicate )
            {
              print "\n\nFollowing files would have same non-accented name ",
                    ":\n";
              $b_NoDuplicate = 0;
            }
        print "\n", map('    '.$_."\n", @{$Olds{$New}}), '--> ', $New, "\n";
        for  ( @{$Olds{$New}} )
             { delete $News{$_} };
      }
}

#-----------------------------------------------------------------------------
#  If there are NO file to rename, then exit
#-----------------------------------------------------------------------------
my $Number = scalar(keys %News);

print "\n\n";
if  ( $Number < 1 )
    {
      print "There are NO file to rename\n";
      exit;
    }

#-----------------------------------------------------------------------------
#  Ask the user for global approval of renaming
#-----------------------------------------------------------------------------
if  ( $b_Interactive )
    {
      print "In order to really rename the ", $Number,
            " files which can safely be renamed, type 'rename' :  ";
      $_ = <STDIN>;
      sleep 1;                       # Gives time to PERL to handle interrupts
      if  ( not m/^rename$/i )
          { exit 1 }
    }
else
    { print $Number, " files will be renamed\n\n" }

#-----------------------------------------------------------------------------
#  Rename accented file names sorted descending by name size
#-----------------------------------------------------------------------------
$Number  = 0;
my $Move = join('  ', @Move);

for  ( sort {length($b) <=> length($a)} keys %News )
{
  $ErrorMessage = &OutputAndErrorFromCommand(@Move, $_, $News{$_});
  if  ( $? == 0 )
      { $Number++ }
  else
      { print "\n$Move  $Q$_$Q\n", (' ' x length($Move)),
        "  $Q$News{$_}$Q\n", ('*' x length($Move)), "  $ErrorMessage\n" }
}
print "\n$Number files have been successfully renamed\n";

__END__

#1


4  

Have you tried setting cmd.exe into another codepage before you feed the file names to gnupg? Issue chcp 65001 to set cmd.exe to Unicode beforehand and try again.

在将文件名提供给gnupg之前,您是否尝试将cmd.exe设置到另一个代码页中?发出chcp 65001以预先将cmd.exe设置为Unicode,然后重试。

If that fails, the following VBScript would do it:

如果失败,以下VBScript将执行此操作:

Option Explicit

Dim fso: Set fso = CreateObject("Scripting.FileSystemObject")
Dim invalidChars: Set invalidChars = New RegExp

' put all characters that you want to strip inside the brackets
invalidChars.Pattern =  "[äöüß&%]"
invalidChars.IgnoreCase = True
invalidChars.Global = True

If WScript.Arguments.Unnamed.Count = 0 Then
  WScript.Echo "Please give folder name as argument 1."
  WScript.Quit 1
End If

Recurse fso.GetFolder(WScript.Arguments.Unnamed(0))

Sub Recurse(f)
  Dim item

  For Each item In f.SubFolders
    Recurse item
    Sanitize item
  Next
  For Each item In f.Files
    Sanitize item
  Next
End Sub

Sub Sanitize(folderOrFile)
  Dim newName: newName = invalidChars.Replace(folderOrFile.Name, "_")
  If folderOrFile.Name = newName Then Exit Sub
  WScript.Echo folderOrFile.Name, " -> ", newName
  folderOrFile.Name = newName 
End Sub

call it like this:

这样叫:

cscript replace.vbs "c:\path\to\my\files"

You can also drag&drop a folder onto it in Windows Explorer.

您还可以在Windows资源管理器中将文件夹拖放到该文件夹​​中。

#2


3  

From http://www.robvanderwoude.com/bht.html:

use NT's SET's string substitution to replace or remove characters anywhere in a string:

使用NT的SET的字符串替换来替换或删除字符串中的任何字符:

SET STRING=[ABCDEFG]
SET STRING=%STRING:[=%
SET STRING=%STRING:]=%
ECHO String: %STRING%

will display 
String: ABCDEFG

SET STRING=[ABCDEFG]
SET STRING=%STRING:[=(%
SET STRING=%STRING:]=)%
ECHO String: %STRING%

will display 
String: (ABCDEFG)

SET STRING=[ABCDEFG]
SET STRING=%STRING:~1,7%
ECHO String: %STRING%

will display 
String: ABCDEFG

If you use this attempt, you will have to process each character you want to replace (e.g. Ä,Ö,Ü,ä,ö,ü,ß, but also á,à,é,è...) seperately.

如果您使用此尝试,则必须单独处理要替换的每个字符(例如Ä,Ö,Ü,ä,ö,ü,ß,还有á,à,é,è......)。

#3


3  

Thanks to Tomalak who actually pointed me in the right direction. Thought I'd post here for completeness.

感谢Tomalak,他实际上指出了我正确的方向。以为我会在这里发布完整性。

The problem seems to be that the codepage used by GPG is fixed (Latin I) independent of the codepage configured in the console. But once he pointed this out, I figured out how to workaraound this.

问题似乎是GPG使用的代码页是固定的(Latin I),与控制台中配置的代码页无关。但是一旦他指出这一点,我就想出了如何解决这个问题。

The trick is to change the codepage before generating the file list. This will actually make the filelist appear to be incorrect when viewed in the console. However, when passed to GPG, it works fine. GPG accepts the files and spits out the encrytped files with correct filenames.

诀窍是在生成文件列表之前更改代码页。这实际上会使文件列表在控制台中查看时显示不正确。但是,当传递给GPG时,它工作正常。 GPG接受文件并使用正确的文件名吐出加密文件。

The batch file looks something like this:

批处理文件看起来像这样:

chcp 1252
dir /b /s /a-d MyFolder >filelist.txt
gpg -r test@test.com --encrypt-files <filelist.txt

#4


1  

I'm using this batch to rename folders and seems to work fine so far... In my case codepage is 1252, yours might be different.

我正在使用这个批处理重命名文件夹,到目前为止似乎工作正常...在我的情况下代码页是1252,你的可能会有所不同。

mode con codepage select=1252
@echo off
Setlocal enabledelayedexpansion
::folder only (/D option)
for /R /D %%d in (*) do (

set an=%%~nd
set bn=!an:.=_!
set cn=!bn:-=_!
set dn=!cn: =_!
set en=!dn:Á=A!
set fn=!en:É=E!
set gn=!fn:Í=I!
set hn=!gn:Ó=O!
set in=!hn:Ú=U!
set jn=!in:Ü=U!
set kn=!jn:á=a!
set ln=!kn:é=e!
set mn=!ln:í=i!
set nn=!mn:ó=o!
set on=!nn:ú=u!
set pn=!on:ü=u!
set qn=!pn:Ñ=N!
set zn=!on:ñ=n!

set ax=%%~xd
set bx=!ax:.=_!
set cx=!bx:-=_!
set dx=!cx: =_!
set bx=!ax:.=_!
set cx=!bx:-=_!
set dx=!cx: =_!
set ex=!dx:Á=A!
set fx=!ex:É=E!
set gx=!fx:Í=I!
set hx=!gx:Ó=O!
set ix=!hx:Ú=U!
set jx=!ix:Ü=U!
set kx=!jx:á=a!
set lx=!kx:é=e!
set mx=!lx:í=i!
set nx=!mx:ó=o!
set ox=!nx:ú=u!
set px=!ox:ü=u!
set qx=!px:Ñ=N!
set zx=!ox:ñ=n!


if [!an!]==[] (set zn=)
if [!ax!]==[] (set zx=)

set newname=!zn!!zx!

if /i not [%%~nd%%~xd]==[!newname!] rename "%%d" !newname!

)

endlocal

pause

#5


1  

Following 'RenameFilesWithAccentedAndDiacriticalLatinChars.pl' PERL script renames files with accented and diacritical Latin characters :

在'RenameFilesWithAccentedAndDiacriticalLatinChars.pl'PERL脚本之后,重命名带有重音和变音拉丁字符的文件:

  • This PERL script starts from the folder given in parameter, or else from the current folder.
  • 此PERL脚本从参数中指定的文件夹开始,或者从当前文件夹开始。

  • It recursively searches for files with characters belonging to 80 - FF of CP 1250, CP 1252, CP 1254 and CP 1257 (mostly accented Latin characters) or Latin characters having diacritical marks.
  • 它递归地搜索具有属于80-FF的CP 1250,CP 1252,CP 1254和CP 1257(主要是重音拉丁字符)或具有变音符号的拉丁字符的字符的文件。

  • It calculates new file names by removing the accents and diacritical marks only from Latin characters (For example, Été --> Ete).
  • 它通过仅从拉丁字符中删除重音符号和变音符号来计算新文件名(例如,Été - > Ete)。

  • It displays all proposed renaming and perhaps conflicts, and asks the user for global approval.
  • 它显示所有建议的重命名和可能的冲突,并要求用户进行全局批准。

  • If the user has approved, it renames all files having no conflict.
  • 如果用户已批准,则会重命名所有没有冲突的文件。

Option '--batch' avoids interactive questions. Use with care.

选项'--batch'避免了交互式问题。小心使用。

Option '--' avoids the next parameter to be interpreted as option.

选项' - '避免将下一个参数解释为选项。

Special Warning :

特别警告:

  • This script was originally encoded in UTF-8, and should stay so.
  • 这个脚本最初是用UTF-8编码的,应该保持不变。

  • This script may rename a lot of files.
  • 此脚本可能会重命名很多文件。

  • Files names are theoretically all encoded only with UTF-8. But some file names may be found to contain also some characters having legacy encoding.
  • 理论上,文件名都只用UTF-8编码。但是可能发现一些文件名也包含一些具有遗留编码的字符。

  • The author has applied efforts for consistency checks, robustness, conflict detection and use of appropriate encoding. So this script should only rename files by removing accents and diacritical marks from Latin characters.
  • 作者已应用于一致性检查,健壮性,冲突检测和使用适当的编码。因此,此脚本应仅通过从拉丁字符中删除重音符号和变音符号来重命名文件。

  • But this script has been tested only under a limited number of OS (Windows, Mac OS X, Linux) and a limited number of terminal encodings (CP 850, ISO-8859-1, UTF-8).
  • 但是这个脚本仅在有限数量的操作系统(Windows,Mac OS X,Linux)和有限数量的终端编码(CP 850,ISO-8859-1,UTF-8)下进行了测试。

  • So, under weird circumstances, this script could rename many files with random names.
  • 因此,在奇怪的情况下,此脚本可以使用随机名称重命名许多文件。

  • Therefore, this script should be used with care, and modified with extreme care (beware encoding of internal strings, inputs, outputs and commands)
  • 因此,应谨慎使用此脚本,并小心修改(注意内部字符串,输入,输出和命令的编码)

#!/usr/bin/perl -w
#=============================================================================
#
#  Copyright 2010  Etienne URBAH
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details at
#  http://www.gnu.org/licenses/gpl.html
#
#  For usage and SPECIAL WARNING, see the 'Help' section below.
#
#=============================================================================
use 5.008_000;    #  For correct Unicode support
use warnings;
use strict;
use Encode;

$| = 1;           #  Autoflush STDOUT

#-----------------------------------------------------------------------------
#  Function ucRemoveEolUnderscoreDash :
#  Set Uppercase, remove End of line, Underscores and Dashes
#-----------------------------------------------------------------------------
sub ucRemoveEolUnderscoreDash
{
  local $_ = uc($_[0]);
  chomp;
  tr/_\-//d;
  $_;
}

#-----------------------------------------------------------------------------
#  Constants
#-----------------------------------------------------------------------------
my $Encoding_Western  = 'ISO-8859-1';
my $Encoding_Central  = 'ISO-8859-2';
my $Encoding_Baltic   = 'ISO-8859-4';
my $Encoding_Turkish  = 'ISO-8859-9';
my $Encoding_W_Euro   = 'ISO-8859-15';
my $Code_Page_OldWest =   850;
my $Code_Page_Central =  1250;
my $Code_Page_Western =  1252;
my $Code_Page_Turkish =  1254;
my $Code_Page_Baltic  =  1257;
my $Code_Page_UTF8    = 65001;

my $HighBitSetChars   = pack('C*', 0x80..0xFF);

my %SuperEncodings    =
  ( &ucRemoveEolUnderscoreDash($Encoding_Western), 'cp'.$Code_Page_Western,
    &ucRemoveEolUnderscoreDash($Encoding_Central), 'cp'.$Code_Page_Central,
    &ucRemoveEolUnderscoreDash($Encoding_Baltic),  'cp'.$Code_Page_Baltic,
    &ucRemoveEolUnderscoreDash($Encoding_Turkish), 'cp'.$Code_Page_Turkish,
    &ucRemoveEolUnderscoreDash($Encoding_W_Euro),  'cp'.$Code_Page_Western,
    &ucRemoveEolUnderscoreDash('cp'.$Code_Page_OldWest),
                                                   'cp'.$Code_Page_Western );

my %EncodingNames     = ( 'cp'.$Code_Page_Central, 'Central European',
                          'cp'.$Code_Page_Western, 'Western European',
                          'cp'.$Code_Page_Turkish, '    Turkish     ',
                          'cp'.$Code_Page_Baltic,  '     Baltic     ' );

my %NonAccenChars     = ( 
                          #--------------------------------#
'cp'.$Code_Page_Central,  #   Central European (cp1250)    #
                          #--------------------------------#
                          #€_‚_„…†‡_‰Š‹ŚŤŽŹ_‘’“”•–—_™š›śťžź#
                          'E_,_,.++_%S_STZZ_````.--_Ts_stzz'.

                          # ˇ˘Ł¤Ą¦§¨©Ş«¬­®Ż°±˛ł´µ¶·¸ąş»Ľ˝ľż#
                          '_``LoAlS`CS_--RZ`+,l`uP.,as_L~lz'.

                          #ŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢß#
                          'RAAAALCCCEEEEIIDDNNOOOOxRUUUUYTS'.

                          #ŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙#
                          'raaaalccceeeeiiddnnoooo%ruuuuyt`',

                          #--------------------------------#
'cp'.$Code_Page_Western,  #   Western European  (cp1252)   #
                          #--------------------------------#
                          #€_‚ƒ„…†‡ˆ‰Š‹Œ_Ž__‘’“”•–—˜™š›œ_žŸ#
                          'E_,f,.++^%S_O_Z__````.--~Ts_o_zY'.

                          # ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿#
                          '_!cLoYlS`Ca_--R-`+23`uP.,10_qh3_'.

                          #ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß#
                          'AAAAAAACEEEEIIIIDNOOOOOxOUUUUYTS'.

                          #àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ#
                          'aaaaaaaceeeeiiiidnooooo%ouuuuyty',

                          #--------------------------------#
'cp'.$Code_Page_Turkish,  #       Turkish  (cp1254)        #
                          #--------------------------------#
                          #€_‚ƒ„…†‡ˆ‰Š‹Œ____‘’“”•–—˜™š›œ__Ÿ#
                          'E_,f,.++^%S_O____````.--~Ts_o__Y'.

                          # ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿#
                          '_!cLoYlS`Ca_--R-`+23`uP.,10_qh3_'.

                          #ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞß#
                          'AAAAAAACEEEEIIIIGNOOOOOxOUUUUISS'.

                          #àáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ#
                          'aaaaaaaceeeeiiiignooooo%ouuuuisy',

                          #--------------------------------#
'cp'.$Code_Page_Baltic,   #       Baltic   (cp1257)        #
                          #--------------------------------#
                          #€_‚_„…†‡_‰_‹_¨ˇ¸_‘’“”•–—_™_›_¯˛_#
                          'E_,_,.++_%___``,_````.--_T___-,_'.

                          # �¢£¤�¦§Ø©Ŗ«¬­®Æ°±²³´µ¶·ø¹ŗ»¼½¾æ#
                          '__cLo_lSOCR_--RA`+23`uP.o1r_qh3a'.

                          #ĄĮĀĆÄÅĘĒČÉŹĖĢĶĪĻŠŃŅÓŌÕÖ×ŲŁŚŪÜŻŽß#
                          'AIACAAEECEZEGKILSNNOOOOxULSUUZZS'.

                          #ąįāćäåęēčéźėģķīļšńņóōõö÷ųłśūüżž˙#
                          'aiacaaeecezegkilsnnoooo%ulsuuzz`' );

my %AccentedChars;
my $AccentedChars     = '';
my $NonAccenChars     = '';
for ( $Code_Page_Central, $Code_Page_Western,
      $Code_Page_Turkish, $Code_Page_Baltic )
    {
      $AccentedChars{'cp'.$_}  = decode('cp'.$_, $HighBitSetChars);
      $AccentedChars          .= $AccentedChars{'cp'.$_};
      $NonAccenChars          .= $NonAccenChars{'cp'.$_};
    }
#print "\n", length($NonAccenChars), '  ', $NonAccenChars,"\n";
#print "\n", length($AccentedChars), '  ', $AccentedChars,"\n";

my $QuotedMetaNonAccenChars = quotemeta($NonAccenChars);

my $DiacriticalChars  = '';
for  ( 0x0300..0x036F, 0x1DC0..0x1DFF )
     { $DiacriticalChars .= chr($_) }

#-----------------------------------------------------------------------------
#  Parse options and parameters
#-----------------------------------------------------------------------------
my $b_Help        = 0;
my $b_Interactive = 1;
my $b_UTF8        = 0;
my $b_Parameter   = 0;
my $Folder;

for  ( @ARGV )
{
  if    ( lc($_) eq '--' )
        { $b_Parameter = 1 }
  elsif ( (not $b_Parameter) and (lc($_) eq '--batch') )
        { $b_Interactive = 0 }
  elsif ( (not $b_Parameter) and (lc($_) eq '--utf8') )
        { $b_UTF8 = 1 }
  elsif ( $b_Parameter or (substr($_, 0, 1) ne '-') )
        {
          if  ( defined($Folder) )
              { die "$0 accepts only 1 parameter\n" }
          else
              { $Folder = $_ }
        }
  else
        { $b_Help = 1 }
}

#-----------------------------------------------------------------------------
#  Help
#-----------------------------------------------------------------------------
if  ( $b_Help )
    {
      die << "END_OF_HELP"

$0  [--help]  [--batch]  [--]  [folder]

This script renames files with accented and diacritical Latin characters :

-  This PERL script starts from the folder given in parameter, or else from
   the current folder.
-  It recursively searches for files with characters belonging to 80 - FF of
   CP 1250, CP 1252, CP 1254 and CP 1257 (mostly accented Latin characters)
   or Latin characters having diacritical marks.
-  It calculates new file names by removing the accents and diacritical marks
   only from Latin characters  (For example,  Été --> Ete).
-  It displays all proposed renaming and perhaps conflicts, and asks the user
   for global approval.
-  If the user has approved, it renames all files having no conflict.

Option '--batch' avoids interactive questions.  Use with care.

Option '--'      avoids the next parameter to be interpreted as option.

SPECIAL WARNING :
-  This script was originally encoded in UTF-8, and should stay so.
-  This script may rename a lot of files.
-  Files names are theoretically all encoded only with UTF-8.  But some file
   names may be found to contain also some characters having legacy encoding.
-  The author has applied efforts for consistency checks, robustness, conflict
   detection and use of appropriate encoding.
   So this script should only rename files by removing accents and diacritical
   marks from Latin characters.
-  But this script has been tested only under a limited number of OS
   (Windows, Mac OS X, Linux) and a limited number of terminal encodings
   (CP 850, ISO-8859-1, UTF-8).
-  So, under weird circumstances, this script could rename many files with
   random names.
-  Therefore, this script should be used with care, and modified with extreme
   care (beware encoding of internal strings, inputs, outputs and commands)
END_OF_HELP
    }

#-----------------------------------------------------------------------------
#  If requested, change current folder
#-----------------------------------------------------------------------------
if  ( defined($Folder) )
    { chdir($Folder)  or  die  "Can NOT set '$Folder' as current folder\n" }

#-----------------------------------------------------------------------------
#  Following instruction is MANDATORY.
#  The return value should be non-zero, but on some systems it is zero.
#-----------------------------------------------------------------------------
utf8::decode($AccentedChars);
#  or  die "$0: '\$AccentedChars' should be UTF-8 but is NOT.\n";

#-----------------------------------------------------------------------------
#  Check consistency on 'tr'
#-----------------------------------------------------------------------------
$_ = $AccentedChars;
eval "tr/$AccentedChars/$QuotedMetaNonAccenChars/";
if  ( $@ )  { warn $@ }
if  ( $@ or ($_ ne $NonAccenChars) )
    { die "$0: Consistency check on 'tr' FAILED :\n\n",
          "Translated Accented Chars :  ", length($_), ' :  ', $_, "\n\n",
          "       Non Accented Chars :  ", length($NonAccenChars), ' :  ',
          $NonAccenChars, "\n" }

#-----------------------------------------------------------------------------
#  Constants depending on the OS
#-----------------------------------------------------------------------------
my $b_Windows = ( defined($ENV{'OS'}) and ($ENV{'OS'} eq 'Windows_NT') );

my ($Q, $sep, $sep2, $HOME, $Find, @List, $cwd, @Move);

if  ( $b_Windows )
    {
      $Q    = '"';
      $sep  = '\\';
      $sep2 = '\\\\';
      $HOME = $ENV{'USERPROFILE'};
      $Find = 'dir /b /s';
      @List = ( ( (`ver 2>&1` =~ m/version\s+([0-9]+)/i) and ($1 >= 6) ) ?
                ('icacls') :
                ( 'cacls') );
      $cwd  = `cd`;  chomp $cwd;  $cwd = quotemeta($cwd);
      @Move = ('move');
    }
else
    {
      $Q    = "'";
      $sep  = '/';
      $sep2 = '/';
      $HOME = $ENV{'HOME'};
      $Find = 'find .';
      @List = ('ls', '-d', '--');
      @Move = ('mv', '--');
      if  ( -w '/bin' )  { die "$0: For safety reasons, ",
                               "usage is BLOCKED to administrators.\n"}
    }

my $Encoding;
my $ucEncoding;
my $InputPipe = '-|';                                # Used as global variable

#-----------------------------------------------------------------------------
#  Under Windows, associate input and output encodings to code pages :
#  -  Get the original code page,
#  -  If it is not UTF-8, try to set it to UTF-8,
#  -  Define the input encoding as the one associated to the ACTIVE code page,
#  -  If STDOUT is the console, encode output for the ORIGINAL code page.
#-----------------------------------------------------------------------------
my $Code_Page_Original;
my $Code_Page_Active;

if  ( $b_Windows )
    {
      #-----------------------------------------------------------------------
      #  Get the original code page
      #-----------------------------------------------------------------------
      $_ = `chcp`;
      m/([0-9]+)$/  or  die "Non numeric Windows code page :  ", $_;
      $Code_Page_Original = $1;
      print 'Windows Original Code Page = ', $Code_Page_Original,
            ( $Code_Page_Original == $Code_Page_UTF8 ?
              ' = UTF-8, display is perhaps correct with a true type font.' :
              '' ), "\n\n";
      $Code_Page_Active = $Code_Page_Original ;

      #-----------------------------------------------------------------------
      #  The input encoding must be the same as the ACTIVE code page
      #-----------------------------------------------------------------------
      $Encoding = ( $Code_Page_Active == $Code_Page_UTF8 ?
                    'utf8' :
                    'cp'.$Code_Page_Active ) ;
      $InputPipe .= ":encoding($Encoding)";
      print "InputPipe = '$InputPipe'\n\n";

      #-----------------------------------------------------------------------
      #  If STDOUT is the console, output encoding must be the same as the
      #  ORIGINAL code page
      #-----------------------------------------------------------------------
      if  ( $Code_Page_Original != $Code_Page_UTF8 )
          {
            no  warnings  'unopened';
            @_ = stat(STDOUT);
            use warnings;
            if  ( scalar(@_) and ($_[0] == 1) )
                { binmode(STDOUT, ":encoding(cp$Code_Page_Original)") }
            else
                { binmode(STDOUT, ":encoding($Encoding)") }
          }
    }

#-----------------------------------------------------------------------------
#  Under *nix, if the 'LANG' environment variable contains an encoding,
#  verify that this encoding is supported by the OS and by PERL.
#-----------------------------------------------------------------------------
elsif ( defined($ENV{'LANG'}) and ($ENV{'LANG'} =~ m/\.([^\@]+)$/i) )
      {
        $Encoding = $1;

        my $Kernel = `uname -s`;
        chomp $Kernel;
        my $ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
        if  ( (lc($Kernel) ne 'darwin') and not grep {$_ eq $ucEncoding}
                       ( map { ($_, &ucRemoveEolUnderscoreDash($_)) }
                         `locale -m` ) )
            { die "Encoding = '$Encoding' or '$ucEncoding'  NOT supported ".
                  "by the OS\n" }

        my $ucLocale = &ucRemoveEolUnderscoreDash($ENV{'LANG'});
        if  ( not grep {$_ eq $ucLocale}
                       ( map { ($_, &ucRemoveEolUnderscoreDash($_)) }
                         `locale -a` ) )
            { die "Locale = '$ENV{LANG}' or '$ucLocale'  NOT supported ".
                  "by the OS\n" }

        if  ( not defined(Encode::find_encoding($Encoding)) )
            { die "Encoding = '$Encoding' or '$ucEncoding'  NOT supported ".
                  "by PERL\n" }

        print "Encoding = '$Encoding'  is supported by the OS and PERL\n\n";
        binmode(STDOUT, ":encoding($Encoding)");
      }

#-----------------------------------------------------------------------------
#  Check consistency between parameter of 'echo' and output of 'echo'
#-----------------------------------------------------------------------------
undef $_;
if  ( defined($Encoding)  )
    {
      $ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
      if    ( defined($SuperEncodings{$ucEncoding}) )
            { $_ = substr($AccentedChars{$SuperEncodings{$ucEncoding}},
                          0x20, 0x60) }
      elsif ( defined($AccentedChars{$Encoding}) )
            { $_ = $AccentedChars{$Encoding} }
      elsif ( $Encoding =~ m/^utf-?8$/i )
            { $_ = $AccentedChars }
    }
if  ( not defined($_) )                # Chosen chars are same in 4 code pages
    { $_ = decode('cp'.$Code_Page_Central,
                  pack('C*', 0xC9, 0xD3, 0xD7, 0xDC,                    # ÉÓ×Ü
                             0xE9, 0xF3, 0xF7, 0xFC)) }                 # éó÷ü
#print  $_, "  (Parameter)\n\n";
#system 'echo', $_;
utf8::decode($_);
#print  "\n", $_, "  (Parameter after utf8::decode)\n\n";
my @EchoCommand = ( $b_Windows ?
                    "echo $_" :
                    ('echo', $_) );
#system @EchoCommand;

open(ECHO, $InputPipe, @EchoCommand)  or  die 'echo $_: ', $!;
my $Output = join('', <ECHO>);
close(ECHO);
chomp $Output;
#print "\n", $Output, "  (Output of 'echo')\n";
utf8::decode($Output);
#print "\n", $Output, "  (Output of 'echo' after utf8::decode)\n\n";

if  ( $Output ne $_ )
    {
      warn "$0: Consistency check between parameter ",
                         "of 'echo' and output of 'echo' FAILED :\n\n",
           "Parameter of 'echo' :  ", length($_), ' :  ', $_, "\n\n",
           "   Output of 'echo' :  ", length($Output), ' :  ', $Output, "\n";
      exit 1;
    }

#-----------------------------------------------------------------------------
#  Print the translation table
#-----------------------------------------------------------------------------
if  ( defined($Encoding) )
{
  undef $_;
  $ucEncoding = &ucRemoveEolUnderscoreDash($Encoding);
  if    ( defined($SuperEncodings{$ucEncoding}) )
        {
          $_ = $SuperEncodings{$ucEncoding};
          print "--------- $EncodingNames{$_} ---------\n",
                '    ', substr($AccentedChars{$_}, 0x20, 0x20), "\n",
                '--> ', substr($NonAccenChars{$_}, 0x20, 0x20), "\n\n",
                '    ', substr($AccentedChars{$_}, 0x40, 0x20), "\n",
                '--> ', substr($NonAccenChars{$_}, 0x40, 0x20), "\n\n",
                '    ', substr($AccentedChars{$_}, 0x60, 0x20), "\n",
                '--> ', substr($NonAccenChars{$_}, 0x60, 0x20), "\n\n" }
  else
  {
    for ( 'cp'.$Code_Page_Central, 'cp'.$Code_Page_Western,
          'cp'.$Code_Page_Turkish, 'cp'.$Code_Page_Baltic )
    {
      if  ( ('cp'.$Encoding eq $_) or ($Encoding =~ m/^utf-?8$/i) )
          { print "--------- $EncodingNames{$_} ---------\n",
                  '    ', substr($AccentedChars{$_},    0, 0x20), "\n",
                  '--> ', substr($NonAccenChars{$_},    0, 0x20), "\n\n",
                  '    ', substr($AccentedChars{$_}, 0x20, 0x20), "\n",
                  '--> ', substr($NonAccenChars{$_}, 0x20, 0x20), "\n\n",
                  '    ', substr($AccentedChars{$_}, 0x40, 0x20), "\n",
                  '--> ', substr($NonAccenChars{$_}, 0x40, 0x20), "\n\n",
                  '    ', substr($AccentedChars{$_}, 0x60, 0x20), "\n",
                  '--> ', substr($NonAccenChars{$_}, 0x60, 0x20), "\n\n" }
    }
  }
}

#-----------------------------------------------------------------------------
#  Completely optional :
#  Inside the Unison file, find the accented file names to ignore
#-----------------------------------------------------------------------------
my $UnisonFile = $HOME.$sep.'.unison'.$sep.'common.unison';
my @Ignores;

if  ( open(UnisonFile, '<', $UnisonFile) )
    {
      print "\nUnison File '", $UnisonFile, "'\n";
      while  ( <UnisonFile> )
      {
        if  ( m/^\s*ignore\s*=\s*Name\s*(.+)/ )
            {
              $_ = $1 ;
              if  ( m/[$AccentedChars]/ )
                  { push(@Ignores, $_) }
            }
      }
      close(UnisonFile);
    }
print map("  Ignore: ".$_."\n", @Ignores);

#-----------------------------------------------------------------------------
#  Function OutputAndErrorFromCommand :
#
#  Execute the command given as array in parameter, and return STDOUT + STDERR
#
#  Reads global variable $InputPipe
#-----------------------------------------------------------------------------
sub OutputAndErrorFromCommand
{
  local $_;
  my @Command = @_;             # Protects content of @_ from any modification
  #---------------------------------------------------------------------------
  #  Under Windows, fork fails, so :
  #  -  Enclose into double quotes parameters containing blanks or simple
  #     quotes,
  #  -  Use piped open with redirection of STDERR.
  #---------------------------------------------------------------------------
  if  ( defined($ENV{'OS'}) and ($ENV{'OS'} eq 'Windows_NT') )
      {
        for  ( @Command )
             { s/^((-|.*(\s|')).*)$/$Q$1$Q/ }
        my $Command = join('  ', @Command);
        #print "\n", $Command;
        open(COMMAND, $InputPipe, "$Command  2>&1")  or  die '$Command: ', $!;
      }
  #---------------------------------------------------------------------------
  #  Under Unix, quoting is too difficult, but fork succeeds
  #---------------------------------------------------------------------------
  else
      {
        my $pid = open(COMMAND, $InputPipe);
        defined($pid) or die "Can't fork: $!";
        if  ( $pid == 0 )                           #  Child process
            {
              open STDERR, '>&=STDOUT';
              exec @Command;                        #  Returns only on failure
              die "Can't @Command";
            }
      }
  $_ = join('', <COMMAND>);                         #  Child's STDOUT + STDERR
  close COMMAND;
  chomp;
  utf8::decode($_);
  $_;
}

#-----------------------------------------------------------------------------
#  Find recursively all files inside the current folder.
#  Verify accessibility of files with accented names.
#  Calculate non-accented file names from accented file names.
#  Build the list of duplicates.
#-----------------------------------------------------------------------------
my %Olds;                                # $Olds{$New} = [ $Old1, $Old2, ... ]
my $Old;
my $Dir;
my $Command;
my $ErrorMessage;
my $New;
my %News;

print "\n\nFiles with accented name and the corresponding non-accented name ",
      ":\n";

open(FIND, $InputPipe, $Find)  or  die $Find, ': ', $!;

FILE:
while  ( <FIND> )
{
  chomp;
  #---------------------------------------------------------------------------
  #  If the file path contains UTF-8, following instruction is MANDATORY.
  #  If the file path does NOT contain UTF-8, it should NOT hurt.
  #---------------------------------------------------------------------------
  utf8::decode($_);

  if  ( $b_Windows )
      { s/^$cwd$sep2// }
  else
      { s/^\.$sep2// }

  #---------------------------------------------------------------------------
  #  From now on :  $_ = Dir/OldFilename
  #---------------------------------------------------------------------------
  push(@{$Olds{$_}}, $_);

  if  ( m/([^$sep2]+)$/ and
        ($1 =~ m/[$AccentedChars]|([\ -\~][$DiacriticalChars])/) )
      {
        if  ( $b_Windows and m/$Q/ )
            {
              print "\n    $Q$_$Q\n***  contains quotes.\n";
              next;
            }
        for  my $Ignore  ( @Ignores )
        {
          if  ( m/$Ignore$/ )
              { next FILE }
        }
        $Old = $_ ;
        m/^(.*$sep2)?([^$sep2]+)$/;
        $Dir = ( defined($1) ? $1 : '');
        $_   = $2;

        #---------------------------------------------------------------------
        #  From now on :  $Old = Dir/OldFilename
        #                 $_   = OldFilename
        #---------------------------------------------------------------------
        print "\n    $Q$Old$Q\n";
        $ErrorMessage = &OutputAndErrorFromCommand(@List, $Old);
        if  ( $? != 0 )
            { print "*** $ErrorMessage\n" }
        else
            {
              #---------------------------------------------------------------
              #  Change accented Latin chars to non-accented chars.
              #  Remove all diacritical marks after Latin chars.
              #---------------------------------------------------------------
              eval "tr/$AccentedChars/$QuotedMetaNonAccenChars/";
              s/([\ -\~])[$DiacriticalChars]+/$1/g;
              #---------------------------------------------------------------
              #  From now on :  $Old = Dir/OldFilename
              #                 $_   = NewFilename
              #---------------------------------------------------------------
              if  ( $@ )
                  { warn $@ }
              else
                  {
                    $New = $Dir.$_;
                    if  ( $b_Windows or (not utf8::is_utf8($Dir)) )    # Weird
                        { utf8::decode($New) }                 # but necessary
                    $News{$Old} = $New;
                    push(@{$Olds{$New}}, $Old);
                  }
              print "--> $Q$Dir$_$Q\n";
            }
      }
}

close(FIND);

#-----------------------------------------------------------------------------
#  Print list of duplicate non-accented file names
#-----------------------------------------------------------------------------
my $b_NoDuplicate = 1;

for  my $New  ( sort keys %Olds )
{
  if  ( scalar(@{$Olds{$New}}) > 1 )
      {
        if  ( $b_NoDuplicate )
            {
              print "\n\nFollowing files would have same non-accented name ",
                    ":\n";
              $b_NoDuplicate = 0;
            }
        print "\n", map('    '.$_."\n", @{$Olds{$New}}), '--> ', $New, "\n";
        for  ( @{$Olds{$New}} )
             { delete $News{$_} };
      }
}

#-----------------------------------------------------------------------------
#  If there are NO file to rename, then exit
#-----------------------------------------------------------------------------
my $Number = scalar(keys %News);

print "\n\n";
if  ( $Number < 1 )
    {
      print "There are NO file to rename\n";
      exit;
    }

#-----------------------------------------------------------------------------
#  Ask the user for global approval of renaming
#-----------------------------------------------------------------------------
if  ( $b_Interactive )
    {
      print "In order to really rename the ", $Number,
            " files which can safely be renamed, type 'rename' :  ";
      $_ = <STDIN>;
      sleep 1;                       # Gives time to PERL to handle interrupts
      if  ( not m/^rename$/i )
          { exit 1 }
    }
else
    { print $Number, " files will be renamed\n\n" }

#-----------------------------------------------------------------------------
#  Rename accented file names sorted descending by name size
#-----------------------------------------------------------------------------
$Number  = 0;
my $Move = join('  ', @Move);

for  ( sort {length($b) <=> length($a)} keys %News )
{
  $ErrorMessage = &OutputAndErrorFromCommand(@Move, $_, $News{$_});
  if  ( $? == 0 )
      { $Number++ }
  else
      { print "\n$Move  $Q$_$Q\n", (' ' x length($Move)),
        "  $Q$News{$_}$Q\n", ('*' x length($Move)), "  $ErrorMessage\n" }
}
print "\n$Number files have been successfully renamed\n";

__END__