如何将mbox文件中的邮件转换为UTF-8?

时间:2022-06-14 20:12:23

I am trying to modify the below program to ensure each msg is converted to utf-8 using Encode::decode(), but I am unsure of how and where to place this to make it work.

我试图修改下面的程序,以确保使用Encode :: decode()将每个msg转换为utf-8,但我不确定如何以及在何处放置它以使其工作。

#!/usr/bin/perl
use warnings;
use strict;
use Mail::Box::Manager;

open (MYFILE, '>>data.txt');
binmode(MYFILE, ':encoding(UTF-8)');


my $file = shift || $ENV{MAIL};
my $mgr = Mail::Box::Manager->new(
    access          => 'r',
);

my $folder = $mgr->open( folder => $file )
or die "$file: Unable to open: $!\n";

for my $msg ( sort { $a->timestamp <=> $b->timestamp } $folder->messages)
{
    my $to          = join( ', ', map { $_->format } $msg->to );
    my $from        = join( ', ', map { $_->format } $msg->from );
    my $date        = localtime( $msg->timestamp );
    my $subject     = $msg->subject;
    my $body        = $msg->decoded->string;

    # Strip all quoted text
    $body =~ s/^>.*$//msg;

    print MYFILE <<"";
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body

}

2 个解决方案

#1


0  

Nothing in the script seems to be specifying what encoding you expect the input to be in... normally that's important since auto-detection of character encodings in hard (and not usually supported by encoding libraries).

脚本中的任何内容似乎都没有指定您期望输入的编码...通常这很重要,因为在硬编码中自动检测字符编码(通常不受编码库支持)。

#2


0  

From the documentation I suspect you want to replace

从文档我怀疑你想要替换

my $body        = $msg->decoded->string;

with

my $body        = $msg->decoded('UTF-8')->string;

Though I'm not completely sure and it may not matter at all.

虽然我不完全确定,但根本不重要。

#1


0  

Nothing in the script seems to be specifying what encoding you expect the input to be in... normally that's important since auto-detection of character encodings in hard (and not usually supported by encoding libraries).

脚本中的任何内容似乎都没有指定您期望输入的编码...通常这很重要,因为在硬编码中自动检测字符编码(通常不受编码库支持)。

#2


0  

From the documentation I suspect you want to replace

从文档我怀疑你想要替换

my $body        = $msg->decoded->string;

with

my $body        = $msg->decoded('UTF-8')->string;

Though I'm not completely sure and it may not matter at all.

虽然我不完全确定,但根本不重要。