如何确定文本是否为西里尔字符?

时间:2021-09-11 18:17:03

My junk mail folder has been filling up with messages composed in what appears to be the Cyrillic alphabet. If a message body or a message subject is in Cyrillic, I want to permanently delete it.

我的垃圾邮件文件夹已经填满了看似西里尔字母组成的邮件。如果邮件正文或邮件主题是西里尔文,我想永久删除它。

On my screen I see Cyrillic characters, but when I iterate through the messages in VBA within Outlook, the "Subject" property of the message returns question marks.

在我的屏幕上,我看到了西里尔字符,但是当我在Outlook中的VBA中迭代消息时,消息的“主题”属性返回问号。

How can I determine if the subject of the message is in Cyrillic characters?

如何确定邮件主题是否为西里尔字符?

(Note: I have examined the "InternetCodepage" property - it's usually Western European.)

(注意:我已经检查了“InternetCodepage”属性 - 它通常是西欧的。)

3 个解决方案

#1


3  

The String datatype in VB/VBA can handle Unicode characters, but the IDE itself has trouble displaying them (hence the question marks).

VB / VBA中的String数据类型可以处理Unicode字符,但IDE本身无法显示它们(因此出现问号)。

I wrote an IsCyrillic function that might help you out. The function takes a single String argument and returns True if the string contains at least one Cyrillic character. I tested this code with Outlook 2007 and it seems to work fine. To test it, I sent myself a few e-mails with Cyrillic text in the subject line and verified that my test code could correctly pick out those e-mails from among everything else in my Inbox.

我写了一个可能帮助你的IsCyrillic函数。该函数采用单个String参数,如果字符串包含至少一个Cyrillic字符,则返回True。我使用Outlook 2007测试了此代码,它似乎工作正常。为了测试它,我在主题行中发送了一些带有西里尔文本的电子邮件,并验证我的测试代码可以正确地从我的收件箱中的其他所有电子邮件中挑选出来。

So, I actually have two code snippets:

所以,我实际上有两个代码片段:

  • The code that contains the IsCyrillic function. This can be copy-pasted into a new VBA module or added to the code you already have.
  • 包含IsCyrillic函数的代码。这可以复制粘贴到新的VBA模块中,也可以添加到您已有的代码中。

  • The Test routine I wrote (in Outlook VBA) to test that the code actually works. It demonstrates how to use the IsCyrillic function.
  • 我写的测试例程(在Outlook VBA中)测试代码实际工作。它演示了如何使用IsCyrillic函数。

The Code

Option Explicit

Public Const errInvalidArgument = 5

' Returns True if sText contains at least one Cyrillic character'
' NOTE: Assumes UTF-16 encoding'

Public Function IsCyrillic(ByVal sText As String) As Boolean

    Dim i As Long

    ' Loop through each char. If we hit a Cryrillic char, return True.'

    For i = 1 To Len(sText)

        If IsCharCyrillic(Mid(sText, i, 1)) Then
            IsCyrillic = True
            Exit Function
        End If

    Next

End Function

' Returns True if the given character is part of the Cyrillic alphabet'
' NOTE: Assumes UTF-16 encoding'

Private Function IsCharCyrillic(ByVal sChar As String) As Boolean

    ' According to the first few Google pages I found, '
    ' Cyrillic is stored at U+400-U+52f                '

    Const CYRILLIC_START As Integer = &H400
    Const CYRILLIC_END  As Integer = &H52F

    ' A (valid) single Unicode char will be two bytes long'

    If LenB(sChar) <> 2 Then
        Err.Raise errInvalidArgument, _
            "IsCharCyrillic", _
            "sChar must be a single Unicode character"
    End If

    ' Get Unicode value of character'

    Dim nCharCode As Integer
    nCharCode = AscW(sChar)

    ' Is char code in the range of the Cyrillic characters?'

    If (nCharCode >= CYRILLIC_START And nCharCode <= CYRILLIC_END) Then
        IsCharCyrillic = True
    End If

End Function


Example Usage

' On my box, this code iterates through my Inbox. On your machine,'
' you may have to switch to your Inbox in Outlook before running this code.'
' I placed this code in `ThisOutlookSession` in the VBA editor. I called'
' it in the Immediate window by typing `ThisOutlookSession.TestIsCyrillic`'

Public Sub TestIsCyrillic()

    Dim oItem As Object
    Dim oMailItem As MailItem

    For Each oItem In ThisOutlookSession.ActiveExplorer.CurrentFolder.Items

        If TypeOf oItem Is MailItem Then

            Set oMailItem = oItem

            If IsCyrillic(oMailItem.Subject) Then

                ' I just printed out the offending subject line '
                ' (it will display as ? marks, but I just       '
                ' wanted to see it output something)            '
                ' In your case, you could change this line to:  '
                '                                               '
                '     oMailItem.Delete                          '
                '                                               '
                ' to actually delete the message                '

                Debug.Print oMailItem.Subject

            End If

        End If

    Next

End Sub

#2


0  

the "Subject" property of the message returns a bunch of question marks.

消息的“Subject”属性返回一堆问号。

A classic string encoding problem. Sounds like that property is returning ASCII but you want UTF-8 or Unicode.

经典的字符串编码问题。听起来这个属性正在返回ASCII,但你需要UTF-8或Unicode。

#3


0  

It seems to me you have an easy solution already - just look for any subject line with (say) 5 question marks in it

在我看来,你已经有一个简单的解决方案 - 只需查找任何主题行(例如)中的5个问号

#1


3  

The String datatype in VB/VBA can handle Unicode characters, but the IDE itself has trouble displaying them (hence the question marks).

VB / VBA中的String数据类型可以处理Unicode字符,但IDE本身无法显示它们(因此出现问号)。

I wrote an IsCyrillic function that might help you out. The function takes a single String argument and returns True if the string contains at least one Cyrillic character. I tested this code with Outlook 2007 and it seems to work fine. To test it, I sent myself a few e-mails with Cyrillic text in the subject line and verified that my test code could correctly pick out those e-mails from among everything else in my Inbox.

我写了一个可能帮助你的IsCyrillic函数。该函数采用单个String参数,如果字符串包含至少一个Cyrillic字符,则返回True。我使用Outlook 2007测试了此代码,它似乎工作正常。为了测试它,我在主题行中发送了一些带有西里尔文本的电子邮件,并验证我的测试代码可以正确地从我的收件箱中的其他所有电子邮件中挑选出来。

So, I actually have two code snippets:

所以,我实际上有两个代码片段:

  • The code that contains the IsCyrillic function. This can be copy-pasted into a new VBA module or added to the code you already have.
  • 包含IsCyrillic函数的代码。这可以复制粘贴到新的VBA模块中,也可以添加到您已有的代码中。

  • The Test routine I wrote (in Outlook VBA) to test that the code actually works. It demonstrates how to use the IsCyrillic function.
  • 我写的测试例程(在Outlook VBA中)测试代码实际工作。它演示了如何使用IsCyrillic函数。

The Code

Option Explicit

Public Const errInvalidArgument = 5

' Returns True if sText contains at least one Cyrillic character'
' NOTE: Assumes UTF-16 encoding'

Public Function IsCyrillic(ByVal sText As String) As Boolean

    Dim i As Long

    ' Loop through each char. If we hit a Cryrillic char, return True.'

    For i = 1 To Len(sText)

        If IsCharCyrillic(Mid(sText, i, 1)) Then
            IsCyrillic = True
            Exit Function
        End If

    Next

End Function

' Returns True if the given character is part of the Cyrillic alphabet'
' NOTE: Assumes UTF-16 encoding'

Private Function IsCharCyrillic(ByVal sChar As String) As Boolean

    ' According to the first few Google pages I found, '
    ' Cyrillic is stored at U+400-U+52f                '

    Const CYRILLIC_START As Integer = &H400
    Const CYRILLIC_END  As Integer = &H52F

    ' A (valid) single Unicode char will be two bytes long'

    If LenB(sChar) <> 2 Then
        Err.Raise errInvalidArgument, _
            "IsCharCyrillic", _
            "sChar must be a single Unicode character"
    End If

    ' Get Unicode value of character'

    Dim nCharCode As Integer
    nCharCode = AscW(sChar)

    ' Is char code in the range of the Cyrillic characters?'

    If (nCharCode >= CYRILLIC_START And nCharCode <= CYRILLIC_END) Then
        IsCharCyrillic = True
    End If

End Function


Example Usage

' On my box, this code iterates through my Inbox. On your machine,'
' you may have to switch to your Inbox in Outlook before running this code.'
' I placed this code in `ThisOutlookSession` in the VBA editor. I called'
' it in the Immediate window by typing `ThisOutlookSession.TestIsCyrillic`'

Public Sub TestIsCyrillic()

    Dim oItem As Object
    Dim oMailItem As MailItem

    For Each oItem In ThisOutlookSession.ActiveExplorer.CurrentFolder.Items

        If TypeOf oItem Is MailItem Then

            Set oMailItem = oItem

            If IsCyrillic(oMailItem.Subject) Then

                ' I just printed out the offending subject line '
                ' (it will display as ? marks, but I just       '
                ' wanted to see it output something)            '
                ' In your case, you could change this line to:  '
                '                                               '
                '     oMailItem.Delete                          '
                '                                               '
                ' to actually delete the message                '

                Debug.Print oMailItem.Subject

            End If

        End If

    Next

End Sub

#2


0  

the "Subject" property of the message returns a bunch of question marks.

消息的“Subject”属性返回一堆问号。

A classic string encoding problem. Sounds like that property is returning ASCII but you want UTF-8 or Unicode.

经典的字符串编码问题。听起来这个属性正在返回ASCII,但你需要UTF-8或Unicode。

#3


0  

It seems to me you have an easy solution already - just look for any subject line with (say) 5 question marks in it

在我看来,你已经有一个简单的解决方案 - 只需查找任何主题行(例如)中的5个问号