Python:在base64解码时忽略“不正确的填充”错误

时间:2021-07-06 18:33:47

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it. If I use

我有一些用base64编码的数据,我想把它们转换回二进制,即使其中存在填充错误。如果我使用

base64.decodestring(b64_string)

it raises an 'Incorrect padding' error. Is there another way?

它会引发一个“不正确的填充”错误。有另一种方式吗?

UPDATE: Thanks for all the feedback. To be honest, all the methods mentioned sounded a bit hit and miss so I decided to try openssl. The following command worked a treat:

更新:谢谢大家的反馈。说实话,上面提到的所有方法听起来都有点出格,所以我决定试试openssl。以下命令进行了处理:

openssl enc -d -base64 -in b64string -out binary_data

12 个解决方案

#1


63  

As said in other responses, there are various ways in which base64 data could be corrupted.

正如在其他响应中所说,base64数据可能被破坏的方式有很多。

However, as Wikipedia says, removing the padding (the '=' characters at the end of base64 encoded data) is "lossless":

然而,正如*所说,删除填充(base64编码数据末尾的'='字符)是“无损的”:

From a theoretical point of view, the padding character is not needed, since the number of missing bytes can be calculated from the number of Base64 digits.

从理论上来说,不需要填充字符,因为丢失的字节数可以从Base64位数字中计算出来。

So if this is really the only thing "wrong" with your base64 data, the padding can just be added back. I came up with this to be able to parse "data" URLs in WeasyPrint, some of which were base64 without padding:

因此,如果这是base64数据中唯一“错误”的地方,那么填充就可以添加回来。我想出这个来是为了能够解析“数据”url在黄鼠狼的印记,其中一些是base64,没有填充:

def decode_base64(data):
    """Decode base64, padding being optional.

    :param data: Base64 data as an ASCII byte string
    :returns: The decoded byte string.

    """
    missing_padding = len(data) % 4
    if missing_padding != 0:
        data += b'='* (4 - missing_padding)
    return base64.decodestring(data)

Tests for this function: weasyprint/tests/test_css.py#L68

这个函数的测试:鼬鼠打印/测试/test_css.py#L68

#2


21  

If there's a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character (=) yourself to make the string a multiple of four, but it should already have that unless something is wrong

如果有一个填充错误,它可能意味着你的字符串被损坏了;base64编码的字符串应该有4个长度的倍数。您可以尝试自己添加填充字符(=),使字符串成为4的倍数,但除非出现问题,否则它应该已经有了这个值

#3


21  

Just add padding as required. Heed Michael's warning, however.

只要按要求添加填充即可。然而,注意迈克尔的警告。

b64_string += "=" * ((4 - len(b64_string) % 4) % 4) #ugh

#4


21  

"Incorrect padding" can mean not only "missing padding" but also (believe it or not) "incorrect padding".

“不正确的填充”不仅意味着“缺少填充”,而且(信不信由你)“不正确的填充”。

If suggested "adding padding" methods don't work, try removing some trailing bytes:

如果建议“添加填充”方法不起作用,请尝试删除一些尾随字节:

lens = len(strg)
lenx = lens - (lens % 4 if lens % 4 else 4)
try:
    result = base64.decodestring(strg[:lenx])
except etc

Update: Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace, otherwise length calculations will be upset.

更新:在添加填充或从末尾删除可能有错误的字节时,应该在删除任何空格后进行修改,否则长度计算会受到影响。

It would be a good idea if you showed us a (short) sample of the data that you need to recover. Edit your question and copy/paste the result of print repr(sample).

如果你给我们看一份你需要恢复的数据样本,这是个好主意。编辑你的问题,复制/粘贴打印回复(样本)的结果。

Update 2: It is possible that the encoding has been done in an url-safe manner. If this is the case, you will be able to see minus and underscore characters in your data, and you should be able to decode it by using base64.b64decode(strg, '-_')

更新2:编码可能是以url安全的方式进行的。如果是这样,您将能够在数据中看到减号和下划线字符,并且应该能够使用base64对其进行解码。b64decode(strg ' _ ')

If you can't see minus and underscore characters in your data, but can see plus and slash characters, then you have some other problem, and may need the add-padding or remove-cruft tricks.

如果您不能在数据中看到减号和下划线,但是可以看到加号和斜杠字符,那么您就会遇到其他问题,并且可能需要这些附加的填充或远程cruft技巧。

If you can see none of minus, underscore, plus and slash in your data, then you need to determine the two alternate characters; they'll be the ones that aren't in [A-Za-z0-9]. Then you'll need to experiment to see which order they need to be used in the 2nd arg of base64.b64decode()

如果您在数据中看不到减、下划线、加号和斜线,那么您需要确定这两个替代字符;它们不会出现在[A-Za-z0-9]中。然后您需要进行实验,看看在base64.b64decode()的第2个arg中需要使用哪个顺序

Update 3: If your data is "company confidential":
(a) you should say so up front
(b) we can explore other avenues in understanding the problem, which is highly likely to be related to what characters are used instead of + and / in the encoding alphabet, or by other formatting or extraneous characters.

更新3:如果您的数据是“公司机密”:(a)(b)你应该这么说前面我们可以探索其他途径在理解问题,这极有可能与字符而不是使用+和/编码字母,或其他格式或无关的字符。

One such avenue would be to examine what non-"standard" characters are in your data, e.g.

其中一种方法是检查数据中哪些非“标准”字符,例如。

from collections import defaultdict
d = defaultdict(int)
import string
s = set(string.ascii_letters + string.digits)
for c in your_data:
   if c not in s:
      d[c] += 1
print d

#5


13  

Use

使用

string += '=' * (-len(string) % 4)  # restore stripped '='s

Credit goes to a comment somewhere here.

信贷到这里的某个地方。

>>> import base64

>>> enc = base64.b64encode('1')

>>> enc
>>> 'MQ=='

>>> base64.b64decode(enc)
>>> '1'

>>> enc = enc.rstrip('=')

>>> enc
>>> 'MQ'

>>> base64.b64decode(enc)
...
TypeError: Incorrect padding

>>> base64.b64decode(enc + '=' * (-len(enc) % 4))
>>> '1'

>>> 

#6


2  

Check the documentation of the data source you're trying to decode. Is it possible that you meant to use base64.urlsafe_b64decode(s) instead of base64.b64decode(s)? That's one reason you might have seen this error message.

检查要解码的数据源的文档。你的意思是用base64.urlsafe_b64decode(s)而不是base64.b64decode(s)吗?这就是您可能看到这个错误消息的原因之一。

Decode string s using a URL-safe alphabet, which substitutes - instead of + and _ instead of / in the standard Base64 alphabet.

解码字符串s使用的是一个url安全的字母,代替了+和_而不是标准的Base64字母表。

This is for example the case for various Google APIs, like Google's Identity Toolkit and Gmail payloads.

例如,对于各种谷歌api,比如谷歌的Identity Toolkit和Gmail有效负载。

#7


1  

I don't have the rep to comment, but a nice thing to note is that (at least in Python 3.x) base64.b64decode will truncate any extra padding provided there is enough in the first place.

我没有评论的资格,但是值得注意的是(至少在Python 3.x中)base64。b64decode首先会截断任何额外的衬垫,前提是有足够的衬垫。

So, something like: b'abc=' works just as well as b'abc=='.

例如:b'abc='和b'abc=='一样有效。

What this means is that you can just add the maximum number of padding characters that you would ever need—which is three (b'===')—and base64 will truncate any unnecessary ones.

这意味着您可以只添加您所需要的填充字符的最大数目,即3 (b'== '),而base64将截断任何不必要的字符。

Basically: base64.b64decode(s + b'===') is cleaner than base64.b64decode(s + b'=' * (-len(s) % 4)).

基本上:base64。b64decode(s + b'== ')比base64更干净。b64decode(s + b'=' * (-len(s) % 4))

#8


0  

Simply add additional characters like "=" or any other and make it a multiple of 4 before you try decoding the target string value. Something like;

在尝试解码目标字符串值之前,只需添加“=”或其他任何字符,使其为4的倍数。类似的;

if len(value) % 4 != 0: #check if multiple of 4
    while len(value) % 4 != 0:
        value = value + "="
    req_str = base64.b64decode(value)
else:
    req_str = base64.b64decode(value)

#9


0  

Adding the padding is rather... fiddly. Here's the function I wrote with the help of the comments in this thread as well as the wiki page for base64 (it's surprisingly helpful) https://en.wikipedia.org/wiki/Base64#Padding.

添加填充是相当……繁琐。下面是我在这个线程中注释的帮助下编写的函数,以及base64的wiki页面(非常有用)https://en.wikipedia.org/wiki/Base64#填充。

import logging
import base64
def base64_decode(s):
    """Add missing padding to string and return the decoded base64 string."""
    log = logging.getLogger()
    s = str(s).strip()
    try:
        return base64.b64decode(s)
    except TypeError:
        padding = len(s) % 4
        if padding == 1:
            log.error("Invalid base64 string: {}".format(s))
            return ''
        elif padding == 2:
            s += b'=='
        elif padding == 3:
            s += b'='
        return base64.b64decode(s)

#10


0  

In case this error came from a web server: Try url encoding your post value. I was POSTing via "curl" and discovered I wasn't url-encoding my base64 value so characters like "+" were not escaped so the web server url-decode logic automatically ran url-decode and converted + to spaces.

如果这个错误来自web服务器:请尝试对post值进行url编码。我是通过“curl”发布的,发现我不是在url编码我的base64值,所以像“+”这样的字符没有被转义,所以web服务器的url解码逻辑自动运行url解码并将+转换为空格。

"+" is a valid base64 character and perhaps the only character which gets mangled by an unexpected url-decode.

“+”是一个有效的base64字符,可能是唯一一个被意想不到的url解码破坏的字符。

#11


0  

In my case I faced that error while parsing an email. I got the attachment as base64 string and extract it via re.search. Eventually there was a strange additional substring at the end.

在我的例子中,我在解析电子邮件时遇到了这个错误。我将附件作为base64字符串,并通过re.search提取它。最后有一个奇怪的额外子字符串在结尾。

dHJhaWxlcgo8PCAvU2l6ZSAxNSAvUm9vdCAxIDAgUiAvSW5mbyAyIDAgUgovSUQgWyhcMDAyXDMz
MHtPcFwyNTZbezU/VzheXDM0MXFcMzExKShcMDAyXDMzMHtPcFwyNTZbezU/VzheXDM0MXFcMzEx
KV0KPj4Kc3RhcnR4cmVmCjY3MDEKJSVFT0YK

--_=ic0008m4wtZ4TqBFd+sXC8--

When I deleted --_=ic0008m4wtZ4TqBFd+sXC8-- and strip the string then parsing was fixed up.

当我删除-_=ic0008m4wtZ4TqBFd+sXC8-并删除字符串时,解析被修复了。

So my advise is make sure that you are decoding a correct base64 string.

所以我的建议是确保您正在解码一个正确的base64字符串。

#12


0  

You should use

你应该使用

base64.b64decode(b64_string, ' /')

By default, the altchars are '+/'.

默认情况下,牵牛星是“+/”。

#1


63  

As said in other responses, there are various ways in which base64 data could be corrupted.

正如在其他响应中所说,base64数据可能被破坏的方式有很多。

However, as Wikipedia says, removing the padding (the '=' characters at the end of base64 encoded data) is "lossless":

然而,正如*所说,删除填充(base64编码数据末尾的'='字符)是“无损的”:

From a theoretical point of view, the padding character is not needed, since the number of missing bytes can be calculated from the number of Base64 digits.

从理论上来说,不需要填充字符,因为丢失的字节数可以从Base64位数字中计算出来。

So if this is really the only thing "wrong" with your base64 data, the padding can just be added back. I came up with this to be able to parse "data" URLs in WeasyPrint, some of which were base64 without padding:

因此,如果这是base64数据中唯一“错误”的地方,那么填充就可以添加回来。我想出这个来是为了能够解析“数据”url在黄鼠狼的印记,其中一些是base64,没有填充:

def decode_base64(data):
    """Decode base64, padding being optional.

    :param data: Base64 data as an ASCII byte string
    :returns: The decoded byte string.

    """
    missing_padding = len(data) % 4
    if missing_padding != 0:
        data += b'='* (4 - missing_padding)
    return base64.decodestring(data)

Tests for this function: weasyprint/tests/test_css.py#L68

这个函数的测试:鼬鼠打印/测试/test_css.py#L68

#2


21  

If there's a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character (=) yourself to make the string a multiple of four, but it should already have that unless something is wrong

如果有一个填充错误,它可能意味着你的字符串被损坏了;base64编码的字符串应该有4个长度的倍数。您可以尝试自己添加填充字符(=),使字符串成为4的倍数,但除非出现问题,否则它应该已经有了这个值

#3


21  

Just add padding as required. Heed Michael's warning, however.

只要按要求添加填充即可。然而,注意迈克尔的警告。

b64_string += "=" * ((4 - len(b64_string) % 4) % 4) #ugh

#4


21  

"Incorrect padding" can mean not only "missing padding" but also (believe it or not) "incorrect padding".

“不正确的填充”不仅意味着“缺少填充”,而且(信不信由你)“不正确的填充”。

If suggested "adding padding" methods don't work, try removing some trailing bytes:

如果建议“添加填充”方法不起作用,请尝试删除一些尾随字节:

lens = len(strg)
lenx = lens - (lens % 4 if lens % 4 else 4)
try:
    result = base64.decodestring(strg[:lenx])
except etc

Update: Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace, otherwise length calculations will be upset.

更新:在添加填充或从末尾删除可能有错误的字节时,应该在删除任何空格后进行修改,否则长度计算会受到影响。

It would be a good idea if you showed us a (short) sample of the data that you need to recover. Edit your question and copy/paste the result of print repr(sample).

如果你给我们看一份你需要恢复的数据样本,这是个好主意。编辑你的问题,复制/粘贴打印回复(样本)的结果。

Update 2: It is possible that the encoding has been done in an url-safe manner. If this is the case, you will be able to see minus and underscore characters in your data, and you should be able to decode it by using base64.b64decode(strg, '-_')

更新2:编码可能是以url安全的方式进行的。如果是这样,您将能够在数据中看到减号和下划线字符,并且应该能够使用base64对其进行解码。b64decode(strg ' _ ')

If you can't see minus and underscore characters in your data, but can see plus and slash characters, then you have some other problem, and may need the add-padding or remove-cruft tricks.

如果您不能在数据中看到减号和下划线,但是可以看到加号和斜杠字符,那么您就会遇到其他问题,并且可能需要这些附加的填充或远程cruft技巧。

If you can see none of minus, underscore, plus and slash in your data, then you need to determine the two alternate characters; they'll be the ones that aren't in [A-Za-z0-9]. Then you'll need to experiment to see which order they need to be used in the 2nd arg of base64.b64decode()

如果您在数据中看不到减、下划线、加号和斜线,那么您需要确定这两个替代字符;它们不会出现在[A-Za-z0-9]中。然后您需要进行实验,看看在base64.b64decode()的第2个arg中需要使用哪个顺序

Update 3: If your data is "company confidential":
(a) you should say so up front
(b) we can explore other avenues in understanding the problem, which is highly likely to be related to what characters are used instead of + and / in the encoding alphabet, or by other formatting or extraneous characters.

更新3:如果您的数据是“公司机密”:(a)(b)你应该这么说前面我们可以探索其他途径在理解问题,这极有可能与字符而不是使用+和/编码字母,或其他格式或无关的字符。

One such avenue would be to examine what non-"standard" characters are in your data, e.g.

其中一种方法是检查数据中哪些非“标准”字符,例如。

from collections import defaultdict
d = defaultdict(int)
import string
s = set(string.ascii_letters + string.digits)
for c in your_data:
   if c not in s:
      d[c] += 1
print d

#5


13  

Use

使用

string += '=' * (-len(string) % 4)  # restore stripped '='s

Credit goes to a comment somewhere here.

信贷到这里的某个地方。

>>> import base64

>>> enc = base64.b64encode('1')

>>> enc
>>> 'MQ=='

>>> base64.b64decode(enc)
>>> '1'

>>> enc = enc.rstrip('=')

>>> enc
>>> 'MQ'

>>> base64.b64decode(enc)
...
TypeError: Incorrect padding

>>> base64.b64decode(enc + '=' * (-len(enc) % 4))
>>> '1'

>>> 

#6


2  

Check the documentation of the data source you're trying to decode. Is it possible that you meant to use base64.urlsafe_b64decode(s) instead of base64.b64decode(s)? That's one reason you might have seen this error message.

检查要解码的数据源的文档。你的意思是用base64.urlsafe_b64decode(s)而不是base64.b64decode(s)吗?这就是您可能看到这个错误消息的原因之一。

Decode string s using a URL-safe alphabet, which substitutes - instead of + and _ instead of / in the standard Base64 alphabet.

解码字符串s使用的是一个url安全的字母,代替了+和_而不是标准的Base64字母表。

This is for example the case for various Google APIs, like Google's Identity Toolkit and Gmail payloads.

例如,对于各种谷歌api,比如谷歌的Identity Toolkit和Gmail有效负载。

#7


1  

I don't have the rep to comment, but a nice thing to note is that (at least in Python 3.x) base64.b64decode will truncate any extra padding provided there is enough in the first place.

我没有评论的资格,但是值得注意的是(至少在Python 3.x中)base64。b64decode首先会截断任何额外的衬垫,前提是有足够的衬垫。

So, something like: b'abc=' works just as well as b'abc=='.

例如:b'abc='和b'abc=='一样有效。

What this means is that you can just add the maximum number of padding characters that you would ever need—which is three (b'===')—and base64 will truncate any unnecessary ones.

这意味着您可以只添加您所需要的填充字符的最大数目,即3 (b'== '),而base64将截断任何不必要的字符。

Basically: base64.b64decode(s + b'===') is cleaner than base64.b64decode(s + b'=' * (-len(s) % 4)).

基本上:base64。b64decode(s + b'== ')比base64更干净。b64decode(s + b'=' * (-len(s) % 4))

#8


0  

Simply add additional characters like "=" or any other and make it a multiple of 4 before you try decoding the target string value. Something like;

在尝试解码目标字符串值之前,只需添加“=”或其他任何字符,使其为4的倍数。类似的;

if len(value) % 4 != 0: #check if multiple of 4
    while len(value) % 4 != 0:
        value = value + "="
    req_str = base64.b64decode(value)
else:
    req_str = base64.b64decode(value)

#9


0  

Adding the padding is rather... fiddly. Here's the function I wrote with the help of the comments in this thread as well as the wiki page for base64 (it's surprisingly helpful) https://en.wikipedia.org/wiki/Base64#Padding.

添加填充是相当……繁琐。下面是我在这个线程中注释的帮助下编写的函数,以及base64的wiki页面(非常有用)https://en.wikipedia.org/wiki/Base64#填充。

import logging
import base64
def base64_decode(s):
    """Add missing padding to string and return the decoded base64 string."""
    log = logging.getLogger()
    s = str(s).strip()
    try:
        return base64.b64decode(s)
    except TypeError:
        padding = len(s) % 4
        if padding == 1:
            log.error("Invalid base64 string: {}".format(s))
            return ''
        elif padding == 2:
            s += b'=='
        elif padding == 3:
            s += b'='
        return base64.b64decode(s)

#10


0  

In case this error came from a web server: Try url encoding your post value. I was POSTing via "curl" and discovered I wasn't url-encoding my base64 value so characters like "+" were not escaped so the web server url-decode logic automatically ran url-decode and converted + to spaces.

如果这个错误来自web服务器:请尝试对post值进行url编码。我是通过“curl”发布的,发现我不是在url编码我的base64值,所以像“+”这样的字符没有被转义,所以web服务器的url解码逻辑自动运行url解码并将+转换为空格。

"+" is a valid base64 character and perhaps the only character which gets mangled by an unexpected url-decode.

“+”是一个有效的base64字符,可能是唯一一个被意想不到的url解码破坏的字符。

#11


0  

In my case I faced that error while parsing an email. I got the attachment as base64 string and extract it via re.search. Eventually there was a strange additional substring at the end.

在我的例子中,我在解析电子邮件时遇到了这个错误。我将附件作为base64字符串,并通过re.search提取它。最后有一个奇怪的额外子字符串在结尾。

dHJhaWxlcgo8PCAvU2l6ZSAxNSAvUm9vdCAxIDAgUiAvSW5mbyAyIDAgUgovSUQgWyhcMDAyXDMz
MHtPcFwyNTZbezU/VzheXDM0MXFcMzExKShcMDAyXDMzMHtPcFwyNTZbezU/VzheXDM0MXFcMzEx
KV0KPj4Kc3RhcnR4cmVmCjY3MDEKJSVFT0YK

--_=ic0008m4wtZ4TqBFd+sXC8--

When I deleted --_=ic0008m4wtZ4TqBFd+sXC8-- and strip the string then parsing was fixed up.

当我删除-_=ic0008m4wtZ4TqBFd+sXC8-并删除字符串时,解析被修复了。

So my advise is make sure that you are decoding a correct base64 string.

所以我的建议是确保您正在解码一个正确的base64字符串。

#12


0  

You should use

你应该使用

base64.b64decode(b64_string, ' /')

By default, the altchars are '+/'.

默认情况下,牵牛星是“+/”。