s3 url -获取桶名和路径

时间:2021-12-25 10:44:51

I have a variable which has the aws s3 url

我有一个变量,它有aws s3 url。

s3://bucket_name/folder1/folder2/file1.json

I want to get the bucket_name in a variables and rest i.e /folder1/folder2/file1.json in another variable. I tried the regular expressions and could get the bucket_name like below, not sure if there is a better way.

我想在变量中获取bucket_name,然后rest I。e / folder1 / folder2 / file1。json在另一个变量。我尝试了正则表达式,可以得到如下所示的bucket_name,不确定是否有更好的方法。

m = re.search('(?<=s3:\/\/)[^\/]+', 's3://bucket_name/folder1/folder2/file1.json')
print(m.group(0))

How do I get the rest i.e - folder1/folder2/file1.json ?

我如何得到剩下的。e - folder1 / folder2 / file1。json吗?

I have checked if there is a boto3 feature to extract the bucket_name and key from the url, but couldn't find it.

我检查了是否有boto3功能从url中提取bucket_name和key,但是没有找到。

4 个解决方案

#1


15  

Since it's just a normal URL, you can use urlparse to get all the parts of the URL.

因为它只是一个普通的URL,所以您可以使用urlparse来获取URL的所有部分。

>>> from urlparse import urlparse
>>> o = urlparse('s3://bucket_name/folder1/folder2/file1.json')
>>> o
ParseResult(scheme='s3', netloc='bucket_name', path='/folder1/folder2/file1.json', params='', query='', fragment='')
>>> o.netloc
'bucket_name'
>>> o.path
'/folder1/folder2/file1.json'

You may have to remove the beginning slash from the key as the next answer suggests.

正如下一个答案所示,您可能必须从键中删除开始的斜杠。

o.path.lstrip('/')

With Python 3 urlparse moved to urllib.parse so use:

使用Python 3 urlparse就将urllib迁移到urllib。解析这样使用:

from urllib.parse import urlparse

#2


4  

For those who like me was trying to use urlparse to extract key and bucket in order to create object with boto3. There's one important detail: remove slash from the beginning of the key

对于像我这样的人来说,他们试图使用urlparse来提取键和bucket,以便用boto3创建对象。有一个重要的细节:从键的开始删除斜线

from urlparse import urlparse
o = urlparse('s3://bucket_name/folder1/folder2/file1.json')
bucket = o.netloc
boto3.client('s3')
client.put_object(Body='test', Bucket=bucket, Key=key.lstrip('/'))

It took a while to realize that because boto3 doesn't throw any exception.

花了一段时间才意识到这一点,因为boto3没有抛出任何异常。

#3


2  

If you want to do it with regular expressions, you can do the following:

如果你想用正则表达式来做,你可以做以下事情:

>>> import re
>>> uri = 's3://my-bucket/my-folder/my-object.png'
>>> match = re.match(r's3:\/\/(.+?)\/(.+)', uri)
>>> match.group(1)
'my-bucket'
>>> match.group(2)
'my-folder/my-object.png'

This has the advantage that you can check for the s3 scheme rather than allowing anything there.

这样做的好处是,您可以检查s3方案,而不允许有任何内容。

#4


2  

A solution that works without urllib or re (also handles preceding slash):

不使用urllib或re(也处理前斜线)的解决方案:

def split_s3_path(s3_path):
    path_parts=s3_path.replace("s3://","").split("/")
    bucket=path_parts.pop(0)
    key="/".join(path_parts)
    return bucket, key

To run:

运行:

bucket, key = split_s3_path("s3://my-bucket/some_folder/another_folder/my_file.txt")

Returns:

返回:

bucket: my-bucket
key: some_folder/another_folder/my_file.txt

#1


15  

Since it's just a normal URL, you can use urlparse to get all the parts of the URL.

因为它只是一个普通的URL,所以您可以使用urlparse来获取URL的所有部分。

>>> from urlparse import urlparse
>>> o = urlparse('s3://bucket_name/folder1/folder2/file1.json')
>>> o
ParseResult(scheme='s3', netloc='bucket_name', path='/folder1/folder2/file1.json', params='', query='', fragment='')
>>> o.netloc
'bucket_name'
>>> o.path
'/folder1/folder2/file1.json'

You may have to remove the beginning slash from the key as the next answer suggests.

正如下一个答案所示,您可能必须从键中删除开始的斜杠。

o.path.lstrip('/')

With Python 3 urlparse moved to urllib.parse so use:

使用Python 3 urlparse就将urllib迁移到urllib。解析这样使用:

from urllib.parse import urlparse

#2


4  

For those who like me was trying to use urlparse to extract key and bucket in order to create object with boto3. There's one important detail: remove slash from the beginning of the key

对于像我这样的人来说,他们试图使用urlparse来提取键和bucket,以便用boto3创建对象。有一个重要的细节:从键的开始删除斜线

from urlparse import urlparse
o = urlparse('s3://bucket_name/folder1/folder2/file1.json')
bucket = o.netloc
boto3.client('s3')
client.put_object(Body='test', Bucket=bucket, Key=key.lstrip('/'))

It took a while to realize that because boto3 doesn't throw any exception.

花了一段时间才意识到这一点,因为boto3没有抛出任何异常。

#3


2  

If you want to do it with regular expressions, you can do the following:

如果你想用正则表达式来做,你可以做以下事情:

>>> import re
>>> uri = 's3://my-bucket/my-folder/my-object.png'
>>> match = re.match(r's3:\/\/(.+?)\/(.+)', uri)
>>> match.group(1)
'my-bucket'
>>> match.group(2)
'my-folder/my-object.png'

This has the advantage that you can check for the s3 scheme rather than allowing anything there.

这样做的好处是,您可以检查s3方案,而不允许有任何内容。

#4


2  

A solution that works without urllib or re (also handles preceding slash):

不使用urllib或re(也处理前斜线)的解决方案:

def split_s3_path(s3_path):
    path_parts=s3_path.replace("s3://","").split("/")
    bucket=path_parts.pop(0)
    key="/".join(path_parts)
    return bucket, key

To run:

运行:

bucket, key = split_s3_path("s3://my-bucket/some_folder/another_folder/my_file.txt")

Returns:

返回:

bucket: my-bucket
key: some_folder/another_folder/my_file.txt