I am looking for a way to extract a filename and extension from a particular url using Python
我正在寻找使用Python从特定url提取文件名和扩展名的方法
lets say a URL looks as follows
让我们说一个URL看起来如下。
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
How would I go about getting the following.
我将如何得到以下信息。
filename = "da4ca3509a7b11e19e4a12313813ffc0_7"
file_ext = ".jpg"
6 个解决方案
#1
30
from urlparse import urlparse
from os.path import splitext, basename
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
disassembled = urlparse(picture_page)
filename, file_ext = splitext(basename(disassembled.path))
Only downside with this is that your filename will contain a preceding / which you can always remove yourself.
唯一的缺点是你的文件名将包含一个你可以自己删除的/。
#2
12
Try with urlparse.urlsplit to split url, and then os.path.splitext to retrieve filename and extension (use os.path.basename to keep only the last filename) :
与urlparse试试。urlsplit到split url,然后是os.path。检索文件名和扩展名的splitext(使用os.path)。basename只保留最后一个文件名):
import urlparse
import os.path
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
print os.path.splitext(os.path.basename(urlparse.urlsplit(picture_page).path))
>>> ('da4ca3509a7b11e19e4a12313813ffc0_7', '.jpg')
#3
10
filename = picture_page.split('/')[-1].split('.')[0]
file_ext = '.'+picture_page.split('.')[-1]
#4
4
# Here's your link:
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
#Here's your filename and ext:
filename, ext = (picture_page.split('/')[-1].split('.'))
When you do picture_page.split('/'), it will return a list of strings from your url split by a /
. If you know python list indexing well, you'd know that -1 will give you the last element or the first element from the end of the list. In your case, it will be the filename: da4ca3509a7b11e19e4a12313813ffc0_7.jpg
执行picture_page.split('/')时,它会从url中返回一个字符串列表,该列表由/分割。如果您很了解python列表索引,您就会知道-1将给出列表末尾的最后一个元素或第一个元素。在您的示例中,文件名是:da4ca3509a7b11e19e4a12313813ffc0_7.jpg
Splitting that by delimeter .
, you get two values: da4ca3509a7b11e19e4a12313813ffc0_7
and jpg
, as expected, because they are separated by a period which you used as a delimeter in your split() call.
通过delimeter将其拆分,您可以得到两个值:da4ca3509a7b11e19e4a123 ffc0_7和jpg,因为它们是由您在split()调用中用作delimeter的期间分隔的。
Now, since the last split returns two values in the resulting list, you can tuplify it. Hence, basically, the result would be like:
现在,由于最后的分割返回结果列表中的两个值,您可以对它进行压缩。因此,基本上结果会是:
filename,ext = ('da4ca3509a7b11e19e4a12313813ffc0_7', 'jpg')
文件名,ext =(“da4ca3509a7b11e19e4a12313813ffc0_7”、“jpg”)
#5
1
os.path.splitext
will help you extract the filename and extension once you have extracted the relevant string from the URL using urlparse
:
os.path。使用urlparse从URL中提取相关字符串后,splitext将帮助您提取文件名和扩展名:
fName, ext = os.path.splitext('yourImage.jpg')
#6
-2
>>> import re
>>> s = 'picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"'
>>> re.findall(r'\/([a-zA-Z0-9_]*)\.[a-zA-Z]*\"$',s)[0]
'da4ca3509a7b11e19e4a12313813ffc0_7'
>>> re.findall(r'([a-zA-Z]*)\"$',s)[0]
'jpg'
#1
30
from urlparse import urlparse
from os.path import splitext, basename
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
disassembled = urlparse(picture_page)
filename, file_ext = splitext(basename(disassembled.path))
Only downside with this is that your filename will contain a preceding / which you can always remove yourself.
唯一的缺点是你的文件名将包含一个你可以自己删除的/。
#2
12
Try with urlparse.urlsplit to split url, and then os.path.splitext to retrieve filename and extension (use os.path.basename to keep only the last filename) :
与urlparse试试。urlsplit到split url,然后是os.path。检索文件名和扩展名的splitext(使用os.path)。basename只保留最后一个文件名):
import urlparse
import os.path
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
print os.path.splitext(os.path.basename(urlparse.urlsplit(picture_page).path))
>>> ('da4ca3509a7b11e19e4a12313813ffc0_7', '.jpg')
#3
10
filename = picture_page.split('/')[-1].split('.')[0]
file_ext = '.'+picture_page.split('.')[-1]
#4
4
# Here's your link:
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
#Here's your filename and ext:
filename, ext = (picture_page.split('/')[-1].split('.'))
When you do picture_page.split('/'), it will return a list of strings from your url split by a /
. If you know python list indexing well, you'd know that -1 will give you the last element or the first element from the end of the list. In your case, it will be the filename: da4ca3509a7b11e19e4a12313813ffc0_7.jpg
执行picture_page.split('/')时,它会从url中返回一个字符串列表,该列表由/分割。如果您很了解python列表索引,您就会知道-1将给出列表末尾的最后一个元素或第一个元素。在您的示例中,文件名是:da4ca3509a7b11e19e4a12313813ffc0_7.jpg
Splitting that by delimeter .
, you get two values: da4ca3509a7b11e19e4a12313813ffc0_7
and jpg
, as expected, because they are separated by a period which you used as a delimeter in your split() call.
通过delimeter将其拆分,您可以得到两个值:da4ca3509a7b11e19e4a123 ffc0_7和jpg,因为它们是由您在split()调用中用作delimeter的期间分隔的。
Now, since the last split returns two values in the resulting list, you can tuplify it. Hence, basically, the result would be like:
现在,由于最后的分割返回结果列表中的两个值,您可以对它进行压缩。因此,基本上结果会是:
filename,ext = ('da4ca3509a7b11e19e4a12313813ffc0_7', 'jpg')
文件名,ext =(“da4ca3509a7b11e19e4a12313813ffc0_7”、“jpg”)
#5
1
os.path.splitext
will help you extract the filename and extension once you have extracted the relevant string from the URL using urlparse
:
os.path。使用urlparse从URL中提取相关字符串后,splitext将帮助您提取文件名和扩展名:
fName, ext = os.path.splitext('yourImage.jpg')
#6
-2
>>> import re
>>> s = 'picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"'
>>> re.findall(r'\/([a-zA-Z0-9_]*)\.[a-zA-Z]*\"$',s)[0]
'da4ca3509a7b11e19e4a12313813ffc0_7'
>>> re.findall(r'([a-zA-Z]*)\"$',s)[0]
'jpg'