使用正则表达式从字符串中提取文件扩展名。

时间:2022-09-13 11:24:13

I have the following String:

我有以下字符串:

"data:audio/mp3;base64,ABC..."

And I'm extracting the file extension (in this case "mp3") out of it.

我提取文件扩展名(这里是mp3)。

The String varies accordingly to the file type. Some examples:

字符串随文件类型而变化。一些例子:

"data:image/jpeg;base64,ABC..."
"data:image/png;base64,ABC..."
"data:audio/wav;base64,ABC..."
"data:audio/mp3;base64,ABC..."

Here's how I've done:

下面是我所做的:

public class Test {

    private static final String BASE64_HEADER_EXP = "^data:.+;base64,";

    private static final Pattern PATTERN_BASE64_HEADER = Pattern.compile(BASE64_HEADER_EXP);

    private String data;

    private String fileName;

    public String getFileName() {
        Matcher base64HeaderMatcher = PATTERN_BASE64_HEADER.matcher(data);
        return String.format("%s.%s", getFilenameWithoutExtension(), getExtension(base64HeaderMatcher));
    }

    private String getFilenameWithoutExtension() {
        return fileName.split("\\.")[0];
    }

    private String getExtension(Matcher base64HeaderMatcher) {
        if (base64HeaderMatcher.find()) {
            String base64Header = base64HeaderMatcher.group(0);
            return base64Header.split("/")[1].split(";")[0];
        }
        return fileName.split("\\.")[1];
    }

}

What I want is a way to do it without having to split and access array positions like I'm doing above. Maybe extract the extension using a regex expression.

我想要的是一种不用像上面那样分割和访问数组位置的方法。也许可以使用regex表达式提取扩展。

I'm able to do it on RegExr site using this expression:

我可以在RegExr站点上使用这个表达式:

(?<=^data:.*/)(.*)(?=;)

But, when trying to use the same regex on Java, I get the error "Require that the characters immediately before the position do" because, aparently, Java doesn't support repetition inside lookbehind:

但是,当尝试在Java上使用相同的regex时,我得到的错误是“需要在位置之前立即使用字符”,因为Java不支持lookbehind内部的重复:

使用正则表达式从字符串中提取文件扩展名。

3 个解决方案

#1


2  

How about using capturing groups?

使用捕获组怎么样?

private static final String BASE64_HEADER_EXP = "^data:[^/]+/([^;]+);base64,";

This way you can use base64HeaderMatcher.group(1) and get file type.

通过这种方式,您可以使用base64HeaderMatcher.group(1)并获取文件类型。

#2


0  

This should do it for the examples you gave:

这应该适用于你所举的例子:

(?<=data:)(?:[A-z]+)/(.*?);

Explanation:

解释:

Positive look-behind

积极的向后看

(?<=data:)

Non-capturing group to account for image, audio, etc.

非捕获组负责图像、音频等。

(?:[A-z]+)

Match / literally, capture group for file extension, match ; literally

匹配/字面意思,为文件扩展捕获组,匹配;字面上的

/(.*?);

#3


0  

"Strings in Java have built-in support for regular expressions. Strings have four built-in methods for regular expressions, i.e., the matches(), split()), replaceFirst() and replaceAll() methods." -http://www.vogella.com/tutorials/JavaRegularExpressions/article.html

Java中的字符串内置了正则表达式的支持。字符串有四个内置的正则表达式方法,即。, matches()、split()、replaceFirst()和replaceAll()方法。http:/ /www.vogella.com/tutorials/JavaRegularExpressions/article.html

Using This Info We can quickly make a regex and test it against our string.

使用这个信息,我们可以快速创建一个regex,并针对我们的字符串测试它。

//In regex each set of () represents a capture field which can later be 
//referenced with $1, $2 etc..
//The below regex breaks the string into four fields 

string pattern="(^data:)(\\w+?/)(\\w+?)(;.*$)";

//First Field
//This field matches the start of a line (^) followed by "data:"

//Second Field
//This matches any wordCharacter (\\w), one or more (+) followed by a "/"
// the "?" symbol after the + means reluctantly match, match as few 
//characters 
//as possible. this field will effectively capture a seriece of letters 
//followed by a slash

//Third Field
//This is the field we want to capture and we will reference with $3
//it matches any wordCharacter(\\w), one or more reluctantly

//Fourth Field
//This captures the rest of the string including the ";"


//Now to extract the extension from this test string

string test="data:image/jpeg;base64,ABC...";
string testExtension="";

//Replace the contents of testExtension with the 3rd capture field of 
//our regex pattern applied to our test string like so

testExtension = test.replaceAll(pattern, "$3");

//This invokes the String class replaceAll() method 

//And now our string testExtension should contain "jpeg"

#1


2  

How about using capturing groups?

使用捕获组怎么样?

private static final String BASE64_HEADER_EXP = "^data:[^/]+/([^;]+);base64,";

This way you can use base64HeaderMatcher.group(1) and get file type.

通过这种方式,您可以使用base64HeaderMatcher.group(1)并获取文件类型。

#2


0  

This should do it for the examples you gave:

这应该适用于你所举的例子:

(?<=data:)(?:[A-z]+)/(.*?);

Explanation:

解释:

Positive look-behind

积极的向后看

(?<=data:)

Non-capturing group to account for image, audio, etc.

非捕获组负责图像、音频等。

(?:[A-z]+)

Match / literally, capture group for file extension, match ; literally

匹配/字面意思,为文件扩展捕获组,匹配;字面上的

/(.*?);

#3


0  

"Strings in Java have built-in support for regular expressions. Strings have four built-in methods for regular expressions, i.e., the matches(), split()), replaceFirst() and replaceAll() methods." -http://www.vogella.com/tutorials/JavaRegularExpressions/article.html

Java中的字符串内置了正则表达式的支持。字符串有四个内置的正则表达式方法,即。, matches()、split()、replaceFirst()和replaceAll()方法。http:/ /www.vogella.com/tutorials/JavaRegularExpressions/article.html

Using This Info We can quickly make a regex and test it against our string.

使用这个信息,我们可以快速创建一个regex,并针对我们的字符串测试它。

//In regex each set of () represents a capture field which can later be 
//referenced with $1, $2 etc..
//The below regex breaks the string into four fields 

string pattern="(^data:)(\\w+?/)(\\w+?)(;.*$)";

//First Field
//This field matches the start of a line (^) followed by "data:"

//Second Field
//This matches any wordCharacter (\\w), one or more (+) followed by a "/"
// the "?" symbol after the + means reluctantly match, match as few 
//characters 
//as possible. this field will effectively capture a seriece of letters 
//followed by a slash

//Third Field
//This is the field we want to capture and we will reference with $3
//it matches any wordCharacter(\\w), one or more reluctantly

//Fourth Field
//This captures the rest of the string including the ";"


//Now to extract the extension from this test string

string test="data:image/jpeg;base64,ABC...";
string testExtension="";

//Replace the contents of testExtension with the 3rd capture field of 
//our regex pattern applied to our test string like so

testExtension = test.replaceAll(pattern, "$3");

//This invokes the String class replaceAll() method 

//And now our string testExtension should contain "jpeg"