I have the following String:
我有以下字符串:
"data:audio/mp3;base64,ABC..."
And I'm extracting the file extension (in this case "mp3"
) out of it.
我提取文件扩展名(这里是mp3)。
The String varies accordingly to the file type. Some examples:
字符串随文件类型而变化。一些例子:
"..."
"..."
"data:audio/wav;base64,ABC..."
"data:audio/mp3;base64,ABC..."
Here's how I've done:
下面是我所做的:
public class Test {
private static final String BASE64_HEADER_EXP = "^data:.+;base64,";
private static final Pattern PATTERN_BASE64_HEADER = Pattern.compile(BASE64_HEADER_EXP);
private String data;
private String fileName;
public String getFileName() {
Matcher base64HeaderMatcher = PATTERN_BASE64_HEADER.matcher(data);
return String.format("%s.%s", getFilenameWithoutExtension(), getExtension(base64HeaderMatcher));
}
private String getFilenameWithoutExtension() {
return fileName.split("\\.")[0];
}
private String getExtension(Matcher base64HeaderMatcher) {
if (base64HeaderMatcher.find()) {
String base64Header = base64HeaderMatcher.group(0);
return base64Header.split("/")[1].split(";")[0];
}
return fileName.split("\\.")[1];
}
}
What I want is a way to do it without having to split and access array positions like I'm doing above. Maybe extract the extension using a regex expression.
我想要的是一种不用像上面那样分割和访问数组位置的方法。也许可以使用regex表达式提取扩展。
I'm able to do it on RegExr site using this expression:
我可以在RegExr站点上使用这个表达式:
(?<=^data:.*/)(.*)(?=;)
But, when trying to use the same regex on Java, I get the error "Require that the characters immediately before the position do"
because, aparently, Java doesn't support repetition inside lookbehind:
但是,当尝试在Java上使用相同的regex时,我得到的错误是“需要在位置之前立即使用字符”,因为Java不支持lookbehind内部的重复:
3 个解决方案
#1
2
How about using capturing groups?
使用捕获组怎么样?
private static final String BASE64_HEADER_EXP = "^data:[^/]+/([^;]+);base64,";
This way you can use base64HeaderMatcher.group(1)
and get file type.
通过这种方式,您可以使用base64HeaderMatcher.group(1)并获取文件类型。
#2
0
This should do it for the examples you gave:
这应该适用于你所举的例子:
(?<=data:)(?:[A-z]+)/(.*?);
Explanation:
解释:
Positive look-behind
积极的向后看
(?<=data:)
Non-capturing group to account for image
, audio
, etc.
非捕获组负责图像、音频等。
(?:[A-z]+)
Match /
literally, capture group for file extension, match ;
literally
匹配/字面意思,为文件扩展捕获组,匹配;字面上的
/(.*?);
#3
0
"Strings in Java have built-in support for regular expressions. Strings have four built-in methods for regular expressions, i.e., the matches(), split()), replaceFirst() and replaceAll() methods." -http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Java中的字符串内置了正则表达式的支持。字符串有四个内置的正则表达式方法,即。, matches()、split()、replaceFirst()和replaceAll()方法。http:/ /www.vogella.com/tutorials/JavaRegularExpressions/article.html
Using This Info We can quickly make a regex and test it against our string.
使用这个信息,我们可以快速创建一个regex,并针对我们的字符串测试它。
//In regex each set of () represents a capture field which can later be
//referenced with $1, $2 etc..
//The below regex breaks the string into four fields
string pattern="(^data:)(\\w+?/)(\\w+?)(;.*$)";
//First Field
//This field matches the start of a line (^) followed by "data:"
//Second Field
//This matches any wordCharacter (\\w), one or more (+) followed by a "/"
// the "?" symbol after the + means reluctantly match, match as few
//characters
//as possible. this field will effectively capture a seriece of letters
//followed by a slash
//Third Field
//This is the field we want to capture and we will reference with $3
//it matches any wordCharacter(\\w), one or more reluctantly
//Fourth Field
//This captures the rest of the string including the ";"
//Now to extract the extension from this test string
string test="...";
string testExtension="";
//Replace the contents of testExtension with the 3rd capture field of
//our regex pattern applied to our test string like so
testExtension = test.replaceAll(pattern, "$3");
//This invokes the String class replaceAll() method
//And now our string testExtension should contain "jpeg"
#1
2
How about using capturing groups?
使用捕获组怎么样?
private static final String BASE64_HEADER_EXP = "^data:[^/]+/([^;]+);base64,";
This way you can use base64HeaderMatcher.group(1)
and get file type.
通过这种方式,您可以使用base64HeaderMatcher.group(1)并获取文件类型。
#2
0
This should do it for the examples you gave:
这应该适用于你所举的例子:
(?<=data:)(?:[A-z]+)/(.*?);
Explanation:
解释:
Positive look-behind
积极的向后看
(?<=data:)
Non-capturing group to account for image
, audio
, etc.
非捕获组负责图像、音频等。
(?:[A-z]+)
Match /
literally, capture group for file extension, match ;
literally
匹配/字面意思,为文件扩展捕获组,匹配;字面上的
/(.*?);
#3
0
"Strings in Java have built-in support for regular expressions. Strings have four built-in methods for regular expressions, i.e., the matches(), split()), replaceFirst() and replaceAll() methods." -http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Java中的字符串内置了正则表达式的支持。字符串有四个内置的正则表达式方法,即。, matches()、split()、replaceFirst()和replaceAll()方法。http:/ /www.vogella.com/tutorials/JavaRegularExpressions/article.html
Using This Info We can quickly make a regex and test it against our string.
使用这个信息,我们可以快速创建一个regex,并针对我们的字符串测试它。
//In regex each set of () represents a capture field which can later be
//referenced with $1, $2 etc..
//The below regex breaks the string into four fields
string pattern="(^data:)(\\w+?/)(\\w+?)(;.*$)";
//First Field
//This field matches the start of a line (^) followed by "data:"
//Second Field
//This matches any wordCharacter (\\w), one or more (+) followed by a "/"
// the "?" symbol after the + means reluctantly match, match as few
//characters
//as possible. this field will effectively capture a seriece of letters
//followed by a slash
//Third Field
//This is the field we want to capture and we will reference with $3
//it matches any wordCharacter(\\w), one or more reluctantly
//Fourth Field
//This captures the rest of the string including the ";"
//Now to extract the extension from this test string
string test="...";
string testExtension="";
//Replace the contents of testExtension with the 3rd capture field of
//our regex pattern applied to our test string like so
testExtension = test.replaceAll(pattern, "$3");
//This invokes the String class replaceAll() method
//And now our string testExtension should contain "jpeg"