I have the following string: SL2.40ch12:53884872-53885197
.
我有以下字符串:SL2.40ch12:53884872-53885197。
I would like to assign SL2.40ch12
to $chromosome
, 53884872
to $start
and 53885197
to $end
. What's an efficient way using regular expression for doing this?
我想将SL2.40ch12分配给$ chromosome,将53884872分配给$ start,将53885197分配给$ end。使用正则表达式执行此操作的有效方法是什么?
Here's how I tried doing it but my regex is off.
这是我尝试这样做但我的正则表达式已关闭。
my $string = SL2.40ch12:53884872-53885197
my $chromosome =~ /^*\:$/
my $start =~ /^+d\-$/
my $end =~ /^-+d\/
thanks
谢谢
3 个解决方案
#1
2
For that particular string, you can do something simple like this:
对于该特定字符串,您可以执行以下简单操作:
my $string = "SL2.40ch12:53884872-53885197";
my ($chr, $start, $end) = split /[:-]/, $string, 3;
If you want it a little stricter, do them separately
如果你想要它更严格一些,那就单独做吧
my ($chr, $range) = split /:/, $string, 2;
my ($start, $end) = split /-/, $range;
This is, of course, assuming that you will not have colons or dashes appearing elsewhere in your data.
当然,这是假设您的数据中不会出现冒号或破折号。
#2
1
Here is a regex that may do what you want:
这是一个可以做你想要的正则表达式:
($chromosome, $begin, $end) = /^(.*):(.*)-(.*)$/;
#3
0
I'm not really familiar with Perl, but if it uses common regexp syntax than your $start and $chromosome lines have an mistake. '$' - means end-of-the-line. So it will try to find dash at the end of the line.
我对Perl并不熟悉,但如果它使用常见的正则表达式语法而不是你的$ start和$染色体行有错误。 '$' - 意味着终结。所以它会尝试在行尾找到破折号。
#1
2
For that particular string, you can do something simple like this:
对于该特定字符串,您可以执行以下简单操作:
my $string = "SL2.40ch12:53884872-53885197";
my ($chr, $start, $end) = split /[:-]/, $string, 3;
If you want it a little stricter, do them separately
如果你想要它更严格一些,那就单独做吧
my ($chr, $range) = split /:/, $string, 2;
my ($start, $end) = split /-/, $range;
This is, of course, assuming that you will not have colons or dashes appearing elsewhere in your data.
当然,这是假设您的数据中不会出现冒号或破折号。
#2
1
Here is a regex that may do what you want:
这是一个可以做你想要的正则表达式:
($chromosome, $begin, $end) = /^(.*):(.*)-(.*)$/;
#3
0
I'm not really familiar with Perl, but if it uses common regexp syntax than your $start and $chromosome lines have an mistake. '$' - means end-of-the-line. So it will try to find dash at the end of the line.
我对Perl并不熟悉,但如果它使用常见的正则表达式语法而不是你的$ start和$染色体行有错误。 '$' - 意味着终结。所以它会尝试在行尾找到破折号。