I have a file with a series of random A's, G's, C's and T's in them that look like this:
我有一个带有一系列随机A,G,C和T的文件,它们看起来像这样:
>Mary
ACGTACGTACGTAC
>Jane
CCCGGCCCCTA
>Arthur
AAAAAAAAAAT
I took those letters and concatinated them to end up with ACGTACGTACGTACCCCGGCCCCTAAAAAAAAAAT
. I now have a series of positions within that concatenated sequence that are of interest to me, and I want to find the associated Names that match with those positions (coordinates). I'm using the Perl function length to calculate the legnth of each sequence, and then associate the culmulative length with the name in a hash. So far I have:
我拿了这些信件并将它们连在一起以最终获得ACGTACGTACGTACCCCGGCCCCTAAAAAAAAAAT。我现在在该连接序列中有一系列我感兴趣的位置,我想找到与这些位置(坐标)匹配的相关名称。我正在使用Perl函数长度来计算每个序列的legnth,然后将culmulative长度与散列中的名称相关联。到目前为止我有:
#! /usr/bin/perl -w
use strict;
my $seq_input = $ARGV[0];
my $coord_input = $ARGV[1];
my %idSeq; #Stores sequence and associated ID's.
open (my $INPUT, "<$seq_input") or die "unable to open $seq_input";
open (my $COORD, "<$coord_input") or die "unable to open $fcoord_input";
while (<$INPUT>) {
if ($_ = /^[AGCT/) {
$idSeq{$_
my $id = ( /^[>]/)
#put information into a hash
#loop through hash looking for coordinates that are lower than the culmulative length
foreach $id
$totallength = $totallength + length($seq)
$lengthId{$totalLength} = $id
foreach $position
foreach $length
if ($length >= $position) { print; last }
close $fasta_input;
close $coord_input;
print "Done!\n";
So far I'm having trouble reading the file into a hash. Also would I need an array to print the hash?
到目前为止,我无法将文件读入哈希。我还需要一个数组来打印哈希吗?
1 个解决方案
#1
2
Not completely clear on what you want; maybe this:
不完全清楚你想要什么;也许这个:
my $seq;
my %idSeq;
while ( my $line = <$INPUT> ) {
if ( my ($name) = $line =~ /^>(.*)/ ) {
$idSeq{$name} = length $seq || 0;
}
else {
chomp $line;
$seq .= $line;
}
}
which produces:
产生:
$seq = 'ACGTACGTACGTACCCCGGCCCCTAAAAAAAAAAAT';
%idSeq = (
'Mary' => 0,
'Jane' => 14,
'Arthur' => 25,
);
#1
2
Not completely clear on what you want; maybe this:
不完全清楚你想要什么;也许这个:
my $seq;
my %idSeq;
while ( my $line = <$INPUT> ) {
if ( my ($name) = $line =~ /^>(.*)/ ) {
$idSeq{$name} = length $seq || 0;
}
else {
chomp $line;
$seq .= $line;
}
}
which produces:
产生:
$seq = 'ACGTACGTACGTACCCCGGCCCCTAAAAAAAAAAAT';
%idSeq = (
'Mary' => 0,
'Jane' => 14,
'Arthur' => 25,
);