如何从文本文件中读取DNA序列并将其存储在C中的数组中？

How to read a DNA sequence from a text File in C language and store it in an array and extract all the substrings of a given length starting from each nucleotide position?

如何从C语言的文本文件中读取DNA序列并将其存储在一个数组中并从每个核苷酸位置开始提取给定长度的所有子串？

For Example the sequence is in the following way in the text file

例如，序列在文本文件中以下列方式

cctgatagacgctatctggctatccaggtacttaggtcctctgtgcgaatctatgcgtttccaaccat

cctgatagacgctatctggctatccaggtacttaggtcctctgtgcgaatctatgcgtttccaaccat

All the substrings of all the starting positions

所有起始位置的所有子串

if length of the sub string = 3

如果子串的长度= 3

cct, ctg, tga, gat, ..., cat

cct，ctg，tga，gat，...，cat

1 个解决方案

#1

Is C language compulsory to you?

C语言是强制性的吗？

I would move to a higher level language such as Python, this function would do:

我会转向更高级的语言，比如Python，这个函数会做：

from itertools import count

def iterate_fragments(sequence,size):
    """Takes a string and yields pieces of given size."""
    for number in count():
        try: yield sequence[number:number+size]
        except IndexError: break

for fragment in iterate_fragments(dna_sequence,3):
    print fragment

This simple code will print each dna fragment (3 nucleotides size).

这个简单的代码将打印每个dna片段（3个核苷酸大小）。

#1