I have this dataframe mydf
. The column nucleotide
could have 'A', 'T','G','C' letters. I want to change the letter A to T , C to G, G to C, and T to A, if the strand
column is '-'. How do I do it?
我有这个数据帧mydf。列核苷酸可具有'A','T','G','C'字母。如果strand列是' - ',我想将字母A更改为T,C更改为G,G更改为C,将T更改为A.我该怎么做?
mydf<- structure(list(seqnames = structure(c(1L, 1L, 1L, 1L), .Label = c("chr1",
"chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9",
"chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
"chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX",
"chrY", "chrM"), class = "factor"), pos = c(115258748, 115258748,
115258748, 115258748), strand = structure(c(1L, 2L, 1L, 2L), .Label = c("+",
"-", "*"), class = "factor"), nucleotide = structure(c(2L, 2L,
2L, 2L), .Label = c("A", "C", "G", "T", "N", "=", "-"), class = "factor")), .Names = c("seqnames",
"pos", "strand", "nucleotide"), row.names = c(NA, 4L), class = "data.frame")
result
结果
seqnames pos strand nucleotide
1 chr1 115258748 + C
2 chr1 115258748 - G
3 chr1 115258748 + C
4 chr1 115258748 - G
1 个解决方案
#1
16
For one-to-one character translation, you can use chartr()
.
对于一对一字符转换,您可以使用chartr()。
within(mydf, {
nucleotide[strand == "-"] <- chartr("ACGT", "TGCA", nucleotide[strand == "-"])
})
# seqnames pos strand nucleotide
# 1 chr1 115258748 + C
# 2 chr1 115258748 - G
# 3 chr1 115258748 + C
# 4 chr1 115258748 - G
Note that I used within()
here to avoid writing mydf$
four times and to save from changing the original data. You can also write the following, but keep in mind you will change the original data.
请注意,我在这里使用()来避免写入mydf $四次并保存更改原始数据。您也可以编写以下内容,但请记住,您将更改原始数据。
mydf$nucleotide[mydf$strand == "-"] <-
with(mydf, chartr("ACGT", "TGCA", nucleotide[strand == "-"]))
#1
16
For one-to-one character translation, you can use chartr()
.
对于一对一字符转换,您可以使用chartr()。
within(mydf, {
nucleotide[strand == "-"] <- chartr("ACGT", "TGCA", nucleotide[strand == "-"])
})
# seqnames pos strand nucleotide
# 1 chr1 115258748 + C
# 2 chr1 115258748 - G
# 3 chr1 115258748 + C
# 4 chr1 115258748 - G
Note that I used within()
here to avoid writing mydf$
four times and to save from changing the original data. You can also write the following, but keep in mind you will change the original data.
请注意,我在这里使用()来避免写入mydf $四次并保存更改原始数据。您也可以编写以下内容,但请记住,您将更改原始数据。
mydf$nucleotide[mydf$strand == "-"] <-
with(mydf, chartr("ACGT", "TGCA", nucleotide[strand == "-"]))