I would like to build a word cloud with R (I have done so with the package wordcloud) and then color specific words a certain color. Currently the behavior of the function is to color words according to frequency (which can be useful) but word size already does this so I'd want to use color for additional meaning.
我想用R构建一个wordcloud(我已经用软件包wordcloud做过了),然后用特定的颜色给特定的单词着色。目前这个函数的行为是根据频率给单词上色(这很有用),但是单词大小已经做到了这一点,所以我想用颜色来表示其他的意思。
Any idea on how to color specific words in wordcloud? (If there's another wordcloud function in R I'm unaware of I'm more than willing to go that route.)
你知道怎么在wordcloud中给特定的词上色吗?(如果在R中有另一个wordcloud函数,我不知道我更愿意走这条路。)
A mock example and my attempt (I tried to treat the color argument in the same manor I would a regular plot from the plot function):
一个模拟的例子和我的尝试(我试图在同一个庄园中处理颜色参数,我想从情节函数中得到一个规则的情节):
library(wordcloud)
x <- paste(rep("how do keep the two words as one chunk in the word cloud", 3),
collapse = " ")
X <- data.frame(table(strsplit(x, " ")))
COL <- ifelse(X$Var1 %in% c("word", "cloud", "words"), "red", "black")
wordcloud(X$Var1, X$Freq, color=COL)
EDIT: I wanted to add that the new version of wordcloud (Jan 10, 2010; version 2.0)[Thank you Ian Fellows & David Robinson] now was this feature along with some other terrific additions. Here is the code to accomplish the original goal within wordcloud:
编辑:我想添加新版wordcloud(2010年1月10日;版本2.0)[谢谢Ian Fellows和David Robinson]现在是这个特性以及其他一些了不起的添加。下面是在wordcloud中实现最初目标的代码:
wordcloud(X$Var1, X$Freq, color=COL, ordered.colors=TRUE, random.color=FALSE)
1 个解决方案
#1
14
EDIT: As described in the comments, the feature described below has now been added to the wordcloud library.
编辑:正如评论中所描述的,下面描述的特性现在已经被添加到wordcloud库中。
My approach was to take the R function's code and customize it. It required changing only a few lines, and can now take either a single color or a vector of colors of the same length as words
.
我的方法是采用R函数的代码并对其进行自定义。它只需要改变几行,现在可以使用单一的颜色,也可以使用与单词相同长度的颜色向量。
library(wordcloud)
colored.wordcloud <- function(words,freq,scale=c(4,.5),min.freq=3,max.words=Inf,random.order=TRUE,random.color=FALSE,
rot.per=.1,colors="black",ordered.colors=FALSE,use.r.layout=FALSE,...) {
tails <- "g|j|p|q|y"
last <- 1
nc<- length(colors)
if (ordered.colors) {
if (length(colors) != 1 && length(colors) != length(words)) {
stop(paste("Length of colors does not match length of words",
"vector"))
}
}
overlap <- function(x1, y1, sw1, sh1) {
if(!use.r.layout)
return(.overlap(x1,y1,sw1,sh1,boxes))
s <- 0
if (length(boxes) == 0)
return(FALSE)
for (i in c(last,1:length(boxes))) {
bnds <- boxes[[i]]
x2 <- bnds[1]
y2 <- bnds[2]
sw2 <- bnds[3]
sh2 <- bnds[4]
if (x1 < x2)
overlap <- x1 + sw1 > x2-s
else
overlap <- x2 + sw2 > x1-s
if (y1 < y2)
overlap <- overlap && (y1 + sh1 > y2-s)
else
overlap <- overlap && (y2 + sh2 > y1-s)
if(overlap){
last <<- i
return(TRUE)
}
}
FALSE
}
ord <- rank(-freq, ties.method = "random")
words <- words[ord<=max.words]
freq <- freq[ord<=max.words]
if (ordered.colors) {
colors <- colors[ord<=max.words]
}
if(random.order)
ord <- sample.int(length(words))
else
ord <- order(freq,decreasing=TRUE)
words <- words[ord]
freq <- freq[ord]
words <- words[freq>=min.freq]
freq <- freq[freq>=min.freq]
if (ordered.colors) {
colors <- colors[ord][freq>=min.freq]
}
thetaStep <- .1
rStep <- .05
plot.new()
op <- par("mar")
par(mar=c(0,0,0,0))
plot.window(c(0,1),c(0,1),asp=1)
normedFreq <- freq/max(freq)
size <- (scale[1]-scale[2])*normedFreq + scale[2]
boxes <- list()
for(i in 1:length(words)){
rotWord <- runif(1)<rot.per
r <-0
theta <- runif(1,0,2*pi)
x1<-.5
y1<-.5
wid <- strwidth(words[i],cex=size[i],...)
ht <- strheight(words[i],cex=size[i],...)
#mind your ps and qs
if(grepl(tails,words[i]))
ht <- ht + ht*.2
if(rotWord){
tmp <- ht
ht <- wid
wid <- tmp
}
isOverlaped <- TRUE
while(isOverlaped){
if(!overlap(x1-.5*wid,y1-.5*ht,wid,ht) &&
x1-.5*wid>0 && y1-.5*ht>0 &&
x1+.5*wid<1 && y1+.5*ht<1){
if (!random.color) {
if (ordered.colors) {
cc <- colors[i]
}
else {
cc <- ceiling(nc*normedFreq[i])
cc <- colors[cc]
}
} else {
cc <- colors[sample(1:nc,1)]
}
text(x1,y1,words[i],cex=size[i],offset=0,srt=rotWord*90,
col=cc,...)
#rect(x1-.5*wid,y1-.5*ht,x1+.5*wid,y1+.5*ht)
boxes[[length(boxes)+1]] <- c(x1-.5*wid,y1-.5*ht,wid,ht)
isOverlaped <- FALSE
}else{
if(r>sqrt(.5)){
warning(paste(words[i],
"could not be fit on page. It will not be plotted."))
isOverlaped <- FALSE
}
theta <- theta+thetaStep
r <- r + rStep*thetaStep/(2*pi)
x1 <- .5+r*cos(theta)
y1 <- .5+r*sin(theta)
}
}
}
par(mar=op)
invisible()
}
Some code to try it out:
一些测试代码:
colors = c("blue", "red", "orange", "green")
colored.wordcloud(colors, c(10, 5, 3, 9), colors=colors)
#1
14
EDIT: As described in the comments, the feature described below has now been added to the wordcloud library.
编辑:正如评论中所描述的,下面描述的特性现在已经被添加到wordcloud库中。
My approach was to take the R function's code and customize it. It required changing only a few lines, and can now take either a single color or a vector of colors of the same length as words
.
我的方法是采用R函数的代码并对其进行自定义。它只需要改变几行,现在可以使用单一的颜色,也可以使用与单词相同长度的颜色向量。
library(wordcloud)
colored.wordcloud <- function(words,freq,scale=c(4,.5),min.freq=3,max.words=Inf,random.order=TRUE,random.color=FALSE,
rot.per=.1,colors="black",ordered.colors=FALSE,use.r.layout=FALSE,...) {
tails <- "g|j|p|q|y"
last <- 1
nc<- length(colors)
if (ordered.colors) {
if (length(colors) != 1 && length(colors) != length(words)) {
stop(paste("Length of colors does not match length of words",
"vector"))
}
}
overlap <- function(x1, y1, sw1, sh1) {
if(!use.r.layout)
return(.overlap(x1,y1,sw1,sh1,boxes))
s <- 0
if (length(boxes) == 0)
return(FALSE)
for (i in c(last,1:length(boxes))) {
bnds <- boxes[[i]]
x2 <- bnds[1]
y2 <- bnds[2]
sw2 <- bnds[3]
sh2 <- bnds[4]
if (x1 < x2)
overlap <- x1 + sw1 > x2-s
else
overlap <- x2 + sw2 > x1-s
if (y1 < y2)
overlap <- overlap && (y1 + sh1 > y2-s)
else
overlap <- overlap && (y2 + sh2 > y1-s)
if(overlap){
last <<- i
return(TRUE)
}
}
FALSE
}
ord <- rank(-freq, ties.method = "random")
words <- words[ord<=max.words]
freq <- freq[ord<=max.words]
if (ordered.colors) {
colors <- colors[ord<=max.words]
}
if(random.order)
ord <- sample.int(length(words))
else
ord <- order(freq,decreasing=TRUE)
words <- words[ord]
freq <- freq[ord]
words <- words[freq>=min.freq]
freq <- freq[freq>=min.freq]
if (ordered.colors) {
colors <- colors[ord][freq>=min.freq]
}
thetaStep <- .1
rStep <- .05
plot.new()
op <- par("mar")
par(mar=c(0,0,0,0))
plot.window(c(0,1),c(0,1),asp=1)
normedFreq <- freq/max(freq)
size <- (scale[1]-scale[2])*normedFreq + scale[2]
boxes <- list()
for(i in 1:length(words)){
rotWord <- runif(1)<rot.per
r <-0
theta <- runif(1,0,2*pi)
x1<-.5
y1<-.5
wid <- strwidth(words[i],cex=size[i],...)
ht <- strheight(words[i],cex=size[i],...)
#mind your ps and qs
if(grepl(tails,words[i]))
ht <- ht + ht*.2
if(rotWord){
tmp <- ht
ht <- wid
wid <- tmp
}
isOverlaped <- TRUE
while(isOverlaped){
if(!overlap(x1-.5*wid,y1-.5*ht,wid,ht) &&
x1-.5*wid>0 && y1-.5*ht>0 &&
x1+.5*wid<1 && y1+.5*ht<1){
if (!random.color) {
if (ordered.colors) {
cc <- colors[i]
}
else {
cc <- ceiling(nc*normedFreq[i])
cc <- colors[cc]
}
} else {
cc <- colors[sample(1:nc,1)]
}
text(x1,y1,words[i],cex=size[i],offset=0,srt=rotWord*90,
col=cc,...)
#rect(x1-.5*wid,y1-.5*ht,x1+.5*wid,y1+.5*ht)
boxes[[length(boxes)+1]] <- c(x1-.5*wid,y1-.5*ht,wid,ht)
isOverlaped <- FALSE
}else{
if(r>sqrt(.5)){
warning(paste(words[i],
"could not be fit on page. It will not be plotted."))
isOverlaped <- FALSE
}
theta <- theta+thetaStep
r <- r + rStep*thetaStep/(2*pi)
x1 <- .5+r*cos(theta)
y1 <- .5+r*sin(theta)
}
}
}
par(mar=op)
invisible()
}
Some code to try it out:
一些测试代码:
colors = c("blue", "red", "orange", "green")
colored.wordcloud(colors, c(10, 5, 3, 9), colors=colors)