How does one find the degree centrality of nodes in table like,
如何找到表中节点的度中心性,
article users
u1 u2 u3 u4 u5 u6 u7
1 1 1 1 0 0 0 0
2 0 1 0 1 1 0 0
3 1 0 0 1 0 1 1
This is just an example of my data I have a very large file consisting of 1533 articles and about 52000 users.
这只是我的数据的一个例子我有一个非常大的文件,包括1533篇文章和大约52000个用户。
I want to find the centrality of articles and centrality of users in the matrix.
我想找出文章的中心性和用户在矩阵中的中心性。
1 个解决方案
#1
6
Degree centrality simply counts the number of other nodes that each node is "connected" to. So to do this for users, for example, we have to define what it means to be connected to another user. The simplest approach asserts a connection if a user has at least one article in common with another user. A slightly more complex (and probably better) approach weights connectivity by the number of articles in common. So if user 1 has 10 articles in common with user 2 and 3 articles in common with user 3, we say that user 1 is "more connected" to user 2 than to user 3. In what follows, I'll use the latter approach.
度数中心性只是计算每个节点“连接”到的其他节点的数量。所以要为用户做这个,例如,我们必须定义连接到另一个用户是什么意思。如果一个用户与另一个用户有至少一篇相同的文章,那么最简单的方法就是断言一个连接。稍微复杂一点(可能更好)的方法是根据共同的文章数量来权衡连接性。因此,如果用户1有10篇与用户2和3篇文章相同的文章,我们认为用户1与用户2的连接比用户3“更紧密”。接下来,我将使用后一种方法。
This code creates a sample matrix with 15 articles and 30 users, sparsely connected. It then calculates a 30 X 30 adjacency matrix for users where the [i,j] element is the number of articles user i has in common with user j. Then we create a weighted igraph
object from this matrix, and let igraph
calculate the degree centrality.
这段代码创建了一个示例矩阵,其中包含15篇文章和30个用户,连接稀疏。然后,它为用户计算一个30×30的邻接矩阵,其中[i,j]元素是用户i与用户j共有的物品数量。然后我们从这个矩阵中创建一个加权的igraph对象,让igraph计算度中心性。
Since degree centrality does not take the weights into account, we also calculate eigenvector centrality (which does take the weights into account). In this very simple example, the differences are subtle but instructive.
由于度中心性不考虑权重,我们也计算特征向量中心性(它确实考虑了权重)。在这个非常简单的例子中,差异是微妙的,但却是有益的。
# this just set up the sample - you have the matrix M already
n.articles <- 15
n.users <- 30
set.seed(1) # for reproducibility
M <- matrix(sample(0L:1L,n.articles*n.users,p=c(0.8,0.2),replace=T),nc=n.users)
# you start here...
m.adj <- matrix(0L,nc=n.users,nr=n.users)
for (i in 1:(n.users-1)) {
for (j in (i+1):n.users) {
m.adj[i,j] <- sum(M[,i]*M[,j])
}
}
library(igraph)
g <- graph.adjacency(m.adj,weighted=T, mode="undirected")
palette <- c("purple","blue","green","yellow","orange","red")
par(mfrow=c(1,2))
# degree centrality
c.d <- degree(g)
col <- as.integer(5*(c.d-min(c.d))/diff(range(c.d))+1)
set.seed(1)
plot(g,vertex.color=palette[col],main="Degree Centrality",
layout=layout.fruchterman.reingold)
# eigenvalue centrality
c.e <- evcent(g)$vector
col <- as.integer(5*(c.e-min(c.e))/diff(range(c.e))+1)
set.seed(1)
plot(g,vertex.color=palette[col],main="Eigenvalue Centrality",
layout=layout.fruchterman.reingold)
So in both cases node 15 has the highest centrality. However, node 28 has a higher degree centrality and a lower eigenvalue centrality than node 27. This is because node 28 is connected to more nodes, but the strength of the connections is lower.
所以在这两种情况下,节点15的中心性都是最高的。然而,节点28比节点27具有更高的度中心性和更低的特征值中心性。这是因为节点28连接了更多的节点,但是连接的强度较低。
The same approach can of course be used to calculate article centrality; just use the transpose of M.
当然,同样的方法也可以用来计算物品的中心性;用M的转置。
This approach will not work with 52,000 users - the adjacency matrix will contain > 2.5 billion elements. I'm not aware of a workaround for this - perhaps someone else is, I'd like to hear it. So if you need to tablulate a centrality score for each of the 52,000 users, I can't help you. On the other hand if you want to see patterns, it might be possible to carry out the analysis on a random sample of users (say, 10%).
这种方法将不能与52,000个用户一起使用——邻接矩阵将包含> 25亿个元素。我不知道有什么变通的办法——也许是其他人,我想听听。所以如果你需要为52,000个用户中的每一个设置一个中心性分数,我不能帮你。另一方面,如果您希望看到模式,可能可以对一个随机的用户样本(比如10%)进行分析。
#1
6
Degree centrality simply counts the number of other nodes that each node is "connected" to. So to do this for users, for example, we have to define what it means to be connected to another user. The simplest approach asserts a connection if a user has at least one article in common with another user. A slightly more complex (and probably better) approach weights connectivity by the number of articles in common. So if user 1 has 10 articles in common with user 2 and 3 articles in common with user 3, we say that user 1 is "more connected" to user 2 than to user 3. In what follows, I'll use the latter approach.
度数中心性只是计算每个节点“连接”到的其他节点的数量。所以要为用户做这个,例如,我们必须定义连接到另一个用户是什么意思。如果一个用户与另一个用户有至少一篇相同的文章,那么最简单的方法就是断言一个连接。稍微复杂一点(可能更好)的方法是根据共同的文章数量来权衡连接性。因此,如果用户1有10篇与用户2和3篇文章相同的文章,我们认为用户1与用户2的连接比用户3“更紧密”。接下来,我将使用后一种方法。
This code creates a sample matrix with 15 articles and 30 users, sparsely connected. It then calculates a 30 X 30 adjacency matrix for users where the [i,j] element is the number of articles user i has in common with user j. Then we create a weighted igraph
object from this matrix, and let igraph
calculate the degree centrality.
这段代码创建了一个示例矩阵,其中包含15篇文章和30个用户,连接稀疏。然后,它为用户计算一个30×30的邻接矩阵,其中[i,j]元素是用户i与用户j共有的物品数量。然后我们从这个矩阵中创建一个加权的igraph对象,让igraph计算度中心性。
Since degree centrality does not take the weights into account, we also calculate eigenvector centrality (which does take the weights into account). In this very simple example, the differences are subtle but instructive.
由于度中心性不考虑权重,我们也计算特征向量中心性(它确实考虑了权重)。在这个非常简单的例子中,差异是微妙的,但却是有益的。
# this just set up the sample - you have the matrix M already
n.articles <- 15
n.users <- 30
set.seed(1) # for reproducibility
M <- matrix(sample(0L:1L,n.articles*n.users,p=c(0.8,0.2),replace=T),nc=n.users)
# you start here...
m.adj <- matrix(0L,nc=n.users,nr=n.users)
for (i in 1:(n.users-1)) {
for (j in (i+1):n.users) {
m.adj[i,j] <- sum(M[,i]*M[,j])
}
}
library(igraph)
g <- graph.adjacency(m.adj,weighted=T, mode="undirected")
palette <- c("purple","blue","green","yellow","orange","red")
par(mfrow=c(1,2))
# degree centrality
c.d <- degree(g)
col <- as.integer(5*(c.d-min(c.d))/diff(range(c.d))+1)
set.seed(1)
plot(g,vertex.color=palette[col],main="Degree Centrality",
layout=layout.fruchterman.reingold)
# eigenvalue centrality
c.e <- evcent(g)$vector
col <- as.integer(5*(c.e-min(c.e))/diff(range(c.e))+1)
set.seed(1)
plot(g,vertex.color=palette[col],main="Eigenvalue Centrality",
layout=layout.fruchterman.reingold)
So in both cases node 15 has the highest centrality. However, node 28 has a higher degree centrality and a lower eigenvalue centrality than node 27. This is because node 28 is connected to more nodes, but the strength of the connections is lower.
所以在这两种情况下,节点15的中心性都是最高的。然而,节点28比节点27具有更高的度中心性和更低的特征值中心性。这是因为节点28连接了更多的节点,但是连接的强度较低。
The same approach can of course be used to calculate article centrality; just use the transpose of M.
当然,同样的方法也可以用来计算物品的中心性;用M的转置。
This approach will not work with 52,000 users - the adjacency matrix will contain > 2.5 billion elements. I'm not aware of a workaround for this - perhaps someone else is, I'd like to hear it. So if you need to tablulate a centrality score for each of the 52,000 users, I can't help you. On the other hand if you want to see patterns, it might be possible to carry out the analysis on a random sample of users (say, 10%).
这种方法将不能与52,000个用户一起使用——邻接矩阵将包含> 25亿个元素。我不知道有什么变通的办法——也许是其他人,我想听听。所以如果你需要为52,000个用户中的每一个设置一个中心性分数,我不能帮你。另一方面,如果您希望看到模式,可能可以对一个随机的用户样本(比如10%)进行分析。