【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

时间:2023-01-12 22:59:04


目录

  • ​​前言​​
  • ​​简介​​
  • ​​ABSTRACT​​
  • ​​1 INTRODUCTION​​
  • ​​2 RELATED WORK​​
  • ​​3 LEMANE FRAMEWORK​​
  • ​​3.1 Trainable Proximity Measure​​
  • ​​3.2 Training process of Lemane​​
  • ​​3.3 Training Lemane with sub-sampling​​
  • ​​4 GENERALIZED PUSH​​
  • ​​4.1 Generalized Push Algorithm​​
  • ​​4.2 Final Embedding​​
  • ​​5 EXPERIMENTS​​
  • ​​5.1 Experimental settings​​
  • ​​5.2 Link Prediction​​
  • ​​5.3 Node Classification​​
  • ​​6 CONCLUSION​​
  • ​​读后总结​​
  • ​​2022/07/14 第一次阅读​​
  • ​​结语​​

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

前言

Hello!
非常感谢您阅读海轰的文章,倘若文中有错误的地方,欢迎您指出~
 
自我介绍 ଘ(੭ˊᵕˋ)੭
昵称:海轰
标签:程序猿|C++选手|学生
简介:因C语言结识编程,随后转入计算机专业,获得过国家奖学金,有幸在竞赛中拿过一些国奖、省奖…已保研。
学习经验:扎实基础 + 多做笔记 + 多敲代码 + 多思考 + 学好英语!
 
唯有努力????
 

知其然 知其所以然!

 
本文仅记录自己感兴趣的内容

简介

会议:KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (CCF A类)

代码:https://github.com/Josiah96Zhang/Lemane

年度:2021/08/14

ABSTRACT

节点嵌入学习图中每个节点的低维表示

在节点嵌入方面的最新进展表明,接近矩阵分解方法具有卓越的性能,并可扩展到具有数百万节点的大型图

现有的方法

  • 首先定义一个接近矩阵
  • 然后通过矩阵分解来学习符合接近度的嵌入

现有的矩阵分解方法大多对不同的任务采用相同的接近度,但观察到不同的任务和数据集可能需要不同的接近度,限制了它们的表示能力

受此启发,我们提出了Lemane,这是一个具有可训练的邻近度量的框架,可以通过学习来自动适应手头的数据集和任务

我们的方法是端到端的,它将可微SVD结合到流水线中,这样就可以通过反向传播来训练参数

然而,这个学习过程在大型图上仍然是昂贵的

为了提高可扩展性,我们只在仔细的下采样图上训练接近测度,然后使用学习到的接近度在原始图上应用标准接近矩阵分解

需要注意的是,对于大型图来说,计算每对图的学习接近度仍然是昂贵的,而且现有的计算接近度的技术并不适用于学习的接近度

因此,我们提出了通用的推送技术,使我们的解决方案可扩展到具有数百万节点的大型图

大量实验表明,我们提出的解决方案在几乎所有数据集的链路预测和节点分类任务上都优于现有的解决方案

1 INTRODUCTION

节点嵌入是将原始图中的节点映射到低维表示的任务

对于每个节点,它输出一个嵌入向量,嵌入向量在保留图中的结构信息和其他底层属性方面起着重要的作用

这些向量可以输入到机器学习模型中,促进广泛的机器学习任务,如节点分类[20,33,37],链接预测[40,44],图重构[45,47],推荐[16,48]


一类性能优良的重要节点嵌入方法是利用邻近矩阵分解的方法

对于这类解,他们首先定义了输入图中节点的接近矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding相对于节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的接近测度

不同的方法可能采用不同的接近测度,个性化的PageRank (PPR)是节点嵌入中常用的接近测度选择

例如

  • 在NRP[45]中采用PPR作为邻近度
  • 在文[47]中,作者进一步提出采用【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding作为邻近度
  • 其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding关于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的PPR
  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是转置图【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding关于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的PPR,
  • 通过反转【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding中每条边的方向,给出了两个嵌入向量【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,使得【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

这一类的现有解决方案,嵌入向量通常是通过奇异值分解(SVD)或特征分解在【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding上或在与【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding密切相关的稀疏矩阵上得到的

尽管它们取得了成功,所有现有的矩阵分解方法的目的都是学习一种保留所选择的邻近性的嵌入,而不考虑这种邻近性是否适合数据集上的任务

然而,可以观察到,不同的任务和数据集可能需要不同的接近度来实现高性能,这限制了此类解决方案的表示能力

此外,已有的节点嵌入方法如STRAP[47]和NetMF[36]表明,对邻近矩阵进行非线性操作(如取对数或取softmax)有助于提高嵌入的表示能力

然而,大多数最新的矩阵分解方法,如HOPE[31]、AROPE[52]和NRP[45],并没有显式地导出接近矩阵,这限制了它们的表示能力

对于显式导出接近矩阵的方法,表明需要对最终的接近矩阵进行矩阵分解

因此,要推导出可训练的接近度测度,模型需要是端到端的,并将接近度计算和SVD分解纳入训练过程

这进一步增加了挑战,特别是当输入图有数百万个节点时


贡献:

  • 由于现有解决方案的局限性,我们提出了一个有效的框架【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,该框架具有可训练的邻近度量,可以学习到最适合手头的数据集和任务自动
  • 我们的可训练接近度测量的灵感来自个性化PageRank (PPR)[32]。PPR 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding可以定义为从【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding开始的α-discounted random walk在节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding处停止的概率,其中从源节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding开始的α-discounted random walk在当前节点处停止的概率为α,并且有(1−α)的概率随机跳转到当前节点的外邻居
  • 对于α-折现随机漫步,大多数将始终是单跳随机漫步,这可能不是最具代表性的任务
  • 这促使我们学习一个更有代表性的随机漫步作为我们的可训练接近测量
  • 我们不再将每一步的停止概率固定为α,而是将可训练的接近度定义在有监督的随机漫步上,其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding跳的停止概率由定义的损失函数学习
  • 那么,我们的节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding相对于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的可训练接近度,被称为监督PPR 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,是从【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding开始的监督随机游走在【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding处停止的概率

为了学习监督随机漫步每一跳的停止概率,我们为不同的任务设计了不同的损失函数,以学习手头任务更有代表性的接近


本文重点研究了节点嵌入中两个比较流行的任务:链路预测和节点分类

在给定损失函数和可训练参数的情况下,我们设计了一种端到端方法,该方法在管道中加入了可微的SVD,以便通过反向传播来训练参数

我们的解决方案主要受到以前基于学习的低秩逼近[23]的工作的启发,其中包括一个可微的SVD,它允许梯度在框架中轻松流动,以解决低秩逼近问题

在我们的框架中,利用可微SVD,梯度可以很容易地从损失函数和可微SVD流到我们的邻近矩阵计算,邻近矩阵计算由训练参数,即监督随机漫步每跳的停止概率决定


但是,上面的训练过程太昂贵,并且不能扩展到大型图

为了提高可扩展性,我们只在仔细的下采样图上训练监督随机行走的停止概率,然后使用学习的邻近性对原始图应用标准接近矩阵分解

我们的主要观察是监督随机游走的每一跳的停止概率是节点无关的

因此,对于仔细的子采样图,学习到的任务在每一跳上的停止概率应该仍然与输入图上的相似

基于此,我们提出了一种有效的基于子图采样的方法,在多个采样子图上训练参数,这提高了我们的Lemane的可扩展性


最后,给定学习到的概率,计算每个节点对的学习接近度对于大图来说仍然是昂贵的,并且不适用于学习到的接近度

因此,我们提出了通用的推送技术,使我们的解决方案可扩展到具有数百万节点的大型图

给定源节点s,我们的广义推算法以【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding代价计算每个节点关于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的近似监督PPR评分

  • 其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是控制近似监督PPR评分质量的参数

然后用【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding代价和【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding空间计算邻近矩阵

在邻近矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding上应用稀疏SVD算法,以【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding运行代价导出最终嵌入,其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding为嵌入维数


在我们的实验中,我们将我们的Lemane与15种现有的节点嵌入方法在链接预测和节点分类任务上进行了比较,使用了6个真实的数据集,其中有300万个节点和1.17亿个边

大量实验表明,我们的Lemane方法在几乎所有数据集的链路预测和节点分类任务上都优于现有方法

2 RELATED WORK

有三种基本的节点嵌入方法:跳跃图方法、矩阵分解方法和神经网络方法

接下来,我们简要回顾每个类别的现有作品

Skip-gram methods

这类方法的灵感来自于用于自然语言处理的word2vec模型[29]的巨大成功

  • DeepWalk[33]首先提出通过向Skip-gram模型输入截断的随机漫步来训练嵌入向量。然后,从随机游动中抽样的节点被视为正样本。随后的方法尝试探索更有代表性的随机游动,以输入到Skip-gram模型中
  • LINE[37]采用单跳和两跳随机游走
  • Node2vec[20]提出探索利用图的DFS和BFS特性的高阶随机游走
  • VERSE[40]和APP[53]采用α-折扣随机游动获得正样本
  • 最近,InfiniteWalk[9]研究了DeepWalk在窗口大小为无穷大时的极限,将DeepWalk与图拉普拉斯矩阵联系起来

Matrix factorization methods

节点嵌入的另一个想法是对选定的邻近矩阵进行矩阵分解

  • 为了显式地推导接近矩阵,例如NetMF[36]的情况,它通常需要【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding开销,对于大型图来说太昂贵了
  • 为了避免【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的运行成本,提出HOPE[31]、AROPE[52]和NRP[45]在不显式计算邻近矩阵的情况下导出嵌入
  • 例如,NRP不是先计算全对接近度得分,然后再分解接近矩阵S,而是转而对邻接矩阵进行SVD,对大多数现实图来说,邻接矩阵是稀疏的,降低了嵌入的计算成本
  • 另一种避免【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding代价的解决方案是计算一个稀疏化的接近矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,其代表是STRAP[47],它施加一个阈值【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,并对每个节点返回最多【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding不小于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的接近分数,使接近矩阵的大小为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding。然后将奇异值分解应用于稀疏接近矩阵。由于第二个解显式地导出了接近矩阵,它允许对接近矩阵进行非线性操作,提高了表示能力

Neural network methods

深度学习为生成节点嵌入提供了另一种解决方案

  • SDNE[43]和DNGR[8]采用带有目标矩阵的多层自编码器来生成嵌入
  • DRNE[41]利用层归一化LSTM[22]通过递归聚合每个节点的邻居表示来生成节点嵌入
  • GraphGAN[44]采用了著名的生成式对抗网络[18],通过对抗性极大极小博弈进行图表示学习
  • AW[5]提出了一种新的关于转移矩阵幂级数的注意模型,通过优化上游目标来引导随机游走注意到随机游走中的重要位置

这些解决方案的瓶颈是计算成本高,这限制了这些方法只能使用小的图

Other methods

还有一些方法不属于上述三类

例如

  • GraphWave[14]通过无监督的方式利用热小波扩散模式来学习节点表示
  • NetHiex[28]捕获图的底层分层分类法,以学习具有多个组件的节点表示
  • RaRE[21]提出了一种同时考虑节点的社会等级和邻近性的节点嵌入方法,并分别学习一个节点的两种表示。AutoNE[42]将AutoML集成到节点嵌入中,可以自动优化原始图的子图的超参数
  • PBG[25]提出了一种分布式嵌入系统,它使用邻接矩阵的块分解作为一种分区方法来扩展到任意大的图
  • 最近的一项工作GraphZOOM[12]首先基于融合图生成子图,然后应用现有方法生成节点嵌入

由于这些方法不能保持任何节点对的接近性,主要的问题是它们对下游任务的有效性不足,我们将在实验中展示这一点

也有各种针对特定图设计的图嵌入方法,如动态图[19,39]和异构网络[13,17]

在本文中,我们重点讨论了网络是静态的且没有给出特征向量的最基本情况

3 LEMANE FRAMEWORK

3.1 Trainable Proximity Measure

从第1节重述,节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding相对于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的个性化页面排名(PPR) 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding来自【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的α-discounted random walk在节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding处停止的概率

  • 其中α-discounted random walk具有在当前节点处停止的α概率和(1−α)随机跳到其外部邻居之一的概率

定义转移矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是对角线矩阵,使得【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是出度(分别为度)

如果【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是有向的(分别无向),并且【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是邻接矩阵

然后,PPR邻近度矩阵可以表示为:

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

在我们的可训练邻近度度量中,我们不是将每一步的停止概率固定为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,而是允许在第【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding跳的随机游动的停止概率【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是可训练的

目前,我们假设给定【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding步的停止概率【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,并将在第3.2节中说明如何训练【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

在每一步遵循这种学习的停止概率的随机游动被表示为监督随机游动,并且从监督随机游动导出的邻近度被表示为监督PPR

受监督的PPR邻近矩阵可以被定义为:

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

为了解释,有监督的随机游动恰好在第【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding跳停止的概率是【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

因此,通过对在每一跳停止的概率进行求和,我们得到了监督PPR分数

然后,当【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding时,可以用算法1中所示的迭代方式来计算准确的监督PPR分数

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

然而,我们观察到当【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding足够大时,有监督的随机游走在跳数大于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding时停止的概率接近于零

因此,我们丢弃所有以大于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的跳数停止的监督随机行走


下面的定理显示了使用算法1计算的监督PPR分数的质量。

定理1.设【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是由算法1导出的监督PPR邻接矩阵,【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的无穷范数,我们有

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

我们定理的所有证明都可以在附录中找到

根据我们的观察,当我们设置L=15时,有监督的随机游走在大于L的跳数处停止的概率几乎为零

因此,在我们的实验中,L被设置为15

算法1在监督PPR邻近度矩阵和我们的训练参数【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding之间建立了紧密的联系,允许梯度通过反向传播从导出的监督PPR邻近度矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding流向训练参数

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding分别表示节点数和边数, 由于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding个元素的稀疏矩阵,因此该算法的时间复杂度为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding


注意, 学习每一跳的停止概率的想法在图神经网络中并不是一个新的想法,

例如[10,24]。对于神经网络来说,这要容易得多,因为停止概率可以被视为额外的权值来学习

然而,将这一思想融入到节点嵌入中,尤其是对矩阵分解方法而言,更具挑战性

我们展示了一种高效且有效的解决方案,它适用于大型图,它不是平凡的

3.2 Training process of Lemane

算法2显示了Lemane训练过程的伪代码

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

  1. 首先,它初始化【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding时的停止概率【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,使随机漫步在第【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding跳处停止的概率遵循某种标准分布,例如均匀分布、几何分布或泊松分布(行1)
  2. 接下来,它使用这些停止概率的初始设置,通过调用算法1(第3行)来导出监督PPR评分

给定监督的PPR分数,它消除了所有接近度分数太小。包含一个阈值【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,所有小于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的分数都设置为0(第4-5行)

  1. 然后,对邻近矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding进行非线性运算【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

接近度分数除以【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,以确保取对数后的每个条目是非负的(第6行)

  1. 然后,在输入参数为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的新矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding上应用一个可微的SVD,如​​23​​
  2. SVD得到两个【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,以及一个【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding对角矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,使【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding
  3. 给定这三个矩阵,返回【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding作为第一个嵌入矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding和返回【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding作为第二个嵌入矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

随后,将这两个嵌入矩阵输入到损失函数中,用于链路预测或节点分类(第9行)

稍后将讨论损失函数

然后,通过反向传播的方法根据损失函数更新停止概率

训练过程终止,直到损失函数收敛(第2行)

注意,由于计算链非常长,将梯度的显式形式写下来是不可行的
然而,像现代的深度神经网络一样,我们可以使用PyTorch中的自适应特征来数值计算与训练参数相关的梯度
此外,PyTorch中内置的SVD支持计算梯度,因此我们直接采用PyTorch实现

下面,我们详细阐述损失函数的细节

Loss function for link prediction

链路预测的损失函数

对于链路预测,我们的损失函数有两个部分

​第一部分​​​目的是保证从节点u开始的监督随机游动的总信息与其出度接近。令【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

损失函数的第一部分如下:

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding


式中

  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的第【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding
  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的第【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding
  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的出度

式2表示我们通过监督随机漫步学习的嵌入应该尽可能保留出度信息

链路预测损失函数的​​第二部分​​是所有现有边的平均交叉熵:

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

  • 其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding为sigmoid函数
  • 如果输入图中存在边【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,则【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,否则【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

链路预测的最终损失函数为:

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding


其中β和γ是两个平衡超参数

Loss function for node classification

节点分类的损失函数

节点分类的损失函数也分为两部分

我们首先将两个输出嵌入矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding串联在一起,得到唯一的嵌入矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

然后遵循两个随机选择的节点具有高概率的不同标签的事实,我们在每次迭代中随机抽样一个小的负节点对【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding表示边集【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding为非归一化拉普拉斯矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,
  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding为G中具有相同标签k的节点组成的完整图

受[50]中提出的松弛的启发,定义节点分类的第一个损失函数如下:

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

式中

  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding为节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的级联表示
  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding为图中类标签的总数
  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的非标准化拉普拉斯矩阵
  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的目标是最小化具有相同类标签的节点对之间的成对距离,最大化负样本之间的成对距离

接下来,我们使用一个激活函数将输出嵌入归一化到预测类标签的概率分布:【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

其中

  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是一个由均匀分布生成的【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding固定映射矩阵
  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是偏差项
  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding表示节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding有类标签【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的概率
  • 【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding表示【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding标签矩阵

其中,如果节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding有类标签【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,则【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,否则,【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

那么,节点分类损失函数的第二部分是所有节点的平均交叉熵:

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding


根据【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding定义节点分类的最终损失函数如下:

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding


其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是两个平衡超参数


注意,Lemane在节点分类上的训练算法需要额外的标签信息,我们随机抽取5%的标签节点进行训练

为了与我们的竞争对手进行公平的比较,为了训练分类器,如果我们将x%的数据用于训练,剩余的(100 - x)%用于测试
我们将只包含(x - 5)%的新训练数据。这保证了我们的Lemane与其他竞争对手相比只访问相同数量的标记数据

3.3 Training Lemane with sub-sampling

上述训练过程需要计算【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding规模的密集矩阵,运行成本为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,无法扩展到大型图

然而,这种代价乍一看似乎是不可避免的,因为我们需要获得接近矩阵来进行反向传播

经过仔细分析,我们得出以下两点结论,以帮助我们避免高昂的运行成本

  • 我们的第一个关键观察是,我们需要训练的参数只是每一跳的停止概率,这是节点独立的。也就是说,如果我们能在输入图G中找到一个子图,在这个子图上学习到的停止概率和在【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding上学习到的停止概率是相同的,那么我们就可以简单地学习较小尺寸的子图上的参数,并将学习到的参数直接应用到原始图上,从而降低了计算成本
  • 另一个观察结果是,连接相似的【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding子图在同一任务上应该与输入图有相似的学习停止概率

基于此,我们提出了基于子采样的大型图上Lemane训练方法

显然,一个简单的解决方案是对【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding个节点进行抽样,然后考虑包含这【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding个节点的子图
然而,这样的解决方案严重降低了节点之间的连接性
简单的边缘采样策略也会面临类似的困境,它会影响采样节点之间的连通性

为了保持采样节点之间的连通性,我们对子图采样应用了BFS风格的遍历

我们的目标仍然是对具有常数【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding个节点的子图进行抽样

  • 首先从输入图【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding中随机抽取源节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,然后对源节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding进行BFS遍历,探索节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的本地社区
  • 如果BFS从【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding开始访问的节点数小于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,则随机抽取另一个节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding作为源进行BFS。只要访问了总共【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding个节点,BFS采样就会停止

但是,在单个子图上训练的权值可能会有偏差,使学习到的停止概率不可推广到原始输入图【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

为了使学习到的参数可推广到输入图,我们使用上述策略对多个子图进行采样来学习参数

特别是,在每次迭代中,我们使用BFS策略对一个子图进行采样,然后根据该子图上定义的损失函数更新训练参数,即停止概率

对子图上的损失函数进行相应的修改

  • 将方程2和6中的节点总数替换为样本容量【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding
  • 将式3中的边总数m替换为样本大小【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding
  • 式5中标签为k的节点集变为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding为子图的节点集,【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding中标签为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的节点集
  • 式5中的负样本集被更改为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

通过这样的采样技术,算法2的时间复杂度可以限定在【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding范围内,其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding为算法2收敛前的训练迭代次数

由于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是一个可控制的常数,我们设置了【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding以便邻近矩阵可以被输入到GPU内存中进行更有效的训练

4 GENERALIZED PUSH

给定学习到的停止概率,一个简单的解决方案是调用算法1来导出监督PPR分数

但是,这将导致【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的运行成本,这对于大型图来说是难以承受的

为了解决这一问题,我们提出了一种广义推算法,以【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding代价有效地计算监督PPR接近矩阵,其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是一个参数,以控制计算代价和接近矩阵的稀疏性

4.1 Generalized Push Algorithm

我们计算监督PPR接近矩阵的主要思想是推导出一个稀疏的接近矩阵,使监督PPR分数不大于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的接近矩阵可以安全地丢弃

但是,我们需要多少成本才能得到一个足够准确的近似接近分数?

在STRAP[47]中,作者提出了【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,代价为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的PPR估计的解

但是,我们能否在不牺牲嵌入质量的前提下进一步降低运行成本呢?

在这里,我们给出一个肯定的答案。我们的解决方案的灵感来自局部图聚类算法local - push[6],它在【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的运行时间内返回关于源【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的近似ppr,并保证

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

在无向图上,其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是节点v的度

local - push算法表明,如果我们只想计算关于一个源的局部图簇周围的近似PPR分数,运行时间可以减少

在我们的例子中,节点嵌入的目的是找到它们的代表节点,并且它们的局部图簇中的节点是完美的代表

然而,Local-Push算法只适用于PPR,不适用于我们的监督式PPR

因此,我们提出了一个适用于任意停止概率的广义推算法


算法3显示了我们的广义推送算法的伪代码

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

给定源节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,维护一个向量【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding来存储在每个节点停止的监督随机漫步的部分,它是关于源【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的估计监督PPR分数

此外,对于每一跳【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding (【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是3.1节中引入的截断监督随机漫步的最大长度),会保持一个额外的剩余向量【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

向量【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding表示从【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding开始的监督随机游走中目前停留在第k跳但还没有停止的部分

因此,如果剩余向量都为零,它将返回精确的监督PPR得分

最初,除了【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding(第1-2行),其余剩余向量都为0,说明监督随机游动最初都停留在【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,还没有停止

然后,如果【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding剩余向量中的任何一个条目【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding高于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding(第3行),则调用一个推操作(第4-7行)

特别的是,它首先将【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding转换为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding(第4行)

为了解释,【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding随机游动的【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding部分在第【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding跳处停止

接下来,【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding随机游动中剩下的【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding部分随机跳转到v的外邻域(第5-6行)

因此,对于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的每一个外邻居u,残差【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding递增【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

推操作完成后,残差【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding设为零(第7行)。当任意【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding不存在大于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的残差【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding时,算法终止


定理2: 算法3运行时间为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding


定理3: 设【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding为有监督的PPR,考虑【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding跳内的随机游动

然后对于无向图,算法3对每个节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding返回【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的估计【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding:

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

通过设置【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,算法3的运行时间为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding。同时,算法3中的误差界可以用【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding来界定

由于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding可以被视为常数,因此【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的运行时间仍然是【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,对于任意节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,我们有:

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding


上述分析表明,我们的广义推送算法可以提供与Local-Push算法相同的结果质量,且其渐近运行成本相同,从而对源节点的代表性节点返回高质量的结果。

4.2 Final Embedding

给出了广义的推算法,我们最后展示了如何输出图的嵌入

在STRAP[47]之后,我们通过将集合【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的每条边的方向颠倒为【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding来计算输入图【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding和转置图【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding上的监督PPR,

  • 其中【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding关于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的监督PPR,而【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding是转置图【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding上的【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding关于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding的监督PPR

请注意,我们并没有将这部分引入到我们的训练阶段以减少计算成本,因为我们对输入图【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding和转置图【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding使用相同的停止概率

算法4展示了如何计算近似接近分数

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

对于每个节点【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,我们计算输入图【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding和转置图【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding上的近似监督PPR(第3-4行)

对于任何近似监督PPR评分【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,只有当其大于阈值【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding时,才加入到【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

同样,只有【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding大于【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding(第5-9行)时,【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding上的每个近似监督PPR评分【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding才加到【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

通过这种修剪策略,将邻近矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding进行稀疏化,使其包含【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding个非零项

然后,对【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding应用一个非线性操作【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

注意,在取对数之前,所有条目都要除以【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,以确保值是非负的(第10行)

然后将得到的矩阵【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding送入一个稀疏SVD,以导出最终的嵌入矩阵X和Y(第11-12行)

我们有以下定理关于M的运行时间和分解质量

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

5 EXPERIMENTS

5.1 Experimental settings

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

5.2 Link Prediction

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding


【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

5.3 Node Classification

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding


【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding

6 CONCLUSION

在本文中,我们提出了Lemane,学习可训练的接近测度,以自动最适合手头的数据集和任务

实验结果表明,与最先进的方法相比,Lemane可以学习更多有代表性的嵌入

读后总结

2022/07/14 第一次阅读

天气炎热 状态不佳

这篇文章总体上还是利用矩阵分解的方法计算NE

之前没有涉猎过这方面的内容,所以理解不深

读完后,只懂了浅层次

  • 利用PPR求节点之间的接近度,这里作者提出针对不同的任务采用不同的【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding,是可监督、可训练的
  • 针对大型图,作者利用Local-Push算法,该算法用到常规PPR计算可行,但是对于作者提出可训练的PPR则不可行
  • 为此作者改进了该算法,使之可以应用到自己的算法上(确实很厉害!)
  • 同时采用BFS采样的方式提取子图,用于训练

读完之后,emm,自己太菜了吧

理解很少,但还是有点感触

一步一步 循序渐进 原始算法解决不了 那就改进

结语

文章仅作为个人学习笔记记录,记录从0到1的一个过程

希望对您有一点点帮助,如有错误欢迎小伙伴指正

【论文阅读|浅读】Lemane:Learning Based Proximity Matrix Factorization for Node Embedding