【transformers】pytorch基础

传送门：https://transformers.run/c2/2021-12-14-transformers-note-3/

pytorch基础知识

tensor ：张量。
需要知道的内容：

张量构建
张量计算
自动微分
形状调整
广播机制
索引与切片
升降维度

Tensor

张量：理解成高纬度的向量就完事。

构造向量：

使用torch.tensor()
torch.from_numpy进行构建

>>> array = [[1.0, 3.8, 2.1], [8.6, 4.0, 2.4]]
>>> torch.tensor(array)
tensor([[1.0000, 3.8000, 2.1000],
        [8.6000, 4.0000, 2.4000]])
>>> import numpy as np
>>> array = np.array([[1.0, 3.8, 2.1], [8.6, 4.0, 2.4]])
>>> torch.from_numpy(array)
tensor([[1.0000, 3.8000, 2.1000],
        [8.6000, 4.0000, 2.4000]], dtype=torch.float64)

张量计算：

支持简单的加减乘除（针对的张量里面的对应单元），同时也支持点积计算与矩阵相乘。

e.g.


>>> x = torch.tensor([1, 2, 3], dtype=torch.double)
>>> y = torch.tensor([4, 5, 6], dtype=torch.double)
>>> print(x + y)
tensor([5., 7., 9.], dtype=torch.float64)
>>> print(x - y)
tensor([-3., -3., -3.], dtype=torch.float64)
>>> print(x * y)
tensor([ 4., 10., 18.], dtype=torch.float64)
>>> print(x / y)
tensor([0.2500, 0.4000, 0.5000], dtype=torch.float64)

>>> x.dot(y)
tensor(32., dtype=torch.float64)
>>> x.sin()
tensor([0.8415, 0.9093, 0.1411], dtype=torch.float64)
>>> x.exp()
tensor([ 2.7183,  7.3891, 20.0855], dtype=torch.float64)

除了数学运算，Pytorch 还提供了多种张量操作函数，如聚合 (aggregation)、拼接 (concatenation)、比较、随机采样、序列化等，详细使用方法可以参见 Pytorch 官方文档。

自动微分

pytorch可以进行梯度的自动计算，根据反向传播算法可以计算出来。微分计算的是特定表达式与特定的自变量值。

具体步骤：

设置自变量tensor，并且requires_grad=True
构造因变量表达式。
调用tensor.backward()
这时候计算的梯度就在${自变量}.grad里面

e.g.


>>> x = torch.tensor([2.], requires_grad=True)
>>> y = torch.tensor([3.], requires_grad=True)
>>> z = (x + y) * (y - 2)
>>> print(z)
tensor([5.], grad_fn=<MulBackward0>)
>>> z.backward()
>>> print(x.grad, y.grad)
tensor([1.]) tensor([6.])

形状调整

形状调整三种：

形状转换
转置
交换维度

形状转换一般用{tensor}.reshape({想要的shape})

剩下的转置跟交换维度，看的不是很懂？暂时放弃。

广播机制

当在计算的过程之中，发现两个张量的形状问题导致无法计算的时候，torch会自动将张量进行广播完成计算。
当然输出的最后的结果也会是最终进行广播的结果。

索引与切片

与python数组之类的类似，直接看代码


>>> x = torch.arange(12).view(3, 4)
>>> x
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
>>> x[1, 3] # element at row 1, column 3
tensor(7)
>>> x[1] # all elements in row 1
tensor([4, 5, 6, 7])
>>> x[1:3] # elements in row 1 & 2
tensor([[ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
>>> x[:, 2] # all elements in column 2
tensor([ 2,  6, 10])
>>> x[:, 2:4] # elements in column 2 & 3
tensor([[ 2,  3],
        [ 6,  7],
        [10, 11]])
>>> x[:, 2:4] = 100 # set elements in column 2 & 3 to 100
>>> x
tensor([[  0,   1, 100, 100],
        [  4,   5, 100, 100],
        [  8,   9, 100, 100]])

升降维度

直接贴图。
在这里插入图片描述


>>> a = torch.tensor([1, 2, 3, 4])
>>> a.shape
torch.Size([4])
>>> b = torch.unsqueeze(a, dim=0)
>>> print(b, b.shape)
tensor([[1, 2, 3, 4]]) torch.Size([1, 4])
>>> b = a.unsqueeze(dim=0) # another way to unsqueeze tensor
>>> print(b, b.shape)
tensor([[1, 2, 3, 4]]) torch.Size([1, 4])
>>> c = b.squeeze()
>>> print(c, c.shape)
tensor([1, 2, 3, 4]) torch.Size([4])

数据加载

首先对数据的大概处理现有一个基本的顺序流程
对于数据的大概处理是：加载数据 ---->> shuffle —>> 分为一个个minibatch —>> 丢进模型训练。

pytorch 在数据载入主要使用两个主要的数据结构：

Dataset：主要用来存储数据，并且给出映射关系，可以简单理解成最后能够给出一个数组类似的数据结构。能够进行arr[idx] 访问。
DataLoaders：主要用来训练遍历数据与完成丢进模型训练之前的操作。

dataset再细说一下，根据加载的数据类别主要分为两种（以下为个人理解）：

迭代性数据集：这种类型本身就具有一定的映射关系，所以只需要给出迭代器就完事了。
映射类数据集：说白了就类似map结构，key可以是任何的东西。那么这时候就需要给出根据特定的key返回的数据到底是什么。如果这个KEY是类似整数的结构，那么系统会自身构造一个映射关系，就能够完成像上面迭代型的任务。如果这个key非整数型，那么还需要手动添加一个映射法则，将这个key映射成一个可以遍历访问的结构。也就是sampler

秒客网