对shuffle=true的理解:
之前不了解shuffle的实际效果,假设有数据a,b,c,d,不知道batch_size=2后打乱,具体是如下哪一种情况:
1.先按顺序取batch,对batch内打乱,即先取a,b,a,b进行打乱;
2.先打乱,再取batch。
证明是第二种
1
2
3
4
|
shuffle ( bool , optional): set to ``true`` to have the data reshuffled
at every epoch (default: ``false``).
if shuffle:
sampler = randomsampler(dataset) #此时得到的是索引
|
补充:简单测试一下pytorch dataloader里的shuffle=true是如何工作的
看代码吧~
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
import sys
import torch
import random
import argparse
import numpy as np
import pandas as pd
import torch.nn as nn
from torch.nn import functional as f
from torch.optim import lr_scheduler
from torchvision import datasets, transforms
from torch.utils.data import tensordataset, dataloader, dataset
class dealdataset(dataset):
def __init__( self ):
xy = np.loadtxt( open ( './iris.csv' , 'rb' ), delimiter = ',' , dtype = np.float32)
#data = pd.read_csv("iris.csv",header=none)
#xy = data.values
self .x_data = torch.from_numpy(xy[:, 0 : - 1 ])
self .y_data = torch.from_numpy(xy[:, [ - 1 ]])
self . len = xy.shape[ 0 ]
def __getitem__( self , index):
return self .x_data[index], self .y_data[index]
def __len__( self ):
return self . len
dealdataset = dealdataset()
train_loader2 = dataloader(dataset = dealdataset,
batch_size = 2 ,
shuffle = true)
#print(dealdataset.x_data)
for i, data in enumerate (train_loader2):
inputs, labels = data
#inputs, labels = variable(inputs), variable(labels)
print (inputs)
#print("epoch:", epoch, "的第" , i, "个inputs", inputs.data.size(), "labels", labels.data.size())
|
简易数据集
shuffle之后的结果,每次都是随机打乱,然后分成大小为n的若干个mini-batch.
以上为个人经验,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/qq_35248792/article/details/109510917