顶会论文复现:PROVING TEST SET CONTAMINATION IN BLACK BOX LANGUAGE MODELS-4 结果

时间:2024-10-08 11:44:23

在这里插入图片描述
在这里插入图片描述

{“pval”: 0.1696232809691942, “permutations_per_shard”: 25, “num_shards”: 5, “canonical_logprobs”: [-3.3727318899495287, -26.976947896884596, -8.280387231770165, -11.1112375389544, -15.317197114102502], “shuffled_logprobs”: [[-30.909075783811705, -1.9435003494767502, -6.068986559756483, -26.58900908523132, -16.12269960305306, -4.46379730144066, -2.1558121800502787, -2.7554792693991, -31.682284527334303, -2.6273379016797502, -29.87468835264795, -29.607210920316206, -21.57213257741471, -11.95329938606544, -9.972366131973049, -1.09951527892729, -24.01362313224146, -12.456106343552321, -13.67304127957505, -8.3861853631837, -1.19935666177955, -1.3937543557773802, -1.8002136455626179, -18.009020852617073, -17.578153150829802], [-1.048417377427077, -2.3941244789539704, -8.412327189846044, -24.544644694362702, -5.74892321528065, -1.055017764263738, -9.581590557731854, -7.1433768327109, -2.737799236512142, -23.983025399790144, -9.26424030310054, -14.993957307794997, -2.9504655498746724, -12.080805583617936, -30.364195487487766, -1.9539559864239302, -9.9784173216152, -5.28962663901724, -15.477334895809188, -12.511526170812552, -2.6651197975443917, -7.0888789550949, -3.2381118496705326, -9.586995443264478, -19.668003974102017], [-4.233909482393801, -7.066849438078104, -5.8159291705155995, -10.790016564647766, -5.962899019632103, -5.1748830459693185, -2.6900913199189995, -14.64220487293797, -12.072412084641194, -7.0405692357728995, -3.757379365485161, -4.3277333949891, -18.239703727872094, -1.2460728438048796, -1.5030126277381202, -3.4466863958886114, -4.680143685284249, -4.651795277712018, -20.354000748485447, -1.4471048784444498, -10.138775905959701, -18.178129422154928, -8.530598762226427, -7.489915270562131, -3.1585280028892795], [-11.69612743268735, -7.769248518398181, -6.86862614461919, -6.518956643516701, -3.803943939116793, -13.014972477420848, -8.689137998628949, -15.809698635575, -7.99394916393605, -10.31342520238305, -15.928287922934501, -4.502634127164701, -8.7768807739485, -4.220711983509, -28.029167855395826, -2.81686953962755, -9.31479084741635, -11.158157243828, -6.721924684535599, -10.066673909472277, -4.500597344717, -21.480636945940205, -9.300124195747498, -11.9573958015312, -14.577081120902347], [-6.629690439094301, -10022.289585636432, -7.999280733182201, -11.308060008804496, -5.2692347206537, -7.0436708680337015, -4.26530791126701, -4.130906993623899, -5.9630648153422, -7.950204238607298, -7.942466538914552, -12.449199286857947, -25.78205265161044, -17.262547473382632, -5.530209510927001, -8.1570511425078, -5.8230390775959995, -5.532957394563099, -16.9681575425189, -5.454541042652198, -4.4566292699186, -2.14531132561838, -38.43645063084328, -33.65827228043184, -2.714607539457402]]}