【文件属性】:
文件名称:谷歌发表的关于Goods的论文,高清英文原版
文件大小:336KB
文件格式:ZIP
更新时间:2022-03-13 09:30:34
Goods 谷歌论文 元数据管理 数据管理架构
Goods: Organizing Google’s Datasets
Alon Halevy
2
, Flip Korn
1
, Natalya F. Noy
1
, Christopher Olston
1
, Neoklis Polyzotis
1
,
Sudip Roy
1
, Steven Euijong Whang
1
1
Google Research
2
Recruit Institute of Technology
alon@recruit.ai, {flip, noy, olston, npolyzotis, sudipr, swhang}@google.com
1. INTRODUCTION
Most large enterprises today witness an explosion in the number of datasets that they generate internally for use in ongoing research and development. The reason behind this explosion is simple: by allowing engineers and data scientists to consume and generate datasets in an unfettered manner, enterprises promote fast development cycles, experimentation, and ultimately innovation that
drives their competitive edge. As a result, these internally generated datasets often become a prime asset of the company, on par
with source code and internal infrastructure. However, while enterprises have developed a strong culture on how to manage the latter, with source-code development tools and methodologies that we
now consider “standard” in the industry (e.g., code versioning and
indexing, reviews, or testing), similar approaches do not generally exist for managing datasets. We argue that developing principled
and flexible approaches to dataset management has become imperative, lest companies run the risk of internal siloing of datasets,
which, in turn, results in significant losses in productivity and opportunities, duplication of work, and mishandling of data
【文件预览】:
Goods谷歌论文-英文原本.pdf