文件名称:谷歌发表的关于Goods的论文,高清英文原版
文件大小:336KB
文件格式:ZIP
更新时间:2022-03-13 09:30:34
Goods 谷歌论文 元数据管理 数据管理架构
Goods: Organizing Google’s Datasets Alon Halevy 2 , Flip Korn 1 , Natalya F. Noy 1 , Christopher Olston 1 , Neoklis Polyzotis 1 , Sudip Roy 1 , Steven Euijong Whang 1 1 Google Research 2 Recruit Institute of Technology alon@recruit.ai, {flip, noy, olston, npolyzotis, sudipr, swhang}@google.com 1. INTRODUCTION Most large enterprises today witness an explosion in the number of datasets that they generate internally for use in ongoing research and development. The reason behind this explosion is simple: by allowing engineers and data scientists to consume and generate datasets in an unfettered manner, enterprises promote fast development cycles, experimentation, and ultimately innovation that drives their competitive edge. As a result, these internally generated datasets often become a prime asset of the company, on par with source code and internal infrastructure. However, while enterprises have developed a strong culture on how to manage the latter, with source-code development tools and methodologies that we now consider “standard” in the industry (e.g., code versioning and indexing, reviews, or testing), similar approaches do not generally exist for managing datasets. We argue that developing principled and flexible approaches to dataset management has become imperative, lest companies run the risk of internal siloing of datasets, which, in turn, results in significant losses in productivity and opportunities, duplication of work, and mishandling of data
【文件预览】:
Goods谷歌论文-英文原本.pdf