文件名称:Large Scale and Big Data Processing and Management.pdf
文件大小:83.01MB
文件格式:PDF
更新时间:2021-10-18 17:29:48
Big Data
Information from multiple sources is growing at a staggering rate. The number of Internet users reached 2.27 billion in 2012. Every day, Twitter generates more than 12 TB of tweets, Facebook generates more than 25 TB of log data, and the New York Stock Exchange captures 1 TB of trade information. About 30 billion radiofrequency identifcation (RFID) tags are created every day. Add to this mix the data generated by the hundreds of millions of GPS devices sold every year, and the more than 30 million networked sensors currently in use (and growing at a rate faster than 30% per year). These data volumes are expected to double every two years over the next decade. On the other hand, many companies can generate up to petabytes of information in the course of a year: web pages, blogs, clickstreams, search indices, social media forums, instant messages, text messages, email, documents, consumer demographics, sensor data from active and passive systems, and more. By many estimates, as much as 80% of this data is semistructured or unstructured. Companies are always seeking to become more nimble in their operations and more innovative with their data analysis and decision-making processes, and they are realizing that time lost in these processes can lead to missed business opportunities. In principle, the core of the Big Data challenge is for companies to gain the ability to analyze and understand Internet-scale information just as easily as they can now analyze and understand smaller volumes of structured information. In particular, the characteristics of these overwhelming flows of data, which are produced at multiple sources are currently subsumed under the notion of Big Data with 3Vs (volume, velocity, and variety). Volume refers to the scale of data, from terabytes to zettabytes, velocity reflects streaming data and large-volume data movements, and variety refers to the complexity of data in many different structures, ranging from relational to logs to raw text