文件名称:Practical Big Data Analytics_Hands-Packt Publishing(2018).pdf
文件大小:56.98MB
文件格式:PDF
更新时间:2021-02-10 15:45:25
BigData Hadoop Spark NoSQL
This book introduces the reader to a broad spectrum of topics related to big data as used in the enterprise. Big data is a vast area that encompasses elements of technology, statistics, visualization, business intelligence, and many other related disciplines. To get true value from data that oftentimes remains inaccessible, either due to volume or technical limitations, companies must leverage proper tools both at the software as well as the hardware level. To that end, the book not only covers the theoretical and practical aspects of big data, but also supplements the information with high-level topics such as the use of big data in the enterprise, big data and data science initiatives and key considerations such as resources, hardware/software stack and other related topics. Such discussions would be useful for IT departments in organizations that are planning to implement or upgrade the organizational big data and/or data science platform. The book focuses on three primary areas: 1. Data mining on large-scale datasets Big data is ubiquitous today, just as the term data warehouse was omnipresent not too long ago. There are a myriad of solutions in the industry. In particular, Hadoop and products in the Hadoop ecosystem have become both popular and increasingly common in the enterprise. Further, more recent innovations such as Apache Spark have also found a permanent presence in the enterprise - Hadoop clients, realizing that they may not need the complexity of the Hadoop framework have shifted to Spark in large numbers. Finally, NoSQL solutions, such as MongoDB, Redis, Cassandra and commercial solutions such as Teradata, Vertica and kdb+ have provided have taken the place of more conventional database systems. This book will cover these areas with a fair degree of depth. Hadoop and related products such as Hive, HBase, Pig Latin and others have been covered. We have also covered Spark and explained key concepts in Spark such as Actions and Transformations. NoSQL solutions such as MongoDB and KDB+ have also been covered to a fair extent and hands-on tutorials have also been provided.