文件名称:HDInsight Succinctly(Syncfusion,2014)
文件大小:1.89MB
文件格式:PDF
更新时间:2018-04-09 03:39:38
hdinsight hadoop azure bigdata
Microsoft’s Big Data Platforms The world of data is changing in a big way and expectations about how to interact and analyze that data are changing as a result. Microsoft offers a broad and scalable portfolio of data storage capabilities for structured, unstructured, and streaming data—both on-premises and in the cloud. Microsoft has been present in the traditional BI space through the SQL Server platform which scales quite satisfactorily into the hundreds of gigabytes range without too much need for specialist hardware or clever configuration. Since approximately 2010, Microsoft has also offered a couple of specialist appliances to scale higher: the SQL Server Fast Track Data Warehouse for anything up to 100 terabytes, and the SQL Server Parallel Data Warehouse (PDW) for anything entering the petabyte scale. However, these platforms only deal with relational data and the open-source movement overtook Microsoft (and indeed many other vendors) with the emergence of Hadoop. Microsoft did have a similar platform internally called Dryad but, shortly before Dryad was expected to go live, it was dropped in favor of creating a distribution of Hadoop in conjunction with Hortonworks.1 2 From that decision point, various previews of the platform were made available as on-premises or cloud versions. Early in 2013, the HDInsight name was adopted for the preview (replacing the original “Hadoop on Azure” name) and the cloud platform became generally available in October 2013. The on-premises version is, at the time of this writing, still in preview with no firm release date. Aspects of these technologies are working their way back into the relational world: The 2.0 version of the Parallel Data Warehouse features support for Hadoop including a language called PolyBase that allows queries to include relational and nonrelational data in the same statements.