文件名称:Modeling the Internet and the Web Probabilistic Methods and Algorithms
文件大小:2.11MB
文件格式:PDF
更新时间:2013-11-29 09:46:25
Modeling Internet Probabilistic Methods
By its very nature, a very large distributed, decentralized, self-organized, and evolving system necessarily yields uncertain and incomplete measurements and data. Probability and statistics are the fundamental mathematical tools that allow us to model, reason and proceed with inference in uncertain environments. Not only are probabilistic methods needed to deal with noisy measurements, but many of the underlying phenomena, including the dynamic evolution of the Internet and theWeb, are themselves probabilistic in nature. As in the systems studied in statistical mechanics, regularities may emerge from the more or less random interactions of myriads of small factors. Aggregation can only be captured probabilistically. Furthermore, and not unlike biological systems, the Internet is a very high-dimensional system, where measurement of all relevant variables becomes impossible. Most variables remain hidden and must be ‘factored out’ by probabilistic methods. There is one more important reason why probabilistic modeling is central to this book. At a fundamental level theWeb is concerned with information retrieval and the semantics, or meaning, of that information. While the modeling of semantics remains largely an open research problem, probabilistic methods have achieved remarkable successes and are widely used in information retrieval, machine translation, and more. Although these probabilistic methods bypass or fake semantic understanding, they are, for instance, at the core of the search engines we use every day. As it happens, the Internet and theWeb themselves have greatly aided the development of such methods by making available large corpora of data from which statistical regularities can be extracted. Thus, probabilistic methods pervasively apply to diverse areas of Internet and Web modeling and analysis, such as network traffic, graphical structure, information retrieval engines, and customer behavior.