Deep Learning 文章选读(二)

时间:2022-10-09 05:38:22
Learning Deep Architectures for AI
Here,we assume that the computational machinery necessary to express complex behaviors(which one might label "intelligent") requires highly varying mathematical functionsthat are highly non-linear in terms of raw sensory inputs, and display a very largenumber of variations(ups and downs) across the domain of interest. We view the raw input to learning system as a high dimensional entity, made of many observed variables,which are ralated by unknown intricate(复杂的) statistical relationships. For example,using knowledge of 3D geometry of solid objects and lighting, we can relate smallvariations in underlying physical and geometric factors(such as position, orientation,lighting of an object) with changes in pixel intensities for all the pixels in an image.We call these factors of variation because they are different aspects of the data that can vary separately and often independently. In thess case, explicit(明确的) knowledge of the physical factors involved allows one to get a picture of the mathematical form of these dependencies, and of the shape of the set of image(as points in a high-dimensionalspace of pixel intensities) associated with the same 3D object. If a machine captured the factors that explain the statistical variations in the data. and how they interact togenerate the kind of data we observe, we would be able to say that the machine understandsthose aspects of the word covered by these factors of variation. Unfortunately, in general and for most factors of variation underlying natural images, we do not have an analytical understanding of these factors of variation. We do not have enough formalizedprior knowledge about the world to explain the observed variety of images, even for such an apparently simple abstraction as MAN, illustrated in Figure 1.A high-level abstractionsuch as MAN has the property that it corresponds to a very large set of possible images,which might be very different from each other from the point of view of simple Euclideandistance in the space of pixel intensities. The set of images for which that label could beappropriate forms a highly convoluted region in pixel space that is not even necessarilya connected region. The MAN category can be seen as a high-level abstraction with respect to the space of images. What we call abstraction here can be seen a category(such as the MAN category) or a feature, a function of sensory data, which can be discrete (e.g. theinput sentence is at the past tense) or continous (e.g. the input video shows an object moving at 2 meter/second). Many lower-level and intermediate-level concepts(which wealso call abstractions here) would be useful to construct a MAN-detector. Lower levelabstractions are more directly tied to particular percepts  , whereas higher level onesare what we call "more abstract" because thgeir connection to actual percepts is moreremote, and through other, intermediate-level abstrctions. 此部分指出复杂行为的相应数学表达式是复杂的,如对于3D几何图形,位置及亮度的变化几乎同时影响所有的图片像素。 这些变化因素往往是可分且独立的。这些物理因素都能被数学表达式刻画。机器学习的目的就是“学习”出这些变量与输出的关系。可惜,这些关系很难用解析式的方式加以描述。描述的过程是低水平的特征对高水平的分类的一种刻画。
In addition to the difficulty of coming up with the appropriate intermediate abstractions,the number of visual and semantic categoies(such as MAN) that we would like an "intelligent"machine ton capture is rather large. The focus of deep architecture learning is to automatically discover such abstractions, from the lowest level features to the highestlevel concepts. Ideally, we would like learning algorithms that enable this discoverywith as little human efforts as possible, i.e. without having to manually define allnecessary abstractions or having to provide a huge set of relevant hand-labeled examples.If these algorithm could tap into(挖掘) the huge resource of text and images on the web,it would certainly help to transfer much human knowledge into machine-interpretable form.这部分基本描述了刻画一个现象的复杂度,如对于一个类别图片的识别,不同的观点可以施加不同的标签,哪些标签对其的刻画最具有意义?(这等同于将这一标签特征试做随机变量,首先关心的是这个随机变量的方差——即统计信息量的多少,因为由Cauchy不等式,方差大是相关系数大的必要条件)深度学习的观点就是以数据为依据,使用最少的人工干预(最少的对非线性变换的选择,及最少的手动特征选择)对数据特征进行刻画。所以其意义在于对模型进行非指导性的过拟合处理,而进行过拟合的因素是共性层面上的,而非个性层面上的,个性层面上的过拟合例子可以参照未剪枝的决策树,决策树过拟合的特点是由未抽象的个性(即单个变量)造成的,神经网络的过拟合特点是由经过抽象的共性(如卷积运算刻画的边界、Max-Pooling刻画的validation不变性)造成的,所以深度学习是一种利用大数据量训练“无害”过拟合模型的方法。