Teaching Machines to Understand Us 让机器理解我们 之二 深度学习的历史

时间:2022-06-19 13:25:33

Deep history

深度学习的历史

The roots of deep learning reach back further than LeCun’s time at Bell Labs. He and a few others who pioneered the technique were actually resuscitating a long-dead idea in artificial intelligence.

深度学习的研究之根是在LeCun在Bell实验室研究之前的。他和其他几个人作为这种技术的先驱者,实际上是重新利用了人工智能中一个早就废弃的技术。

When the field got started, in the 1950s, biologists were just beginning to develop simple mathematical theories of how intelligence and learning emerge from signals passing between neurons in the brain. The core idea — still current today — was that the links between neurons are strengthened if those cells communicate frequently. The fusillade of neural activity triggered by a new experience adjusts the brain’s connections so it can understand it better the second time around.

当神经网络这个领域在20世纪50年代开始的时候,生物学家只是想用简单的数学理论来描述智能和学习是如何从人脑中的神经元之间互相传递信号中产生的。其核心思想(今天仍然是)是如果这些细胞频繁通讯,神经元之间的联结会得到增强。一种新经验触发的猛烈的神经活动调整了大脑的神经元联结,这样在第二次遇到这种情况的时候,会更好的进行理解。

In 1956, the psychologist Frank Rosenblatt used those theories to invent a way of making simple simulations of neurons in software and hardware. The New York Times announced his work with the headline “Electronic ‘Brain’ Teaches Itself.”  Rosenblatt’s perceptron, as he called his design, could learn how to sort simple images into categories—for instance, triangles and squares. Rosenblatt usually implemented his ideas on giant machines thickly tangled with wires, but they established the basic principles at work in artificial neural networks today.

在1956年,心理学家Frank Rosnblatt用这些理论发明了一种在软件和硬件中对神经元进行简单模拟的方法。《纽约时报》用标题“电子大脑教自己东西”来宣布了他的工作。Rosenblatt称之为感知机,可以学习对简单的图像进行分类,比如,三角形和方形。Rosenblatt通常在缠绕着线团的巨大机器中实现他的想法,但他们建立了今天人工神经网络的基本准则基础。

One computer he built had eight simulated neurons, made from motors and dials connected to 400 light detectors. Each of the neurons received a share of the signals from the light detectors, combined them, and, depending on what they added up to, spit out either a 1 or a 0. Together those digits amounted to the perceptron’s “description” of what it saw. Initially the results were garbage. But Rosenblatt used a method called supervised learning to train a perceptron to generate results that correctly distinguished diferent shapes. He would show the perceptron an image along with the correct answer. Then the machine would tweak how much attention each neuron paid to its incoming signals, shifting those “weights” toward settings that would produce the right answer. After many examples, those tweaks endowed the computer with enough smarts to correctly categorize images it had never seen before. Today’s deep-learning networks use sophisticated algorithms and have millions of simulated neurons, with billions of connections between them. But they are trained in the same way.

他建立的一台计算机有八个模拟神经元,用马达和拨号盘连到了400个光检测器上。每个神经元接收光检测器中的一部分信号,将每个神经元的输入合并,根据总计结果的不同,输出1或者0。这些数字一起描述了感知机“看到”的东西。开始的时候输出的结果毫无意义,但Rosenblatt用了监督学习的方法来训练一个感知机,产生能够正确区分不同形状的结果。然后这台机器就可以调整每个神经元对输入信号的注意力,也就是向能产生正确答案的方向调整“权重”。经过很多例子后,这些调整使计算机有了足够的智能,能够正确的对从没见过的图像进行分类。今天的深度学习网络用了复杂的算法,有数百万个模拟神经元,其间有数十亿个联结,但它们的训练方法是一样的。

Rosenblatt predicted that perceptrons would soon be capable of feats like greeting people by name, and his idea became a linchpin of the nascent field of artificial intelligence. Work focused on making perceptrons with more complex networks, arranged into a hierarchy of multiple learning layers. Passing images or other data successively through the layers would allow a perceptron to tackle more complex problems. Unfortunately, Rosenblatt’s learning algorithm didn’t work on multiple layers. In 1969 the AI pioneer Marvin Minsky, who had gone to high school with Rosenblatt, published a book-length critique of perceptrons that killed interest in neural networks at a stroke. Minsky claimed that getting more layers working wouldn’t make perceptrons powerful enough to be useful. Artificial intelligence researchers abandoned the idea of making software that learned. Instead, they turned to using logic to craft working facets of intelligence—such as an aptitude for chess. Neural networks were shoved to the margins of computer science.

Rosenblatt预测感知机很快就可以带着名字向人打招呼,他的思想也成为了人工智能领域初期的关键。工作都集中在将感知机推广到更复杂的网络,以及将感知机级联成多层学习层次。使图像或其他数据相继通过各层次,这能使感知机能处理更复杂的问题。不幸的是,Rosenblatt的学习算法在多层情况下并不适用。在1969年,人工智能的先驱者,Marvin Minsky,他和Rosenblatt一起上过中学,出版了一本书,以很大的篇幅批评了感知机模型,认为其一下子就消灭了人们对神经网络的兴趣。Minsky认为用更多的层次不能使感知机更有用。人工智能的研究者抛弃了研制能学习的软件的想法。相反,他们转向用逻辑来制作智能的各个方面,比如下棋。神经网络被抛到了计算机科学的边缘。

Nonetheless, LeCun was mesmerized when he read about perceptrons as an engineering student in Paris in the early 1980s. “I was amazed that this was working and wondering why people abandoned it,” he says. He spent days at a research library near Versailles, hunting for papers published before perceptrons went extinct. Then he discovered that a small group of researchers in the United States were covertly working on neural networks again. “This was a very underground movement,” he says. In papers carefully purged of words like “neural” and “learning” to avoid rejection by reviewers, they were working on something very much like Rosenblatt’s old problem of how to train neural networks with multiple layers.

但是,当LeCun在20世纪80年代初还是一个巴黎的工程学学生的时候,读到了感知机的内容,就迷住了。他说:“我对这种模型可以工作感到非常神奇,搞不懂人们为什么会抛弃它”。他在凡尔赛附近的一个研究图书馆待了好几天,寻找感知机消失前发表的论文。然后他发现美国的一个研究小组正在再次秘密的进行神经网络的研究工作。“这是一个非常地下的活动”,他说。论文中都小心的清除了像“神经”和“学习”这样的词语,以免被审稿者拒绝,他们研究的问题很像Rosenblatt的老问题,如何在多层情况中训练神经网络。

LeCun joined the underground after he met its central figures in 1985, including a wry Brit named Geoff Hinton, who now works at Google and the University of Toronto. They immediately became friends, mutual admirers—and the nucleus of a small community that revived the idea of neural networking. They were sustained by a belief that using a core mechanism seen in natural intelligence was the only way to build artificial intelligence. “The only method that we knew worked was a brain, so in the long run it had to be that systems something like that could be made to work,” says Hinton.

LeCun在1985年与其核心人物进行了会面,然后就加入了这个地下组织,核心人物中包括一个人,名叫Geoff Hinton,他现在在Google和多伦多大学工作。他们立刻成为了朋友,并互相仰慕,并成为了这个小团体的核心,他们致力于复兴神经网络的思想。维持他们的一个信仰,那就是只能通过使用自然智能中核心机制,才能创造出人工智能。Hintion说:“我们知道的唯一方法就是大脑,所以长期来看,只能是类似大脑的系统才能真正工作”。

LeCun’s success at Bell Labs came about after he, Hinton, and others perfected a learning algorithm for neural networks with multiple layers. It was known as backpropagation, and it sparked a rush of interest from psychologists and computer scientists. But after LeCun’s check-reading project ended, backpropagation proved tricky to adapt to other problems, and a new way to train software to sort data was invented by a Bell Labs researcher down the hall from LeCun. It didn’t involve simulated neurons and was seen as mathematically more elegant. Very quickly it became a cornerstone of Internet companies such as Google, Amazon, and LinkedIn, which use it to train systems that block spam or suggest things for you to buy.

LeCun、Hinton和其他人完善了多层神经网络的学习算法,并在Bell实验室取得了成功。算法被称之为BP算法,即反向传播算法,这点燃了从心理学家到计算机科学家的一股兴趣。但在LeCun的支票读取工程结束后,人们发现反向传播算法难以应用到其他问题中去,这时Bell实验室的一个研究人员,继续了LeCun的道路,找到了一种训练软件整理数据的新方法。这与刺激神经元没有关系,并在数学上更加讲究。很快这成了互联网公司比如Google、Amazon以及LinkedIn的基础,这些公司使用这种技术对系统进行训练,使其过滤垃圾邮件,或给你推荐购买的东西。

After LeCun got to NYU in 2003, he, Hinton, and a third collaborator, University of Montreal professor Yoshua Bengio, formed what LeCun calls “the deep-learning conspiracy.” To prove that neural networks would be useful, they quietly developed ways to make them bigger, train them with larger data sets, and run them on more powerful computers. LeCun’s handwriting recognition system had had five layers of neurons, but now they could have 10 or many more. Around 2010, what was now dubbed deep learning started to beat established techniques on real-world tasks like sorting images. Microsoft, Google, and IBM added it to speech recognition systems. But neural networks were still alien to most researchers and not considered widely useful. In early 2012 LeCun wrote a fiery letter—initially published anonymously—after a paper claiming to have set a new record on a standard vision task was rejected by a leading conference. He accused the reviewers of being “clueless” and “negatively biased.”

在LeCun2003年到了纽约大学之后,他、Hinton和第三位合作者,蒙特利尔大学的教授Yoshua Bengio,形成了LeCun称之为“深度学习密谋”的小组。为了证明神经网络是有用的,他们悄悄的将其发展壮大,用更大的数据集来训练它们,在更强大的计算机上进行运算。LeCun的手写体识别系统有5层神经元,但现在他们的设计有10层或者更多。在2010年左右,现在称之为深度学习的技术开始在现实任务,比如图像分类中击败传统技术。微软、Google和IBM将其在语音识别的系统中进行了应用。但神经网络仍然被很多研究者视为异类,大部分人认为其没什么用。在2012年开始的时候,他的一篇论文声称在标准视觉任务中创下了新记录,但被一个主要会议拒稿了,这时LeCun写了一封措辞激烈的信,开始时是匿名发表的,他指责那些审稿者“毫无根据”以及“有负面偏向”。

Everything changed six months later. Hinton and two grad students used a network like the one LeCun made for reading checks to rout the field in the leading contest for image recognition. Known as the ImageNet Large Scale Visual Recognition Challenge, it asks software to identify 1,000 types of objects as diverse as mosquito nets and mosques. The Toronto entry correctly identified the object in an image within five guesses about 85 percent of the time, more than 10 percentage points better than the second-best system (see Innovator Under 35 Ilya Sutskever, page 47). The deep-learning software’s initial layers of neurons optimized themselves for finding simple things like edges and corners, with the layers after that looking for successively more complex features like basic shapes and, eventually, dogs or people.

六个月后,一切都改变了。Hinton和两个研究生使用了LeCun在读取支票系统中的网络,在图像识别的主要比赛中击败了其他人,这个比赛就是ImageNet大规模视觉识别挑战赛。比赛要求算法能识别1000种种类繁多的物体,从蚊子到*。多伦多参赛队正确的识别了图像中要求的5个物体,用了85%的时间,比第二名的系统多了10个百分点(见Innovator Under 35 Ilya Sutskever,47页)。深度学习算法的初始层神经元的优化主要为了找到简单的特征如边缘和角点,后面的层主要是寻找更复杂的特征比如基本的形状,最后才是狗或人这些目标。

LeCun recalls seeing the community that had mostly ignored neural networks pack into the room where the winners presented a paper on their results. “You could see right there a lot of senior people in the community just flipped,” he says. “They said, ‘Okay, now we buy it. That’s it, now—you won.’”

LeCun回忆到,看到那些忽视神经网络的团体全都聚集在一个房间里,在这个房间里胜出的团队发表了其结果的论文。他说:“你在那里可以看到很多团体内的资深人士都发狂了,他们说,‘好好,现在我们认了,就这样把,现在你赢了’”。

Journey of Acceptance

1956: Psychologist Frank Rosenblatt uses theories about how brain cells work to design the perceptron, an artificial neural network that can be trained to categorize simple shapes.

1969: AI pioneers Marvin Minsky and Seymour Papert write a book critical of perceptrons that quashes interest in neural networks for decades.

1986: Yann LeCun and Geof Hinton perfect backpropagation to train neural networks that pass data through successive layers of artificial neurons, allowing them to learn more complex skills.

1987: Terry Sejnowski at Johns Hopkins University creates a system called NETtalk that can be trained to pronounce text, going from random babbling to recognizable speech.

1990: At Bell Labs, LeCun uses backpropagation to train a network that can read andwritten text. AT&T later uses it in machines that can read checks.

1995: Bell Labs mathematician Vladimir Vapnik publishes an alternative method for training software to categorize data such as images. This sidelines neural networks again.

2006: Hinton’s research group at the University of Toronto develops ways to train much larger networks with tens of layers of artificial neurons.

June 2012: Google uses deep learning to cut the error rate of its speech recognition software by 25 percent.

October 2012: Hinton and two colleagues from the University of Toronto win the largest challenge for software that recognizes objects in photos, almost halving the previous error rate.

March 2013: Google buys DNN Research, the company founded by the Toronto team to develop their ideas. Hinton starts working at Google.

March 2014: Facebook starts using deep learning to power its facial recognition feature, which identifies people in uploaded photos.

May 2015: Google Photos launches. The service uses deep learning to group photos of the same people and let you search your snapshots using terms like “beach” or “dog.”

接受神经网络的过程

1956年:心理学家Frank Rosenblatt用脑细胞如何工作的理论设计了感知机,这是一个人工神经网络,经过训练后可以对简单的形状进行分类。

1969年:人工智能的先驱者Marvin Minsky和Seymour Papert写了一本书,批评了感知机模型,在几十年里压制了对神经网络的研究兴趣。

1986年:Yann LeCun和Geoff Hinton完善了反向传播算法,可以在多层神经网络中进行训练,得以学到更复杂的技能。

1987年:霍普金斯大学的Terry Sejnowski开发了一个NETtalk系统,经过训练可以发音,从随机的发音到可以识别的语音。

1990年:在Bell实验室,LeCun使用反向传播算法训练了一个网络,可以读取手写字。AT&T后来用这种算法开发了可以读取支票的机器。

1995年:Bell实验室的数学家Vladimir Vapnik发表了一种替代方法来训练软件对图像之类的数据进行分类。这使神经网络再次边缘化。

2006年:多伦多大学Hinton的研究小组研究出了一种方法训练有数十层神经元的大型神经网络。

2012.06:Google用深度学习技术将其语音识别的软件的错误率降低了25个百分点。

2012.10:Hinton和多伦多大学的两个同事赢得了最大的图像目标识别的比赛,几乎将原来的错误率降低了一半。

2013.03:Google收购了DNN Research,多伦多团队成立的一家公司,Hinton开始在Google开展工作。

2014.03:Facebook开始使用深度学习技术增强自己的人脸识别特性,从人们上传的图片中识别人脸。

2015.05:发布了Google Photos。这项服务用深度学习来对同一个人的图片进行分类,你还可以用“沙滩”或“狗”还搜索你的所有图片。

Academics working on computer vision quickly abandoned their old methods, and deep learning suddenly became one  of the main strands in artificial intelligence. Google bought a company founded by Hinton and the two others behind the 2012 result, and Hinton started working there part time on a research team known as Google Brain. Microsoft and other companies created new projects to investigate deep learning. In December 2013, Facebook CEO Mark Zuckerberg stunned academics by showing up at the largest neural-network research conference, hosting a party where he announced that LeCun was starting FAIR (though he still works at NYU one day a week).

计算机视觉方面的研究人员迅速的放弃了老式方法,深度学习突然成为了人工智能的主线之一。Google收购了Hinton和其他2个同事因为2012年比赛成立的公司,所以Hinton开始在一个叫Google Brain的研究小组做兼职工作。微软和其他公司也形成了研究深度学习的新工程。在2013年12月,Facebook CEO扎克伯格出现在最大的神经网络研究会议上,震惊了学术界,他主持了一个聚会,宣布LeCun开始在FAIR工作(他现在仍然每周在纽约大学工作一天)。

LeCun still harbors mixed feelings about the 2012 research that brought the world around to his point of view. “To some extent this should have come out of my lab,” he says. Hinton shares that assessment. “It was a bit unfortunate for Yann that he wasn’t the one who actually made the breakthrough system,” he says. LeCun’s group had done more work than anyone else to prove out the techniques used to win the ImageNet challenge. The victory could have been his had student graduation schedules and other commitments not prevented his own group from taking on ImageNet, he says. LeCun’s hunt for deep learning’s next breakthrough is now a chance to even the score.

LeCun对2012年那次让全世界都聚焦在他的观点的研究怀有复杂的感情。他说:“一定程度上来说,这应当是出自我的实验室的工作”。Hinton也同意这样的评估,他说:“对于Yann来说这有一点点不幸,没有亲手做出这个突破”。LeCun的小组为了证明赢得ImageNet挑战算法的有效性做了最多的工作。他说,如果不是他的学生毕业的计划和其他承诺,这些阻碍了他的小组参加ImageNet,这个胜利应当是他的。LeCun对深度学习下一个突破的追求是扳回比分的一个机会。