Week1 Bird recognition in the city of Peacetopia (case study)( 和平之城中的鸟类识别(案例研究))
1.Problem Statement
This example is adapted from a real production application, but with details disguised to protect confidentiality. (问题陈述:这个例子来源于实际项目,但是为了保护机密性,我们会对细节进行保护。)
You are a famous researcher in the City of Peacetopia. The people of Peacetopia have a common characteristic: they are afraid of birds. To save them, you have to build an algorithm that will detect any bird flying over Peacetopia and alert the population.
The City Council gives you a dataset of 10,000,000 images of the sky above Peacetopia, taken from the city’s security cameras. They are labelled:
(现在你是和平之城的著名研究员,和平之城的人有一个共同的特点:他们害怕鸟类。为了保 护他们,你必须设计一个算法,以检测飞越和平之城的任何鸟类,同时警告人们有鸟类飞过。 市议会为你提供了 10,000,000 张图片的数据集,这些都是从城市的安全摄像头拍摄到的。它 们被命名为:)
y = 0: There is no bird on the image (图片中没有鸟类)
y = 1: There is a bird on the image (图片中有鸟类)
Your goal is to build an algorithm able to classify new images taken by security cameras from Peacetopia.(你的目标是设计一个算法,能够对和平之城安全摄像头拍摄的新图像进行分类。)
There are a lot of decisions to make:
What is the evaluation metric? How do you structure your data into train/dev/test sets? Metric of success The City Council tells you the following that they want an algorithm that
(有很多决定要做:评估指标是什么? 你如何将你的数据分割为训练/开发/测试集? 成功的指 标 市议会告诉你,他们想要一个算法:)
Has high accuracy Runs quickly and takes only a short time to classify a new image. Can fit in a small amount of memory, so that it can run in a small processor that the city will attach to many different security cameras. Note: Having three evaluation metrics makes it harder for you to quickly choose between two different algorithms, and will slow down the speed with which your team can iterate. True/False?
( 拥有较高的准确度快速运行,只需要很短的时间来分类一个新的图像。 可以适应小内存的设备,这样它就可以运行在一个小的处理器上,它将用于城市的 安全摄像头上。 请注意: 有三个评估指标使您很难在两种不同的算法之间进行快速选择,并且会降低您的团队迭代的速度,是真的吗?)
【 】True(正确 ) 【 】False(错误)
答案
True
2.After further discussions, the city narrows down its criteria to:“We need an algorithm that can let us know a bird is flying over Peacetopia as accurately as possible.” “We want the trained model to take no more than 10sec to classify a new image.” “We want the model to fit in 10MB of memory.” (经过进一步讨论,市议会缩小了它的标准:“我们需要一种算法,可以 让我们尽可能精确的知道一只鸟正飞过和平之城。” “我们希望经过训练的模型对新图像进行 分类不会超过 10 秒。” “我们的模型要适应 10MB 的内存的设备.” )
If you had the three following models, which one would you choose?*( 如果你有以下三个模型, 你会选择哪一个?)
Test Accuracy(测试准确度) | Runtime(运行时间) | Size(内存大小) | |
---|---|---|---|
【】 | 97% | 1 sec(秒) | 3MB |
【】 | 99% | 1 3sec(秒) | 9MB |
【】 | 97% | 3 sec(秒) | 2MB |
【】 | 98% | 9 sec(秒) | 9MB |
答案
98% 9sec 9MB
3.Based on the city’s requests, which of the following would you say is true?( 问题 3 根据城市 的要求,您认为以下哪一项是正确的?)
【 】Accuracy is an optimizing metric; running time and memory size are a satisficing metrics.( 准确度是一个优化指标; 运行时间和内存大小是令人满意的指标。)
【 】Accuracy is a satisficing metric; running time and memory size are an optimizing metric. (准确度是一个令人满意的指标; 运行时间和内存大小是一个优化指标。)
【 】Accuracy, running time and memory size are all optimizing metrics because you want to do well on all three. (准确性、运行时间和内存大小都是优化指标,因为您希望在所有这三方面都 做得很好。)
【 】 Accuracy, running time and memory size are all satisficing metrics because you have to do sufficiently well on all three for your system to be acceptable. (准确性、运行时间和内存大小都 是令人满意的指标,因为您必须在三项方面做得足够好才能使系统可以被接受。)
答案
Accuracy is an optimizing metric; running time and memory size are a satisficing metrics.( 准确度是一个优化指标; 运行时间和内存大小是令人满意的指标。)
4.Structuring your data Before implementing your algorithm, you need to split your data into train/dev/test sets. Which of these do you think is the best choice? (结构化你的数据 在实现 你的算法之前,你需要将你的数据分割成训练/开发/测试集,你认为哪一个是最好的选 择?)
Train(训练集) | Dev(开发集) | Test(测试集) | |
---|---|---|---|
【】 | 6,000,000 | 1,000,000 | 3,000,000 |
【】 | 6,000,000 | 3,000,000 | 6,000,000 |
【】 | 9,500,000 | 250,000 | 250,000 |
【】 | 3,333,334 | 3,333,333 | 3,333,333 |
5.After setting up your train/dev/test sets, the City Council comes across another 1,000,000 images, called the “citizens’ data”. Apparently the citizens of Peacetopia are so scared of birds that they volunteered to take pictures of the sky and label them, thus contributing these additional 1,000,000 images. These images are different from the distribution of images the City Council had originally given you, but you think it could help your algorithm. (在设置了训 练/开发/测试集之后,市议会再次给你了 1,000,000 张图片,称为“公民数据”。 显然,和平 之城的公民非常害怕鸟类,他们自愿为天空拍照并贴上标签,从而为这些额外的 1,000,000 张图像贡献力量。 这些图像与市议会最初给您的图像分布不同,但您认为它可以帮助您的 算法。)
You should not add the citizens’ data to the training set, because this will cause the training and dev/test set distributions to become different, thus hurting dev and test set performance. True/False? (你不应该将公民数据添加到训练集中,因为这会导致训练/开发/测试集分布变得 不同,从而损害开发集和测试集性能,是真的吗?)
【 】 True(正确) 【 】 False(错误)
答案
False
\6. One member of the City Council knows a little about machine learning, and thinks you should add the 1,000,000 citizens’ data images to the test set. You object because: (市议会的一名成员对机器学习知之甚少,他认为应该将 1,000,000 个公民的数据图像添加到测试集中,你反对的原因是:)
【 】This would cause the dev and test set distributions to become different. This is a bad idea because you’re not aiming where you want to hit. (这会导致开发集和测试集分布变得不同。这 是一个很糟糕的主意,因为这会达不到你想要的效果。)
【 】The 1,000,000 citizens’ data images do not have a consistent x–>y mapping as the rest of the data (similar to the New York City/Detroit housing prices example from lecture). (公民的数 据图像与其他数据没有一致的 x- >y 映射(类似于纽约/底特律的住房价格例子)。)
【 】A bigger test set will slow down the speed of iterating because of the computational expense of evaluating models on the test set. (一个更大的测试集将减慢迭代速度,因为测试集 上评估模型会有计算开销。)
【 】The test set no longer reflects the distribution of data (security cameras) you most care about. (测试集不再反映您最关心的数据(安全摄像头)的分布。(博主注:训练集是摄像头拍的,用他人拍的数据去测试摄像头拍的,势必会导致准确度下降,要添加也应该添加到整个数据集中,保证同一分布。))
答案
【 】This would cause the dev and test set distributions to become different. This is a bad idea because you’re not aiming where you want to hit. (这会导致开发集和测试集分布变得不同。这 是一个很糟糕的主意,因为这会达不到你想要的效果。)
【 】The test set no longer reflects the distribution of data (security cameras) you most care about. (测试集不再反映您最关心的数据(安全摄像头)的分布。(博主注:训练集是摄像头拍的,用他人拍的数据去测试摄像头拍的,势必会导致准确度下降,要添加也应该添加到整个数据集中,保证同一分布。))
7.You train a system, and its errors are as follows (error = 100%-Accuracy): (你训练了一个系统,其误差度如下(误差度 = 100% - 准确度))
Training set error(训练集误差) | **4.0% ** |
---|---|
Dev set error(测试集误差) | 4.5% |
This suggests that one good avenue for improving performance is to train a bigger network so as to drive down the 4.0% training error. Do you agree? (这表明,提高性能的一个很好的途径是训练一个更大的网络,以降低 4%的训练误差。你同意吗?)
【 】 Yes, because having 4.0% training error shows you have high bias. (是的,因为有 4%的训 练误差表明你有很高的偏差。)
【 】Yes, because this shows your bias is higher than your variance. (是的,因为这表明你的模 型的偏差高于方差。)
【 】No, because this shows your variance is higher than your bias. (不同意,因为方差高于偏 差。)
【 】 No, because there is insufficient information to tell. (不同意,因为没有足够的信息,这 什么也说明不了。)
答案
【 】 No, because there is insufficient information to tell. (不同意,因为没有足够的信息,这 什么也说明不了。)
8.You ask a few people to label the dataset so as to find out what is human-level performance. You find the following levels of accuracy: (你让一些人对数据集进行标记,以便找出人们对它的识别度。你发现了准确度如下:)
bird watching expert #1 (鸟类专家 1) | 0.3% Error(误差) |
---|---|
bird watching expert #2 (鸟类专家 2) | 0.5% Error(误差) |
Normal person #1 (not a bird watching expert) (普通人 1) | 1.0% Error(误差) |
Normal person #2 (not a bird watching expert)(普通人 2) | 1.2% Error(误差) |
If your goal is to have “human-level performance” be a proxy (or estimate) for Bayes error, how would you define “human-level performance”? (如果您的目标是将“人类表现”作为贝叶斯错误的基准线(或估计),那么您如何定义“人类表现”?)
【 】0.0% (because it is impossible to do better than this) (0.0% (因为不可能做得比这更好))
【 】0.3% (accuracy of expert #1) (0.3% (专家 1 的错误率))
【 】0.4% (average of 0.3 and 0.5) (0.4% (0.3 到 0.5 之间))
【 】0.75% (average of all four numbers above) (0.75% (以上所有四个数字的平均值)
答案
0.3%
9.Which of the following statements do you agree with? (您同意以下哪个观点?)
【 】 A learning algorithm’s performance can be better than human-level performance but it can never be better than Bayes error. (学习算法的性能可以优于人类表现,但它永远不会优于 贝叶斯错误的基准线。)
【 】A learning algorithm’s performance can never be better than human-level performance but it can be better than Bayes error. (学习算法的性能不可能优于人类表现,但它可以优于贝叶斯 错误的基准线。)
【 】A learning algorithm’s performance can never be better than human-level performance nor better than Bayes error. (学习算法的性能不可能优于人类表现,也不可能优于贝叶斯错误的基 准线。)
【 】A learning algorithm’s performance can be better than human-level performance and better than Bayes error. (学习算法的性能可以优于人类表现,也可以优于贝叶斯错误的基准 线。)
答案
【 】 A learning algorithm’s performance can be better than human-level performance but it can never be better than Bayes error. (学习算法的性能可以优于人类表现,但它永远不会优于 贝叶斯错误的基准线。)
10.You find that a team of ornithologists debating and discussing an image gets an even better 0.1% performance, so you define that as “human-level performance.” After working further on your algorithm, you end up with the following: (你发现一组鸟类学家辩论和讨论图像得到一个更好的 0.1%的性能,所以你将其定义为“人类表现”。在对算法进行深入研究之后,最终得出以下结论:)
Human-level performance(人类表现) | 0.1% |
---|---|
Training set error(训练集误差) | 2.0% |
Dev set error(开发集误差) | 2.1% |
Based on the evidence you have, which two of the following four options seem the most promising to try? (Check two options.) (根据你的资料,以下四个选项中哪两个尝试起来是最有希望的?(两个选项。))
【 】 Try increasing regularization. (尝试增加正则化。)
【 】Get a bigger training set to reduce variance. (获得更大的训练集以减少差异。)
【 】 Try decreasing regularization. (尝试减少正则化。)
【 】Train a bigger model to try to do better on the training set. (训练一个更大的模型,试图 在训练集上做得更好。)
答案
【 】 Try decreasing regularization. (尝试减少正则化。)
【 】Train a bigger model to try to do better on the training set. (训练一个更大的模型,试图 在训练集上做得更好。)
11.You also evaluate your model on the test set, and find the following:( 你在测试集上评估你 的模型,并得到以下内容)
Human-level performance(人类表现) | 0.1% |
---|---|
Training set error(训练集误差) | 2.0% |
Dev set error(开发集误差) | 2.1% |
Test set error(测试集误差) | **7.0% ** |
What does this mean? (Check the two best options.)
【 】You have underfit to the dev set. (你的开发集欠拟合了。)
【 】You should try to get a bigger dev set. (你应该尝试获得更大的开发集。)
【 】You should get a bigger test set. (你应该得到一个更大的测试集。)
【 】 You have overfit to the dev set. (你的开发集过拟合了。)
答案
【★】You should try to get a bigger dev set. (你应该尝试获得更大的开发集。)
【★】 You have overfit to the dev set. (你的开发集过拟合了。)
12.After working on this project for a year, you finally achieve:( 在一年后,你完成了这个项 目,你终于实现了:)
Human-level performance(人类表现) | 0.10% |
---|---|
Training set error(训练集误差) | 0.05% |
Dev set error(开发集误差) | 0.05% |
What can you conclude? (Check all that apply.) (你能得出什么结论? (检查所有选项。))
【 】It is now harder to measure avoidable bias, thus progress will be slower going forward. (现在很难衡量可避免偏差,因此今后的进展将会放缓。)
【 】This is a statistical anomaly (or must be the result of statistical noise) since it should not be possible to surpass human-level performance. (统计异常(统计噪声的结果),因为它不可能超过 人类表现。)
【 】With only 0.09% further progress to make, you should quickly be able to close the remaining gap to 0%(只有 0.09%的进步空间,你应该很快就能够将剩余的差距缩小到 0%)
【 】 If the test set is big enough for the 0.05% error estimate to be accurate, this implies Bayes error is ≤0.05(如果测试集足够大,使得这 0.05%的误差估计是准确的,这意味着贝叶斯误差是小于等于 0.05 的。)
答案
【★】It is now harder to measure avoidable bias, thus progress will be slower going forward.
(现在很难衡量可避免偏差,因此今后的进展将会放缓。)
【★】 If the test set is big enough for the 0.05% error estimate to be accurate, this implies
Bayes error is ≤0.05(如果测试集足够大,使得这 0.05%的误差估计是准确的,这意味着贝叶斯
误差是小于等于 0.05 的。)
13.It turns out Peacetopia has hired one of your competitors to build a system as well. Your system and your competitor both deliver systems with about the same running time and memory size. However, your system has higher accuracy! However, when Peacetopia tries out your and your competitor’s systems, they conclude they actually like your competitor’s system better, because even though you have higher overall accuracy, you have more false negatives (failing to raise an alarm when a bird is in the air). What should you do? (事实证明,和平之城也雇佣了你的竞争对手来设计一个系统。您的系统和竞争对手都被提供了相同 的运行时间和内存大小的系统,您的系统有更高的准确性。然而,当你和你的竞争对手的系统进行测试时,和平之城实际上更喜欢竞争对手的系统,因为即使你的整体准确率更高,你也会有更多的假阴性结果(当鸟在空中时没有发出警报)。你该怎么办?)
【 】Look at all the models you’ve developed during the development process and find the one with the lowest false negative error rate. (查看开发过程中开发的所有模型,找出错误率最低的 模型。)
【 】Ask your team to take into account both accuracy and false negative rate during development. (要求你的团队在开发过程中同时考虑准确性和假阴性率。)
【 】Rethink the appropriate metric for this task, and ask your team to tune to the new metric. (重新思考此任务的指标,并要求您的团队调整到新指标。)
【 】Pick false negative rate as the new metric, and use this new metric to drive all further development. (选择假阴性率作为新指标,并使用这个新指标来进一步发展。)
答案
【★】Rethink the appropriate metric for this task, and ask your team to tune to the new metric.
(重新思考此任务的指标,并要求您的团队调整到新指标。)
14.You’ve handily beaten your competitor, and your system is now deployed in Peacetopia and is protecting the citizens from birds! But over the last few months, a new species of bird has been slowly migrating into the area, so the performance of your system slowly degrades because your data is being tested on a new type of data. (你轻易击败了你的竞争对手,你的系统现在被部署在和平之城中,并且保护公民免受鸟类攻击! 但在过去几个月中,一种新的鸟类已经慢慢迁移到该地区,因此你的系统的性能会逐渐下降,因为您的系统正在测试一 种新类型的数据。(博主注:以系统未训练过的鸟类图片来测试系统的性能))
You have only 1,000 images of the new species of bird. The city expects a better system from you within the next 3 months. Which of these should you do first? (你只有 1000 张新鸟类的图像, 在未来的 3 个月里,城市希望你能更新为更好的系统。你应该先做哪一个?)
【 】 Use the data you have to define a new evaluation metric (using a new dev/test set) taking into account the new species, and use that to drive further progress for your team. (使用所拥有的数据来定义新的评估指标(使用新的开发/测试集),同时考虑到新物种,并以此来推动团队的进一步发展。)
【 】Put the 1,000 images into the training set so as to try to do better on these birds. (把 1000 张图片放进训练集,以便让系统更好地对这些鸟类进行训练。)
【 】Try data augmentation/data synthesis to get more images of the new type of bird. (尝试数据增强/数据合成,以获得更多的新鸟的图像。)
【 】Add the 1,000 images into your dataset and reshuffle into a new train/dev/test split. (将 1,000 幅图像添加到您的数据集中,并重新组合成一个新的训练/开发/测试集)
答案
【★】 Use the data you have to define a new evaluation metric (using a new dev/test set) taking
into account the new species, and use that to drive further progress for your team. (使用所拥有
的数据来定义新的评估指标(使用新的开发/测试集),同时考虑到新物种,并以此来推动团
队的进一步发展。)
15.The City Council thinks that having more Cats in the city would help scare off birds. They are so happy with your work on the Bird detector that they also hire you to build a Cat detector. (Wow Cat detectors are just incredibly useful aren’t they.) Because of years of working on Cat detectors, you have such a huge dataset of 100,000,000 cat images that training on this data takes about two weeks. Which of the statements do you agree with? (Check all that agree.)( 问 题 15 市议会认为在城市里养更多的猫会有助于吓跑鸟类,他们对你在鸟类探测器上的工作 感到非常满意,他们也雇佣你来设计一个猫探测器。(哇~猫探测器是非常有用的,不是 吗?)由于有多年的猫探测器的工作经验,你有一个巨大的数据集,你有 100,000,000 猫的 图像,训练这个数据需要大约两个星期。你同意哪些说法?(检查所有选项。))
【 】 Needing two weeks to train will limit the speed at which you can iterate.( 需要两周的时 间来训练将会限制你迭代的速度。)
【 】 Buying faster computers could speed up your teams’ iteration speed and thus your team’s productivity.( 购买速度更快的计算机可以加速团队的迭代速度,从而提高团队的生产 力。)
【 】 If 100,000,000 examples is enough to build a good enough Cat detector, you might be better of training with just 10,000,000 examples to gain a ≈10x improvement in how quickly you can run experiments, even if each model performs a bit worse because it’s trained on less data.( 如果 100,000,000 个样本就足以建立一个足够好的猫探测器,你最好用 100,000,00 个样本训练,从而使您可以快速运行实验的速度提高约 10 倍,即使每个模型表现差一点因为它的训练数据较少。)
【 】Having built a good Bird detector, you should be able to take the same model and hyperparameters and just apply it to the Cat dataset, so there is no need to iterate.( 建立了一个 效果比较好的鸟类检测器后,您应该能够采用相同的模型和超参数,并将其应用于猫数据集, 因此无需迭代。)
答案
【★】 Needing two weeks to train will limit the speed at which you can iterate.( 需要两周的时
间来训练将会限制你迭代的速度。)
【★】 Buying faster computers could speed up your teams’ iteration speed and thus your
team’s productivity.( 购买速度更快的计算机可以加速团队的迭代速度,从而提高团队的生产
力。)
【★】 If 100,000,000 examples is enough to build a good enough Cat detector, you might be
better of training with just 10,000,000 examples to gain a ≈10x improvement in how quickly you
can run experiments, even if each model performs a bit worse because it’s trained on less
data.( 如果 100,000,000 个样本就足以建立一个足够好的猫探测器,你最好用 100,000,00 个样
本训练,从而使您可以快速运行实验的速度提高约 10 倍,即使每个模型表现差一点因为它的
训练数据较少。)