机器学习与分布式机器学习
If you are managing a tech team as a product or project manager, here is what you need to know about machine learning.
如果您要以产品或项目经理的身份管理技术团队,这是您需要了解的有关机器学习的知识。
Machine learning and deep learning have been popular buzz words for the last five years. The demand for .ai domains has skyrocketed.
在过去的五年中,机器学习和深度学习一直是流行的流行语。 .ai域名的需求猛增。
But beyond all the hype surrounding machine learning, it is hard to grasp the basic concepts with ease if you are an absolute beginner.
但是,除了围绕机器学习的所有炒作之外,如果您是绝对的初学者,那么很难轻松掌握基本概念。
Given the pervasive nature of ML and AI, almost every product can have a machine learning use case now. So in this article, we'll look at machine learning in-depth and equip you with the knowledge you need for your next technical conversation.
鉴于ML和AI的无处不在,现在几乎每种产品都可以拥有机器学习用例。 因此,在本文中,我们将深入研究机器学习,并为您提供下一次技术对话所需的知识。
什么是机器学习? (What is Machine Learning?)
Machine learning is a branch of Artificial Intelligence. Artificial Intelligence as a whole includes many general concepts that aim to simulate human thinking.
机器学习是人工智能的一个分支。 人工智能作为一个整体包含许多旨在模拟人类思维的一般概念。
Machine learning focuses only on one key aspect: making machines learn.
机器学习仅关注一个关键方面:使机器学习。
Machine learning is the science of getting computers to make decisions without being explicitly programmed.
机器学习是使计算机无需显式编程即可做出决策的科学。
In the past decade, machine learning has given us self-driving cars, face recognition, chatbots, and many other useful applications. Machine learning is powering so many tools that we use on a daily basis.
在过去的十年中,机器学习为我们提供了自动驾驶汽车,面部识别,聊天机器人和许多其他有用的应用程序。 机器学习为我们每天使用的众多工具提供了强大的动力。
机器学习如何工作? (How Does Machine Learning Work?)
Machine learning uses algorithms to analyze massive amounts of data and draw conclusions from it. When you combine large data sets with high computing power, these algorithms can understand patterns and relationships between data.
机器学习使用算法来分析大量数据并从中得出结论。 将大型数据集与强大的计算能力结合在一起时,这些算法可以理解数据之间的模式和关系。
For example, let's look at a simple dataset:
例如,让我们看一个简单的数据集:
x = 1,2,3,4,5
x = 1,2,3,4,5
y = 1,4,9,16,25
y = 1,4,9,16,25
If you look at the above numbers, you'll see that the relationship between x and y is that y is a square of x (that is, y = x²).
如果看上面的数字,您会看到x和y之间的关系是y是x的平方(即y =x²)。
In machine learning, the job of an algorithm is to find this function that defines the relationship between the input and output. Once this function has been established, it is easy to predict future values.
在机器学习中,算法的工作是找到定义输入和输出之间关系的函数。 一旦建立了此功能,就很容易预测未来的价值。
For example, if x is 10, y is 100.
例如,如果x为10,则y为100。
Though this example is too simple, it should give you an idea of how machine learning models work.
尽管此示例太简单了,但它应该使您了解机器学习模型的工作方式。
Consider a complex dataset like predicting housing prices.
考虑一个复杂的数据集,例如预测房价。
This dataset will contain area codes, square footage, and many other inputs with the price as the output. If you have a dataset with thousands of these input features and the final price, you can train a model to predict the average price based on new inputs.
该数据集将包含区号,平方英尺和许多其他输入,以价格为输出。 如果您的数据集包含成千上万个输入特征和最终价格,则可以训练模型以根据新输入预测平ASP格。
Machine learning problems usually involve finding the relationship between the inputs and outputs to find the ‘hypothesis function’. In our earlier example, the hypothesis function was y = x².
机器学习问题通常涉及找到输入和输出之间的关系以找到“假设函数”。 在我们前面的示例中,假设函数为y =x²。
Real-world hypothesis functions are much more complex than this. We then use that function to find answers for custom inputs.
现实世界中的假设函数要比这复杂得多。 然后,我们使用该函数查找自定义输入的答案。
In a nutshell, machine learning, in most cases, is advanced statistics combined with computational capacity. Today, machine learning powers technologies like facial recognition, sentiment analysis, and many others.
简而言之,在大多数情况下,机器学习是结合了计算能力的高级统计信息。 如今,机器学习为面部识别, 情感分析等技术提供了强大的支持。
学习算法的类型 (Types of Learning Algorithms)
Let's look at the type of problems you will come across when working with machine learning. Firstly, there are three ways by which you can make machines learn.
让我们看看使用机器学习时遇到的问题类型。 首先,可以通过三种方法使机器学习。
监督学习 (Supervised Learning)
In supervised learning, you provide clear inputs to a machine learning algorithm. The algorithm knows what to learn from the data and the conclusions expected from it.
在监督学习中,您为机器学习算法提供了清晰的输入。 该算法知道从数据中学到什么以及从中得出的结论。
For example, for recognizing the difference between a cat and a dog, you train an algorithm with thousands of images. Each of these images will be labeled accordingly.
例如,为了识别猫和狗之间的差异,您需要训练具有数千张图像的算法。 这些图像中的每一个都将相应地标记。
Once you run this data through the algorithm, the algorithm learns and understands the differences. Thus, it can predict, with reasonable accuracy, whether a new image is a cat or a dog.
通过算法运行此数据后,算法将学习并理解差异。 因此,它可以以合理的准确度预测新图像是猫还是狗。
无监督学习 (Unsupervised Learning)
Labeling data is important to build a supervised model. However, companies collect large datasets on a daily basis. Labeling these datasets to make the job of a machine learning model easier is not an elegant way to approach this problem.
标记数据对于构建监督模型很重要。 但是,公司每天都会收集大量数据集。 标记这些数据集以简化机器学习模型的工作并不是解决此问题的绝妙方法。
This is where unsupervised learning comes in. You can use unsupervised learning algorithms to cluster data based on available attributes. This data can then be fed into supervised learning models to achieve higher prediction accuracy.
这就是无监督学习的地方。您可以使用无监督学习算法根据可用属性对数据进行聚类。 然后可以将此数据输入到有监督的学习模型中,以实现更高的预测精度。
Unsupervised learning models are more challenging than supervised learning models. You can find more information and examples here, and you can learn more about important machine learning algorithms here.
无监督学习模型比有监督学习模型更具挑战性。 您可以在此处找到更多信息和示例,在这里您可以从中获得有关重要机器学习算法的更多信息 。
强化学习 (Reinforcement Learning)
No machine learning algorithm is 100% accurate. The level of accuracy depends on the dataset you train the algorithm with.
没有机器学习算法是100%准确的。 准确度取决于您使用算法训练的数据集。
This means that after you train an algorithm, there can be new datasets available. These datasets might have the potential to improve the accuracy of your model considerably.
这意味着在训练算法后,可能会有新的数据集可用。 这些数据集可能具有极大提高模型准确性的潜力。
You can use reinforcement learning for these types of scenarios. Reinforcement learning is the concept of updating the algorithm while it is in production. Reinforcement learning models can retrain based on new inputs.
您可以对这些类型的方案使用强化学习。 强化学习是在生产过程中更新算法的概念。 强化学习模型可以根据新的输入进行重新训练。
For example, a self-driving car can learn about a new type of terrain after it has traveled through that terrain. This will be taken into account by the self-driving car’s algorithm the next time it has to choose a route.
例如,无人驾驶汽车经过某种地形后,便可以了解该地形。 自动驾驶汽车的算法在下次选择路线时会考虑到这一点。
机器学习问题的类型 (Types of Machine Learning Problems)
Machine Learning problems can be classified into four subcategories based on the type of result you are looking for.
机器学习问题可以根据您要寻找的结果类型分为四个子类别。
分类 (Classification)
Classification models produce a result that belongs to a finite set. Examples of classification models include spam/not spam, 0 or 1 (binary classification), positive/negative/neutral, and so on.
分类模型产生的结果属于有限集。 分类模型的示例包括垃圾邮件/非垃圾邮件,0或1(二进制分类),正/负/中性等。
回归 (Regression)
Regression models produce results that belong to a range. Examples include predicting stock market prices, weather forecasting, and more. These are not limited to a finite set of values and hence are called regression problems.
回归模型产生的结果属于一个范围。 示例包括预测股市价格,天气预报等。 这些不限于一组有限的值,因此称为回归问题。
聚类 (Clustering)
Clustering is a key concept in unsupervised learning. Clustering helps you group data that have similar attributes. Once these groups have been established, it becomes easier to train them using supervised models.
聚类是无监督学习中的关键概念。 群集可帮助您对具有相似属性的数据进行分组。 一旦建立了这些小组,就可以使用监督模型来训练他们。
Learn more about clustering here.
降维 (Dimensionality Reduction)
Dimensionality Reduction is another unsupervised learning technique. Using Dimensionality reduction, you can reduce a complex dataset with thousands of features into a simple one with maybe a hundred inputs.
降维是另一种无监督的学习技术。 使用降维,您可以将具有数千个特征的复杂数据集简化为可能包含一百个输入的简单数据集。
Similar to clustering, dimensionality reduction is often used to reduce noise from large datasets before feeding them into supervised training models.
与聚类相似,降维通常用于减少大型数据集的噪声,然后再将其输入到监督的训练模型中。
You can find a more in-depth article on Dimensionality reduction here.
什么是深度学习? (What is Deep Learning?)
Deep learning is Machine learning on steroids.
深度学习是类固醇的机器学习。
There are many algorithms in machine learning. One that stands out is a Neural Network.
机器学习中有很多算法。 杰出的一个是神经网络。
The difference between other machine learning algorithms and a neural network is that you can stack neural networks together — as many as you want.
其他机器学习算法和神经网络之间的区别在于,您可以将神经网络堆叠在一起-可以任意数量地堆叠。
This helps us solve complex problems like facial recognition and self-driving since these types of problems come with thousands of inputs in real-time.
这可以帮助我们解决诸如面部识别和自动驾驶之类的复杂问题,因为这些类型的问题需要实时输入数千个输入。
Using neural networks, you can solve almost any complex problem with high accuracy, if you have the data and computing power needed for the model to run.
如果您具有运行模型所需的数据和计算能力,则可以使用神经网络以高精度解决几乎任何复杂的问题。
Neural networks have been around for decades, but it was the availability of large datasets and computing power that bought them back to life. Now deep learning is one of the most exciting fields in the industry.
神经网络已经存在了数十年,但是大型数据集的可用性和强大的计算能力使它们重获新生。 现在,深度学习已成为业界最激动人心的领域之一。
为什么需要机器学习? (Why Do You Need Machine Learning?)
Let's look at some popular machine learning solutions that we use every day.
让我们看一下我们每天使用的一些流行的机器学习解决方案。
语音助手 (Voice assistants)
Ever wondered how Siri can understand and interpret your voice commands? The answer is machine learning. You can find a voice assistant in almost every smartphone now, thanks to the advancements in Natural Language Processing.
有没有想过Siri如何理解和解释您的语音命令? 答案是机器学习。 由于自然语言处理技术的进步,您现在几乎可以在所有智能手机中找到语音助手。
Even though it is hard for computers to understand natural language, thanks to machine learning, we have Alexa, Cortana, and Siri.
尽管计算机很难理解自然语言,但由于有了机器学习,我们有了Alexa,Cortana和Siri。
产品推荐 (Product recommendations)
Recommendation engines are a profitable use case for e-commerce companies. If you can find the right products to recommend, chances are your customer will make multiple purchases.
推荐引擎对于电子商务公司来说是一个有利可图的用例。 如果您找到合适的产品来推荐,您的客户很可能会多次购买。
Machine learning algorithms can understand user behavior from past purchases. This helps them recommend similar products when a customer is shopping on your website.
机器学习算法可以了解过去购买的用户行为。 当客户在您的网站上购物时,这可以帮助他们推荐类似的产品。
Recommendations are not limited to e-commerce. This applies to products like Spotify or Netflix that recommend the music or movies you like.
建议不仅限于电子商务。 这适用于Spotify或Netflix等推荐您喜欢的音乐或电影的产品。
聊天机器人 (Chatbots)
Customer support can make or break your company, especially if you are a startup. The more users you attract, the more customer support you have to provide.
客户支持可能会成败公司,特别是在您创办公司的情况下。 您吸引的用户越多,您必须提供的客户支持就越多。
Chatbots are a huge time saver when it comes to interacting with customers. Since the majority of your customers will have common questions, you can design a chatbot that can answer redundant questions.
当与客户进行交互时,聊天机器人可节省大量时间。 由于大多数客户都会遇到常见问题,因此您可以设计一个聊天机器人来回答多余的问题。
You don't have to employ additional customer service professionals or make your customers wait in a queue. Chatbots are saving businesses time and money, thanks to Machine Learning.
您不必雇用其他的客户服务专业人员,也不必让您的客户排队等候。 借助机器学习,聊天机器人可以节省企业的时间和金钱。
垃圾邮件过滤 (Spam filtering)
Spam filtering is a simple yet powerful application of Machine Learning. It is the reason why Gmail or Outlook can filter out spam emails for you with high accuracy.
垃圾邮件过滤是机器学习的一种简单而强大的应用程序。 这就是Gmail或Outlook可以为您准确过滤掉垃圾电子邮件的原因。
Spam filtering systems are also built to learn from experience. This model, also called reinforcement learning, can understand your preferences better when you mark an email as spam.
垃圾邮件过滤系统也可以从经验中学习。 当您将电子邮件标记为垃圾邮件时,此模型也称为强化学习,可以更好地了解您的偏好。
We now have cleaner inboxes, thanks to Machine Learning.
借助机器学习,我们现在的收件箱更加整洁。
语言翻译 (Language translation)
What would we do without Google Translate? Machine learning-based language translation engines save businesses millions every year.
没有Google翻译,我们该怎么办? 基于机器学习的语言翻译引擎每年可为企业节省数百万美元。
Before machine learning, translation services were entirely human-powered. Thanks to machine learning, you can translate large data sets to any language in a matter of mere minutes.
在机器学习之前,翻译服务完全是由人力提供的。 借助机器学习,您可以在短短几分钟内将大型数据集转换为任何语言。
工具和框架 (Tools and Frameworks)
Machine learning and deep learning are accomplished by using different libraries and frameworks. Though other languages have their own tools, Python is usually the preferred language for machine learning.
机器学习和深度学习是通过使用不同的库和框架来完成的。 尽管其他语言都有自己的工具,但是Python通常是机器学习的首选语言。
Here are a few Python frameworks you can use to build your next machine learning or deep learning product.
以下是一些Python框架,可用于构建下一个机器学习或深度学习产品。
-
Scikit-learn — Popular for machine learning problems. Great community support. Not suitable for complex deep learning models.
Scikit-learn-流行于机器学习问题。 社区的大力支持。 不适合复杂的深度学习模型。
-
Tensorflow — Most popular deep learning framework. Built by Google. Supports all complex deep learning models like CNNs and RNNs
Tensorflow-最受欢迎的深度学习框架。 由Google建立。 支持所有复杂的深度学习模型,例如CNN和RNN
-
PyTorch — Built by Facebook, scalable, and offers high performance.
PyTorch —由Facebook构建,可扩展,并提供高性能。
I recently wrote a blog post on popular deep learning frameworks if you are interested.
如果您感兴趣,我最近写了一篇有关流行的深度学习框架的博客文章 。
结论 (Conclusion)
Machine learning has the potential to transform every industry. From voice assistants to self-driving cars, the applications of machine learning are all around us today. It can help you understand your customers better and make smarter decisions with data.
机器学习具有改变每个行业的潜力。 从语音助手到自动驾驶汽车,如今机器学习的应用无处不在。 它可以帮助您更好地了解客户,并根据数据做出更明智的决策。
I hope this article helped you get a good grasp of machine learning and deep learning concepts. If you are fascinated by machine learning as much as I am, check out the Machine Learning course on Coursera by Prof. Andrew Ng.
我希望本文能帮助您更好地掌握机器学习和深度学习的概念。 如果您像我一样着迷于机器学习,请查看吴安德教授在Coursera上的机器学习课程 。
I regularly write about Machine Learning, Cyber Security, and DevOps. You can signup for my weekly newsletter here.
我定期撰写有关机器学习,网络安全和DevOps的文章。 您可以 在这里 注册我的 每周新闻 。
翻译自: https://www.freecodecamp.org/news/machine-learning-for-managers-what-you-need-to-know/
机器学习与分布式机器学习