程序员如何学数学

时间:2021-03-11 00:31:13

最近在找一些传统离散数学以外的数学书,想从其他角度补习一下计算机科学相关的数学知识,偶然间就看到一些人都推荐了这篇文章Math For Programmers,通读了一遍果然不错。但文章有点长,所以没逐字逐句地翻译,只是对每个部分做一下总结,并标注了一些写得很不错的地方。

非常难能可贵地是,作者并没有像老师或者大牛一样说教或者“炫技”,而是一直在强调两点:兴趣热情和解决问题的直觉。不管通篇作者说了多少东西,他都希望你能保持热情,哪怕每天学一点点,只要能有用能进步就好。另外就是我们不是要成为数学家,最重要的是培养直觉,独立发现问题、解决问题的思路。

遗憾的是,作者只推荐了通过Wiki“泛学”数学这一个途径,没有推荐具体的书籍和资料。目前看Dover系列的书好像比较符合作者的要求,比如《Concepts of Modern Mathematics》、《Introduction to Graph Theory》、《Introduction to Probability Theory》口碑都还不错,但因为还没看过所以也不敢妄自推荐。等读过一些之后,再给大家分享一些心得和建议吧。

Math For Programmers

作者最近一年半一直在想法恢复自己生疏的数学技巧,于是读了一大堆数学书,还有更大一堆没读的。那就听听作者在这一年半经历后有什么心得吧。

I’ve been working for the past 15 months on repairing my rusty math skills, ever since I read a biography of Johnny von Neumann. I’ve read a huge stack of math books, and I have an even bigger stack of unread math books. And it’s starting to come together. Let me tell you about it.


Conventional Wisdom Doesn’t Add Up

作者一上来先谈了一下程序员到底需不需要学点数学知识。语言比较风趣,用词有些戏谑:程序员不需要学数学,极端点说,你甚至不需要会编程,因为你还可以做项目管理、UI设计、技术写作、系统管理员等等非编码或轻编码类的职位。实际上,你不需要知道任何事情,能保证继续活着就行 :)

之后,作者说到其实程序员学数学是很有优势的,你会发现原来数学挺简单的。只是这里说的数学不是我们学校里的数学,或者说学校的教学方法不对。看了这一部分后,对重拾数学建立起了些许的信心。作者后面部分甚至说拿起微积分教材,很快就看完了,我也想试试……

First: programmers don’t think they need to know math. I hear that so often; I hardly know anyone who disagrees. Even programmers who were math majors tell me they don’t really use math all that much! They say it’s better to know about design patterns, object-oriented methodologies, software tools, interface design, stuff like that.

And you know what? They’re absolutely right. You can be a good, solid, professional programmer without knowing much math. But hey, you don’t really need to know how to program, either. Let’s face it: there are a lot of professional programmers out there who realize they’re not very good at it, and they still find ways to contribute.

If you’re suddenly feeling out of your depth, and everyone appears to be running circles around you, what are your options? Well, you might discover you’re good at project management, or people management, or UI design, or technical writing, or system administration, any number of other important things that “programmers” aren’t necessarily any good at. You’ll start filling those niches (because there’s always more work to do), and as soon as you find something you’re good at, you’ll probably migrate towards doing it full-time. In fact, I don’t think you need to know anything, as long as you can stay alive somehow.

So they’re right: you don’t need to know math, and you can get by for your entire life just fine without it. But a few things I’ve learned recently might surprise you:

  1. Math is a lot easier to pick up after you know how to program. In fact, if you’re a halfway decent programmer, you’ll find it’s almost a snap.

  2. They teach math all wrong in school. Way, WAY wrong. If you teach yourself math the right way, you’ll learn faster, remember it longer, and it’ll be much more valuable to you as a programmer.

  3. Knowing even a little of the right kinds of math can enable you do write some pretty interesting programs that would otherwise be too hard. In other words, math is something you can pick up a little at a time, whenever you have free time.

  4. Nobody knows all of math, not even the best mathematicians. The field is constantly expanding, as people invent new formalisms to solve their own problems. And with any given math problem, just like in programming, there’s more than one way to do it. You can pick the one you like best.

  5. Math is… ummm, please don’t tell anyone I said this; I’ll never get invited to another party as long as I live. But math, well… I’d better whisper this, so listen up: (it’s actually kinda fun.)


The Math You Learned (And Forgot)

我们在学校里学习的数学,只是为了让大家以后成为科学家或工程师时能有一些预备知识,而不是专为编程而设计的课程。所以程序员觉得不用学数学也就不足为奇了,因为一提到数学首先想到的就是学校里的,那些感觉对自己编程职业生涯没什么用的“数学”。

Here’s the math I learned in school, as far as I can remember:

  • Grade School: Numbers, Counting, Arithmetic, Pre-Algebra (“story problems”)
  • High School: Algebra, Geometry, Advanced Algebra, Trigonometry, Pre-Calculus (conics and limits)
  • College: Differential and Integral Calculus, Differential Equations, Linear Algebra, Probability and Statistics, Discrete Math

How’d they come up with that particular list for high school, anyway? It’s more or less the same courses in most U.S. high schools. I think it’s very similar in other countries, too, except that their students have finished the list by the time they’re nine years old. (Americans really kick butt at monster-truck competitions, though, so it’s not a total loss.)

Algebra? Sure. No question. You need that. And a basic understanding of Cartesian geometry, too. Those are useful, and you can learn everything you need to know in a few months, give or take. But the rest of them? I think an introduction to the basics might be useful, but spending a whole semester or year on them seems ridiculous.

I’m guessing the list was designed to prepare students for science and engineering professions. The math courses they teach in and high school don’t help ready you for a career in programming, and the simple fact is that the number of programming jobs is rapidly outpacing the demand for all other engineering roles.

And even if you’re planning on being a scientist or an engineer, I’ve found it’s much easier to learn and appreciate geometry and trig after you understand what exactly math is — where it came from, where it’s going, what it’s for. No need to dive right into memorizing geometric proofs and trigonometric identities. But that’s exactly what high schools have you do.

So the list’s no good anymore. Schools are teaching us the wrong math, and they’re teaching it the wrong way. It’s no wonder programmers think they don’t need any math: most of the math we learned isn’t helping us.


The Math They Didn’t Teach You

最重要的区别,学校里学的都是连续数学,而我们需要的则是离散数学,关注整数。对程序员来说,最重要的数学分支是概率论,它是你在学校学完算数后最应该立刻学的东西。当你在想有多少种或多大几率时(即计数和计算概率),都是概率问题。此外,还有统计、线性代数、逻辑、信息论也都是对编程很有用的数学分支

如果一门数学课程,能花一整周,用最生动有趣的方式先讲讲这门学科的来龙去脉,为什么我们要学习它的话,那将多美妙!感觉这也应该是很多想求学的人的心声。很多时候,你很想学会一门技术,但当你扎进知识的海洋里迷路时,就会失去最初的冲动和兴趣。正确的引导,有趣的背景知识,让我们知道自己为什么要学,才是正确的方式。不知道这种方式可不可以也类比成“自顶向下”,而不是Dynamic Programming自底向上的方式。即有了宏观的、高层次的知识,再向下学习时会事半功倍。

The math computer scientists use regularly, in real life, has very little overlap with the list above. For one thing, most of the math you learn in grade school and high school is continuous: that is, math on the real numbers. For computer scientists, 95% or more of the interesting math is discrete: i.e., math on the integers.

I’m going to talk in a future blog about some key differences between computer science, software engineering, programming, hacking, and other oft-confused disciplines. I got the basic framework for these (upcoming) insights in no small part from Richard Gabriel’s Patterns Of Software, so if you absolutely can’t wait, go read that. It’s a good book.

For now, though, don’t let the term “computer scientist” worry you. It sounds intimidating, but math isn’t the exclusive purview of computer scientists; you can learn it all by yourself as a closet hacker, and be just as good (or better) at it than they are. Your background as a programmer will help keep you focused on the practical side of things.

The math we use for modeling computational problems is, by and large, math on discrete integers. This is a generalization. If you’re with me on today’s blog, you’ll be studying a little more math from now on than you were planning to before today, and you’ll discover places where the generalization isn’t true. But by then, a short time from now, you’ll be confident enough to ignore all this and teach yourself math the way you want to learn it.

For programmers, the most useful branch of discrete math is probability theory. It’s the first thing they should teach you after arithmetic, in grade school. What’s probability theory, you ask? Why, it’s counting. How many ways are there to make a Full House in poker? Or a Royal Flush? Whenever you think of a question that starts with “how many ways…” or “what are the odds…”, it’s a probability question. And as it happens (what are the odds?), it all just turns out to be “simple” counting. It starts with flipping a coin and goes from there. It’s definitely the first thing they should teach you in grade school after you learn Basic Calculator Usage.

I still have my discrete math textbook from college. It’s a bit heavyweight for a third-grader (maybe), but it does cover a lot of the math we use in “everyday” computer science and computer engineering.

Oddly enough, my professor didn’t tell me what it was for. Or I didn’t hear. Or something. So I didn’t pay very close attention: just enough to pass the course and forget this hateful topic forever, because I didn’t think it had anything to do with programming. That happened in quite a few of my comp sci courses in college, maybe as many as 25% of them. Poor me! I had to figure out what was important on my own, later, the hard way.

I think it would be nice if every math course spent a full week just introducing you to the subject, in the most fun way possible, so you know why the heck you’re learning it. Heck, that’s probably true for every course.

Aside from probability and discrete math, there are a few other branches of mathematics that are potentially quite useful to programmers, and they usually don’t teach them in school, unless you’re a math minor. This list includes:

  • Statistics, some of which is covered in my discrete math book, but it’s really a discipline of its own. A pretty important one, too, but hopefully it needs no introduction.

  • Algebra and Linear Algebra (i.e., matrices). They should teach Linear Algebra immediately after algebra. It’s pretty easy, and it’s amazingly useful in all sorts of domains, including machine learning.

  • Mathematical Logic. I have a really cool totally unreadable book on the subject by Stephen Kleene, the inventor of the Kleene closure and, as far as I know, Kleenex. Don’t read that one. I swear I’ve tried 20 times, and never made it past chapter 2. If anyone has a recommendation for a better introduction to this field, please post a comment. It’s obviously important stuff, though.

  • Information Theory and Kolmogorov Complexity. Weird, eh? I bet none of your high schools taught either of those. They’re both pretty new. Information theory is (veeery roughly) about data compression, and Kolmogorov Complexity is (also roughly) about algorithmic complexity. I.e., how small you can you make it, how long will it take, how elegant can the program or data structure be, things like that. They’re both fun, interesting and useful.

There are others, of course, and some of the fields overlap. But it just goes to show: the math that you’ll find useful is pretty different from the math your school thought would be useful.

What about calculus? Everyone teaches it, so it must be important, right? Well, calculus is actually pretty easy. Before I learned it, it sounded like one of the hardest things in the universe, right up there with quantum mechanics. Quantum mechanics is still beyond me, but calculus is nothing. After I realized programmers can learn math quickly, I picked up my Calculus textbook and got through the entire thing in about a month, reading for an hour an evening.

Calculus is all about continuums — rates of change, areas under curves, volumes of solids. Useful stuff, but the exact details involve a lot of memorization and a lot of tedium that you don’t normally need as a programmer. It’s better to know the overall concepts and techniques, and go look up the details when you need them.

Geometry, trigonometry, differentiation, integration, conic sections, differential equations, and their multidimensional and multivariate versions — these all have important applications. It’s just that you don’t need to know them right this second. So it probably wasn’t a great idea to make you spend years and years doing proofs and exercises with them, was it? If you’re going to spend that much time studying math, it ought to be on topics that will remain relevant to you for life.


The Right Way To Learn Math

前面作者已经列举了一些重要的数学分支,那如何学呢?要广度优先,而不是深度优先。就像图和递归算法里的BFS策略一样,每一门都“浅尝辄止”:忘掉具体的算法和证明,了解它的名字、用处、有什么限制、谁在什么情况下发明的、大概是计算什么的。这里作者用了一个很妙的比喻,将程序员学习数学的方式想象成是数学专业的文科学位

数学最排外的部分可能就是它抽象的符号化,但作者也说这对程序员来说根本不是问题,比如求和符号西格玛与代码里循环的关系。

The right way to learn math is breadth-first, not depth-first. You need to survey the space, learn the names of things, figure out what’s what.

To put this in perspective, think about long division. Raise your hand if you can do long division on paper, right now. Hands? Anyone? I didn’t think so.

I went back and looked at the long-division algorithm they teach in grade school, and damn if it isn’t annoyingly complicated. It’s deterministic, sure, but you never have to do it by hand, because it’s easier to find a calculator, even if you’re stuck on a desert island without electricity. You’ll still have a calculator in your watch, or your dental filling, or something.

Why do they even teach it to you? Why do we feel vaguely guilty if we can’t remember how to do it? It’s not as if we need to know it anymore. And besides, if your life were on the line, you know you could perform long division of any arbitrarily large numbers. Imagine you’re im*ed in some slimy 3rd-world dungeon, and the dictator there won’t let you out until you’ve computed 219308862/103503391. How would you do it? Well, easy. You’d start subtracting the denominator from the numerator, keeping a counter, until you couldn’t subtract it anymore, and that’d be the remainder. If pressed, you could figure out a way to continue using repeated subtraction to estimate the remainder as decimal number (in this case, 0.1185678219, or so my Emacs M-x calc tells me. Close enough!)

You could figure it out because you know that division is just repeated subtraction. The intuitive notion of division is deeply ingrained now.

The right way to learn math is to ignore the actual algorithms and proofs, for the most part, and to start by learning a little bit about all the techniques: their names, what they’re useful for, approximately how they’re computed, how long they’ve been around, (sometimes) who invented them, what their limitations are, and what they’re related to. Think of it as a Liberal Arts degree in mathematics.

Why? Because the first step to applying mathematics is problem identification. If you have a problem to solve, and you have no idea where to start, it could take you a long time to figure it out. But if you know it’s a differentiation problem, or a convex optimization problem, or a boolean logic problem, then you at least know where to start looking for the solution.

There are lots and lots of mathematical techniques and entire sub-disciplines out there now. If you don’t know what combinatorics is, not even the first clue, then you’re not very likely to be able to recognize problems for which the solution is found in combinatorics, are you?

But that’s actually great news, because it’s easier to read about the field and learn the names of everything than it is to learn the actual algorithms and methods for modeling and computing the results. In school they teach you the Chain Rule, and you can memorize the formula and apply it on exams, but how many students really know what it “means”? So they’re not going to be able to know to apply the formula when they run across a chain-rule problem in the wild. Ironically, it’s easier to know what it is than to memorize and apply the formula. The chain rule is just how to take the derivative of “chained” functions — meaning, function x() calls function g(), and you want the derivative of x(g()). Well, programmers know all about functions; we use them every day, so it’s much easier to imagine the problem now than it was back in school.

Which is why I think they’re teaching math wrong. They’re doing it wrong in several ways. They’re focusing on specializations that aren’t proving empirically to be useful to most high-school graduates, and they’re teaching those specializations backwards. You should learn how to count, and how to program, before you learn how to take derivatives and perform integration.

I think the best way to start learning math is to spend 15 to 30 minutes a day surfing in Wikipedia. It’s filled with articles about thousands of little branches of mathematics. You start with pretty much any article that seems interesting (e.g. String theory, say, or the Fourier transform, or Tensors, anything that strikes your fancy.) Start reading. If there’s something you don’t understand, click the link and read about it. Do this recursively until you get bored or tired.

Doing this will give you amazing perspective on mathematics, after a few months. You’ll start seeing patterns — for instance, it seems that just about every branch of mathematics that involves a single variable has a more complicated multivariate version, and the multivariate version is almost always represented by matrices of linear equations. At least for applied math. So Linear Algebra will gradually bump its way up your list, until you feel compelled to learn how it actually works, and you’ll download a PDF or buy a book, and you’ll figure out enough to make you happy for a while.

With the Wikipedia approach, you’ll also quickly find your way to the Foundations of Mathematics, the Rome to which all math roads lead. Math is almost always about formalizing our “common sense” about some domain, so that we can deduce and/or prove new things about that domain. Metamathematics is the fascinating study of what the limits are on math itself: the intrinsic capabilities of our formal models, proofs, axiomatic systems, and representations of rules, information, and computation.

One great thing that soon falls by the wayside is notation. Mathematical notation is the biggest turn-off to outsiders. Even if you’re familiar with summations, integrals, polynomials, exponents, etc., if you see a thick nest of them your inclination is probably to skip right over that sucker as one atomic operation.

However, by surveying math, trying to figure out what problems people have been trying to solve (and which of these might actually prove useful to you someday), you’ll start seeing patterns in the notation, and it’ll stop being so alien-looking. For instance, a summation sign (capital-sigma) or product sign (capital-pi) will look scary at first, even if you know the basics. But if you’re a programmer, you’ll soon realize it’s just a loop: one that sums values, one that multiplies them. Integration is just a summation over a continuous section of a curve, so that won’t stay scary for very long, either.

Once you’re comfortable with the many branches of math, and the many different forms of notation, you’re well on your way to knowing a lot of useful math. Because it won’t be scary anymore, and next time you see a math problem, it’ll jump right out at you. “Hey,” you’ll think, “I recognize that. That’s a multiplication sign!”

And then you should pull out the calculator. It might be a very fancy calculator such as R, Matlab, Mathematica, or a even C library for support vector machines. But almost all useful math is heavily automatable, so you might as well get some automated servants to help you with it.


When Are Exercises Useful?

如何练习和实践呢?就像读代码一样!了解作者的意图、设计思想,主要是要解决什么问题,计算什么数据。读数学资料也是一样的,让直觉引导你,了解大意,当你发现一些东西与你的直觉不一致时,再深入进去了解。最重要的,不要让任何东西削减你的热情

After a year of doing part-time hobbyist catch-up math, you’re going to be able to do a lot more math in your head, even if you never touch a pencil to a paper. For instance, you’ll see polynomials all the time, so eventually you’ll pick up on the arithmetic of polynomials by osmosis. Same with logarithms, roots, transcendentals, and other fundamental mathematical representations that appear nearly everywhere.

I’m still getting a feel for how many exercises I want to work through by hand. I’m finding that I like to be able to follow explanations (proofs) using a kind of “plausibility test” — for instance, if I see someone dividing two polynomials, I kinda know what form the result should take, and if their result looks more or less right, then I’ll take their word for it. But if I see the explanation doing something that I’ve never heard of, or that seems wrong or impossible, then I’ll dig in some more.

That’s a lot like reading programming-language source code, isn’t it? You don’t need to hand-simulate the entire program state as you read someone’s code; if you know what approximate shape the computation will take, you can simply check that their result makes sense. E.g. if the result should be a list, and they’re returning a scalar, maybe you should dig in a little more. But normally you can scan source code almost at the speed you’d read English text (sometimes just as fast), and you’ll feel confident that you understand the overall shape and that you’ll probably spot any truly egregious errors.

I think that’s how mathematically-inclined people (mathematicians and hobbyists) read math papers, or any old papers containing a lot of math. They do the same sort of sanity checks you’d do when reading code, but no more, unless they’re intent on shooting the author down.

With that said, I still occasionally do math exercises. If something comes up again and again (like algebra and linear algebra), then I’ll start doing some exercises to make sure I really understand it.

But I’d stress this: don’t let exercises put you off the math. If an exercise (or even a particular article or chapter) is starting to bore you, move on. Jump around as much as you need to. Let your intuition guide you. You’ll learn much, much faster doing it that way, and your confidence will grow almost every day.


How Will This Help Me?

经过不断地学习,作者本人已经能够写一些“数学味”比较浓重的代码了,比如神经网络、基因算法、贝叶斯分类器、聚类算法、图像匹配等等非常酷的东西,之后还可以拿给朋友炫耀 :)

当你掌握了足够多的知识后,你会发现那些数学符号其实是在让事情变简单而不是让你冒冷汗,就像一段优雅的代码一样。不了解语法语义时可能摸不着头脑,熟悉之后就能看出其简洁和美妙。

最后作者反复强调的一点,一定要保持兴趣。你可以花整个周末看数学,也可能几个月都没继续,但只要每次你看一点都能有所收获,就可以了。不要让任何规矩束缚你,学到了就好

Well, it might not — not right away. Certainly it will improve your logical reasoning ability; it’s a bit like doing exercise at the gym, and your overall mental fitness will get better if you’re pushing yourself a little every day.

For me, I’ve noticed that a few domains I’ve always been interested in (including artificial intelligence, machine learning, natural language processing, and pattern recognition) use a lot of math. And as I’ve dug in more deeply, I’ve found that the math they use is no more difficult than the sum total of the math I learned in high school; it’s just different math, for the most part. It’s not harder. And learning it is enabling me to code (or use in my own code) neural networks, genetic algorithms, bayesian classifiers, clustering algorithms, image matching, and other nifty things that will result in cool applications I can show off to my friends.

And I’ve gradually gotten to the point where I no longer break out in a cold sweat when someone presents me with an article containing math notation: n-choose-k, differentials, matrices, determinants, infinite series, etc. The notation is actually there to make it easier, but (like programming-language syntax) notation is always a bit tricky and daunting on first contact. Nowadays I can follow it better, and it no longer makes me feel like a plebian when I don’t know it. Because I know I can figure it out.

And that’s a good thing.

And I’ll keep getting better at this. I have lots of years left, and lots of books, and articles. Sometimes I’ll spend a whole weekend reading a math book, and sometimes I’ll go for weeks without thinking about it even once. But like any hobby, if you simply trust that it will be interesting, and that it’ll get easier with time, you can apply it as often or as little as you like and still get value out of it.

Math every day. What a great idea that turned out to be!