Name: Linus Torvalds (torvalds@linux-foundation.org) 6/8/10
anon2 (anon@anons.com) on 6/8/10 wrote:
>
>But productivity is a difference thing when it comes to
>kernel code. Linux devs are working practically for free.
>So the same amount of budget can get you whole lot work
>done.
Actually, this is wrong.
People working for free still doesn't mean that it's fine
to make the work take more effort - people still work for
other compensation, and not feeling excessively
frustrated about the tools (including language) and getting
productive work done is a big issue.
So if a language change were to make people much more
productive, that would be a good thing regardless of how
much people end up getting paid. It's definitely not about
the money.
But the thing is, "lines of code" isn't even remotely close
to being a measure of productivity, or even the gating
issue. The gating issue in any large project is pretty much
all about (a) getting the top people and (b) communication.
In the kernel, we have roughly a thousand people being
attributed for each and every kernel release (at
about three months apart). Now, there's a long tail, and
the hundred (or even fifty) top contributors do most of
the bulk work, but even then, the biggest issue that I end
up worrying about is not even the code, but the "flow" of
code and development.
For example, I personally don't even write much code any
more, and haven't for years. I mainly merge (and to a
large degree - don't merge: a large portion of what
I do is telling people "No, I won't take this, because of
xyz". Even if rejection ends up being the rare case, it's
actually the main reason for me existing. Anybody can say
"yes". Somebody needs to say "no").
And the best way to make things work is to not need
to communicate at all. It's exactly the same issue as in
parallel programming - any communication inevitably is the
main bottleneck.
And the best way to avoid communication is to have some
"culture" - which is just another way to say "collection of
rules that don't even need to be written down/spoken, since
people are aware of it". Sure, we obviously have a lot of
documentation about how things are supposed to be done,
but exactly as with any regular human culture, documentation
is kind of secondary.
(Put another way: there are lots of books about culture,
and you can get a PhD in anthropology and spend all your
life just studying it - but for 99% of all people, you
don't read a book about your culture, you learn it by
being part of the community).
And there is a very strong "culture" of C (and UNIX, for
that matter). And this is also where it's so important for
the language to be simple and unambiguous. One of the
absolute worst features of C++ is how it makes a
lot of things so context-dependent - which just means
that when you look at the code, a local view simply seldom
gives enough context to know what is going on.
That is a huge problem for communication. It immediately
makes it much harder to describe things, because you have
to give a much bigger context. It's one big reason why I
detest things like overloading - not only can you not grep
for things, but it makes it much harder to see what a
snippet of code really does.
Put another way: when you communicate in fragments (think
"patches"), it's always better to see "sctp_connect()"
than to see just "connect()" where some unseen context is
what makes the compiler know that it is in the sctp module.
And you have to communicate in fragments in order
to communicate efficiently. And I don't mean "efficiently"
as in network bandwidth - I mean as in "in general". The
reason we use patches instead of sending the whole project
(or even just a couple of whole files) around is not because
it's denser in email - it's because the only thing that
matters is the change, not the end result.
So that is a very fundamental reason for development to
avoid ambiguity and context. And that, btw, has absolutely
nothing to do particularly with "kernel programming". It's
true in general in any sw project, but it's true in real
life too: speaking or writing ambiguously is not good in
normal human communication either.
So a simple and clear language is a good thing. You don't
want to be unnecessarily verbose (meaningless syntactic
fluff is always bad), but at the same time you do not want
to require too much context either.
[ Lots of implicit context is fine if everybody is an
expert on the subject. Which is why really esoteric
scientific literature is basically unreadable unless
you're an expert - it requires huge amounts of context
to make sense at all. But that is simply not possible
in a large project that has many different areas.
For example, I know the VM and core kernel really well,
but I still need to be able to read the code of various
filesystems and networking code. So even for somebody
like me, the code needs to be written without hidden
context. ]
And C is a largely context-free language. When you see a
C expression, you know what it does. A function call does
one thing, and one thing only - there will not be some
subtle issue about "which version" of a function it calls.
Of course, you can use the preprocessor and inline functions
to do that, but even then you have to be pretty explicit:
you can still grep for that preprocessor symbol, and it's
all pretty "direct".
Now, in other situations you do want more language support,
and you do want the language to do memory allocation etc
for you (ie GC - I'm not talking about that idiotic
"new" keyword in C++, or other crap). In the kernel, we
couldn't do that anyway. Similarly, in the kernel, we do
really require very specialized locking and direct control
over memory ordering etc, so a language that exposes some
model of concurrency would almost certainly be too limited
in that concurrency too.
So there are particular reasons why I think C is "as simple
as possible, but no simpler" for the particular case of an
OS kernel, or system programming in particular. That's why
I'm absolutely not saying that you should use C for all
projects.
But C++? I really don't think the "good features" of it
are very good at all. If you leave C behind, do it properly
and get some real features that matter. GC, some
concurrency support, dynamic code generation, whatever.
Linus