[p2p-research] Linux Versus E. coli | The Loom

Thu May 6 04:17:43 CEST 2010

  Sent to you by Ryan via Google Reader: Linux Versus E. coli | The Loom
via Discover Main Feed on 5/3/10

In 1991, a 21-year-old Finnish computer science student named Linus
Torvalds got annoyed. He had bought a personal computer to use at home,
but he couldn’t find an operating system for it that was as robust as
Unix, the system he used on the computers at the University of
Helsinki. So he wrote one. He posted it online, free for anyone to
download. But he required that anyone who figured out a way to make it
better would have share the improvement with everyone else who used the
system. Torvalds would later tell Wired that his motives were not
noble. “I didn’t want the headache of trying to deal with parts of the
operating system that I saw as the crap work,” he said. “I wanted help.”

In his quest to avoid crap work, Torvalds unleashed a monster. People
began to download the system, dubbed Linux, all over the world. Within
a few weeks, Torvalds was getting emails from hundreds of users,
explaining how to fix bugs and how to add new bells and whistles.
People began to write programs that would only work on Linux computer.
They founded companies around Linux-based software. Millions of people
chose Linux for their computers, and major computer companies like
Microsoft and Dell begn to support the system. Along the way, Linux
evolved. Torvalds’s first version contained 10,000 lines of code. Linux
now holds over 12 million lines.

Those 12 million lines may seem like a hopeless thicket of code, but it
actually has a hidden structure. It’s divided up into chunks, each of
which carries out a particular task. All told, they carry out 12,391
separate functions. The functions are also connected. If Linux carries
out one function, the system will direct the computer to carry out
other functions. You can think of Linux as a network, with the
functions joined together by links of control. Computer programmers can
map out that network as a so-called “call graph.”

Linux bears an uncanny resemblance to the genes in a living cell. Many
genes make proteins that act as switches for other genes. The proteins
clamp onto DNA near a target gene, allowing the cell to read the gene
and make a new protein. And that new protein may, in turn, grab onto
many other genes. Thanks to this hierarchy of switches, cells can
respond to changes in their environment and quickly carry out complex
behaviors, such as reorganizing themselves to feed on a new kind of
food.

A number of scientists have begun to compare natural and manmade
networks. A lot of the same rules appear to be at work in the growth of
the Internet, airport connections, brain wiring, ecosystem food webs,
and gene networks. But very often, scientists are finding, it’s the
differences between natural and manmade networks that are most
revealing, offering clues to the different ways in which people and
evolution build complex things.

In the Proceedings of the National Academy of Sciences this week,
Koon-Kiu Yang of Yale and his colleagues present the first detailed
comparison of Linux’s network to a gene network. (The paper will be
here.) Thanks to the open-source nature of Linux, the scientists could
look at every line of code in every version of the system over the past
two decades, from Torvald’s first primitive stab to its current
sophisticated form. And for a living cell, Yang and his colleagues
turned to the living equivalent of Linux–a biological network they
could analyze from top to bottom. They chose E. coli. coli, since it is
the best-studied species on Earth. (Why E. coli? There’s a certain book
that will explain it to you.)

Over the past fifty years, scientists have mapped 1,378 interactions
among E. coli genes. Out of that research, Yang and his colleagues
built a microbial call graph. They assigned each gene to one of three
categories. If a gene switched on one or more genes, but was not itself
switched on by another gene, they called it a “master regulator.” If a
gene was switched on by a different gene and then, in turn, switched on
other genes, the scientists dubbed it a “middle manager.” And if the
gene was switched on but did not then switch on any other genes, they
called it a “workhorse.” The scientists drew the network of master
regulators, middle managers, and workhorses.

The scientists sorted all the functions in Linux by the same rules.
Here is the picture that emerged.

(N.B.: for the sake of clarity, the scientists only used 10% of the
nodes in the full Linux call graph. But the complete picture would look
the same.)

Both Linux and E. coli are organized into hierarchies. But their
hierarchies have different shapes. E. coli’s genome is dominated by
workhorses. Middle-managers and master regulators make up less than 5%
of the total number of genes. In Linux, by contrast, over 80% of the
functions are in the upper echelons. Each workhorse in Linux is
controlled to many middle managers. In E. coli, on the other hand, each
workhorse gene is typically controlled either by a few genes or just
one. And so in E. coli it’s the higher levels where genes have the most
links, not the workhorses.

Once Yang and his colleagues had drawn the two networks, they looked at
the paths information takes as it flows from master regulators down to
workhorses. E. coli’s genes are organized into relatively distinct
modules. When a master regulator swings into action–in response, say,
to a spike in temperature–it switches on a set of other genes with
relatively little overlap with the genes switched on by other master
regulators. Linux, by contrast, has blurry boundaries. Four out of five
Linux modules overlap, in contrast to 5% of E. coli’s.

The networks in E. coli and Linux don’t just look different. They also
grew in different ways as well. The oldest genes in E. coli’s
network–the ones shared by many other species of microbes–are its
workhorses. The genes higher up in the E. coli hierarchy have emerged
more recently. Those higher-ranking genes have also been undergoing a
lot of evolutionary change since they first emerged. The old genes, by
contrast, have changed little.

The history of Linux has played out differently. A lot of the oldest
functions in Linux are middle managers or master regulators, not
workhorses as in E. coli. And while old genes in E. coli haven’t
evolved much, programmers have heavily rewritten Linux’s old functions.

Both networks developed, step by step, as increasingly sophisticated
systems for operating things–computers or cells. But the Linux network
was the work of programmers, while E. coli is the product of four
billion years of evolution. The differences in the history and shape of
the two networks emerge from the ways in which they developed. The
programmers who built Linux did not have the time to invent entirely
new workhorse functions. It was simpler for them to just use the old
workhorse functions in new modules. But this strategy leaves Linux a
lot more fragile than a biological network. Its modules overlap, so
that in many cases, a workhorse function is essential for many
different modules at once. As a result, Linux gets buggy and prone to
crashing. And so as programmers improve Linux, they’ve had to fine-tune
its all-purpose functions at every step of the way.

E. coli is far more rugged. Mutations crop up all the time as the
bacteria multiply, and yet they generally don’t suffer a catastrophic
network crash. One reason E. coli is so robust is that its modules have
evolved to be distinct. Overlapping modules make cells particularly
vulnerable to mutations, because a single mutation can shut down a lot
of their essential biology. Natural selection favors organisms with a
more rugged network.

Because E. coli is the product of evolution, rather than of
programmers, parts of its genome have changed relatively little over
billions of years. The oldest parts of the network are the workhorse
genes–the ones that encode primitive proteins that do the fundamental
work of life, like building new pieces of DNA. They can tolerate very
little change. It’s much easier instead for E. coli to evolve new ways
of controlling those workhorses.

This kind of comparison is very new, and it’s not clear yet what
scientists will find when they compare Linux to other
genomes–particular to the genomes of more complex species like
ourselves. E. coli has only about 4300 genes. We have 20,000
protein-coding genes. A lot of those genes control other genes. Indeed,
a typical human gene has a lot of switches, all of which have to be
thrown in order for the gene to make a protein in a certain situation.
The human genome is also packed with thousands of genes that don’t
encode proteins, but which may encode RNA molecules that also switch
genes on and off. Scientists just don’t know enough yet about the human
genome to map its network the way they’ve mapped E. coli. But it’s
possible that when they finally do, it will be a lot more top-heavy,
with a lot more overlapping modules and multi-tasking workhorses.

If that turns out to be the case, biologists will have a new question
to keep them busy for a long time to come: how did Linus get to be so
much like Linux?

[Update: Fixed Torvalds's name and other typos. Thanks for the
proofing!]

Things you can do from here:
- Subscribe to Discover Main Feed using Google Reader
- Get started using Google Reader to easily keep up with all your
favorite sites
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listcultures.org/pipermail/p2presearch_listcultures.org/attachments/20100506/44794e8f/attachment.html>