RE: Software/Hardware Architectures

From: Billy Brown (ewbrownv@mindspring.com)
Date: Wed Jul 14 1999 - 10:55:28 MDT


I don't think we were really getting anywhere with the previous line of
responses, so I decided to try it again from the beginning. Here goes:

Regarding Current Software
Current PC software is written for hardware that is actually in use, not
hypothetical designs that might or might not ever be built. This is
perfectly logical, and I don't think it makes sense to blame anyone for it.
If a better architecture becomes available, we can expect ordinary market
forces to lead them to support it in short order (look at Microsoft's
efforts with regard to the only-marginally-superior Alpha chip, for
example).

The more sophisticated vendors (and like it or not, that included Microsoft)
have been writing 100% object-oriented, multithreaded code for several years
now. They use asynchronous communication anywhere there is a chance that it
might be useful, and they take full advantage of what little multiprocessor
hardware is actually available. There is also a trend currently underway
towards designing applications to run distributed across multiple machines
on a network, and this seems likely to become the standard approach for
high-performance software in the near future.

Regarding Fine-Grained Parallelism
Parallel processing is not a new idea. The supercomputer industry has been
doing it for some time now, and they've done plenty of experimenting with
different kinds of architectures. They have apparently decided that it
makes more sense to link 1,000 big, fast CPUs with large memory caches than
100,000 small, cheap CPUs with tiny independant memory blocks. That fits
perfectly with what I know about parallel computing - the more nodes you
have the higher your overhead tends to be, and tiny nodes can easily end up
spending 100% of their resources on system overhead.

Now, if someone has found a new technique that changes the picture, great.
But if this is something you've thought up yourself, I suggest you do some
more research (or at least propose a more complete design). When one of the
most competitive (and technically proficient) industries on the planet has
already tried something and discarded it as unworkable, its going to take
more than arm-waving to convince me that they are wrong.

Regarding the Applicability of Parallelism
The processes on a normal computer span a vast continuum between the
completely serial and the massively parallel, but most of them cluster near
the serial end of the spectrum. Yes, you have a few hundred process in
memory on your computer at any given time, but only a few of them are
actually doing anything. Once you've allocated two or three fast CPUs (or a
dozen or so slow ones) to the OS and any running applications, there isn't
much left to do on a typical desktop machine. Even things that in theory
should be parallel, like spell checking, don't actually get much benifit
from multiple processors (after all, the user only responds to one dialog
box at a time).

On servers there is more going on, and thus more opportunity for
parallelism. However, the performance bottleneck is usuall in the network
or disk access, not CPU time. You can solve these problems by introducing
more parallelism into the system, but ultimately it isn't cost-effective.
For 99% of the applications out there, it makes more sense to buy 5
standardized boxes for <$5,000 each than one $100,000 mega-server (and you
get better performance, too).

Of course, there are many processes that are highly amenable to being run in
a parallel manner (video rendering, simulation of any kind, and lots of
other things), but most of them are seldom actually done on PCs. The one
example that has become commonplace (video rendering) is usually handled by
a specialized board with 1 - 8 fast DSP chips run by custom driver-level
software (once again, the vendors have decided that a few fast, expensive
chips are more economical than a lot of slow, cheap ones).

Side Issues
1) Most parallel tasks require that a large fraction of the data in the
system be shared among all of your CPUs. Thus, your system needs to provide
for a lot of shared memory if it is going to be capable of tackling
molecular CAD, atmospheric simulations, neural networks, etc. That brings
up all those issues of caching, inter-node communication and general
overhead you were trying to avoid.

2) You also can't get away from context switching. Any reasonably complex
task is going to have to be broken down into procedures, and each processor
will have to call a whole series of them in order to get any usefull work
done. This isn't just an artifact of the way we currently write software,
either. It is an inevitable result of the fact that any interesting
computation requires a long series of distinct operations, each of which may
require very different code and/or data from the others.

Billy Brown, MCSE+I
ewbrownv@mindspring.com



This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:04:29 MST