Software/Hardware Architectures (WAS: RE: Human minds on Windows(?))

From: Eugene Leitl (eugene.leitl@lrz.uni-muenchen.de)
Date: Wed Jul 14 1999 - 00:28:46 MDT


Billy Brown writes:

> > Hey, so they really call OS.send.message() every few 100 machine
> > instructions, or so, and context-switch every 1 us? Really?
>
> Why on Earth would you want to do that? Even for masively parallel

Because that's the only way how you can do things on a fine-grain
maspar system. I believe we were talking about portability/migration
issues...

> architectures you are better off either using large CPU/memory blocks, or

Uh, there _are_ no large contiguous memory blocks/monster CPUs
in a maspar fine-grain system. For price (yield), reliability,
footprint and thermal dissipation reasons you shouldn't imagine
something like the ASCI Red but a midi tower stuffed few 100...1000
VLIW CPUs with few-MBit on-die RAM, interconnected by multiple/
redundant fast (multi-GBps, <<10 ns latency) serial links
running a primitive switchable protocol. But I seem to be repeating
myself...

> running conventional apps in a virtual machine. At any rate, Redmond

As I said you *could* run conventional apps, but not at practical
speeds. It would be even less usable than http://www.bochs.com/

> designs for the hardware that is actually in use (big surprise), so they
> only context switch every millisecond or two.
 
Context switches are intrinsically expensive if done on anything else
than (bi)stack machines http://www.cs.cmu.edu/~koopman/stack_computers/ .
These cannot be adequately programmed in C-type languages.
 
> They haven't gotten around to re-writing the entire OS this way yet, but

Thanks god, or OpenSource OSses would be in trouble ;)

> everything new *is* done that way. In Office 2000, for instance, every
> spreadsheet cell is indeed an object (and so is every other recognizeable
> program element). The same goes for ADO, MTS, and recent versions of most

Yes, but are these asynchronous objects?

> On a modern CPU a context switch isn't any big deal. You don't want to do

No, it is a very big deal because you have to save the context, and
there is a lot of it. Register sets, stack frames, you name it. You
might not notice the machinery for the overhead and the introduced
delays in a 50 MTransistor bloatware CPU, but we're not talking about
anysuch. You certainly can't beat context switch times in a modern
MISC CPU.

> it every other instruction, but there isn't any good reason to do that in
> the first place. You can certainly do it anywhere there is a reason to
> without having to worry about it affecting your performance.
 
Of course context switching is a bad idea, but we're using it only to
simulate something essentially parallel. After all, the same code
should run on a single-node and multi-node machine, right?
 
> > Multithreading!=asynchronous message passing on many tiny objects.
> > We're talking about several thousands primitive (few kBytes) objects
> > which send message packets which are routed by hardware directly --
> > while the originator code may or may not wait for the ack/result
> > to arrive. If this exists at all, it is academic curiousity at best
> > (Thinking Machines might qualify, though I really doubt they exploited
> > their options fully every time).
>
> It doesn't exist because there is no reason to do it. The current model

Of course there is a reason: we need to migrate to massively parallel
hardware yet to keep our apps for the time being, remember?

> does exactly the same thing, but the objects are 10-100 times as big and
> which communicate about 10% as often. Asynchronous calls are used whenever
> they actualy do something for you (in most cases they don't work, because
> you can't proceed witht he current operation until you get your results
> back).
 
This is strange, for the world is an intrinsically parallel
place. Many things are happening simultaneously, and are only coupled
locally, if at all. This is most naturally expressed as a large number
of asynchronous objects which are most naturally run on a large
parallel machine.
 
If you think that the most of the world is sequential, you must be a
programmer ;)

> Or, to put it another way, the reason current apps wouldn't benifit from
> your fast chip/small memory parallel processing architecture is because most
> of the tasks they do are inherently linear, not because they are poorly
> written. The only way to speed up a linear process is to give it a single
> very fast thread of execution. That's why massively parallel machines are
> generally reserved for inherently parallel sorts of computation.

Sorry, but this is nonsense. That particular machine by my desk is
running ~100 processes, each of which even having some massively parallel
aspects. There shouldn't be more than one process/CPU in most cases,
and hence there is no need to for context switching, nor MMU for
address space protection (nor cache, because the stuff is on-die).

Searching is intrinsically parallel, so is rendering, so is neural
DSP, so is simulation of any kind. While I type this into emacs
sequentially, don't tell me GC can't profit from a little
parallelism, or the GUI, or the mp3 process in the background,
the file system reindexing, the web server, the molecular
dynamics process, the FASTA search, the parallel make, etc. All this
off the top of my head, there is surely lots more to it.

> Now, if you take a close look at modern PC architectures you'll see that
> there is an emerging trend towards increasing parallelism in the areas where
> it is usefull. Servers often have multiple CPUs, since they have to handle

There is no noticeable parallelism in the modern PC but in the
CPU. Also, I wouldn't say that little parallelism is in there
because it is particularly useful. People wouldn't be building
Beowulfs if one could purchase them on the free market, right?

> many different requests simultaneously. Video subsystems often incorporate

In absence of crossbars to memory, SMPs suck massively. Global shared
memory is a myth anyway, so we'd better get used to message-passing
anyway.

> several DSPs, and the trend seems to be towards using more and more of them.

There are no parallel DSP arrays in any PC video subsystem I am aware
of. In fact I cannot readily think of any DSP array applications
anywhere in the mainstream.

> With modems, sound cards and other specialized functions turning into
> software for DSP chips, it would not be at all surprising if the PC of the

Right now people attempt to reduce hardware prices by moving
functionality into software. See Windows printers, memory mapped
video, etc.

> future had a large array of DSPs for parallel-processing tasks. However, it
> will still need that fast Pentium-whatever chip for handling more linear
> jobs in a timely fashion.

Non sequitur. A pentium is a legacy processor if I ever saw one. There
is absolutely no need for a dedicated head in a lattice of modern
DSPs.



This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:04:28 MST