Re: Blue Gene

From: Eugene Leitl (eugene.leitl@lrz.uni-muenchen.de)
Date: Tue Dec 07 1999 - 12:55:09 MST


Robert J. Bradbury writes:

> So would I. I think they would publish it, you can't effectively use a
> machine unless you can work with it at multiple levels. I've rarely
> seen a compiler that I can't out-code. The question is whether they

Deeply pipelined VLIW stuff is a nightmare to hand-optimize. TI 'C6x
and Itanium would come to mind here.

> will publish it before the machine becomes available, for example do we
> even have the Merced instruction set (or the Playstation instruction set?).
 
http://developer.intel.com/design/ia64/index.htm

As to PSX2 Beowulf (project named Wolfstation) there is a site
   http://wulfstation.org/
and a mailing list
   http://www.onelist.com/community/beowulf-psx2
though so far it is mostly fluff (we're waiting for the dev kits).

The PSX2 CPU is mostly vanilla MIPS. If you know MIPS, you'll find
yourself immediately at home.

> > But even if I'm right, the task of designing software to make full use
> > of the machine's capabilities may be so daunting that no one else will
> > want to take it on, effectively making it a single-purpose machine.
>
> Not really, if it is general purpose, there are already software
> models (e.g. the Oxford Bulk Synchronous Parallel (BSP) model,
> the OpenMP API for Shared Memory Programming, and the BIP message
> passing model (for Myrinet) for programming similar machines.
> There only difference between programming something for a Beowulf
> cluster and Blue Gene is the granularity of the processor units.

Hopefully, IBM will integrate the switch/router into the CPU, and
implement direct hardware support for most MPI calls (i.e. most basic
MPI calls would be substituted by a single/few machine instructions).

Otherwise, latencies will be prohibitive. The best we can do with
cheap off-shelf codes/hardware is GAMMA (with MPI wrapper) and M-VIA:

    http://www.disi.unige.it/project/gamma/
    http://www.nersc.gov/research/FTG/via/

The latencies here are mostly due to software overhead. I don't see
why one shouldn't be able to bring message passing latency down to few
10 ns with the proper hardware support.

As to the optimal type of code for MD, I strongly suspect integer
lattice gases might be up to the challenge.

        http://xxx.lanl.gov/archive/comp-gas

Progress is slow, but steady. A generic forcefield engine implemented
as integer lattice gas would be trivially to cast in hardware. And you
certainly can't beat the speed.

> What IBM probably did was ask themselves what the failure rate
> was going to be in the processor units. With 1M processors
> it might be quite high. Customers aren't going to be happy
> if your machine is down most of the time getting boards replaced.

I surmise one can keep a fresh CPU pool and checkpoint
periodically. When a CPU fails you fallback to the last snapshot
state, thus losing only minutes of computation. Of course this means
lots of I/O activity every few minutes, so each CPU should have its
own disk. Otoh Big Blue is probably up to the challenge of building a
monolithic monster RAID.

> This is now solved in multiprocessor & clustered architectures
> where you can afford to take out a node for a few minutes to
> hours to replace parts. However if you are running integrated
> calculations (i.e. this isn't a client-server archecture) that
> take days to weeks and the data in one node interacts with *all*
> of the other data, then when you pull a node you slow down the
> entire calculation. The clever trick is going to be detecting
> the failures (you don't want soft failures, you want hard failures)
> and having the data arranged so that multiple processors/nodes can
> rapidly get to it.

The good part about MD is that they are mostly local-interaction (see
particle-in-cell for a very good illustration), and long-range
interactions (mostly Coulomb) can be simulated by propagating the
information through the node lattice a la bucket brigade.

> This is a new level in computer architecture and getting very close
> to what goes on in the brain. If they get the architecture right
> and the fault tolerance right and because they have solved the
> bandwidth problem, you can expect a simple instruction set to
> gradually expand as people come up with other applications
> and declining feature sizes give you more chip real-estate to
> work with.

For a glimpse of what is possible with PIM type of devices (caution,
self plug again) have a look at (old, obsolete, half-baked etc):

     http://www.lrz-muenchen.de/~ui22204/.html/txt/8uliw.txt

> > And this is likely the only one they will build, like Deep Blue.
>
> IBM is one of the most clever marketing organizations in the world.
> Unlike Deep Blue, they aren't doing this for publicity. (After all
> how many machines are you going to sell when you know you are going
> to lose the game...) They realize the market for these machines is
> in the dozens (major pharma & govmnts), thousands (universities &
> small-biotech), and potentially workstation quantities (individual
> researchers). I'll predict with this one they are planning to do
> the software investment and then use that to follow the declining
> hardware costs to make the machines available to larger markets.

If we're talking about 5 years before they deliver, there is a fair
chance Beowulfs will be able to meet the challenge. You can't argue
with economies of scale very well.
 
> > P.S. I apologize for my sloppy editing on my original post
> > (which was truly my first post to this list).
>
> No problem. The information was quite helpful and appreciated.
>
> Robert



This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:06:00 MST