Re: COMP: Moore's Law

Eugene Leitl (eugene.leitl@lrz.uni-muenchen.de)
Fri, 11 Jun 1999 14:19:40 -0700 (PDT)

mark@unicorn.com writes:

> Yet every attempt to do this I remember ended up running a lot slower than
> dedicated hardware; firstly because they had to keep reconfiguring the
> chip to do different things, which took a long time, and secondly because
> they couldn't run as fast as dedicated chips.

That's because all current reconfigurable architectures are based on FPGAs. Which is certainly not exactly the best way to do it. Suboptimal as it is, we're going to get FPGA areas in DSPs/consumer devices before very long.

> Perhaps... but there's a big difference between what's theoretically better
> and what's practically better. So far there's no good reason for believing
> that this kind of hardware is really better.

Well, one cannot really argue with physics of computation. Not on the long run. So we're going to have reversible, reconfigurable computing before very long.

> Yet we've seen probably a million-fold improvement in computing performance
> in that time, and probably a thousand-fold reduction in cost. What more
> would a 'revolution' have given us?

What we've got is iteration of the computational mill/dumb storage paradigm, first as mechanical, then electromechanical, then vacuum tube/electromagnetic, then semiconductor/electromagnetic/optical incarnation. The speed has increased, the architecture has remained the same. Whether punching cards or typing in a text editor, whether hardwiring the program or writing to FPGA cells there is no qualitative difference. I cannot help but to think that the one billion transistors in a current PC could be utilized much more efficiently. Considering that that billion constitutes a small fraction of all processed silicon (the defective rest goes into the scrap bin) I think we can do much better for the money.

> >Of course. The essence of Wintel's success. What I don't understand is
> >why after all these years people are still buying it, hook and sinker.
>
> a) it's cheap.

Economies of scale are not specific to architectures. And perpetuating a braindead architecture is much more damaging on the long run. Market is irrational/short-term, though.

> b) it runs all your old software.

Which should be more properly addressed with emulation. Why doesn't emulation work very well today? Because the enhancements iterations are gradual and don't allow to run the last architecture snappily enough.

> c) it mostly does the job.

Which is arguable. It certainly doesn't do mine very well.

> However, that's changing now; the K7 looks like it could give the P-III a

K7 is just a minor variation on the CPU motif, really. As is Alpha.

> real run for its money, and Windows is making up a large fraction of the
> cost of cheap PCs. Plus open source software greatly simplifies the process

Actually, a Californian company is selling a complete K6-2/350, 32 MByte RAM, 4 GByte EIDE, CDROM etc. computer sans CRT and OS for $299. With $30 they'll install RedHat on it for you. I heard it sells extremely well.

> of changing CPU architectures; just recompile and you can run all the
> software you used to run.

Open Source is nice. But how much of it is written in rigorous OOP fashion, from a mosaic of tiny objects? Using threaded code? How much of it is written using asynchronous OO message passing? It's hard enoug to make people think in MPI/Beowulf way. Linux sucks. Beowulf is the wrong way to do it. But it at least gets you started.

> >Why paying for one expensive, legacy-ballast CPU and invest in nine
> >others hideously complex designs (possibly more complex than the CPU
> >itself), each requiring individual resources on the fab if you
> >could churn out ~500-1000 CPUs for roughly $500 production costs?
>
> Last I checked, a Z80 was a dollar or two a chip. Why aren't we all running
> massively parallel Z80 machines? Perhaps because building a machine with
> 500 CPUs will be much more expensive than buying them and writing software
> to do anything useful on them will be a monumental task?

Um, why aren't we using abacuses? Your comparison with Z80 is about as meaningless. A valid comparison would be a MISC CPU, like the i21. A 32 bit version of it would have ~30 kTransistors, and should outperform your PII 200, in some cases PII 400. If scaled to 1..2 kBit bus width and embedded RAM it would make an interesting comparison to a quad-Xeon.

And of course a 500 CPU machine, if mass-produced, would be less expensive than your desktop. There is no packaging, since you go WSI. The die yield is quantitative because you adjust the grain size to have >80% die yield on wafer. The wafer yield is 100%. The testing is trivial: software does it. There is only one kind of chip to produce. There is no motherboard. Etc.

> >You can implement a pretty frisky 32 bit CPU core plus networking
> >in ~30 kTransistors, and I guess have semiquantitive die yield assuming
> >1 MBit grains.
>
> But what good will it do for me? I used to work with Transputers, which
> were going to lead to these massively parallel computers built from cheap
> CPUs. Didn't happen, because there were few areas where massively parallel
> CPUs had benefits over a single monolithic CPU.

There are no monolithic supercomputers in existance. They are all consumer-class CPUs glued with custom networking. Currently, we've got Alpha Beowulfs with Myrinet in several instance beating the crap out of current SGI/Cray in terms of absolute performance. Clustering rules supreme. Want to bet for the first consumer PC with clustering integrated in one case?

> >Engines look very differently if you simultaneously operate on
> >an entire screen line, or do things the voxel way.
>
> I find discussion of voxel rendering pretty bizarre from someone who
> complains about my regarding 32MB as a 'reasonable amount' of memory for
> a graphics chip. Reasonable voxel rendering is likely to need gigabytes
> of RAM, not megabytes.

Who says you need to keep the entire voxelset in one grain? There is sure no known memory which lets you process several GBytes at 100 Hz. With few MBit grains, it's easy. Voxels love embarrassingly parallel fine-grain architectures.

> >The reason's why we don't have WSI yet are mostly not technical. It is
> >because people don't want to learn.
>
> Anamartic were doing wafer-scale integration of memory chips more than
> a decade ago; from what I remember, they needed a tremendous amount of

There was no viable embedded RAM process as late as last year. Most of the industry doesn't have access to one. Sony isn't producing the Playstation 2 in quantities yet. Anamartic hadn't had a ghost of a chance of succeeding. Embedded RAM alone takes several G$ to develop, WSI plus embedded RAM at least that much. If you have the hardware, you need the nanokernel OS and the development environment to support it, and hordes of programmers to train.

> work to test each wafer and work out how to link up the chips which worked
> and avoid the chips which didn't. This is the kind of practical issue

You don't avoid dead dies, that's a sofware problem. Choose a die on the wafer at random, touch down a testing pin and boot from link. The picokernel tests the dies, forks off clones all over the wafer in a few ms (redundant links and integrated routers route around dead dies), runs tests and collects the results. The testing machine gathers the results, and computes the wafer quality according to the fraction of good dies reported. If you don't get a good die after a few random tries, the whole wafer is sour, which indicates major production trouble. Next wafer.

> which theoreticians just gloss over, and then 'can't understand' why
> people don't accept their theories.

I understand the collective inertia of millions of programmers and the whole of the semiconductor industry and especially the market. However, people tend to confuse these very real issues with purely technical ones, which is not true. Amdahl's law is not Scripture.

> The reason we still use monolithic chips is not that people are afraid of
> trying other solutions, but because we have tried those other solutions,
> and so far they've failed. That may change, but I don't see any good evidence
> of that.

We _haven't_ tried other solutions. Currently, we're not attempting to change this. This will have to change when current technology saturates, which should be soon enough. Perhaps sooner, if we understand the mechanisms underlying our inertia.

> Mark