Re: COMP: Moore's Law

From: Eugene Leitl (eugene.leitl@lrz.uni-muenchen.de)
Date: Thu Jun 10 1999 - 15:35:25 MDT


mark@unicorn.com writes:

> >physics itself. Let's shift the complexity load to the software,
> >where it belongs.
>
> Even with a factor of ten slowdown compared to hardware?

Mapping an algorithm into reecofigurable hardware is *faster* than
doing it all-purpose hardware, unless you can have a silicon
foundry within your computer which can churn out new dedicated
ASICs at a MHz rate. With reconfigurable architectures, you swap
out virtual circuitry, not code. In fact reconfigurable hardware
allows the creation of very dynamic, hyperactive machines
with unified data/code, the most efficient things theoretically
possible. These things are very new, so even academia doesn't
quite know how to tackle them yet. You certainly can't go Darwin
in machina (the next thing after OOP) without them.
 
> No, I was pointing out that Intel have been able to keep putting out new,
> faster CPUs at a tremendous rate by reusing as much of their old design
> as possible, rather than trying to come up with brand-new architectures;
> when they try that -- with Merced, for example -- they fail. Whether the
> chips themselves are any good is a completely different matter.
 
Merced is a yet another 64 bit CPU. It is not particularly innovative,
and it has been mostly designed by HP, not Intel.
 
> That's precisely my point; new technologies tend to take longer to build
> and cost more... and that's getting worse.
 
What makes it worse: a widespread low-risk attutude, gaining
ground. Of course it is these global technology stalls, which allow
dramatic revolutionary bursts. Essentially, we haven't seen a single
revolution in computing yet since 1940s.

> >Oh, but there is. These gate delays add up after a while. One needs to
> >streamline/radically simplify the architecture to go to higher clock
> >rates. Structure shrink alone can't do it forever, you know.
>
> Yes, but there's still plenty of room in the 80x86 architecture, and they

The 80x86 architecture gets mostly emulated, these days. I'm really
looking forward to what Transmeta is going to produce.

> could bolt on a 64-bit kludge just like the prior 32-bit kludge. The main
> aim of Merced and the Camino chip seems to be locking people into a
> proprietary Intel architecture so they can eliminate competition and boost
> profits, not any essential technical improvements.
 
Of course. The essence of Wintel's success. What I don't understand is
why after all these years people are still buying it, hook and sinker.

> >There is no need to go dedicated. If there are hundreds or thousands
> >identical CPUs in each desktop there is sufficient horsepower to do
> >anything in software.
>
> Why pay for hundreds of expensive CPUs if you can do the same job with one
> CPU and nine dedicated support chips?
 
Why paying for one expensive, legacy-ballast CPU and invest in nine
others hideously complex designs (possibly more complex than the CPU
itself), each requiring individual resources on the fab if you
could churn out ~500-1000 CPUs for roughly $500 production costs?

> >God, we can do embedded RAM now.
>
> But it's very hard to do anything useful with embedded RAM because any
> reasonable amount bloats the die size so much. My graphics card has 32MB

That's perhaps because people have a strange notion of what is a reasonable
amount. You can implement a pretty frisky 32 bit CPU core plus networking
in ~30 kTransistors, and I guess have semiquantitive die yield assuming
1 MBit grains. Since you can fit a nanokernel OS in 4..12 kBytes,
especially if using threaded code (which requires a bi-stack
architecture -- since the shallow stacks are part of the CPU there is
context switch overhead). Of course few people are comfortable with a
type of coding where subroutine calls contribute to >>20% of all
instructions. Now assume programming in an asychronous message-passing
OOP model with average object size of a few hundred bytes and hard
memory grains of about a MBit, and see why seasoned programmers are
having a problem with that.

I have no idea how that CPU architecture scales to 1 kBit bus, but I
strongly guess it does it roughly linearly.

> of RAM; you're not going to fit that into a single chip with a graphics
> controller that already has close to ten million transistors and get any
> kind of affordable yield. Plus, of course, once you've built your chip
> with 32MB of RAM you can't then expand it without replacing the entire
> chip.
 
That's another reason people don't do it: because they operate in the
context of unvoiced assumptions. Sony's design uses 4 MByte grains,
which is hard at the edge of feasibility, imo. If I was going to build
a rendering engine, I'd distribute it either by bitplanes or do a display
mosaic. Engines look very differently if you simultaneously operate on
an entire screen line, or do things the voxel way.

> >There is a lot of silicon
> >real estate out there on these 300 mm wafers. And quantitive yield can
> >do wonders to prices.
>
> You should really talk to the people who've tried WSI before making claims
> as to how wonderful it's going to be. The only company I know of who ever
> did it are Anamartic, and they had a hell of a time making it work; do they
> even exist anymore?

The processes allowing RAM/logic integration are brand new, and thus
currently accessible only to major players. There is no way how a
small company could go WSI and succeed, also consider that you
couldn't sell such architectures. You could do emulate a legacy system
on them, but it would be no faster or slower due to the intrinsically
sequential nature of legacy systems.

The reason's why we don't have WSI yet are mostly not technical. It is
because people don't want to learn.
 
> Mark



This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:04:07 MST