From: Eugen Leitl (eugen@leitl.org)
Date: Fri Apr 12 2002 - 06:28:28 MDT
On 12 Apr 2002, Mike Linksvayer wrote:
> For any given hardware requirement it'll be here
> relatively/predictably soon, which Moravec and Kurzweil make much hay
> of. The path to removing/widening hardware bottleneck after
The technically naive public tends to assume that integration density and
switching speed automatically translates into performance, specifically
all-purpose performance. Unfortunately, the naive public is mistaken. The
only true way to find out is running a benchmark, and the only really
relevant benchmark is your code. (Strangely, you won't see this basic fact
mentioned on glossy marketing brochures selling the latest and greatest in
hardware).
> bottleneck is exceedingly clear. AFAICT the path to adequate software
> is nearly as murky as the hardware path is clear.
The hardware path is pretty clear, unfortunately the architectural baggage
prevents us from significant progresses on it. Recent narrow margins
(earmarks of a mature industry), and high and rising prototyping costs
result in a pronounced risk aversion in R&D. I don't see this changing
until we abandon photolithography, and turn to self-assembling molecular
devices. We *might* get cellular hardware before, though. And we will
certainly get embedded memory.
The software path is somewhat less clear, but here the orthodoxy is even
more stifling. (It's a mature industry indeed, the smell is at times
overpowering).
> I was referring to cpu speed. As measured by the STREAM benchmark
> memory bandwidth has improved by 10^2, not shabby. See historical
> plots at <http://www.cs.virginia.edu/stream/analyses.html>.
Okay, let's see why benchmarks don't translate into real-world all purpose
performance. STREAM does the following four tests:
``Copy'' measures transfer rates in the absence of arithmetic.
``Scale'' adds a simple arithmetic operation.
``Sum'' adds a third operand to allow multiple load/store ports on vector
machines to be tested.
``Triad'' allows chained/overlapped/fused multiply/add operations.
As the name implies, it is a streaming benchmark. It streams through
memory in sequential order. As such, it measures and exercises one aspect
of the memory subystem. In mid-80s, RAM was RAM, as access latency was
flat across address space, regardless of access pattern. This is lots
different now.
In terms of access latency, it was ballpark 120 ns then, and it's ballpark
50 ns now (the costs for an access might have actually increased, because
of elaborate mechanisms in the CPU, the memory interface, and within the
memory itself). That's a factor of two, and since we've got broder buses
now (a factor of eight), all I see is a factor of 16 for nonstreaming
memory access. Which is obviously not 100. It would be actually
interesting to write a NONSTREAM benchmark, and to run it on a number of
systems, vintage and modern. I think the results would be surprisingly
dismal, because so much hardware must be engaged before you can get at the
contents of a remote memory location.
> Of course hardware is relevant to real-world problem solving, that's
> the economic engine that promotes investment in ever faster computers.
> How much of that hardware is being used to tackle problems on a
> critical path towards AI?
Not much. And partly, it's a good thing. Insect-grade AI would do *lots*
for industrial automation, though, including self-replicating systems,
which ramp up very quickly, and are very cheap.
This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 09:13:26 MST