On Fri, 11 Feb 2000, Ramez Naam wrote:
> From: Eugene Leitl
> >
> > One should compare apples with apples. TI 'c6x MIPSen look good on
> > paper, but do not translate very well in real-time performance.
>
> I suspect as much (given that it's a DSP) but don't know any particulars of
> the chip.
Well, for things like emulating neurons, DSPs should do a pretty good
job. Aren't their some examples (@ Columbia?) of large DSP arrays
doing neural net stuff? If neurons do much more than multiply and
add, I'd be very surprised. If you have a thumbtack sized chip
with 10^12 OPs, its only going to take ~100-1000 of them to get you
human brain "equivalence". Thats a box that fits under my desk.
Can't take it to bed with me though, as its apt to be a bit *hot*
[double entendre intended if you are following the other threads].
As I've indicated before it looks like the Moravec/Kurzweil times
for Brain equivalent computers (> 2020) are very conservative.
>
> Yes, this is why for the new Intel / AMD / IBM chips, clock speed seems to
> be the only good indicator of actual performance in mainline tasks.*
...
> * = note that I'm talking about chip performance here, rather than system
> performance. For system performance, these days a fast bus, big cache, and
> lots of RAM seem to contribute more for most tasks than processor speed.
I suspect you are up against programming granularity. Given the rates
the chips are operating at you have to be making function calls at
a phenomenal rate. In the large programs we currently have you
are going to be flushing the caches at a rather high rate meaning
your bus bandwidth is the real constraint. Thats why you see things
like double data rate memory and RAMBUS coming down the pipe.
This will not ultimately be solved until we get processor-in-memory.
I'll be interesting to see published reports on progress on this
(e.g. the Berkeley group work). For "brain"-like computations,
fast chips with a small amount of embedded PROM seems best.
Of course this is pretty much what the "Blue Gene" architecture is.
Anyone know if deGaris has published any of the technical details
of his chip architectures?
> Since
> they seem to have maxed out the number of functional units and improvements
> are trending towards providing specialized multi-media instructions, we're
> no longer getting general purpose clock-independent peformance boosts from
> the new architectures.
>
> Hopefully VLIW will start to address this in the next few years.
>
They seem to be slowly adding functional units but they've got
clock distribution, heat dissipation and chip pin counts that
are restricting this to a fair degree. The next "architecture"
fix will most likely involve real-time profiling feeding into
things like the branch prediction and speculative execution
followed by real-time pushing the instruction sequences into
actual gate hardware (hardware compiling). We probably aren't
too far away from things like gate-level JAVA interpreters
or MS-Word.
I'll just add as a footnote, that everything I do (except the
simulations for dismantling Jupiter & Saturn) works quite
quickly on dual 200MHz Pentium Pro machines until I get to the
point where I'm running a dozen or more Netscape & IE windows
and MS-Excel and MS-Word and ... So I finally broke down and
upgraded one system to 256MB of 60ns ECC RAM. Now I can
open windows for days and things are still very fast.
Forget the processor. Buy memory. The net is still the
limiting factor *even* over an ADSL line.
R.
This archive was generated by hypermail 2b29 : Thu Jul 27 2000 - 14:03:40 MDT