Re: Fwd: A question of Web entropy

From: Jacques Du Pasquier (jacques@dtext.com)
Date: Sat Dec 29 2001 - 22:21:41 MST


Google are benefactors of humankind. They should get a Nobel prize or
something.

From: "Samantha Atkins" <samantha@objectent.com>
> It could/should be many times more "adequate". I would like to
> be able to exclude/include at will mailing list archives, portal
> pages and so on. When I say I want only English that should be
> all I get, it is not necessarily so today. Quoted phrases are
> not always taken as such. Exclusions do not always exclude. I
> haven't looked into it but it would be much more "adequate" if a
> programmatic interface was public so the hit list could more
> easily be fed to something else of one's own devising.

All these things you will get.

I don't even bother with exclusions, I think up a discriminant
additional term.

> > I never suffered from "too many pages", or outdated pages. I would
be
> > happy for EVERY page ever put online to stay forever.
>
> What? You have never performed a search that gave you hundreds
> of thousands of hits, most of them irrelevant?

Yes, sometimes I want to find the homepage of my friend John, and I type
"John", and I can't find it in the top results. The damn thing is broken
!

Seriously, no. I almost never have this problem. I find that the ranking
algorithm based on link structure gives very expectable results. I
decide beforehand if Google will help, then if I think it will, it does.
(This is a thing I love with the Internet : it magically reflects
expectations. You think of a useful utility ; you reflect that it is
easy to implement, and that many people would use it ; you can almost
DEDUCE that it exists. It's like you can search the Internet in your own
mind.) I don't expect in all cases to arrive straight at the right
place, but Google gets me into a link structure that gets me there.

Were it not off topic (but it is !!), I'd ask you some examples of
failed searches. I think in most cases the reason should be obvious, so
that you can actually expect it before doing the search. You can send
some samples to me in private if you want and we can have a look.

A new thing I noticed is if you make a spelling mistake he will actually
spot it and suggest another search ! And if/when you use Windows, try
the Google tool bar, it's great. Oh, and now you can also POST to the
newsgroups with Google !

> You have never
> clicked on broken links?

Of course, but it is rarely a big deal. If the page still exists, I can
almost instantly find it. Dvorak doesn't like it because he needs to
update his silly "best links in all categories" page... In the ODP for
example dead links are automatically checked and removed.

> I would also be much happier if the
> low level protocols also included date created and date last
> updated.

HTTP does include the last modified date (and the expire date if people
care to set it).

> Accounts go dead and any links to those pages are now dangling
> pointers. This is to some small degree an increase in
> disorder.

Yes, OK. But to me it's not the big deal that Dvorak says.

Also, there are solutions to this. You can start a registry/redirection
service with stable references, so that someone moving only needs to
update his entry there. If there really was demand for it, it would
exist. But there is not, because it is a minor problem and the added
complication is not worth it. (actually such services exist, but little
used for said reasons) The DNS already offers such an abstraction level,
if you get your own domain then you can keep it when you physically
move.

> > Again there's an obvious parallel between the Internet and a
> > foreseeable physical word, set free of usual physical limits. In an
> > extropic world, with almost infinite space, energy, and resources,
the
> > people, like web pages, will not feel the effects of entropy anymore
> > and will just ADD one to the other, without replacing each other.
>
> In the case of Web pages this would not be a good thing. Much
> information has an expiration date implicit to the type of
> information. Keeping it all on the open web indefinitely (vs.
> archives) would be like keeping every newspaper, book, magazine
> you had ever seen in your office.

With 2 billions newspaper pages on your desk you would be embarrassed,
that's for sure.

I never read the newspapers so I don't have this problem. I do have a
few printouts of siginst.org around however :-)

Jacques



This archive was generated by hypermail 2b30 : Sat May 11 2002 - 17:44:32 MDT