summaryrefslogtreecommitdiff
path: root/f5/7e4107a64217ce8b3e51a66d26b235267a646f
blob: 1d17aeffc496fcd489806b4cb2feb75e44b00622 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194]
	helo=mx.sourceforge.net)
	by sfs-ml-3.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <shadders.del@gmail.com>) id 1R0qIu-00078Y-MD
	for bitcoin-development@lists.sourceforge.net;
	Tue, 06 Sep 2011 07:42:44 +0000
Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of gmail.com
	designates 209.85.213.47 as permitted sender)
	client-ip=209.85.213.47; envelope-from=shadders.del@gmail.com;
	helo=mail-yw0-f47.google.com; 
Received: from mail-yw0-f47.google.com ([209.85.213.47])
	by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-MD5:128)
	(Exim 4.76) id 1R0qIt-0004i9-Vs
	for bitcoin-development@lists.sourceforge.net;
	Tue, 06 Sep 2011 07:42:44 +0000
Received: by ywa12 with SMTP id 12so4343681ywa.34
	for <bitcoin-development@lists.sourceforge.net>;
	Tue, 06 Sep 2011 00:42:38 -0700 (PDT)
Received: by 10.236.76.41 with SMTP id a29mr22346529yhe.40.1315294958596;
	Tue, 06 Sep 2011 00:42:38 -0700 (PDT)
Received: from [10.1.1.50] (155.88-67-202.dynamic.dsl.syd.iprimus.net.au
	[202.67.88.155])
	by mx.google.com with ESMTPS id s42sm8035331yhs.0.2011.09.06.00.42.35
	(version=SSLv3 cipher=OTHER); Tue, 06 Sep 2011 00:42:37 -0700 (PDT)
Message-ID: <4E65CEE6.7030002@gmail.com>
Date: Tue, 06 Sep 2011 17:42:30 +1000
From: Steve <shadders.del@gmail.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.21) Gecko/20110831 Thunderbird/3.1.13
MIME-Version: 1.0
To: bitcoin-development@lists.sourceforge.net
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.6 (-)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	-1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
	sender-domain
	0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
	(shadders.del[at]gmail.com)
	-0.0 SPF_PASS               SPF: sender matches SPF record
	-0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
	author's domain
	0.1 DKIM_SIGNED            Message has a DKIM or DK signature,
	not necessarily valid
	-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
	0.0 T_TO_NO_BRKTS_FREEMAIL To: misformatted and free email service
X-Headers-End: 1R0qIt-0004i9-Vs
Subject: [Bitcoin-development] Building a node crawler to map network
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: shadders.del@gmail.com
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Tue, 06 Sep 2011 07:42:44 -0000

Hi All,

I started messing around today with building a node crawler to try and 
map out the bitcoin network and hopefully provide some useful 
statistics.  It's very basic so far using a mutilated bitcoinj to 
connect (due me being java developer and not having a clue with c/c++). 
  If it's worthwhile I'll hack bitcoinj some more to run on top Netty to 
take advantage of it's NIO architecture (netty's been shown to handle 
1/2 million concurrent connections so would be ideal for the purpose).

Hoping to a get a bit of input into what would be useful as well as 
strategy for getting max possible connections without distorted data.  I 
seem to recall Gavin talking about the need for some kind of network 
health monitoring so I assume there's a need for something like this...

Firstly at the moment basically I'm just storing version message and the 
results of getaddr for each node that I can connect to.  Is there any 
other useful info that can be extracted from a node that's worth collecting?

Second and main issue is how to connect.  From my first very basic 
probing it seems the very vast majority of nodes don't accept incoming 
connections no doubt due to lack of upnp.  So it seems the active crawl 
approach is not really ideal for the purpose.  Even if it was used the 
resultant data would be hopelessly distorted.

A honeypot approach would probably be better if there was some way to 
make a node 'attractive' to other nodes to connect to.  That way it 
could capture non-listening nodes as well.  If there is some way to 
influence other nodes to connect to the crawler node that solves the 
problem.  If there isn't which I suspect is the case then perhaps 
another approach is to build an easy to deploy crawler node that many 
volunteers could run and that could then upload collected data to a 
central repository.

While I'm asking questions I'll add one more regarding the getaddr 
message.  It seems most nodes return about 1000 addresses in response to 
this message.  Obviously most of these nodes haven't actually talked to 
all 1000 on the list so where does this list come from?  Is it mixture 
of addresses obtained from other nodes somehow sorted by timestamp? 
Does it include some nodes discovered by IRC/DNS? Or are those only used 
to find the first nodes to connect to?

Thanks for any input... Hopefully I can build something that's useful 
for the network...