33/b13f5d1fdb45d18bed59c5ef8b32f751b57df2


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199

Received: from sog-mx-1.v43.ch3.sourceforge.com ([172.29.43.191]
	helo=mx.sourceforge.net)
	by sfs-ml-4.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <etotheipi@gmail.com>) id 1SaqRQ-00039w-B6
	for bitcoin-development@lists.sourceforge.net;
	Sat, 02 Jun 2012 15:40:36 +0000
Received-SPF: pass (sog-mx-1.v43.ch3.sourceforge.com: domain of gmail.com
	designates 209.85.212.47 as permitted sender)
	client-ip=209.85.212.47; envelope-from=etotheipi@gmail.com;
	helo=mail-vb0-f47.google.com; 
Received: from mail-vb0-f47.google.com ([209.85.212.47])
	by sog-mx-1.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128)
	(Exim 4.76) id 1SaqRP-0003GB-Ly
	for bitcoin-development@lists.sourceforge.net;
	Sat, 02 Jun 2012 15:40:36 +0000
Received: by vbbfr13 with SMTP id fr13so2168973vbb.34
	for <bitcoin-development@lists.sourceforge.net>;
	Sat, 02 Jun 2012 08:40:30 -0700 (PDT)
Received: by 10.52.93.75 with SMTP id cs11mr5871413vdb.52.1338651630037;
	Sat, 02 Jun 2012 08:40:30 -0700 (PDT)
Received: from [192.168.1.85] (c-76-111-96-126.hsd1.md.comcast.net.
	[76.111.96.126])
	by mx.google.com with ESMTPS id o15sm8430311vdi.15.2012.06.02.08.40.28
	(version=SSLv3 cipher=OTHER); Sat, 02 Jun 2012 08:40:28 -0700 (PDT)
Message-ID: <4FCA33EB.5030706@gmail.com>
Date: Sat, 02 Jun 2012 11:40:27 -0400
From: Alan Reiner <etotheipi@gmail.com>
User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64;
	rv:12.0) Gecko/20120428 Thunderbird/12.0.1
MIME-Version: 1.0
To: Bitcoin Dev <bitcoin-development@lists.sourceforge.net>
Content-Type: multipart/alternative;
	boundary="------------010405000806020803010308"
X-Spam-Score: -0.6 (/)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	-1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
	sender-domain
	0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
	(etotheipi[at]gmail.com)
	-0.0 SPF_PASS               SPF: sender matches SPF record
	1.0 HTML_MESSAGE           BODY: HTML included in message
	-0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
	author's domain
	0.1 DKIM_SIGNED            Message has a DKIM or DK signature,
	not necessarily valid
	-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
X-Headers-End: 1SaqRP-0003GB-Ly
Subject: [Bitcoin-development] Full Clients in the future - Blockchain
	management
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Sat, 02 Jun 2012 15:40:36 -0000

This is a multi-part message in MIME format.
--------------010405000806020803010308
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Devs,

I have decided to upgrade Armory's blockchain utilities, partly out of 
necessity due to a poor code decision I made before I even decided I was 
making a client.  In an effort to avoid such mistakes again, I want to 
do it "right" this time around, and realize that this is a good 
discussion for all the devs that will have to deal with this eventually...

The part I'm having difficulty with, is the idea that in a few years 
from now, it just may not be feasible to hold transactions 
file-/pointers/ in RAM, because even that would overwhelm standard RAM 
sizes.  Without any degree of blockchain compression, I see that the 
most general, scalable solution is probably a complicated one.

On the other hand, where this fails may be where we have already 
predicted that the network will have to split into "super-nodes" and 
"lite nodes."  In which case, this discussion is still a good one, but 
just directed more towards the super-nodes.  But, there may still be a 
point at which super-nodes don't have enough RAM to hold this data...

(1)  As for how small you can get the data:  my original idea was that 
the entire blockchain is stored on disk as blkXXXX.dat files.  I store 
all transactions as 10-byte "file-references."  10 bytes would be

     -- X in blkX.dat (2 bytes)
     -- Tx start byte (4 bytes)
     -- Tx size bytes (4 bytes)

The file-refs would be stored in a multimap indexed by the first 6 bytes 
of the tx-hash.  In this way, when I search the multimap, I potentially 
get a list of file-refs, and I might have to retrieve a couple of tx 
from disk before finding the right one, but it would be a good trade-off 
compared to storing all 32 bytes (that's assuming that multimap nodes 
don't have too much overhead).

But even with this, if there are 1,000,000,000 transactions in the 
blockchain, each node is probably 48 bytes  (16 bytes + map/container 
overhead), then you're talking about 48 GB to track all the data in 
RAM.  mmap() may help here, but I'm not sure it's the right solution

(2) What other ways are there, besides some kind of blockchain 
compression, to maintain a multi-terabyte blockchain, assuming that 
storing references to each tx would overwhelm available RAM?   Maybe 
that assumption isn't necessary, but I think it prepares for the worst.

Or maybe I'm too narrow in my focus.  How do other people envision this 
will be handled in the future.  I've heard so many vague notions of 
"well we could do /this/ or /that/, or it wouldn't be hard to do /that/" 
but I haven't heard any serious proposals for it.  And while I believe 
that blockchain compression will become ubiquitous in the future, not 
everyone believes that, and there will undoubtedly be users/devs that 
/want/ to maintain everything under all circumstances.

-Alan

--------------010405000806020803010308
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Devs,<br>
    <br>
    I have decided to upgrade Armory's blockchain utilities, partly out
    of necessity due to a poor code decision I made before I even
    decided I was making a client.&nbsp; In an effort to avoid such mistakes
    again, I want to do it "right" this time around, and realize that
    this is a good discussion for all the devs that will have to deal
    with this eventually...<br>
    <br>
    The part I'm having difficulty with, is the idea that in a few years
    from now, it just may not be feasible to hold transactions file-<i>pointers</i>
    in RAM, because even that would overwhelm standard RAM sizes.&nbsp;
    Without any degree of blockchain compression, I see that the most
    general, scalable solution is probably a complicated one.<br>
    <br>
    On the other hand, where this fails may be where we have already
    predicted that the network will have to split into "super-nodes" and
    "lite nodes."&nbsp; In which case, this discussion is still a good one,
    but just directed more towards the super-nodes.&nbsp; But, there may
    still be a point at which super-nodes don't have enough RAM to hold
    this data...<br>
    <br>
    (1)&nbsp; As for how small you can get the data:&nbsp; my original idea was
    that the entire blockchain is stored on disk as blkXXXX.dat files.&nbsp;
    I store all transactions as 10-byte "file-references."&nbsp; 10 bytes
    would be <br>
    <br>
    &nbsp;&nbsp;&nbsp; -- X in blkX.dat (2 bytes) <br>
    &nbsp;&nbsp;&nbsp; -- Tx start byte (4 bytes) <br>
    &nbsp;&nbsp;&nbsp; -- Tx size bytes (4 bytes)<br>
    <br>
    The file-refs would be stored in a multimap indexed by the first 6
    bytes of the tx-hash.&nbsp; In this way, when I search the multimap, I
    potentially get a list of file-refs, and I might have to retrieve a
    couple of tx from disk before finding the right one, but it would be
    a good trade-off compared to storing all 32 bytes (that's assuming
    that multimap nodes don't have too much overhead). <br>
    <br>
    But even with this, if there are 1,000,000,000 transactions in the
    blockchain, each node is probably 48 bytes&nbsp; (16 bytes +
    map/container overhead), then you're talking about 48 GB to track
    all the data in RAM.&nbsp; mmap() may help here, but I'm not sure it's
    the right solution<br>
    <br>
    (2) What other ways are there, besides some kind of blockchain
    compression, to maintain a multi-terabyte blockchain, assuming that
    storing references to each tx would overwhelm available RAM? &nbsp; Maybe
    that assumption isn't necessary, but I think it prepares for the
    worst.<br>
    <br>
    Or maybe I'm too narrow in my focus.&nbsp; How do other people envision
    this will be handled in the future.&nbsp; I've heard so many vague
    notions of "well we could do <i>this</i> or <i>that</i>, or it
    wouldn't be hard to do <i>that</i>" but I haven't heard any serious
    proposals for it.&nbsp; And while I believe that blockchain compression
    will become ubiquitous in the future, not everyone believes that,
    and there will undoubtedly be users/devs that <i>want</i> to
    maintain everything under all circumstances.<br>
    <br>
    -Alan<br>
  </body>
</html>

--------------010405000806020803010308--