Received: from sog-mx-2.v43.ch3.sourceforge.com ([172.29.43.192] helo=mx.sourceforge.net) by sfs-ml-3.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1Sh3CP-0005Z6-F5 for bitcoin-development@lists.sourceforge.net; Tue, 19 Jun 2012 18:30:45 +0000 Received-SPF: pass (sog-mx-2.v43.ch3.sourceforge.com: domain of gmail.com designates 209.85.213.47 as permitted sender) client-ip=209.85.213.47; envelope-from=etotheipi@gmail.com; helo=mail-yw0-f47.google.com; Received: from mail-yw0-f47.google.com ([209.85.213.47]) by sog-mx-2.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-MD5:128) (Exim 4.76) id 1Sh3CO-0008UN-Ng for bitcoin-development@lists.sourceforge.net; Tue, 19 Jun 2012 18:30:45 +0000 Received: by yhjj56 with SMTP id j56so5258909yhj.34 for ; Tue, 19 Jun 2012 11:30:39 -0700 (PDT) Received: by 10.236.74.133 with SMTP id x5mr23653316yhd.126.1340130639286; Tue, 19 Jun 2012 11:30:39 -0700 (PDT) Received: from [192.168.1.85] (c-76-111-96-126.hsd1.md.comcast.net. [76.111.96.126]) by mx.google.com with ESMTPS id g4sm82399900yhf.12.2012.06.19.11.30.38 (version=SSLv3 cipher=OTHER); Tue, 19 Jun 2012 11:30:38 -0700 (PDT) Message-ID: <4FE0C538.3090001@gmail.com> Date: Tue, 19 Jun 2012 14:30:16 -0400 From: Alan Reiner User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Mark Friedenbach References: <4FE0B7EB.6000100@gmail.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------040508030500040107050704" X-Spam-Score: -0.8 (/) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for sender-domain 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (etotheipi[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 1.0 HTML_MESSAGE BODY: HTML included in message -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.2 AWL AWL: From: address is in the auto white-list X-Headers-End: 1Sh3CO-0008UN-Ng Cc: bitcoin-development@lists.sourceforge.net Subject: Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node X-BeenThere: bitcoin-development@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jun 2012 18:30:45 -0000 This is a multi-part message in MIME format. --------------040508030500040107050704 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 06/19/2012 02:18 PM, Mark Friedenbach wrote: > On Tue, Jun 19, 2012 at 10:33 AM, Alan Reiner > wrote: > > If we were to use a raw trie structure, then we'd have all the above > issues solved: a trie has the same configuration no matter how > elements > are inserted or deleted, and accesses to elements in the tree are > constant time -- O(1). There is no such thing as an unbalanced trie. > But overall space-efficiency is an issue. > > A PATRICIA tree/trie would be ideal, in my mind, as it also has a > completely deterministic structure, and is an order-of-magnitude more > space-efficient. Insert, delete and query times are still O(1). > However, it is not a trivial implementation. I have occasionally > looked > for implementations, but not found any that were satisfactory. > > > No, a trie of any sort is dependent upon distribution of input data > for balancing. As Peter Todd points out, a malicious actor could > construct transaction or address hashes in such a way as to grow some > segment of the trie in an unbalanced fashion. It's not much of an > attack, but in principle exploitable under particular timing-sensitive > circumstances. > > Self-balancing search trees (KVL, RB, 2-3-4, whatever) don't suffer > from this problem. > > Mark I was using "unbalanced" to refer to "query time" (and also insert/delete time). If your trie nodes branch based on the next byte of your key hash, then the max depth of your trie is 32. Period. No one can do anything to ever make you do more than 32 hops to find/insert/delete your data. And if you're using a raw trie, you'll always use /exactly/ 32 hops regardless of the distribution of the underlying data. Hence, the trie structure is deterministic (history-independent) and cannot become unbalanced in terms of access time. My first concern was that a malicious actor could linearize parts of the tree and cause access requests to take much longer than log(N) time. With the trie, that's not only impossible, you're actually accessing in O(1) time. However, you are right that disk space can be affected by a malicious actor. The more branching he can induce, the more branch nodes that are created to support branches with only one leaf. --------------040508030500040107050704 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit On 06/19/2012 02:18 PM, Mark Friedenbach wrote:
On Tue, Jun 19, 2012 at 10:33 AM, Alan Reiner <etotheipi@gmail.com> wrote:

If we were to use a raw trie structure, then we'd have all the above
issues solved:  a trie has the same configuration no matter how elements
are inserted or deleted, and accesses to elements in the tree are
constant time -- O(1).  There is no such thing as an unbalanced trie.
But overall space-efficiency is an issue.

A PATRICIA tree/trie would be ideal, in my mind, as it also has a
completely deterministic structure, and is an order-of-magnitude more
space-efficient.  Insert, delete and query times are still O(1).
However, it is not a trivial implementation.  I have occasionally looked
for implementations, but not found any that were satisfactory.

No, a trie of any sort is dependent upon distribution of input data for balancing. As Peter Todd points out, a malicious actor could construct transaction or address hashes in such a way as to grow some segment of the trie in an unbalanced fashion. It's not much of an attack, but in principle exploitable under particular timing-sensitive circumstances.

Self-balancing search trees (KVL, RB, 2-3-4, whatever) don't suffer from this problem.

Mark

I was using "unbalanced" to refer to "query time" (and also insert/delete time).  If your trie nodes branch based on the next byte of your key hash, then the max depth of your trie is 32.  Period.  No one can do anything to ever make you do more than 32 hops to find/insert/delete your data.   And if you're using a raw trie, you'll always use exactly 32 hops regardless of the distribution of the underlying data.  Hence, the trie structure is deterministic (history-independent) and cannot become unbalanced in terms of access time.

My first concern was that a malicious actor could linearize parts of the tree and cause access requests to take much longer than log(N) time.  With the trie, that's not only impossible, you're actually accessing in O(1) time.

However, you are right that disk space can be affected by a malicious actor.  The more branching he can induce, the more branch nodes that are created to support branches with only one leaf. 


--------------040508030500040107050704--