Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194] helo=mx.sourceforge.net) by sfs-ml-1.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1TLjmK-0008Iz-2I for bitcoin-development@lists.sourceforge.net; Wed, 10 Oct 2012 00:04:00 +0000 Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of gmail.com designates 209.85.210.175 as permitted sender) client-ip=209.85.210.175; envelope-from=gmaxwell@gmail.com; helo=mail-ia0-f175.google.com; Received: from mail-ia0-f175.google.com ([209.85.210.175]) by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) id 1TLjmH-000718-QX for bitcoin-development@lists.sourceforge.net; Wed, 10 Oct 2012 00:04:00 +0000 Received: by mail-ia0-f175.google.com with SMTP id b35so1326572iac.34 for ; Tue, 09 Oct 2012 17:03:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.157.234 with SMTP id wp10mr3631775igb.5.1349827432300; Tue, 09 Oct 2012 17:03:52 -0700 (PDT) Received: by 10.64.34.4 with HTTP; Tue, 9 Oct 2012 17:03:52 -0700 (PDT) In-Reply-To: References: Date: Tue, 9 Oct 2012 20:03:52 -0400 Message-ID: From: Gregory Maxwell To: Jeff Garzik Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.3 (-) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for sender-domain 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (gmaxwell[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.3 AWL AWL: From: address is in the auto white-list X-Headers-End: 1TLjmH-000718-QX Cc: Bitcoin Development Subject: Re: [Bitcoin-development] On bitcoin testing X-BeenThere: bitcoin-development@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2012 00:04:00 -0000 On Tue, Oct 9, 2012 at 7:12 PM, Jeff Garzik wrote: > * Data-driven tests: If possible, write software-neutral, data-driven > tests. This enables clients other than the reference one (Satoshi > client) to be tested. Embed tests in testnet3 chain, if possible. The mention of testnet3 here reminds me to make a point: Confirmation bias is a common problem for software testing=E2=80=94 people often over te= st the success cases and under-test the failure cases. This is certainly the case in Bitcoin: For example, testnet3+the packaged tests test all the branches inside the interior script evaluation engine _except_ the rejection cases. For us failure cases can be harder to package up (e.g. can't be placed in testnet) but Matt's node-simulation based tester provides a good example of how to create a data driven test set that tests both failure cases and dynamic behavior (e.g. reorgs). Testing of failure cases is absolutely critical for testing of implementation compatibility: The existence of a difference in what gets rejected in a widely deployed alternative node could result in an utterly devastating network split. Generally every test of something which must succeeded should be matched by a test of something that must fail. Personally, I like to test the boundary cases=E2=80=94 e.g. if something has an allowed range of [0-8], I'll test -1,0,8,9 at a minimum. Though reasoning trumps rules of thumb. Confirmation bias is another reason why it's important to have a more diverse collection of testers than the core developers. People who work closely with the software have strong expectations of how the software should work and are less likely to test crazy corner cases because they "know" the outcome, sometimes erroneously. To reinforce Jeff's list of different approaches: I've long found that each mechanism of software testing has diminishing returns the more of it you apply. So you're best off using as many different approaches a little rather than spending all your resources going as deep as possible with any one approach. There are also some kind of testing which are synergistic: Almost all testing is enhanced enormously by combining it with valgrind because it substantially lowers the threshold of issue detection substantially (e.g. detecting bogus memory accesses which are _currently_ causing a crash for you but could). If I could only test one of "with valgrind" or "without" I'd test with every time. Sadly valgrind doesn't exist on windows and it's rather slow. Dr. Memory (http://code.google.com/p/drmemory/) may be an alternative on Windows, and there is work to port ASAN to GCC so it may be possible to mingw ASAN builds in not too long. I've also found that any highly automatable testing (coded data driven, unit, and fuzz testing) combines well with diverse compilation, e.g. building on as many system types and architectures=E2=80= =94 including production irrelevant ones=E2=80=94 as possible in the hopes that some system specific quark make a bug easier to detect.