Received: from sog-mx-3.v43.ch3.sourceforge.com ([172.29.43.193] helo=mx.sourceforge.net) by sfs-ml-3.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1VQrbY-0007aA-PP for bitcoin-development@lists.sourceforge.net; Tue, 01 Oct 2013 04:30:36 +0000 Received-SPF: neutral (sog-mx-3.v43.ch3.sourceforge.com: 67.222.55.9 is neither permitted nor denied by domain of trillion01.com) client-ip=67.222.55.9; envelope-from=olivier@trillion01.com; helo=oproxy7-pub.mail.unifiedlayer.com; Received: from oproxy7-pub.mail.unifiedlayer.com ([67.222.55.9]) by sog-mx-3.v43.ch3.sourceforge.com with smtp (Exim 4.76) id 1VQrbN-0004VX-Pc for bitcoin-development@lists.sourceforge.net; Tue, 01 Oct 2013 04:30:36 +0000 Received: (qmail 12389 invoked by uid 0); 1 Oct 2013 04:03:40 -0000 Received: from unknown (HELO box610.bluehost.com) (70.40.220.110) by oproxy7.mail.unifiedlayer.com with SMTP; 1 Oct 2013 04:03:40 -0000 Received: from [173.179.63.169] (port=38364 helo=[192.168.1.104]) by box610.bluehost.com with esmtpsa (TLSv1:RC4-SHA:128) (Exim 4.80) (envelope-from ) id 1VQrBU-0005oT-9P; Mon, 30 Sep 2013 22:03:40 -0600 Message-ID: <1380600219.932.21.camel@Wailaba2> From: Olivier Langlois To: slush Date: Tue, 01 Oct 2013 00:03:39 -0400 In-Reply-To: References: Organization: Trillion01 Inc Content-Type: text/plain; charset="ISO-8859-1" X-Mailer: Evolution 3.8.5 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Identified-User: {5686:box610.bluehost.com:olivierl:trillion01.com} {sentby:smtp auth 173.179.63.169 authed with olivier@trillion01.com} X-Spam-Score: 0.7 (/) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. 0.7 SPF_NEUTRAL SPF: sender does not match SPF record (neutral) X-Headers-End: 1VQrbN-0004VX-Pc Cc: "bitcoin-development@lists.sourceforge.net" Subject: Re: [Bitcoin-development] bitcoind stops responding X-BeenThere: bitcoin-development@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Oct 2013 04:30:36 -0000 On Mon, 2013-09-30 at 22:44 +0200, slush wrote: > Hi, > > > during several weeks I'm observing more and more frequent issues with > bitcoind. The problem is that bitcoind stops responding to RPC calls, > but there's no other suspicious activity in bitcoind log, CPU usage is > low, disk I/O is standard etc. > > > I observed this problem with version 0.8.2, but it is still happening > with 0.8.5. Originally this happen just one or twice per day. Today my > monitoring scripts restarted bitcoind more than 30x, which sounds > alarming. This happen on various backends, so it isn't a problem of > one specific node. Is there anybody else who's observing similar > problem? What a coincidence. I do have observed the same thing. right now with 0.8.5. I am writing a small app. My jsonrpc client is programmed to timeout after 2 secs and I did see a couple of timeouts once in while. What I did is a simple test app that just hammer bitcoind with 3 rpc requests every 30 seconds and I abort it as soon as it encountered a timeout. The 3 request burst is performed on the same HTTP 1.1 kept alive connection. Then I disconnect. When I launch my app before leaving in the morning, pretty sure that I have a core dump waiting for me when I come back. I choose very simple calls: getinfo,getaccount Added a couple of traces in the RPC handling code. (BTW, timestamps in traces would be tremendously useful for tracking problems...). I see my request received by bitcoind but there is no trace yet to show that the reply is sent. Not sure yet exactly where the problem is but my current #1 suspect is: LOCK2(cs_main, pwalletMain->cs_wallet); with some kind of lock contention with the other threads.