From lnml at tnull.de Thu Aug 3 08:54:06 2023 From: lnml at tnull.de (Elias Rohrer) Date: Thu, 03 Aug 2023 10:54:06 +0200 Subject: [Lightning-dev] Jamming Mitigation Dry Run In-Reply-To: References: Message-ID: <0C731DA9-A5BA-4874-BB07-A17E04A14B13@tnull.de> Hi Carla + Clara, I want to prefix this by saying that I'm very familiar with how limiting the lack of available real-world datasets can be for conducting significant simulations and empirical experiments on Lightning. However, it may be noteworthy that long-term collection of the proposed fields could potentially allow to re-identify the anonymized channel counterparties based off some heuristics correlating with the public graph data, especially when datasets from multiple (possibly neighbouring) collection points will end up being combined. Subsequently, this might allow to draw further conclusions on transferred amounts, channel liquidities at particular times, and, as HTLC settlement/failure timestamps are recorded in nanosecond resolution, potentially even the payment destination's identity (cf. [1]). As surrendering this kind of data therefore requires a good level of trust in the researchers, it might be helpful (and best practise) if you could clarify upfront whether you intend to time-box the collection period, where the data would be stored, and who would have access to it. From my point of view clearly defining the collection period would also be mandatory as we don't want to incentivise node operators to collect and store HTLC data longer-term, especially if it's to this degree of detail. Best, Elias [1]: https://arxiv.org/pdf/2006.12143.pdf > ### 1. Collect Anonymized Data > We're aware that we are dealing with sensitive and private > information. > For this reason, we propose defining a common data format so that > analysis tooling can be built around, so that node operators can run > the analysis locally if desired. Fields marked with [P] *MUST* be > randomized if exported to researching teams. > > The proposed format is a CSV file with the following fields: > * version (uint8): set to 1, included to future-proof ourselves > against the need to change this format. > * channel_in (uint64)[P]: the short channel ID of the incoming channel > that forwarded the HLTC. > * channel_out (uint64)[P]: the short channel ID of the outgoing > channel that forwarded the HTLC. > * peer_in (hex string)[P]: the hex encoded pubkey of the remote peer > for the channel_in. > * peer_out (hex_string)[P]: the hex encoded pubkey of the remote peer > for the channel_out. > * fee_msat(uint64): the fee offered by the HTLC, expressed in msat. > * outgoing_liquidity (float64): the portion of > `max_htlc_value_in_flight` that is occupied on channel_out after the > HTLC has been forwarded. > * outgoing_slots (float64): the portion of `max_accepted_htlcs` that > is occupied on channel_out after the HTLC has been forwarded. > * ts_added_ns (uint64): the unix timestamp that the HTLC was added, > expressed in nanoseconds. > * ts_removed_ns (uint64): the unix timestamp that the HLTC was > removed, expressed in nanoseconds. > * htlc_settled (bool): set to 0 if the HTLC failed, and 1 if it was > settled. > * incoming_endorsed (int16): an integer indicating the endorsement > status of the incoming HTLC (-1 if not present, otherwise set to the > value in the incoming endorsement TLV). > * outgoing_endorsed (int16): an integer indicating the endorsement > status of the outgoing HTLC (-1 if not set, otherwise set to the > value set in the outgoing endorsement TLV). > > Before we add endorsement signaling and setting via an experimental > TLV, the last two values here will always be -1. The data is still > incredibly useful in the meantime, and allows for easy update once the > TLV is propagated through the network. -------------- next part -------------- An HTML attachment was scrubbed... URL: