Private Discovery in p2panda
For the last six years I’ve been following friends down the peer-to-peer (p2p) rabbit hole. It started with my introduction to secure scuttlebutt, a kind of slow burn social network built on complex technology and a few very elegant ideas.
Lately some of these friends have been developing p2panda, a library to help build a new generation of p2p applications. p2panda covers most of the stack from setting up raw connections between peers, syncing data back and forth, and handling encryption and higher level permissions in groups. You can learn more about p2panda from their recent conference talk.
I’ve been lucky enough to be able to contribute to discovery in p2panda in the form of a private topic protocol.
Discovery in p2p
A big difference in building a p2p system compared with a centralized approach is the problem of discovery. You generally need to find other peers who are online exactly when you are (peer discovery) and also the subset of those peers who are interested in the same content as you (topic discovery). Without centralization or trusted authorities peers are largely on their own to “float around the network” and find this information for themselves.
A topic in p2panda is simply a string of 32 bytes which functions as an ID for a particular set of data. These topics should be derived from something hard to guess, like a public key or a content hash. They are not intended to be human-readable topics like “football in berlin”, though they might stand in for something like that in the application layer.
A naive implementation of topic discovery is trivial: you send the other peer a list of the topics you’re interested in and they send you theirs. Then you can initiate sync protocols to catch up on the data represented by those topics. This is however very bad when you’re concerned with privacy!
Because peers are generally happy to perform the chosen discovery protocol with you, it’s possible to “scan” the entire network by triggering this over and over again. Given that peers would generally be connecting over the public internet, an interested party can correlate these topics with the IP addresses of everyone participating along with the schedule of their network availability! If you could keep up a continuous scanning over the long term it’d even be possible to see when various peers joined a topic over time leading full-scale surveillance of social groups.
Remember that metadata is as dangerous to people’s privacy as the actual contents of communications! What is the point of building a p2p system if it’s design just ends up handing the same data straight to anyone with an internet connection?
Private set intersection
Our solution to this comes in the form of private set intersection (PSI), a class of techniques for discovering what both parties have without revealing the actual values to each other. PSI techniques are used by banks and other brokers of client information to protect their data while performing transactions between institutions.
Most PSI protocols use public key cryptography to perform true multi-party computations. Each party transforms their data using a private key and trades this with the other peer’s results. Then they each apply the transformation again on the traded data. If both sides end up with the same result they know they have a match, but neither has revealed the actual values to each other. For p2panda we went with a simpler less cryptographically clever system.
It’s worth noting that this is an interactive protocol. Each run of a PSI protocol requires the participants to send data back and forth over several rounds. While this isn’t required to be done in real time a slow implementation using caching on some third peer or DHT isn’t that useful. After all even if we don’t mind waiting, the goal is to find peers who are online now who are interested in our topics. The protocol is also interactive in the sense that it must be unique for each run in order to prevent replay attacks.
In the end after talking with a few cryptography-inclined friends we landed on simply hashing the topics. This is secure because we assume the topics themselves are random enough to be unguessable. If the topics had low entry, like phone numbers, then hashing would not be secure. p2panda requires the topics to be 32 bytes long, and though some implementer could theoretically use low entry topics and pad them out, the interface and documentation all point away from that. Additionally it’s pretty convenient to use long strings as identifiers in p2p systems as most data is already identifable via cryptographically random means (derived from another key, the content hash of a specific operation).
Hashing and Salts
To make hashing secure against replay attacks we have both peers generate some randomness that is concatenated into a shared session_salt. This will be unique to this run of the protocol and never used again.
To make this protocol fair for both sides, the final salt used is actually session_salt concatenated with a single bit, say 0 for Alice and 1 for Bob, resulting in alice_salt and bob_salt. This is necessary because of the nature of network communication, Alice and Bob cannot simultaneously reveal their results to each other (at least without a third party involved). This means if they used symmetric salts for hashing the person going second could replay what they were sent.
The Actual Protocol
-
Alice generates 32 random bytes and sends it to Bob.
-
Bob generates his own random 32 bytes to send to Alice. With the two halves in hand, Bob knows
session_salt, and thealice_saltandbob_saltderived from it. Bob hashes all his topics usingbob_saltand sends them along with his 32 random bytes back to Alice. -
Alice now knows the two halves required to compute the salts. She hashes her topics using the
bob_saltand compares the results with what bob sent her. Alice can locally verify the intersection of these sets and known whats held in common. Alice now hashes all of her topics usingalice_saltand sends them to bob. She also sends along a list of peers she knows about that are interested in the common topics. -
Bob hashes his topics using
alice_saltand compares the results with what he received from Alice. Bob can now locally verify the intersection himself just as Alice did before. Bob sends alice a list of peers he knows about from the topic intersection too.
The End
The protocol only requires a few back and forth rounds and the sizes of the messages are pretty much just a 32 byte hash per topic. I hope this means it’s good enough as a default topic discovery for most applications. Users of p2panda should be able to pick defaults and end up with a relatively performant and secure application.
The work is now merged into the net-rewrite branch which should be available in the next major version release sometime in 2026.
Thanks to Aljoscha, Jan, and Ajinkya for guidance and the p2panda crew for inviting me in.
More reading
Cade Diehm - This is Fine: Optimism & Emergency in the P2P Network