p2pcopy: C# console app to transfer files peer to peer

Thursday, October 06, 2016 7 Comments

Back in late August, I was working from home. Then someone asked me to send a VMWare image to the office. The virtual machine was +17 GB in size so using Slack was not an option. Since I wasn't at the office, using a USB drive was not an option either.

I had some spare time so I thought: "well, writing a small program using UDT's rendezvous mode should be straightforward" (this mode which basically lets you establish a true P2P connection)... By the way, I could have used something better like https://reep.io but... hey, I just wanted to do some coding :-)

File transfer... the last frontier

I only found the joke above a few days later. The people at xkcd describe how the "file transfer problem" was not yet solved.

Disclaimer: WebRTC already does this

Besides reep.io, I also found later some JavaScript WebRTC code doing exactly what I was trying to do from command line.

I tried to find a WebRTC implementation for C# and I found IceLink (http://www.frozenmountain.com/purchase#icelink) from Frozen Mountain. I still didn't give it a try but it looks great.

Before we start: what P2P is?

Just a short clarification: by P2P I mean connecting 2 computers behind 2 different firewalls.

If the computers are on the same local network or if they are directly accessible from the internet, I won't consider it P2P in this blogpost (it can be theoretically, but in order to get them connected, traditional socket operations are enough).

Look Ma: no hands!

Well, actually it should read "look Ma, no server!".

To establish a P2P connection between two computers behind 2 different firewalls first you need to obtain and exchange their public IPs and ports.

Then each of the computers "tries" to connect to the other IP:port and, if you are lucky, they "traverse" the NATs and get connected (you know, your firewall/NAT lets you use private IPs while having a public one).

But you need a central server to coordinate the process. A server that each client computer connects to and it is used to obtain and interchange their IPs.

Our own Tube server could play this role (it is already acting as "relay server" for our own pseudo-P2P operation – all traffic goes through the central, which is something I want to avoid for true P2P).

But I wanted to write a small console application so simple that it didn't need to "trust" any intermediate server, so I didn't want to use a central server to do the initial negotiation (or signaling).

I wanted a "server-less" P2P setup...

So the goal was that the p2pcopy started, showed the public IP (you need a server for this) and asked you for the public IP of the other peer. The users of p2pcopy would then exchange their public IP:ports by Slack or any other messaging system, and the data transfer would start... NO server needed.

A word on UDT

So far I've taken UDT for granted. UDT is a clever protocol written on top of UDP that outperforms TCP on high-latency / high-bandwidth networks.

What does it mean? Well, if your ping takes like 100 ms but your bandwidth is high, then UDT will run circles around TCP.

We tested this with Plastic for game studios with teams in South Asia and Northern Europe and we were able to transfer data about 3 times faster.

There is a wrapper (Windows only though) to use UDT in C#.

Coding the initial version

Coding a UDT rendezvous-mode program was very easy. It is almost nothing else than this:

    client = new Udt.Socket(AddressFamily.InterNetwork, SocketType.Stream);

    client.SetSocketOption(Udt.SocketOptionName.Rendezvous, true);

    client.Bind(socket);

    Console.Write("\r{0} - Trying to connect to {1}:{2}.  ",
        retry++, remoteAddr, remotePort);

    client.Connect(remoteAddr, remotePort);

    Console.WriteLine("Connected successfully to {0}:{1}",
        remoteAddr, remotePort);

And if the two peers execute the same code, they get actually connected!

Yes, they both "connect()" instead of a connect/accept pair, and it works, that's the magic of NAT traversal.

I tested this with 2 laptops, manually, one on the Codice's VPN and the other just on my LAN (Wi-Fi) and... it worked! Although I had to run the two peers concurrently for it to work... which is something I initially sub estimated when I though "I will code it in 5 minutes".

Getting the external IP:port on each node

Each node running p2pcopy has to obtain its external IP and port to be exchanged with the other peer. In order to d that I just used (and cleaned up) some code I found on CodeProject: an implementation of a STUN client in C#.

Failing to synchronize to get the peers connected

So far coding the entire thing was super-quick, just a few minutes and it was up and running.

I tested on my LAN vs VPN and the data transfer speed was not good, but I thought it could be because of the setup I was using (at the end of the day the two computers were on the same physical network, but data had to go outside on one side to use the Codice's VPN).

So I decided to ask for help to one of the colleagues at the office.

We both started our apps, exchanged the IPs, clicked ENTER and... nothing! It didn't work. Ouch!!

Then we started synchronizing manually using Slack to be able to HIT run the connect code simultaneously (something that is not that hard when you are hitting ENTER on two laptops on the same table). And it still didn't work.

The worst part is when your colleague starts making fun of you, you know, "yes, what great invention you got here!" :-D

It took us a while to sync correctly and see a successful data transfer.

If was going to work like that, my p2pcopy thing was not going to be very useful after all...

Enter the Internet time

First test with a "real user" was a total failure. My "10 minutes of coding p2pcopy thing" was going to take a little bit more after all.

Initially I thought: "ok, it can't be done without a central server, that's all". But, well, I kept relaunching manually on my two laptops and it sort of worked most of the time. I was worried because of the low data transfer speed (1 to 1.5MB/s seemed to be the upper limit, with many transfer under 700KB/s).

Then I realized it was just a matter of getting synchronized. But, how to you sync two peers if there is no actual connection between them? I mean, you can't start the program from CLI with the other peer's IP and port because you don't know them beforehand so...

What if both peers checked the time and wait till a given second arrives? Then they would actually start at the same time... But what if their clocks are not in sync?

The solution is to retrieve the exact time from an internet clock. Google provided a bunch of C# implementations and I just took one from StackOverflow.

After the user entered the other's peer IP:port pair, the program retrieved the internet time and waited till a given second arrived in the minute. Something like wait for second 0, 10, 20, 30, 40, 50 before starting the simultaneous connect.

And it worked! I tested on my LAN/VPN setup, I also tested it accessing a remote machine on the VPN through remote desktop, and it worked!

Later I ran tests with two different "real users" and while it wasn't perfect, most of the time you could enter the other peer and hit enter without having to "Slack-synchronize" and it worked fine.

Improving the data transfer speed

Once I got it worked I created a GitHub repo to share the code, including the full history of how I developed it, checkin by checkin (needless to say I actually develop in Plastic and I just push to GitHub :P).

But my biggest concern was data transfer speed.

I sent the famous 17GB VmWare image to the office, and data transfer was always around 800KB/s, only occasionally peaked to 1MB/s.

It was slow.

I then started re-reading and studying the UDT options, but without success. No combination seemed to make data transfer faster.

So I thought I had to find a better way to send the data, and I decided to test parallel data transfer: what if I split the file in 2 (or more) parts and try to make better use of the available bandwidth by sending the fragments in parallel?

I made some tests on branch master-multi-thread but they were not conclusive. I mean, sometimes it was faster in parallel, but sometimes not.

My local Wi-Fi was broken

I then coded an option to send / receive the data using plain TCP, to test it on LAN. Nothing to do with P2P, just to measure and compare data transfer.

And then I discovered that my local Wi-Fi was only doing between 1MB/s to at most 3MB/s... Better than what I was getting from the VPN, but not that much.

Hole-punching on LAN

Then I tested the real UDT rendezvous (NAT hole punching after all) on my LAN to compare results between plain TCP and UDT:

tcp

p2pcopy.exe sender --tcp --tcpremotepeer 192.168.1.39:7070 --file C:\Users\pablo\Downloads\03183u.tif
Connected to 192.168.1.39:7070
-[###################################]  181.64 MB / 181.64 MB.     2.3 MB/s

udt

p2pcopy.exe sender --localport 4300 --file C:\Users\pablo\Downloads\03183u.tif
Using local port: 4300
Your firewall is FullCone
Tell this to your peer: 88.41.37.87:4300


Enter the ip:port of your peer: 192.168.1.39:21300
Your firewall is FullCone
[18:35:52] - Waiting 8 sec to sync with other peer
Your firewall is FullCone
0 - Trying to connect to 192.168.1.39:21300.  Connected successfully to 192.168.1.39:21300
-[#################################]  181.64 MB / 181.64 MB.    1.75 MB/s

UDT: 1.75MB/s vs TCP: 2.3MB/s (I had to open the ports for the TCP test). There was a difference, but not that much!

Please note that in UDT mode (without the --tcp flag) I'm entering the local IP address, ignoring the public one.

INTERESTING FINDINGyou can use NAT hole punching to traverse each computer's firewall on LAN. This is pretty interesting because on LAN you're protected by your own firewall, but here NAT hole-punching works too. So, the implementation of a better P2P based Plastic Tube can detect if the two peers have the same public IP (they are on the same LAN) and instead of hole-punching to the public IPs (which indeed won't work), just do it with the private address.

Here is where I realized that the data transfer was not that bad after all, since my local Wi-Fi was super slow. I ran some tests connecting to cable and I tested the network speed using iperf. I got about 20-25mbps on wifi, and it got better as I plugged one of the laptops with cable (the other laptop doesn't have a LAN connector!).

In fact, this issue with the Wi-Fi made me work on PeerFinder as I shared here.

TCP hole punching

After seeing the perf results, including the multi-thread data transfer, I thought there could be a problem with UDT, and that probably plain TCP would do better.

So I tried to make TCP hole punching work.

The theory says you have to do as follows:

  • Each peer starts 2 threads.
  • One thread tries to "connect" to the other peer.
  • Another thread listens.
  • Both threads create sockets, but they bind to the same local port taking advantage of the "reuse address" functionality.

Later I read that the "accept part" is not even needed, and in fact in all my tests I never saw an accept working.

I started testing on LAN, and it worked: where simple tcp failed to work unless I disabled the firewall on the laptop doing the accept (connect/accept, no hole punching, naïve code), my tcp-hole-punch code worked! Connections were immediately established and data was sent.

A sample session with --tcpholepunch goes like this:

p2pcopy\bin\Debug\p2pcopy.exe sender --tcpholepunch --localport 7070 --tcpremotepeer 192.168.3.35:7070 --file PeerFinder.exe
Running tcp hole punch
Trying to connect
Connector correctly connected
\[############################################################]      10.5 KB / 10.5 KB.

Each peer sets its local port and the remote ip:port beforehand for this test, I never implemented the exchange part because I was not using STUN for TCP (in fact, the code is just for UDP).

After the initial LAN success, I tested on VPN. It didn't work. I sort of saw one of the peers connecting, but the other ALWAYS failed.

Later I tested with one of my colleagues connected from on his home... it didn't work either. I tested with another one from the office and same result.

I might be mistaken but tcp hole punching is not reliable enough outside the local network.

INTERESTING FINDINGyou can use TCP hole punch to directly connect peers on the same LAN so they don't have to open ports to communicate. It can be certainly used to implement nice P2P features on LAN for Plastic.

TCP vs UDP hole punching

It seems it is more likely for UDP hole punching to work than TCP, as I was able to see in action myself. It is possible to find some good explanations mentioning SYN packets and how TCP is established and so on.

But basically, it means that on the internet, you better stick to UDP, which means you need to build some reliability layer on top, like UDT.

There is a very famous series of articles explaining how real-time games need to use UDP instead of TCP too, that I think is worth mentioning. The guy builds his own layer on top of UDP, but in his case, dropping packages is not an issue.

WebRTC, libjingle, libnice and pseudo-tcp

WebRTC does true P2P out of the box

Since P2P is something modern web applications take advantage of, I started reading a little bit to learn how they do it.

WebRTC is the technology they all use. It enables to implement video-conferencing and audio very easily in JavaScript on mostly any modern browser.

Besides the video/audio thing, WebRTC provides something called "RTCDataChannel" for bi-directional peer-to-peer arbitrary data transfers.

I tried to find an implementation of WebRTC in C# but I didn't find any. Using WebRTC, implementing the Tube with true P2P would be straightforward, because the library takes care of the hole punching, decides if it has to go through relays or not, solves the situation if both nodes are on the same LAN and many more.

I kept searching but I didn't find anything for C# except XSockets, a commercial product for realtime communication, but it wasn't clear to me whether they do the entire RTC thing.

Then I found it would be possible to do WebRTC by embedding a browser on a form or something and automating, but it looked like an overkill.

libjingle

More reading drove me to libjingle (again, because I think I already read about it before). It is a library developed by Google to ease developing P2P applications.

And it seems that WebRTC is totally built on top of libjingle (I mean, you can find includes like this #include "webrtc/p2p/base/pseudotcp.h" on the code.

It uses ICE, TURN, STUN, XMPP-or-something, and other protocols to create the connections in a reliable way. It seems 92% of the time they can achieve P2P directly, and only 8% goes relayed. It seems this is what they used for Google Hangouts (not sure they still use it).

By the way, I didn't find *any* C# implementation or wrapper.

Do libjingle and WebRTC use TCP hole punching?

This was my question: how these guys achieve a TCP point-to-point connection?

I mean, I thought that was the case because I tried https://reep.io connecting from home to the office and it achieved 2MB/s, while p2pcopy was only on 1.2MB/s or so. Somehow they were faster and I wanted to know why.

I kept reading but I didn't find a good answer until browsing the code I found something about pseudotcp. At first I thought it was just an option, but later I found this: https://developers.google.com/talk/libjingle/file_share

PseudoTcpChannel enables sending TCP-like packets through a firewall. It is typically easier to make a UDP connection through a NAT than to make a TCP connection. Therefore, PseudoTcpChannel is provided to enable TCP-like functionality to UDP packets. Each FileShareSession object creates its own PseudoTcpChannel object when a connection is established. It creates a TransportChannel, which provides the external data connection, by callingSession::CreateChannel on the Session object passed into its constructor. It exposes a StreamInterface used by internal components to read/write data to the remote computer. In the file share application, PseudoTcpChannel acts as an intermediary between the channel and HttpServer or HttpClient to wrapping or unwrapping data with a pseudo-TCP layer. PseudoTcpChannel is created by FileShareSession when its associated Session object receives an informational XML stanza with a QN_SHARE_CHANNEL member.

Well, so, at the end of the day, they are doing UDP hole punching like I'm doing, but instead of UDT, they place their pseudotcp on top. So maybe pseudotcp is more efficient than UDT!

The code for pseudotcp, which seems to be just a single file, can be read here.

By the way, there are no wrappers or implementation of pseudotcp in C# either.

I found an interesting thread about libjingle and even a possible C# port here.

libnice

I started looking for info about pseudotcp and I found libnice. It is another P2P library and it seems they copied the pseudotcp code from Google and recently improved it. I found an article about it:

https://www.collabora.com/about-us/blog/2014/10/31/recent-improvements-in-libnice

Finally, FIN/ACK support has been added to libnice's pseudo-TCP implementation. The code was originally based on Google's libjingle pseudo-TCP, establishing a reliable connection over UDP by encapsulating TCP-like packets within UDP. This implemented the basics of TCP, but left things like the closing FIN/ACK handshake to higher-level protocols. Fine for Google, but not for our use case, so we added support for that. Furthermore, we needed to layer TLS over a pseudo-TCP connection using GTlsConnection, which required implementing half-duplex close support and fixing a few nasty leaks in GTlsConnection.

No wrapper or anything in C# for libnice either, although I read that there used to be a GTK# application that used DBus in C# to use it. Here they mention Chatter, a Telephathy GUI, and Telephathy is another P2P thing. I also found some references on a forum thread but didn't find any code.

The libnice pseudotcp code is in C instead of C++ and can be found here:

https://cgit.freedesktop.org/libnice/libnice/tree/agent/pseudotcp.c

Full source code and binaries

You can get the source code from GitHub https://github.com/psantosl/p2pcopy and there is also one binary release available in case you feel like using it :-P

Future steps: implement pseudotcp in C#

I would like to give a try to this idea and see how data transfer speed improves using pseudotcp instead of UDT.

Wrapping up

I admit I love all the network programming related stuff. Not that I'm a big expert, but I certainly enjoy it.

My motivation with all this P2P thing, though, is to add real peer-to-peer connectivity to our Plastic Tube feature. Right now, as I mentioned, all traffic goes through a relay. We don't cache any data at all, but obviously it would be more effective both in cost and transfer speed if it used true P2P. Unlike p2pcopy, the Tube uses a server to negotiate the connection and to serve as "directory service" so you can easily reach remote repositories by email like this: your_repo@tube:robin@batman.com, which is very handy.

7 comments:

  1. Really interesting read - thanks!

    ReplyDelete
  2. Here are a couple of webrtc stacks you can use with c#

    https://github.com/radioman/WebRtc.NET

    http://www.meshcommander.com/webrtc

    ReplyDelete
  3. this thing is really cool. are you going to put any more work into it? i noticed the github hasn't been updated in 3 months. 50% success rate maxes out about 1 mb/second, crashes partway through send when sending large files. this was with the compiled binary.

    ReplyDelete
    Replies
    1. Hey! Thanks for sharing!

      Yep, we got a little bit more than 1MB/s but not too much. I have to say that we don't get much more with reep.io.

      My goal is to use this code in our main product, Plastic SCM, for the P2P operations.

      In order to improve perf we should give a try to pseudo-tcp and see how it goes.

      Regarding the 50% success: uhm... yes, pushing the hole in the router sometimes needs more than 1 try. One thing you can do is to specify the port in the next data transfer after a successful one.

      The way to really improve it would be to use a central server as rendezvous point, but then part of the beauty of the idea is gone.

      Onto the crashes: not aware o any. Do you mean connection is lost sending big files? Because certainly implementing retries would be great.

      About the project activity: I'm open to receive contributions? What about adding retries?? :-) Feel like doing it?

      Thanks!

      pablo

      Delete
  4. Interesting read! Have you tried your code with either or both of the peers are behind a symmetric NAT? It seems that it won't work, given the whole port address translation issue.

    ReplyDelete
    Replies
    1. Well, since you exchange external ports, it should actually work. I use it often to exchange big files with teammates and it seems to work fine most of the time :-)

      Delete