Who we are

We are the developers of Plastic SCM, a full version control stack (not a Git variant). We work on the strongest branching and merging you can find, and a core that doesn't cringe with huge binaries and repos. We also develop the GUIs, mergetools and everything needed to give you the full version control stack.

If you want to give it a try, download it from here.

We also code SemanticMerge, and the gmaster Git client.

Troubles in .NET remoting when IP changes

Sunday, October 26, 2008 Pablo Santos , 3 Comments

I’m going to talk about an issue I had with remoting a few days ago when handling a specific scenario with a server and a client.

You know remoting is no longer the newest and cooler technology out there, since WCF is being there for a while, but AFAIK we don’t have WCF yet on Mono, which means it simply doesn’t exist for me and all you multi-platform people. So, that’s why I’m still diving into the deep remoting implementation looking for answers.

Here’s the problem: I have a simple scenario like the one in the following picture:



A client and a server working together, nothing special. The server publishes a IRemoteCall interface using remoting and the following simple code (not I’m marshalling an existing object, with infinite lifetime, which is not the usual way of doing things unless you’re an old COM cowboy... :-P)


using System;
using System.Threading;
using System.Runtime.Remoting;
using System.IO;

using RemotingTest;

public class Remote: MarshalByRefObject, RemotingTest.IRemoteCall
{
public override object InitializeLifetimeService()
{
return null;
}

public void Send(string data)
{
for( int i = 0; i < 10; ++i )
{
Thread.Sleep(1000);
Console.WriteLine("server is waiting...");
}
Console.WriteLine("server is done");
}
}

public class Server
{
public static void Main(string[] args)
{
RemotingConfiguration.Configure("remoting.conf");
Remote rem = new Remote();
ObjRef objservice = RemotingServices.Marshal(rem, "remote");
System.Console.WriteLine("Hit to exit...");
System.Console.ReadLine();
RemotingServices.Disconnect(rem);
}
}



Ok, you see there’s nothing special so far.

Just a few lines of code to implement the shared interface and the server code to start up the service and publish it.
The client is even simpler:


using System;
using System.Runtime.Remoting;
using RemotingTest;

public class Client
{

public static void Main(string[] args)
{
RemotingConfiguration.Configure("remoting.conf");

Client me = new Client();

string server = args[0];

me.Run(server);

Console.WriteLine("Waiting for next call");
Console.ReadLine();
Console.WriteLine("Running again");
me.Run(server);
Console.ReadLine();
}

private void Run(string server)
{
IRemoteCall remoteCall =
(IRemoteCall)Activator.GetObject(
typeof(IRemoteCall),
server);

try
{
remoteCall.Send("hello");
}
catch (Exception ex)
{
Console.WriteLine("Exception: " + ex.Message);
}
}

}


Simple, right?

It simply access the remote object (specified at the command line, in my case something like tcp://beardtongue:6060/remote), makes a call, waits for a user key hit, and runs a second call.
The next listing is the remoting.conf file used on the client:

<configuration>
<system.runtime.remoting>
<application>
<channels>
<channel ref="tcp" >
</channel>
</channels>
</application>
<customErrors mode="Off" />
</system.runtime.remoting>
</configuration>


And the one for the server:


<configuration>
<system.runtime.remoting>
<application>
<channels>
<channel ref="tcp" port="8080">
<serverProviders>
<formatter ref="binary" typeFilterLevel="Full" />
</serverProviders>
</channel>
</channels>
</application>
<customErrors mode="Off" />
</system.runtime.remoting>
</configuration>


I’m using TCP remoting channels in binary mode.

Well, needless to say it was not my start up scenario, but it is a simple example I wrote to study in detail a problem I discovered on a real deployment. During normal operation both the client and server work smoothly, but I ran into problems when the machine running the server experienced an IP change. How this happened? Well, I was traveling back and forth from the office to home, and I experienced problems with clients and servers running on the same laptop. They worked at the office but I had to restart the client to continue working at home. What was going on? The issue seemed to be related to the IP change the laptop was having at each network when connecting to a different DHCP server. The scenario is better depicted at the next figure:



Let’s see: I have a server up and running, then my client makes a first successful call. My laptop is suspended and awakened with a new IP address. Then the client tries a new call and it fails.

In order to reproduce it without having to reconfigure my DHCP server or traveling 20km each time, I used the following commands on my XP laptop:


>netsh interface ip set address
name="Conexión de área local"
static 192.168.1.253
255.255.255.0
192.168.1.1 1


(Everything on a single line)

I changed from 253 to 245 each time I changed IP.

Why doesn’t it work if I’m using the server name instead of the IP? Shouldn’t it work?

Then I run the test using the Mono runtime and... it worked!! In fact, I was playing with a custom TCP channel I wrote deriving from the Mono TCP channel (implementing SSL and some other tuning) and it was working too... So the problem with the .NET implementation is somewhere in the TCP Channel, not the remoting stack.
After some study I found out the following: inside the Mono TCP channel implementation there’s a small class named ReusableTCPChannel which in turn implements a property named IsAlive. The IsAlive property makes a call to:

return !Client.Poll (0, SelectMode.SelectRead);


which basically checks whether the underlying socket is still valid each time it is retrieved from the internal TCP channel socket cache. When the server’s IP changes and the client tries to run the next call it detects that the socket is no longer usable and creates a new one.

This is not happening on the .NET stack (I’m talking about 1.1, didn’t check whether it is solved on .NET 2.0 or 3.5) and the initial socket is reused to issue the next call, and simply raises an exception after a few seconds when it can’t reach the server anymore. The problem could be probably fixed somewhere inside the remoting library, maybe at the SocketCache.GetSocket method, where the RemoteConnection is retrieved from a HashTable and the underlying socket is used but never checked.

Fortunately it’s not an extremely common scenario for server applications, but if it happens to you try to grab another TCP channel (use the one from Mono) to get it fixed.
Pablo Santos
I'm the CTO and Founder at Códice.
I've been leading Plastic SCM since 2005. My passion is helping teams work better through version control.
I had the opportunity to see teams from many different industries at work while I helped them improving their version control practices.
I really enjoy teaching (I've been a University professor for 6+ years) and sharing my experience in talks and articles.
And I love simple code. You can reach me at @psluaces.

3 comments:

  1. Interesting article! I'm currently developping a "quick" client/server solution to administer and run some batch-work run by night, using Remoting for the communication.

    I'll do the tests tomorrow to see if the problem is still present in framework 2.0 and let you know ! :P

    ReplyDelete
  2. Yes, the problem is still in .NET 2.0 as I have run into it also, in a situation where the server has a dynamic address. I am planning on handling this scenario by detecting the failure to connect, and re-establishing a new connection, if necessary by doing the DNS lookup myself and connecting by IP address.

    ReplyDelete
  3. We have a slightly different setup but the same issue. We're also using .NET 2.0. Our client devices have a docking station they can use for hard-wired network access but they also have a wireless module they use when lifted from the docking station. When switching from hard-wired to wireless or vice versa the NIC and IP address they use to connect to the server changes. The client device still gets its message sent to the server but the server seems incapable of figuring out where to send the return message. It seems it's still trying to send the return message to the previously used (now invalid) network card of the client and doesn't notice the message it just received didn't originate from there... Hopefully we can fix this by reinitializing the objects on the client side when the connection status of its network adapters changes.

    ReplyDelete