How to deal with References that point to "dead" servants

This is a discussion on How to deal with References that point to "dead" servants within the Theory and Concepts forums in category; Hi friends of Corba, I have another beginner question: I have implemented a distributed application using CORBA....(for those who know, that's not the one with MFC...). Everything works reasonably fine, but of course, in this system it can be, that a Servant that has bound at the nameservice, crashes for some reason. In this case, of course, the reference to the "dead" servant still exists in the nameservice and other clients may attempt to invoke a request on it. At the moment, when a client makes a request on a dead servant, it takes a certain amount of time (lets ...

Go Back   Application Development Forum > Theory and Concepts

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 12-19-2005, 09:08 AM
maxpower24@gmx.net
Guest
 
Default How to deal with References that point to "dead" servants

Hi friends of Corba,

I have another beginner question:

I have implemented a distributed application using CORBA....(for those
who know, that's not the one with MFC...). Everything works reasonably
fine, but of course, in this system it can be, that a Servant that has
bound at the nameservice, crashes for some reason.

In this case, of course, the reference to the "dead" servant still
exists in the nameservice and other clients may attempt to invoke a
request on it.

At the moment, when a client makes a request on a dead servant, it
takes a certain amount of time (lets say 40sec) before it gets a CORBA
-exception. This behaviour is not acceptable, that is why:

1.) Is there a possibilty that when a servant crashes (no exception,
but e.g no electricity),
it will automatically be unbound????

2.) If no (which I expect), how can I decrease the time period
before the CORBA-Exception
is thrown?? For instanrce, it would be ok, if this exception
came within 100ms or so...

3.) I think the best solution would be something like this:
If (servant exists){
make the request
} else {
unbind the corpse...;-)
}

I know, there is a CORBA-funktion like non_existent(), but
unfortunately this function
is not part of the CORBA Minimal Standard. Does anybody have
an idea, how I can
if the reference is still valid before making the actual
request??????

I hope, somebody out there an help me or at least give me a
hint....Thanx in advance and merry christmas!!!

Reply With Quote
  #2  
Old 12-20-2005, 05:55 AM
GrahamJWalsh@gmail.com
Guest
 
Default Re: How to deal with References that point to "dead" servants

Hi Max,

Assuming timeouts are not possible, the approach I mention below might
well work for you. If you have timeouts available, then on the client
side you could simply reduce it to 100ms, say. If no response it
received within that time interval the client can catch the exception
and take the appropriate action.

If timeouts are not an option to you....

others will have different views on this but your problem is not new.
Essentially, you need to ensure the integrity of the objects bound to
the Name Service. ie. you need some means of checking the status of
each object and unbinding it if needs be. Ideally, you'd like to be
able to unregister your objects upon deletion. i.e. in the dtor for
your servants you can put code in there to unregister them. That deals
with normal operation.

However you're more concerned with un-graceful servant
crash...whatever. In that case you're getting no dtor code being called
and stale/bad refs are knocking around the NS. You could avail of
callbacks to achieve the desired effect. I mean you could have a method
in your servants which "call back" on an object every N seconds. This
is effectively pinging and all the baggage that comes with it. n/w
traffic etc. It's an option to you all the same.

Another approach would be for you to have some sort of monitor process.
This process iterates over the entries in the name service every N
seconds and issues some sort of "ping" method on the object in
question. When the ping method receives no response or throws an
exception, your client can dedude the object is no longer valid and
needs to be removed/unbound from NS. This wil reduce.. but not
erradicate, the possibilty of your clients getting bogus references
from the Name Service.

Both of these involve code. there's no corba silver bullet here that
addreses your problem. One of these solutions might be feasible to you.
Alternatively you could take the advice of ciaran mc hale who will
probably blow holes in my posting ha ha.


Aside: I wonder if the latest version of the CORBA spec addresses this
in any shape of form. It's an old problem.

Cheers

Graham

Reply With Quote
  #3  
Old 12-21-2005, 03:18 AM
Ke Jin
Guest
 
Default Re: How to deal with References that point to "dead" servants

Hi,

see inline,

Ke

maxpower24@gmx.net wrote:
> Hi friends of Corba,
>
> I have another beginner question:
>
> I have implemented a distributed application using CORBA....(for those
> who know, that's not the one with MFC...). Everything works reasonably
> fine, but of course, in this system it can be, that a Servant that has
> bound at the nameservice, crashes for some reason.
>
> In this case, of course, the reference to the "dead" servant still
> exists in the nameservice and other clients may attempt to invoke a
> request on it.
>
> At the moment, when a client makes a request on a dead servant, it
> takes a certain amount of time (lets say 40sec) before it gets a CORBA
> -exception. This behaviour is not acceptable, that is why:
>


If it took 40 secs to report this problem, then, it is not the
"servant" was not available, but the "link level connection" was not
available (e.g. you unplugged the cable connection or server host was
powered off).

> 1.) Is there a possibilty that when a servant crashes (no exception,
> but e.g no electricity),
> it will automatically be unbound????
>


A servant can't crash. I believe you mean a "server process" crashes or
the "server host" crashes (or power down), rather than "servant"
crashes.

In case of "server process" crashes, you should get a CORBA
OBJECT_NOT_EXISTS exception immediately. Becase transport level can
report the failure when it unable to initiate the connection to the
target server.

In case of "server host" crashes (or power off), it is same as a link
layer failure (same as you unplug the cable connection). It depends on
your client side TCP engine's TCP connection timeout. On unix, it took
few ten seconds (40, as you said above) to timeout. On Windows, it took
few seconds (I remember it is only 2 seconds) for you to see that
exception.

Also, some ORB may have other rebind mechanism (such as VisiBroker will
try to query osagent for an alternative server) which may increase the
timeout value in link layer failure case.

> 2.) If no (which I expect), how can I decrease the time period
> before the CORBA-Exception
> is thrown?? For instanrce, it would be ok, if this exception
> came within 100ms or so...
>


a. Reconfigure your link layer (reconfig unix kernel, or windows
registry).

or

b. set request timeout. however, the side effect is, you may get a
TIMEOUT exception, even the connection and the server and the object
implementation are perfectly ok (but take longer time to process the
request).

Regards,
Ke

> 3.) I think the best solution would be something like this:
> If (servant exists){
> make the request
> } else {
> unbind the corpse...;-)
> }
>
> I know, there is a CORBA-funktion like non_existent(), but
> unfortunately this function
> is not part of the CORBA Minimal Standard. Does anybody have
> an idea, how I can
> if the reference is still valid before making the actual
> request??????
>
> I hope, somebody out there an help me or at least give me a
> hint....Thanx in advance and merry christmas!!!


Reply With Quote
  #4  
Old 12-21-2005, 08:12 PM
Michi Henning
Guest
 
Default Re: How to deal with References that point to "dead" servants

Ke Jin wrote:

> In case of "server process" crashes, you should get a CORBA
> OBJECT_NOT_EXISTS exception immediately. Becase transport level can
> report the failure when it unable to initiate the connection to the
> target server.


That's not right. OBJECT_NOT_EXISTS is authoritative, and can be returned
only after consultation with the server. If the server cannot be reached,
you should get COMM_FAILURE or some such.

Cheers,

Michi.
Reply With Quote
  #5  
Old 12-21-2005, 10:01 PM
Ke Jin
Guest
 
Default Re: How to deal with References that point to "dead" servants

Michi Henning wrote:
> Ke Jin wrote:
>
> > In case of "server process" crashes, you should get a CORBA
> > OBJECT_NOT_EXISTS exception immediately. Becase transport level can
> > report the failure when it unable to initiate the connection to the
> > target server.

>
> That's not right. OBJECT_NOT_EXISTS is authoritative, and can be returned
> only after consultation with the server. If the server cannot be reached,
> you should get COMM_FAILURE or some such.
>


Whatever, the point is not about the type of exception, but the latency
of a some such exception. As said, by default, in case of server
crashed before request sending, client should get an exception
*immediately*, instead of after 40 seconds.

Regards,
Ke

> Cheers,
>
> Michi.


Reply With Quote
  #6  
Old 12-22-2005, 12:36 PM
Markus Elfring
Guest
 
Default Re: How to deal with References that point to "dead" servants

> Both of these involve code. there's no corba silver bullet here that
> addreses your problem. One of these solutions might be feasible to you.


What kind of bullets are the exceptions "COMM_FAILURE" and "OBJECT_NOT_EXISTS"?


> Aside: I wonder if the latest version of the CORBA spec addresses this
> in any shape of form. It's an old problem.


Are there any well-know solutions from fault tolerance technology available?
- http://www.ociweb.com/cnb/CORBANewsBrief-200301.html
- http://citeseer.ist.psu.edu/368321.html
- http://www.cs.wustl.edu/~schmidt/cor...-reliable.html
- http://en.wikipedia.org/wiki/Crash-only_software

Regards,
Markus


Reply With Quote
  #7  
Old 12-22-2005, 02:39 PM
Ke Jin
Guest
 
Default Re: How to deal with References that point to "dead" servants

A questionable assumption behind some fault tolerance technologies in
addressing this issue (namely large timeout value under link layer
disconnect) is: broken link layer connection is not only real-time
detectable, but also a fault to be handled by up layer "fault tolerance
technology". This is not necessary true for packet switch network (such
as ethernet). Transient link layer disconnection in packet switch
network is neither necessary real-time detectable nor necessary to be a
fault for transport or application layer on top of it. For instance, if
the unplugged cable is put back (or a temporary off-line router is put
back on line) before the timeout (40 seconds observed by the original
post), transport and application layer should continue to function
without notice this transient link layer problem.

If an application really need real-time scale link layer disconnection
report and handling, it should consider to use circuit link instead of
setting a real-time scale (e.g. few hundred milli seconds) timeout on
packet switch link. This (namely set a real-time scale timeout in
packet switch network) would likely introduce more problems than it
solves (such as pre-mature and very vulnerable transport connection).

Regards,
Ke

Markus Elfring wrote:
> > Both of these involve code. there's no corba silver bullet here that
> > addreses your problem. One of these solutions might be feasible to you.

>
> What kind of bullets are the exceptions "COMM_FAILURE" and "OBJECT_NOT_EXISTS"?
>
>
> > Aside: I wonder if the latest version of the CORBA spec addresses this
> > in any shape of form. It's an old problem.

>
> Are there any well-know solutions from fault tolerance technology available?
> - http://www.ociweb.com/cnb/CORBANewsBrief-200301.html
> - http://citeseer.ist.psu.edu/368321.html
> - http://www.cs.wustl.edu/~schmidt/cor...-reliable.html
> - http://en.wikipedia.org/wiki/Crash-only_software
>
> Regards,
> Markus


Reply With Quote
  #8  
Old 12-22-2005, 06:26 PM
Michi Henning
Guest
 
Default Re: How to deal with References that point to "dead" servants

Ke Jin wrote:
>
> If an application really need real-time scale link layer disconnection
> report and handling, it should consider to use circuit link instead of
> setting a real-time scale (e.g. few hundred milli seconds) timeout on
> packet switch link. This (namely set a real-time scale timeout in
> packet switch network) would likely introduce more problems than it
> solves (such as pre-mature and very vulnerable transport connection).


In part, the problem isn't just caused by the difficulty of detecting
network failure in a timely manner, but also by use of the naming
service in the first place. What we have here is a stateful server
that maintains a bunch of objects, and a stateful naming service
that maintains a bunch of IORs to these objects. So, the server
and the naming service maintain redundant state, namely the notion
of which objects exist at any given time.

Of course, the server and the naming service can fail independently,
which leaves us with the problem that their respective state can go
out of sync, and how to recover if it does.

So, this is a design problem, as much as anything else. Any CORBA
system that dynamically updates the naming service in this fashion
is vulnerable to the problem and should probably be redesigned.
Instead of putting every IOR there is into the naming service, the
naming service should contain only a few key IORs that are needed to
get off the ground (and that denote essentially singleton objects).
Then, instead of the naming service, add a lookup interface to the
actual server. Problem solved: no state can go out of sync, and nothing
ever needs cleaning up.

IMO, overall, the naming service is a pretty bad idea. Apart from the
quite horribly botched IDL design, the service is pragmatically not
very useful. At most, I'd use it to locate a handful of key IORs that
clients need to get off the ground. For everything else, it's better
to build the functionality into the server itself, especially when
the set of IORs that clients need to look up is not stable and changes
all the time.

Cheers,

Michi.
Reply With Quote
  #9  
Old 12-22-2005, 08:47 PM
Ke Jin
Guest
 
Default Re: How to deal with References that point to "dead" servants

Michi Henning wrote:
> Ke Jin wrote:
> >
> > If an application really need real-time scale link layer disconnection
> > report and handling, it should consider to use circuit link instead of
> > setting a real-time scale (e.g. few hundred milli seconds) timeout on
> > packet switch link. This (namely set a real-time scale timeout in
> > packet switch network) would likely introduce more problems than it
> > solves (such as pre-mature and very vulnerable transport connection).

>
> In part, the problem isn't just caused by the difficulty of detecting
> network failure in a timely manner, but also by use of the naming
> service in the first place.


The observed long latency by the original post is not relevant to use
of naming service. The discussed long latency is purely a nature of
packet switch network. You would have exactly same error report latency
if you try to telnet a host which is powered off (or disconnected from
network).

> What we have here is a stateful server
> that maintains a bunch of objects, and a stateful naming service
> that maintains a bunch of IORs to these objects. So, the server
> and the naming service maintain redundant state, namely the notion
> of which objects exist at any given time.
>
> Of course, the server and the naming service can fail independently,
> which leaves us with the problem that their respective state can go
> out of sync, and how to recover if it does.
>


I am confused. What state is to be sync'ed between naming service and
stateful application implemenation object? And why you don't need this
kind of synchronization if application object is stateless? Do you see
other similar directory services need such state synchronization? For
instance, does DNS server sync its state (namly domain-ip mapping) with
a stateful host OS (such as number of processes and their state)?

> So, this is a design problem, as much as anything else. Any CORBA
> system that dynamically updates the naming service in this fashion
> is vulnerable to the problem and should probably be redesigned.


I don't see using naming service could cause the large latency (40
secs, questioned in the initial post) of error reporting, nor see
without using naming service would avoid this latency. This latency is
irrlevant to whether and how naming service is used, irrelvant to
whether object is stateful or stateless, but purely a nature of packet
switch network and link layer error detect algorithm/setting of
transport layer atop.

> Instead of putting every IOR there is into the naming service, the
> naming service should contain only a few key IORs that are needed to
> get off the ground (and that denote essentially singleton objects).
> Then, instead of the naming service, add a lookup interface to the
> actual server. Problem solved: no state can go out of sync, and nothing
> ever needs cleaning up.
>
> IMO, overall, the naming service is a pretty bad idea.


I wouldn't comment on the goodness or badness of the idea of naming
service, but would like to point out that naming service is just an OMG
abstraction of various old and long existent directory services, such
as DCE naming service, ONC naming service (yellowpage) and even IETF
DNS. They share the same nature under the same circumstance. Such as,
you would get the same error report latency when you try to telnet a
host which is off-line temporary. Also, I still don't see the relevance
between naming service and the discussed problem, namly report
OBJECT_NOT_EXISTS or COMM_FAILURE after 40 seconds on sending a request
to an object on an off-line (or powered off) host.

Regards,
Ke

> Apart from the
> quite horribly botched IDL design, the service is pragmatically not
> very useful. At most, I'd use it to locate a handful of key IORs that
> clients need to get off the ground. For everything else, it's better
> to build the functionality into the server itself, especially when
> the set of IORs that clients need to look up is not stable and changes
> all the time.
>
> Cheers,
>
> Michi.


Reply With Quote
  #10  
Old 12-23-2005, 12:23 AM
Jonathan Biggar
Guest
 
Default Re: How to deal with References that point to "dead" servants

Michi Henning wrote:
> IMO, overall, the naming service is a pretty bad idea. Apart from the
> quite horribly botched IDL design, the service is pragmatically not
> very useful. At most, I'd use it to locate a handful of key IORs that
> clients need to get off the ground. For everything else, it's better
> to build the functionality into the server itself, especially when
> the set of IORs that clients need to look up is not stable and changes
> all the time.


Once approach that has the advantage of interoperating with clients
expecting to use the naming service is to embed an implementation of the
naming service in your server. Then you can guarantee that the object
references exported by the naming service don't get out of sync with the
actual object lifetimes.

Federate your custom naming service implementation with the generic one
and voila!

--
Jonathan Biggar
jon@floorboard.com
jon@biggar.org
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 03:44 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.