Provide failover for schedule agent execution on Domino Cluster

In current every release of Domino server (9.0.x/10.0.x/11.0.x), when failover occurs on Domino Cluster, schedule agent can't not be automatically switched to be executed on failover target server(secondary server), it can only be executed on the original primary server.

The customer wish HCL could provide the following new functions to support failover for schedule agent execution on Domino Cluster

1) provide some extra setting to select the secondary server where schedule agent can execute after failover.

2) Also provide API which can select the secondary server where schedule agent can execute after failover.

Admin

Thomas Hampel

Jan 18, 2021

Admins do not always know why an agent needs to run, or where it is supposed to run.
Even today developers, who want to have high availability for agents implemented, can do so by self defining how this is supposed to work.
e.g. the agent scheduled on server#2 can check periodically if it is possible to open the application on #server1. Or you can call a low level OS method to 'ping' the other server for checking if it still is alive.
However, all business logic and repl. conflict prevention is to be managed by the developer.

Reply
Hide replies

Guest

Dec 24, 2020

The idea of a second server is bad - both servers may crash. Which server in the cluster should the agents run on then?
But there is a way to avoid conflict.
1. In the configuration document, change the processing server to a working server manually (administrator).
2. Change server launch after crash (HCL) - do it without running Amgr and RunJava. And automatically start them only after the configuration database has been replicated.

Reply
Hide replies

Guest

Dec 13, 2020

@ThomasHampel: this is absolutely right... but there is no way to find a solution for clustered agents that does NOT failin a szenario where network connection between servers is lost. How should the failover server determine, if the other server was completely shutdown via Power Switch or if only the network connection is down... all solutions -even if included in server core- need to check if the other server is able to run the agent... best regards, Torsten

Reply
Hide replies

Admin

Thomas Hampel

Dec 13, 2020

In this example, disconnecting the network cable, or a simple routing issue between the cluster members will lead to agents running on both servers at the same time because server2 will assume that server1 is down while server1 still runs but without a network connection to its cluster partner(s)

Reply
Hide replies

Guest

Dec 8, 2020

I agree, that this would be very useful. I work with a configuration document and a special check for this purpose: Agent is scheduled to run on "all servers". The configuration document contains the server the agent is currently meant to run. As soon as the agent starts, it checks, if server = configured server. If it is not, then it tries to reach the configured server and open the database there. If it opens, then agent stops. If it does NOT open, then target server must be down or database corrupt, so the agent writes its own servername in the configuration document and is the "configured server" from that moment on, until it goes down and the other agent takes over (or an admin manually changes back the server in the config document).

Reply
Hide replies

Please enter your email address

RELATED IDEAS

Provide failover for schedule agent execution on Domino Cluster Merged

Provide failover for schedule agent execution on Domino Cluster