Add possibility to run agent on cluster server(s)

a scheduled agent has to be set on cluster server S1 oder S2. But if S1 is not available the agent wouldn't run. Automatically let cluster server S2 run agents set to run on S1 if S1 is not available.

note. After restart of a cluster server it has to revalidate with all its cluster servers before to run any agents set to run on it-self.

Attach files
Enter a subject
Drop here to upload

Guest

Feb 8, 2019

As it would be ideally.

We have several clusters. Each cluster includes 4 servers.
Cluster-1: hub-1, hub-2, app1, app2.

In Domino Directory there are 2 specific sched agents configuration documents, where for Cluster-1 is written:

Fields of doc-1:
Name: MAIN_AGENTS;
Server: hub-1.

Fields of doc-2:
Name: OTHER_AGENTS;
Server: hub-2.

In all schedule agents not the server is selected, but a specific configuration for launching the agent. I.e MAIN_AGENTS or OTHER_AGENTS.

Server hub-1 is down.
We convinced that it would not up quickly.
In the document MAIN_AGENTS we change hub-1 server to app2 (at balancer for users in app1 priority).

Profit:
1. Easy to manage - just one change.
2. There is no need to change and resign design elements (agents).

Reply
Hide replies

Guest

Feb 8, 2019

Agents can mark their successful work in special logs on the administrative server. But I for manual switching.

Reply
Hide replies

Guest

Jan 28, 2019

I thought of this as an enhancement to existing mechanisms: If you mark an agent as run on <cluster name>, in the agent design, the AMgr could lookup the cldbdir entry for this nsf to find out failover rules.

The cldbdir could provide an easy to use admin interface to manage this feature similar to enabling/disabling cluster replication. Of course there are issues to address - like timing on server startup between cldbdir replication and amgr initialization for run-on-cluster agents.
Still far from simple - but it would let Domino Clustering shine even brighter

Reply
Hide replies

Guest

Jan 22, 2019

That's not what I meant. When ping (or ARP) DOES work, but NRPC does NOT -> Domino server down. If ping (ARP) does NOT work -> network issue, state of server unclear. Yes, this would prevent fail over in cases where the server has a hardware fault or power failure. But these issues happen next to never. It would work well in case of a Domino crash - and that's by far the most often cause for server unavailability.

Reply
Hide replies

Admin

Thomas Hampel

Jan 22, 2019

If ping (or name resolution) doesn't work it doesn't mean the other server is down. It also doesn't mean that the other server does not run agents or has not already executed these agents. So its rather tricky to solve this request.

Reply
Hide replies

Guest

Jan 22, 2019

@Thomas Hampel
The server could detect if the other server is still reachable via ICMP. This does of course not make for 100% doubtlessness. But it would be sufficient from my point of view. And: Network outages are very rare, Domino crashes not (unfortunately)
This fail over should of course be configurable per agent!

Reply
Hide replies

Guest

Jan 22, 2019

To prevent a split-brain condition you could:

#1 Define a master server to run the agent if there is no connection to the other hosts.

#2 Use at least 3 nodes (servers or dedicated arbiter software), so that a quorum is possible. (See: https://docs.gluster.org/en/v3/Administrator%20Guide/arbiter-volumes-and-quorum/#client-quorum )

Reply
Hide replies

Guest

Jan 22, 2019
@Thomas, good point. Maybe we should not make it automatic, but rather give the admin an option to fail over all agents to the cluster mate, using one command.

So:
- agents can be marked to run on 1 particular server, but with the possibility to activate failover,through a check-mark in the agent properties, for example. Eventually also add a second server field in the agent properties to choose one particular failover server.
- when server A is down, the administrator can run a console command on the failover server so it takes over all agents which have the failover property set (and eventually this failover server in a second server field)
With this, you should be able to tackle the potential network issue, with 2 servers running the same agent, modifying the same documents and creating replication conflicts after network gets restored.

It still is manual, but only 1 command for all (failover activated) agents.

I'm not sure about the second server field in the agent properties, but I think that might be handy in cases you have clusters of 3 or more servers and you want 1 particular server to be the failover for a particular agent, but potentially another server for another agent...

And this doesn't necessarily needs to be limited to clusters. Let's say you have servers in different regions, with some applications replicating on schedule, it might be handy to use this feature also in case one of these servers is down for a longer period...

Thibaud
Reply
Hide replies

Admin

Thomas Hampel

Jan 22, 2019

In case of a network hick-up both agents would run or how would servers detect that its not server outage but a connectivity issue?

Reply
Hide replies

Guest

Aug 29, 2018

This would solve many problems I encounter regularly. Needs some thought about making it optional or default - maybe by introducing the possiblity to run on <ClusterName> instead run on <ServerName> ?

Reply
Hide replies

Guest

Jul 20, 2018

This is a nice suggestion and feature that will make the cluster servers a real cluster even for agents.

Reply
Hide replies

Guest

Jul 18, 2018

Yes, this definitely needs to be addressed.

Reply
Hide replies

19 MERGED

Failover for AGENTS

in cluster Agent should have additional settings to properly work in cluster. So Admin/Developer can select Active/Active -agent randomly executed on one of servers, Active Pasive- agent primary runs on one server, in case of failure if agent i...

over 6 years ago in Domino / LotusScript 0 Assessment

5 MERGED

Provide failover for schedule agent execution on Domino Cluster

In current every release of Domino server (9.0.x/10.0.x/11.0.x), when failover occurs on Domino Cluster, schedule agent can't not be automatically switched to be executed on failover target server(secondary server), it can only be executed on the ...

over 4 years ago in Domino / Administration 5 Assessment

Please enter your email address

RELATED IDEAS

Add possibility to run agent on cluster server(s)