Skip to Main Content
HCL Domino Ideas Portal

Welcome to the #dominoforever Product Ideas Forum! The place where you can submit product ideas and enhancement request. We encourage you to participate by voting on, commenting on, and creating new ideas. All new ideas will be evaluated by HCL Product Management & Engineering teams, and the next steps will be communicated. While not all submitted ideas will be executed upon, community feedback will play a key role in influencing which ideas are and when they will be implemented.

For more information and upcoming events around #dominoforever, please visit our Destination Domino Page

132 VOTE
Status Assessment
Workspace Domino
Created by Guest
Created on Jul 18, 2018

Add possibility to run agent on cluster server(s)

a scheduled agent has to be set on cluster server S1 oder S2. But if S1 is not available the agent wouldn't run. Automatically let cluster server S2 run agents set to run on S1 if S1 is not available.

note. After restart of a cluster server it has to revalidate with all its cluster servers before to run any agents set to run on it-self.

  • Attach files
  • Guest
    Reply
    |
    Feb 8, 2019

    As it would be ideally.

    We have several clusters. Each cluster includes 4 servers.
    Cluster-1: hub-1, hub-2, app1, app2.

    In Domino Directory there are 2 specific sched agents configuration documents, where for Cluster-1 is written:

    Fields of doc-1:
    Name: MAIN_AGENTS;
    Server: hub-1.

    Fields of doc-2:
    Name: OTHER_AGENTS;
    Server: hub-2.

    In all schedule agents not the server is selected, but a specific configuration for launching the agent. I.e MAIN_AGENTS or OTHER_AGENTS.

    Server hub-1 is down.
    We convinced that it would not up quickly.
    In the document MAIN_AGENTS we change hub-1 server to app2 (at balancer for users in app1 priority).

    Profit:
    1. Easy to manage - just one change.
    2. There is no need to change and resign design elements (agents).

  • Guest
    Reply
    |
    Feb 8, 2019

    Agents can mark their successful work in special logs on the administrative server. But I for manual switching.

  • Guest
    Reply
    |
    Jan 28, 2019

     I thought of this as an enhancement to existing mechanisms: If you mark an agent as run on <cluster name>, in the agent design, the AMgr could lookup the cldbdir entry for this nsf to find out failover rules.

    The cldbdir could provide an easy to use admin interface to manage this feature similar to enabling/disabling cluster replication.  Of course there are issues to address - like timing on server startup between cldbdir replication and amgr initialization for run-on-cluster agents.
    Still far from simple - but it would let Domino Clustering shine even brighter

  • Guest
    Reply
    |
    Jan 22, 2019

    That's not what I meant. When ping (or ARP) DOES work, but NRPC does NOT -> Domino server down. If ping (ARP) does NOT work -> network issue, state of server unclear. Yes, this would prevent fail over in cases where the server has a hardware fault or power failure. But these issues happen next to never. It would work well in case of a Domino crash - and that's by far the most often cause for server unavailability.

  • Admin
    Thomas Hampel
    Reply
    |
    Jan 22, 2019

    If ping (or name resolution) doesn't work it doesn't mean the other server is down. It also doesn't mean that the other server does not run agents or has not already executed these agents. So its rather tricky to solve this request.

  • Guest
    Reply
    |
    Jan 22, 2019

    @Thomas Hampel
    The server could detect if the other server is still reachable via ICMP. This does of course not make for 100% doubtlessness. But it would be sufficient from my point of view. And: Network outages are very rare, Domino crashes not (unfortunately)
    This fail over should of course be configurable per agent!

  • Guest
    Reply
    |
    Jan 22, 2019

    To prevent a split-brain condition you could:

    #1 Define a master server to run the agent if there is no connection to the other hosts.

    #2 Use at least 3 nodes (servers or dedicated arbiter software), so that a quorum is possible. (See: https://docs.gluster.org/en/v3/Administrator%20Guide/arbiter-volumes-and-quorum/#client-quorum )

  • Guest
    Reply
    |
    Jan 22, 2019

    @Thomas, good point. Maybe we should not make it automatic, but rather give the admin an option to fail over all agents to the cluster mate, using one command.

    So:

    • agents can be marked to run on 1 particular server, but with the possibility to activate failover,through a check-mark in the agent properties, for example. Eventually also add a second server field in the agent properties to choose one particular failover server.
    • when server A is down, the administrator can run a console command on the failover server so it takes over all agents which have the failover property set (and eventually this failover server in a second server field)

    With this, you should be able to tackle the potential network issue, with 2 servers running the same agent, modifying the same documents and creating replication conflicts after network gets restored.

    It still is manual, but only 1 command for all (failover activated) agents.

    I'm not sure about the second server field in the agent properties, but I think that might be handy in cases you have clusters of 3 or more servers and you want 1 particular server to be the failover for a particular agent, but potentially another server for another agent...

     

    And this doesn't necessarily needs to be limited to clusters. Let's say you have servers in different regions, with some applications replicating on schedule, it might be handy to use this feature also in case one of these servers is down for a longer period...

     

    Thibaud

  • Admin
    Thomas Hampel
    Reply
    |
    Jan 22, 2019

    In case of a network hick-up both agents would run or how would servers detect that its not server outage but a connectivity issue?

  • Guest
    Reply
    |
    Aug 29, 2018

    This would solve many problems I encounter regularly. Needs some thought about making it optional or default - maybe by introducing the possiblity to run on <ClusterName> instead run on <ServerName> ?

  • Guest
    Reply
    |
    Jul 20, 2018

    This is a nice suggestion and feature that will make the cluster servers a real cluster even for agents. 

  • Guest
    Reply
    |
    Jul 18, 2018

    Yes, this definitely needs to be addressed.

19 MERGED

Failover for AGENTS

Merged
in cluster Agent should have additional settings to properly work in cluster. So Admin/Developer can select Active/Active -agent randomly executed on one of servers, Active Pasive- agent primary runs on one server, in case of failure if agent i...
almost 6 years ago in Domino / LotusScript 0 Assessment
5 MERGED

Provide failover for schedule agent execution on Domino Cluster

Merged
In current every release of Domino server (9.0.x/10.0.x/11.0.x), when failover occurs on Domino Cluster, schedule agent can't not be automatically switched to be executed on failover target server(secondary server), it can only be executed on the ...
almost 4 years ago in Domino / Administration 5 Assessment