#dominoforever | Product Ideas Portal

 

Welcome to the #dominoforever Product Ideas Forum! The place where you can submit product ideas and enhancement request. We encourage you to participate by voting on, commenting on, and creating new ideas. All new ideas will be evaluated by HCL Product Management & Engineering teams, and the next steps will be communicated. While not all submitted ideas will be executed upon, community feedback will play a key role in influencing which ideas are and when they will be implemented.

For more information and upcoming events around #dominoforever, please visit our Destination Domino Page

Add possibility to run agent on cluster server(s)

a scheduled agent has to be set on cluster server S1 oder S2. But if S1 is not available the agent wouldn't run. Automatically let cluster server S2 run agents set to run on S1 if S1 is not available.

note. After restart of a cluster server it has to revalidate with all its cluster servers before to run any agents set to run on it-self.

  • Avatar32.5fb70cce7410889e661286fd7f1897de Guest
  • Jul 18 2018
  • Likely to implement
  • Attach files
  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    8 Feb, 2019 11:58pm

    As it would be ideally.

    We have several clusters. Each cluster includes 4 servers.
    Cluster-1: hub-1, hub-2, app1, app2.

    In Domino Directory there are 2 specific sched agents configuration documents, where for Cluster-1 is written:

    Fields of doc-1:
    Name: MAIN_AGENTS;
    Server: hub-1.

    Fields of doc-2:
    Name: OTHER_AGENTS;
    Server: hub-2.

    In all schedule agents not the server is selected, but a specific configuration for launching the agent. I.e MAIN_AGENTS or OTHER_AGENTS.

    Server hub-1 is down.
    We convinced that it would not up quickly.
    In the document MAIN_AGENTS we change hub-1 server to app2 (at balancer for users in app1 priority).

    Profit:
    1. Easy to manage - just one change.
    2. There is no need to change and resign design elements (agents).

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    8 Feb, 2019 11:22pm

    Agents can mark their successful work in special logs on the administrative server. But I for manual switching.

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    28 Jan, 2019 04:44pm

     I thought of this as an enhancement to existing mechanisms: If you mark an agent as run on <cluster name>, in the agent design, the AMgr could lookup the cldbdir entry for this nsf to find out failover rules.

    The cldbdir could provide an easy to use admin interface to manage this feature similar to enabling/disabling cluster replication.  Of course there are issues to address - like timing on server startup between cldbdir replication and amgr initialization for run-on-cluster agents.
    Still far from simple - but it would let Domino Clustering shine even brighter

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    22 Jan, 2019 11:33am

    That's not what I meant. When ping (or ARP) DOES work, but NRPC does NOT -> Domino server down. If ping (ARP) does NOT work -> network issue, state of server unclear. Yes, this would prevent fail over in cases where the server has a hardware fault or power failure. But these issues happen next to never. It would work well in case of a Domino crash - and that's by far the most often cause for server unavailability.

  • Admin
    Thomas Hampel commented
    22 Jan, 2019 11:15am

    If ping (or name resolution) doesn't work it doesn't mean the other server is down. It also doesn't mean that the other server does not run agents or has not already executed these agents. So its rather tricky to solve this request.

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    22 Jan, 2019 08:51am

    @Thomas Hampel
    The server could detect if the other server is still reachable via ICMP. This does of course not make for 100% doubtlessness. But it would be sufficient from my point of view. And: Network outages are very rare, Domino crashes not (unfortunately)
    This fail over should of course be configurable per agent!

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    22 Jan, 2019 08:50am

    To prevent a split-brain condition you could:

    #1 Define a master server to run the agent if there is no connection to the other hosts.

    #2 Use at least 3 nodes (servers or dedicated arbiter software), so that a quorum is possible. (See: https://docs.gluster.org/en/v3/Administrator%20Guide/arbiter-volumes-and-quorum/#client-quorum )

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    22 Jan, 2019 08:46am

    @Thomas, good point. Maybe we should not make it automatic, but rather give the admin an option to fail over all agents to the cluster mate, using one command.

    So:

    • agents can be marked to run on 1 particular server, but with the possibility to activate failover,through a check-mark in the agent properties, for example. Eventually also add a second server field in the agent properties to choose one particular failover server.
    • when server A is down, the administrator can run a console command on the failover server so it takes over all agents which have the failover property set (and eventually this failover server in a second server field)

    With this, you should be able to tackle the potential network issue, with 2 servers running the same agent, modifying the same documents and creating replication conflicts after network gets restored.

    It still is manual, but only 1 command for all (failover activated) agents.

    I'm not sure about the second server field in the agent properties, but I think that might be handy in cases you have clusters of 3 or more servers and you want 1 particular server to be the failover for a particular agent, but potentially another server for another agent...

     

    And this doesn't necessarily needs to be limited to clusters. Let's say you have servers in different regions, with some applications replicating on schedule, it might be handy to use this feature also in case one of these servers is down for a longer period...

     

    Thibaud

  • Admin
    Thomas Hampel commented
    22 Jan, 2019 08:30am

    In case of a network hick-up both agents would run or how would servers detect that its not server outage but a connectivity issue?

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    29 Aug, 2018 07:34am

    This would solve many problems I encounter regularly. Needs some thought about making it optional or default - maybe by introducing the possiblity to run on <ClusterName> instead run on <ServerName> ?

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    20 Jul, 2018 11:41am

    This is a nice suggestion and feature that will make the cluster servers a real cluster even for agents. 

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    18 Jul, 2018 04:32pm

    Yes, this definitely needs to be addressed.