Always on solution for Server Automation


Brief Description
Always on solution for Server Automation
Benefits / Value
A large shared autoamtion environment has the requirement to be always on. This applies for normal operations time as well for planned upgrades.
where SA in normal operations has a good shared nothing architecture allowing each Core to operate. currently there is large downtime when you want to perform an inplace upgrade to a newer software version.
This is currently a major blocker to use SA for continously available large automation projects.
as soon as a company achieved a level of autoamtion in all areas (Incident, Change, Request Fullfillment) it is impossible to bridge a larger time frame where automation platform is not available.
already due to the fact that there are no people in operations any more that can do this manually
So that means implicitly that automation platform must be continously available
Design details
Currently you can not upgrade Cores without impacting running Jobs. Firstly SA needs a mechanism to Dry out a core not taking new Jobs and allowing old jobs to be rerouted to another Core or having a clear indication that there is still an active job using a route to this Core (rerouting satellites would break jobs)
so probably a gracefull routing switch in satellites is needed
a more Product inbuild funtion to take a core into maintenance would be needed (currently you could only do this manually by loadbalancers taking core out of balancing)
Software upgrades must consistently allow different versions in a mesh to exist
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.