I hear this question a lot - both from non-technical folk, as well as agencies who know they are 'missing something' in their approach to deploying, securing and scaling applications, but aren't sure if a sysadmin will solve it. 'What is it that you (a sysadmin) actually does (e.g the day-to-day, or in general)?'
Sysadmin work is, in my opinion, an unusual role because it traverses a huge range of technologies, problems, risks, and there are often many solutions to the one task. In addition, the role can be both innovative as well as reactive. Things need to be built, and those things break. The sysadmin usually deals with both.
Despite years of working with agencies and their developers to help design tools that developers want to use, the sysadmin often has to work wonders and is frequently a 'lone ranger' with a complex peer group that is sometimes reached out to by simply 'Googling it'.
I thought I'd describe some of the things I've been doing, especially in the last couple of years. There seems no better place to start than with a whopper, and what allowed me to take the leap to full-time consulting: continuous deployment.
As many in the Drupal community know, around 2009/2010 I helped pioneer solutions to the famous 'dev/stage/live' workflow problem when deploying Drupal apps across such chained environments - particularly with a focus on Aegir, but the paradigm was more general than that - more general, even, than Drupal. My interest has always been in automating the procedure of delivering change (that's how I found Aegir). This began because I'm simply lazy. It ended up being more altruistic, after seeing firsthand that once that process is easier or automatic, the 'change' becomes not only easier, but enjoyable.
Change is risky - not just technical change. Things might go wrong. No-one likes to break things that weren't meant to be broken. People are scared of getting in trouble or causing problems. We often would rather go out of our way to avoid upsetting the 'norm' or what we think could wind up being a worse experience than we expected it to be. The inevitable side-effect of this is that necessary changes get put off, procrastinated, and build up in a queue. It's ironic that sometimes things break only because of the sheer amount of 'change' that's built up in this procrastination - that the very accumulation itself sometimes introduces new ways to fail that weren't apparent as those little bits and pieces were made one at a time. Ever put off going to the dentist?
The crucial aim of automating change through effective change management, is to reduce risk. But looking beyond trying to stop things 'going wrong', the aim should also be to make dealing with things going wrong, an easier process. Things will always go wrong. What happens next is what matters.
Much of my work building (usually bespoke these days, not Aegir*) deployment solutions is automating the deployment, but also handling rollback, notification and management of the situation gone wrong. A simple example: a bad hook_update() can break your site, and reverting your git commit isn't going to change your database. What then? Build systems based on a sensible methodology: back up first, attempt upgrade, rollback if it fails, alert people. Try to be quiet about success, so that it becomes 'the norm'. As you can see, this goes beyond 'what my awesome shell script can do'. This is about changing attitudes, and trying to make people feel less ill about what they have to do. Deployment anxiety (or any anxiety, for that matter) sucks, and it doesn't have to exist.
A fascinating phenomenon that I've observed in agencies is that once developers recognise the convenience of the deployment process ('what? I just push to Git and that's it?'), they seem to become naturally compelled to practice the famous mantra of 'commit early, commit often'. A pain point of development (deploying) is suddenly gone, so it actually becomes enjoyable to practice it - and you practice it by pushing your bite-size chunks of change more frequently. Developers are smart - they figure this out for themselves. No need for a training session on 'how often we're gonna deploy now' - just show them the new deployment system and they will do this themselves.
As a result, the changes start to become simpler and more frequent. This has a multitude of effects: the first is that quality improves, because it's harder to make big mistakes in small amounts. The 'bank-up queue' of change is gone, so many unexpected side-effects simply don't exist. Another effect is that developers become more confident in the process. Morale improves. On-boarding of new developers, or re-arranging of roles, becomes easier (because git history becomes easier to understand). And when the occasional mistake is made and the system detects and backs it out or alerts about it, much time and frustration is saved that would normally be consumed trying to find out exactly what's gone wrong.
Multiply these effects N times if you use a deployment methodology (not just tool!) that is abstracted and consistent across all N projects.
The work of 'sysadmin' is not just a reactive or innovative role for solving technical problems. The sysadmin doesn't have to live in a basement and hate everybody. Designing and implementing automated deployment systems also can have a sizeable effect on the culture of an organisation. It can help people enjoy coming to work. It can result in higher quality of work. And that can result in a better reputation for a company, in the eyes of its customers. And that can result in more customers - and therefore, more money. All from the desire to make a pain point less painful.
What other benefits has automating continuous deployment offered your business? I am also happy to answer questions (or offer engagement) if you are trying to work out a way to do what I've described. Why not hire a consultant who's in the business of helping make that happen?
In Part Two of what I do as a sysadmin, I'll talk some more about automation and enforcing state as part of change management (think: Puppet).
Part Three will talk about security,
Part Four: monitoring,
Part Five: troubleshooting or 'ghosts in the machine',
Part Six: high availability and disasters (sometimes the same thing :) ),
Part Seven: communication.
* Lots of people are often surprised to discover I don't do a lot of Aegir work now. Most of my customers now have more bespoke needs (such as Drupal/Magento hybrids, 'unusual' Drupal deployments, complex integrations/inter-communications, high-availability clusters etc - stuff that Aegir can be a bit inflexible about). Inevitably I am in business, and it is often more viable time-wise (and therefore commercially) to build bespoke tools to solve such problems. Fabric-driven Jenkins deployment is my main solution these days. Feel free to get in touch if you think it would be something you might need).