This article is third in a series of long, windy answers to the inevitable 'but what exactly do you do as a sysadmin consultant?' question. I started writing this because it's hard to give a sufficient short answer.
Back in part two I talked about the time I spend with organisations helping them manage their configuration and infrastructure 'state' as declarative code (e.g Puppet). Prior to that, in part one, I talked about continuous deployment of applications (particularly Drupal, and Magento). In both these cases I tried to transgress discussions about the technical details, and focus more on how a sysadmin helps organisations mitigate risk and even improve the experience of other colleagues using the company's systems.
Defining security in terms of risk
Security is no exception to this - and increasingly it seems that dealing with matters of information security is a bigger and bigger part of my role. Most of us are aware of the 'arms race' that is defending public (and even private) facing systems from attackers. Spin up a cloud server somewhere and in just a couple of minutes (or less), it will start receiving intrusion attempts or be scanned for the purposes of information gathering. The internet is not a friendly place for the things that run on it or for those who use it. But attack and defense is only one area of security.
A number of my customers have been going through the rigorous process of achieving ISO27001 compliance. This is a security standard that is supposed to demonstrate that a company has a reliable, well-maintained and effective suite of information security processes which are appropriate for their business.
This is a much bigger (or)deal than PCI-DSS (compliance typically related to ensuring an organisation and its infrastructure is appropriately building systems that can perform payment transactions). ISO27001 has done a good job of educating people about the fact that security is not just about building safe systems or adequately monitoring and responding to compromises. Instead, security is again about risk. A system that is meant to be up and running, but is down (not due to being hacked), is still a security issue, because it has detrimental affects on the business in a myriad of ways (i.e money is being lost, reputations are being damaged, or the nature of the downtime might have also resulted in integrity issues such as data corruption).
As a result, my work as a consultant is not just about typical security technical details (ensuring there are adequate firewalls (inbound and outbound), that there are virus scans, intrusion detection systems, access control, password policies, version header stripping, isolation of systems etc) but designing broader procedures and systems that conform to or make up a customer's ISMS.
This can include the automation of entries into risk registers, introducing threat monitoring systems, and attempting to automate disaster recovery tests. It can also include the implementation of rolling patch cycles (and routine automated checks to ensure services that depend on libraries which have been patched, have been restarted or reboots have taken place).
Things I like
Having said all that, there are of course some specific technical tools that particularly interest me at the moment in the area of security, some of which I have been deploying for organisations. These include:
- Threat intelligence (monitoring internet chatter with tools like Scumblr)
- Hardware 2-factor auth with Yubikeys
- Security by isolation models (such as the Qubes Linux distribution)
- Intrusion detection and integrity checking (e.g, and last but not least, OSSEC)
OSSEC has been especially an asset to me since 2012 after a spate of Drupal vulnerabilities related to CKeditor. I have given talks about its use and how important rigorous review of its alerts is, in terms of intrusion and pattern detection, and integrity (not just intrusions).
A race against time
A major mitigating factor to the issue of risk and security is timely detection. This includes monitoring not just of the systems you are running for signs of intrusion or human error (chmod -R 777 anyone?), but also keeping tabs on the outside world. Most of us run software developed by third parties, and security vulnerabilities are constantly being discovered in them. The time between announcement of a security vulnerability (and its fix) and the mitigation of that threat, is crucial.
In October 2014, a critical SQL injection vulnerability in the Drupal CMS was being exploited in the wild just 7 hours after its announcement (observed first-hand by myself). Many people didn't patch for some time, mostly because of these main reasons:
- They weren't subscribed to announcements about security vulnerabilities in Drupal (the dedicated mailing lists provided to the Drupal community this purpose, Twitter, RSS feeds etc)
- They didn't understand the severity or nature of the threat and so under-reacted
- They didn't have effective means of patching swiftly or in an automated fashion
All three of these are what I consider to be valuable merits that a Drupal-savvy systems administrator can bring. Stay notified, carefully interpret what you are being notified about, and have a plan for what must be done. I observed similar mistakes by others in the wake of the Heartbleed vulnerability.
Other common pitfalls
Other areas I regularly see agencies fail at security, other than running vulnerable versions of software or due to lack of monitoring for intrusions/mistakes, include:
- Poor access control (developers have access to production systems) - this often goes hand-in-hand with the absence of a decent deployment/workflow/'backward sync from prod to stage' system (whereby there is little need to login to production systems)
- Poor authentication control (no 2-factor auth, no SSH PKI, no screen locks on desktop environments, shared accounts/passwords etc)
- Poor operational security (particularly increasing risk of client-side attacks) - e.g no laptop encryption, leaving sensitive data on laptops which are left in public places or could be susceptible to 'Evil Maid' attacks), private key misuse
- Poor encryption (no use of VPNs, PGP, use of plaintext FTP systems, transmitting of passwords or other sensitive data in cleartext e-mails etc)
- Poor firewalls (frequently I see cases of no/slack outbound firewall rules, or default policies of ACCEPT instead of DROP)
- Poor implementations of backups, cron jobs (e.g database backup jobs that chew up disk space gradually and eventually crash and corrupt servers or data), or reckless tuning of memory settings etc which can lead to similar crashes
- 'Legacy' problems (services running that no longer need to be, data that should have been deleted but wasn't, or ex-employee accounts/access remaining active
- Information leakages (no stripping of metadata, headers etc)
- Poor definition of roles, responsibilities, change management, or chain of command (e.g no establishment of a chief security officer, lack of incident reporting or risk registers. Often it is mistakenly assumed by a development agency that a hosting company is taking care of problems that are not part of the hosting contract at all)
- This one I have experienced myself - the paralysing effect of fear (e.g customer knows something is wrong but is worried about acting, in case the act of change breaks something else/makes it worse. Inevitably, it means nothing is done)
None of these things are devoid of solutions. The good news is that things can only get better from there!
Security is about a lot more than staying on top of software vulnerabilities. It is a constant arms race whereby the 'good guys' are usually on the back foot. It requires constant re-evaluation of procedures, policies, changes to systems, and evaluation of the 'current climate'. Most importantly, security involves making sensible judgment calls by business leaders as to which risks matter the most. No-one can defend against everything or win all the time. Excessive security is usually at the cost of usability, and can make your staff either unhappy or subversive (they will simply find ways around the rules you've enforced, or sacrifice quality and/or speed to conform to the rigidity).
Fortunately, in the wake of major world events such as the 'Snowden revelations', security issues are receiving a lot more scrutiny by businesses of all sizes than they perhaps used to. Increasingly, areas such as encryption, data privacy etc and are being re-evaluated, and major organisations like Google are starting to play hard-ball by penalising organisations who deploy poorly secured systems (bad SSL certs/algorithms, preference to plaintext traffic etc).
I think the current trend is a good one and may help build a safer internet. In the meantime I hope to continue to help agencies improve the security of their systems, both for the sake of their businesses and the customers/users they serve. Get in touch if you are a Drupal shop seeking to improve on any of the above issues I've mentioned.
Coming up next, Part Four - where I'll be talking about effective monitoring solutions for server administration (and, yes, security). Then: