The original post can be found on the Electric Cloud blog.
In a recent Continuous Discussions (#c9d9) video podcast, expert panelists discussed ChatOps and DevOps.
Our expert panel included: Michael Chletsos, founder at InfoTech MegaCorp; Rad Dougall, geek translator; Jonathan Rudenberg, co-founder at Flynn; Marcus Young; web operations engineer at Stratasan; and Sam Fell.
During the episode, the panelists discussed key benefits, use cases, and best practices for ChatOps. Continue reading for their full insights. (Also, be sure to check out the best practices shared by Daniel Perez from HPE on ChatOps in the enterprise, which also include the code for their Hubot integrations on GitHub!)
What is ChatOps?
ChatOps brings more clarity into what everyone on the team is doing, per Cheltsos: “When somebody says, ‘The world is on fire,’ and then somebody runs off into the wherever they run and they start doing stuff, everyone is like, ‘What are they doing?’ and nobody knows. ChatOps is bringing that to the foreground so that everybody knows what’s going on. No more hidden secrets. No more, ‘It’s fixed’ – you see why it was fixed.”
Dougall emphasizes the benefit of being able to work in different time zones and still being clued-in to what’s going on: “What I like about ChatOps is I can dive in and out. Even if my guy in New Zealand is asleep, I can see what he did to fix the problem. And, if I’m communicating with the customer in my time zone, I don’t need him to be there. I can see everything in one place, like a single payment blast on all of the systems.”
ChatOps are much more current and useful for knowledge sharing than Wikis, says Wallgren: “I think of ChatOps of being like a Wiki, only it’s not out of date. It’s the constantly self-updating Wiki. ‘How do I this?’ Well, go search in the chat, and chances are it’s going to work. On the other hand, the Wiki page was edited 18 months ago by somebody who doesn’t work there anymore. I kind of crap on Wikis all the time because I find them terribly useful for spelunking and architectural digging around into what we did five years ago. But it’s never where I go first. So I think ChatOps as more like what Wikis should be – always up-to-date, always current. The right people are available, you can ask questions and get answers in real time, as opposed to having to drop out of the Wiki and go talk on email or that kind of stuff.”
ChatOps moves work from dashboards to chat rooms, allowing for more discussion, explains Rudenberg: “I like to think of ChatOps as basically moving a lot of the work that you normally do in a command line or in dashboards, into whatever chat room you’re using for your organization. We use it a lot for incoming alerts, and it allows for a discussion of what’s going on in production without having to think about linking people off to other places and so on. You can also bring in active commands to do things against your infrastructure from the chat room as well.”
ChatOps allows another layer of visibility into daily tasks, says Young: “I’ve been doing ChatOps for about the last year, because I feel like it’s the next stage of getting people to be able to do things in a visible way, and getting rid of hidden messages and trying to do certain things on certain servers.”
Fell gives his definition of ChatOps: “It’s a very, very digestible way for people to communicate, and if you have it plugged into your infrastructure, then it’s a pretty convenient way to communicate with that infrastructure as well.”
Key Use Cases and Integrations
ChatOps can be beneficial for larger companies and teams as well, says Fell: “You might be using lots of different software from lots of different people who could potentially have access to that infrastructure. For example, HPE is the ElectricFlow deck plus the Hubot, which is a SaaS solution. They were incorporating the security measures, the access control list. Their obfuscating credentials are using that to actually deploy those credentials out, to help with the failover when there was a problem to help make that stuff happen. So, there are large companies that have edited this out and that are using it.”
Wallgren on the security issues of ChatOps: “The self-service aspects of it are pretty cool. It’s always a very powerful thing to put people in charge of helping themselves. The whole problem with security and the signal-to-noise ratio are not unique to ChatOps, and it’s kind of interesting that it brings them out in a more visceral way. If we’re going to use this tool and be present in it and participate, you have to solve the problem or people are going to get pissed off and abandon the tool. So I think the security thing is really interesting, you know, who typed that and how do we know who they are, and are they who they said they are, and can we go ahead and do the thing that they asked to be done? That’s a tricky problem to solve outside of the ChatOps environment, so then having to do that full, end-to-end integration sounds like a pretty juicy problem.”
Slack may not be totally secure, but it is a great tool for monitoring and feedback, explains Cheltsos: “I’m kind of a Slack minimalist. I know a lot of ChatOps is you go into Slack and you’re like, ‘Deploy this here now.’ Well, maybe my security spider-sense goes off, in that I don’t think Slack is that secure and if you’re able to deploy from Slack, so are the employees of Slack. That’s my personal view. So, I get a little leery on that, but I love to use it for monitoring and logging. The feedback from a deploy is in there, actually kicking off builds. I like to see the whole pipeline just automate itself, so as soon as you’re pushing code and if everything’s green – or blue, depending on your version of Jenkins or whatever – it just goes right on forward to production after all the checks and balances are done. I love to feedback into a Slack channel that people can join at will and see what’s going on.”
ChatOps allows for tasks and requests to be handled much more quickly than over email, per Dougall: “For the most part, it’s getting everybody in the same place to talk about the same thing in real time. My pet peeve is emails where everybody is CC’d just in case. So, for us, ChatOps speeds things up. For customers, it speeds things up. There’s no waiting around. There’s always somebody online in the channel from our team, and it’s easy to give that history and transparency to the customers as well. Here was an alert, this is what happened. Our guys will jump on it, even if everybody from the customer’s team was asleep at the time that it happened. They can wake up in the morning, have a look, and they can see exactly what happened.”
Rudenberg recommends using Securitybot to solve most security issues with ChatOps: “DropBox, actually, just open-sourced a tool called Securitybot. The idea is that if you have a security team, one of the things that they like to do is to watch for suspicious behavior on servers, typically engineers running Studio, for example, on servers. In the past, how you track that stuff down is you track down the engineer that ran the planning and ask, ‘Did you run this? Why did you run this’? What Securitybot does is it integrates into Slack, and you hook up your monitoring alerts for suspicious activity on servers to it, and it will send a message to the user that performs the weird command, and say, ‘Hey, did you do this? If so, can you just explain why you did this?’ Then, it sends a duo, two-factor-off push to their phone to verify that they are the person who they say they are. After doing that, it logs it, and then the security team can review a roll out of all of the events and the explanations for why that happened. It actually can add security as opposed to removing security, if you’re running commands through chat and so on.”
Young was able to empower his developers to deploy independently using ChatOps: “As far as integrations go, I actually like to email a lot because I don’t like a lot of noise at all. What happens is you end up with a notifications channel where GitHub sends information, New Relic sends information. It basically just becomes the garbage bin. Everybody needs it and nobody looks at it, because there’s so much that nobody wants to read through and figure out what’s relevant to them. I found that everybody wants those integrations, but nobody ever wants to read them. So, my advice is to allow it to empower people, but not a lot of parsing unless you want it. I usually add commands for getting health of an environment, and then that command knows how to go and scrape Amazon for all the saving stuff, whether it’s database, metrics, ad metrics, how many are in the load balancer, that kind of stuff, and aggregating that back in a readable way that you actually would care about. It’s been working out pretty well, to the point where I was actually on vacation for about a week and I didn’t get any calls. Developers were able to deploy every day, they were able to push their code out and not really need me unless it’s an emergency. My end goal is to be the silent guy that you don’t realize is actually keeping everything running.”
Best Practices for ChatOps
Make ChatOps fun to use and easy to change, advises Young: “Allow people to change the bot. Not only use it, but allowing them to have a say in what it can do. What I found is that I would add commands, and then those would either be too complex or just didn’t make sense. It’s hard for me to assume that I know what all the developers want to see, so I let it have a pipeline and a lifecycle like any real application. Another one that I found that I haven’t been able to implement is to get rid of the ambiguity of it, and make it more of a living entity instead of just at-deploy staging. Whatever you do you want it provide feedback to you, maybe be sassy at times, be a little fun, make it more human, because then it makes it more interesting to interact with, but it also lets you lower the barrier of usage.”
Don’t rely too much on one technology, keep it flexible, says Wallgren: “One thing that you want to be careful of is you don’t want to code yourself into a corner, where you’re completely reliant on one particular technology. Then, when the next new thing comes along, the new coolness, the amount of heavy lifting that you’re going to have to do to get on to that new platform, or that new tool, is going to be pretty big. So you don’t want to get yourself too locked into any one tool, and be able to lift and shift to some other new, cool platform that you might want to use.”
Fell emphasizes the importance of keeping the noise level down: “Too much signal-noise ratio is pretty important because at a certain point people just shut down and they say, “’Well, I’m just not going to read it anymore.’ They’ll make their own thing off on the side. They’re like, ‘Well, this is the one that I actually have to pay attention to,’ which is not useful because then you’re actually taking things away from the public collaboration forum that you’ve built in order to eliminate some of that noise, which may not be so helpful for people.”
Make your chat bot personal and useful, advises Cheltsos: “Create a persona for your bot. There should not be a single monolithic bot that does everything. You have a lot of things that are ‘the bot,’ and to brand this bot and to make it your friend and coworker is very important, because all of a sudden you start to have an interaction with this coworker of yours. I called mine ‘Andy’ at one point because I worked with a lot of Andy’s. Andy would comment on our GitHub pull-request. He would do code review for us, he would comment on tickets. Andy was everywhere, the 24-hour worker. It was great. People started to write emails to Andy which, and they would do @Andy, they would mention him in places, and I would laugh because I think some people really thought this was a person after a while. And was pretty cool, so I think that’s very key, to start to treat it not just as fun and games but as a tool, but then put that personality on top of it which makes it fun again, but in a productive way.”
Dougall had some advice on what not to do:“Don’t integrate everything that you can. My guys used to say I have ‘Shiny Thing Syndrome’ because I’d go down the list and be like, ‘Oh, I can integrate this with that and then plug it in. Yay!’ But why? Just why? And that only came from having that problem of that channel that nobody reads because everything is in there. Nobody cares. None of it’s interesting. None of it’s useful for me to act on.
Rudenberg emphasizes keeping things simple: “I really want to strongly emphasize that keeping things simple is good. I know it’s been mentioned before, but you can’t reiterate this enough. If there is noise and it is not useful, just get rid of it, and try to winnow your incoming alerts down to just the stuff that’s useful. And then, if you want to be able to access other things, just have query commands that will pull in a chart or a status on demand, as opposed to proactively pushing those in.”