Over a million developers have joined DZone.

From Inventing Google Docs to Fixing Monitoring: Startup Tech Stack Interviews Steve Newman of Scalyr

Scalyr CEO and founder Steve Newman talks performance trends, fixing monitoring, and Scalyr specifics.

· Performance Zone

See Gartner’s latest research on the application performance monitoring landscape and how APM suites are becoming more and more critical to the business, brought to you in partnership with AppDynamics.

Startup Tech Stack is a regular segment where we sit down to chat with an up and coming startup to find out about their project, what technologies their using and what they’re excited about in technology at the moment.

This week we’re talking to Steve Newman, CEO and founder of Scalyr, a log aggregation and monitoring startup that close a $2.1m seed funding round earlier in 2015.

Thanks for taking the time to talk to us Steve. Could you give us a bit more information about Scalyr, what it does and who your customers are?

We're operations visibility as a service. We gather server logs and other operational data, and give engineering teams the ability to search, analyze, monitor, and generally understand what's going on in their servers.

Scalyr is an outgrowth of our experience at Google, trying keep up with the chaos of modern, large-scale, cloud-based deployments. The cloud makes it easy to build a system with lots of moving parts. Amazon or whoever runs the pieces for you; your job becomes keeping track of the big picture. With so many services to manage, that alone becomes a full-time job.

Our customers come in all shapes and sizes. The common thread is that they're outgrowing their existing visibility tools; there are too many issues to investigate, and each investigation is a pain, so some issues get dropped and it's hard to nip problems in the bud. 

The monitoring space is very crowded at the moment, with the like of Splunk, Geneos to name just a couple. How is Scalyr different? What is the problem you’re trying to solve?

We have a single goal: reducing friction. When you're juggling five complicated tools, and searching your server logs takes long enough that you go check email and lose your train of thought, it's hard to get anything done. As one of our customers put it, what makes Scalyr unique is:

1. Speed

2. Speed

3. Speed

One of the early discoveries at Google, one that people don't talk about so much any more but it still holds, is that making something faster makes it feel different – lighter weight, more accessible. We've engineered our system in a novel way that lets our users run ad-hoc searches in (usually) less than 1 second. The design is simultaneously cheaper to operate than traditional index-based solutions, so you don't have to pick and choose which logs you can afford to store.

Everything we do is about speed in some fashion. We've worked hard to keep the UI simple as we add power, so that you don't have to puzzle through a complex query language to get answers. And we integrate traditionally separate functions like log management, system metrics, and application monitoring, so you don't have to stop and switch tools in the middle of an investigation.

A lot of our users come to us from Splunk; they appreciate the power Splunk offers, and they're looking for a simple, fast solution. We've put together a "Splunk alternative" page that discusses Scalyr for people used to Splunk.

What technologies are you using to build Scalyr? I presume there’s a certain amount of proprietary technology, but what tools and libraries have you been using to build and run the system? (I’m personally very interested in the UI tech you’ve chosen, as this seem to be the riskiest area at the moment for new tools)

It's a relatively simple stack. We've built a lot in-house to keep it simple and streamlined. The frontend is based on Angular. The backend is fully custom, sitting directly on Linux. We've been writing in Java, which has given us a good balance of productivity, safety, and performance. (I keep expecting to have to rewrite bits in C++ for raw speed, but so far we've always found a way to make Java work, and I like the safety guarantees it provides. The Rust team is doing some very interesting things to balance that magic triangle of safety, performance, and productivity, and I could see us exploring that someday.)

We're hosted in EC2. We don't use a lot of other AWS services, but we're in the process of moving data storage to S3. We're all about burstable bandwidth, and S3 is a great platform for that if you use it carefully.

Do you use Scalyr to monitor Scalyr?

Of course! This is a big deal for us. Obviously it's important to experience your own product. But I can very honestly say that I don't know how we'd manage operations with any other tool. Scalyr lets us be very proactive about investigating small issues before they become user-facing problems. One result is that – knock on wood – we almost never get paged for a middle-of-the-night issue. I literally can't remember the last time it happened.

What technologies and changes in the industry are you excited about at the moment?

"The cloud" is a cliche at this point, but the truth is that we've barely scratched the surface of that trend. EC2 made it easier to spin up a server, that was nice. Now Amazon and others will run just about any piece of software so you don't have to run it yourself, that's cool too. But when we as an industry are pooling all of our physical resources together into these gigantic cloud data centers, some really interesting things become possible. We're one example of that. We provide radically better performance than you'd get running a little on-premise installation of some traditional solution. We get this huge unfair advantage from pooling all the hardware in one place. I think we'll start to see this in more and more areas, but it requires completely rethinking the system architecture. Most of what people do in "the cloud" today is not a new architecture, it's just on-demand provisioning of traditional packaged software.

You have a long history in startups, having founded Writely which went on to become Google Docs along with a number of other successful companies. How do you think the world of startups has changed over your career?

I'm hardly the first to say this, but: it's just so easy to launch things now, and so fast to iterate. I remember when getting a bugfix to your customers involved a FedEx shipment to the disk duplication factory. No matter how agile your startup was, there were severe limits on how quickly you could interact with the outside world. Now everything is online and you can get anything as a service. It took us 100 days to launch Writely – that's from my partner Sam saying "hey listen to this idea I just had" to a live app. Today it would be even quicker, not to mention more polished and scalable.

So I think there's more opportunity than there ever was. The world is moving faster, and speed is the fundamental advantage of a startup.

What’s next for Scalyr?

We have two big initiatives right now. Claudia Carpenter, the UI guru for Writely, came on board this summer and the team is working on a major redesign that we're really excited about. It's just going to blow away anything people have seen in this space for both simplicity and power. It's hard to combine those attributes, but we've taken things down to first principles and come up with a design that's extremely accessible while actually adding more features.

On the backend, we're constantly scaling up and finding ways to operate more efficiently. A while back we ran a blog post about our backend architecture, titled "Searching 20 GB / second". I'm looking forward to the day when we can write a followup, "Searching 1000 GB / second." It's not too far off.

Anything else you'd like to tell our readers?

Just to come check us out! One nice thing about this space is that, because you can run two log aggregators side-by-side, it's easy to experiment with new solutions. If your team is avoiding production issues because they're a pain to investigate... that doesn't have to be a fact of life.

The Performance Zone is brought to you in partnership with AppDynamics.  See Gartner’s latest research on the application performance monitoring landscape and how APM suites are becoming more and more critical to the business.

splunk,logging,s3,amazon aws,google docs,monitoring,monitoring and performance,devops

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}