Logs. We love them. We hate them. We can’t collect enough of them. We collect too many. We save them forever. We never look at them. We pour over them. They incriminate. They repudiate. Logs.
Logging, how you do it, what it does, and when to use them begs a diverse number of responses from just about anyone in IT, DevOps, or an audit department. Why would such a boring topic have such a wide variety of opinion and emotion with it? Two reasons come to mind.
First, the scenarios in which you use logs are as numerous as the scenarios you have for programming in the first place. It’s understandable that opinions are so diverse because logs are telemetry for your application and your runtime environment. This means that each environment will have a unique perspective and thus unique logging requirements.
Second, we invoke a lot of opinion and emotion around the topic of logs due to the history of system administration and the newly found empowerment of DevOps. Logging in the wrong format, missing data, failure to centralize, etc., are all problems that we faced as an industry that caused us to develop calluses and scars in our psyche. If you ever want to hear a good rant, just ask your operations person to tell you a story about an issue they faced with logging. It will go something like this…
Logs are a problem all unto themselves, but our system telemetry, business metrics, and performance promises hinge on them. We push forward with logging efforts because of the great insight they provide to our business and technology environment. However, for security, and application security specifically, logs aren’t enough.
Logs Miss the Mark for Application Security
Logs and security go hand in hand. All compliance frameworks require them to some degree. We write pages and pages on retention periods, encryption of logs, storage, etc. Security uses logs for forensics and for determining what actually happened in the system. This works for post-breach events and trying to discern how access was gained or privileges were escalated.
When you have a hammer, every problem looks like a nail. This is the approach the security industry took when applying logging techniques to application security. Logging worked for network devices and firewalls, why not application security as well? In answer to this, Phillip Maddux (Px Mx) recently presented a list of seven challenges faced when using logs for application security purposes.
All seven points are on target, but I specifically want to call attention to three of the challenges in the list:
- Delayed response to identifying issues.
- Limited data (no POST body, no header data).
- Limited context.
Let's take each one in turn.
Delayed Response Means Delayed Decisions
Centralized logging can be done any number of ways. Emitting directly to the log aggregation stack or via a store and forward agent. Depending on your architecture this means a log gets indexed within a wide range of time from seconds to minutes (or maybe even hours). This is generally fine for most use cases like alerting on an error or updating sales figures, but this doesn’t work for preventing command execution or XSS.
Being out of band of the application means that any application security problems and the necessary defensive response will inherently have a delay.
Limited Data Means Limited Available Insight
You log your HTTP traffic, of course you do, because you are a responsible member of society. Source, destination, response codes. All very standard. What you don’t log are POST bodies, header data, and, in some cases, even query strings. We don’t log these because they often contain passwords, tokens, and other sensitive data. Remember all those compliance frameworks? Well, they dictate that we don’t log this type of data because of how sensitive this it can be. In the past attackers often sought out log data for these reasons.
Since we aren’t logging this extra HTTP data, we aren’t inspecting it for attacks. This is a substantial gap in our coverage and means that we are missing a piece — or in this case multiple pieces — of the puzzle. To really understand how you are being attacked and whether attackers are being successful, we need to see the full picture, and the full HTTP traffic flow.
Limited Context Means Limited Understanding
Lastly, it is difficult to tie disparate events together through logs. Was the HTTP 500 error tied to a SQL injection attempt? Does the rise in failed logins correlate to the spike in XSS attacks? Being able to correlate across business logic (login failures), anomalous responses (error codes) and attack traffic (SQLi, XSS) can be difficult to accomplish via logging.
This gap in context forces us to treat all attacks equally from a casual internet-wide bot to a persistent attacker trying to dump our database. Without context, it is difficult to tell these two apart.