Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Maintaining a Database of Reputation Data With Splunk

DZone's Guide to

Maintaining a Database of Reputation Data With Splunk

We developed an open-source plugin to maintain a database of reputation data from various open resources with a response rate of tens of thousands of requests per second.

· Database Zone
Free Resource

Read why times series is the fastest growing database category.

In the Splunk application database, there exist multiple solutions for enrichment and retrieving information that a certain IP address has been added to a certain reputation database and therefore looks suspicious.

However, these solutions are either commercial (i.e. Recorded Future App or Kaspersky Threat Feed App) or tremendously slow (i.e. IP Reputation App, which, at the time of writing, hasn’t been fully available since February due to maintenance). That led us to develop our own open-source plugin, RST Cloud Threat Database Add-On for Splunk. It allows us to maintain a single database of reputation data gathered from various open resources and provides a response rate of tens of thousands of requests per second.

We created the plugin using a small set of Python scripts that automatically enrich Splunk search results requesting additional details from external databases on the fly. We used Redis Key-Value storage as the database. Redis keeps everything in RAM and doesn’t depend on I/O latencies of the disk drives. We published the source code at GitHub. It’s open to suggestions for improvement.

Our performance tests were executed on a dual-core Intel® Xeon® E5-2630 virtual machine with 4 Gb of RAM. The results indicate that even taking into account overheads from Python 2.7, virtualization on standard equipment and Splunk itself, with 300K records in Redis the plugin throughput averages at about 25K RPS, which is enough for a fair amount of tasks. Notice that these figures are acquired using “off the shelf” Redis with no additional optimization or clustering. Also, we haven’t yet used Redis pipelining mechanisms for database access in the search script.

For example, we can use the plugin to detect web form spammers or connections to the website from infected IP addresses.

RST Cloud IP Reputation

In the console, we can easily print out a list of suspicious clients.

sourcetype=Web:*:access_log host=www.demo.demo | fields clientip | dedup clientip |
lookup local=true lookupthreat clientip OUTPUT threatscore threatsource threatcategory |
where threatscore > 0

In the resulting data, we show its source, the category under which a certain IP address is listed in each reputation database, and the cumulative Threat Score.

Threat Score in Splunk search

For your convenience, we also have a short-hand macro:

`threatDB(clientip)`

For environmental simplicity, Redis can be installed directly on Splunk Head or on another server or a server cluster. Besides, the RST Cloud Threat Database Add-On contains a number of scripts that automatically download data from various sources and import it to Redis.

Nowadays, there exist a lot of reliable open resources for gathering suspicious and dangerous IP addresses. Our plugin supports more than 15 of them, including:

  • Sblam! A blog, forum, and comment web-spammer database.
  • StopForumSpam. A blog, forum, and wiki web-spammer database.
  • CINS Score. A sentinel IPS database shared by the company.
  • Blocklist.de. A database of addresses that attack Postfix, SSH, and Apache; Spambots, IRC-Bots, Reg-Bots, DDoS, etc.
  • Ransomware Tracker. A CnC server addresses for Ransomware.
  • AlienVault OTX. Open data feeds from the famous SIEM.
  • Binary Defence. A specialized Threat Intelligence provider.
  • EmergingThreats. We integrated ProofPoint feeds for blocking on firewalls.
  • Arbor ATLAS. DDoS-attacking addresses from a very famous company.
  • Botvrij. Malware-compromised addresses.
  • Tor Project. Tor-network addresses.

The database can include both individual addresses that you can search by the IP key: red.smembers('ip:'+clientip), and subnets that are handled by a looped script.

Now, let’s take a look at the installation process. Step-by-step instructions:

  1. Install Redis.
  2. Install Python dependency libraries.
  3. Modify connection strings in the scripts.
  4. Set up a CRON-task for keeping the IP Reputation DB up to date.

We’ll skip the first step, as it is explained in details at Redis’ website. Besides, on Debian, it’s just one command: apt-get install -y redis-server for a Redis server with default settings.

To resolve Python library dependencies, it’s sufficient to run the following:

wget bootstrap.pypa.io/get-pip.py
python get-pip.py
sudo pip install redis
sudo pip install netaddr

It’s worth noting that Splunk uses its own Python, and it’s not recommended to modify it. Therefore, it’s better to install all extras to the Python in the operating system.

Depending on Redis installation options, you might need to change addresses and ports in the following scripts:

Main search script redisworker.py:

sys.path.append("/usr/local/lib/python2.7/dist-packages") # Path to redis-py module
redis_server = '127.0.0.1'
redis_port = 6379

Database clean up script threat_flushdb.py and updated IoC download script threatuploader.py:

redis_server = '127.0.0.1'
redis_port = 6379

IoC download from various resources script start_threatupload.sh:

base_dir=/opt/splunk/bin/scripts/threatDB
python_bindir=/usr/bin

Then, you have to select a temporary folder:

mkdir -p /tmp/threatsupload

And set up CRON scheduler, i.e. using /etc/crontab:

2 0 * * * root $SPLUNK_HOME/etc/apps/threatDB/bin/start_threatupload.sh /tmp/threatsupload

We recommend using a separate account instead of root in the production environment. We’d like to point out that once-a-day default update frequency is more than enough because the TTL of the DB records is 48 hours.

Commercial solutions provide a number of indicators of compromise, including IP addresses, domain addresses, hashes, file paths, mutex names, etc. These allow for identification of malicious software activity inside an organization. We concentrate on web security, thus our plugin only works with IP addresses. In future, we plan to improve the performance of the plugin and extend its functionality. Among everything else, we plan to add DNS reputation database feeds, thus making the plugin applicable to additional tasks.

Learn how to get 20x more performance than Elastic by moving to a Time Series database.

Topics:
splunk ,python ,database ,tutorial

Published at DZone with permission of Yury Sergeev. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}