Over a million developers have joined DZone.

How To Check Linux Process Deeply With Common Sense

DZone's Guide to

How To Check Linux Process Deeply With Common Sense

Apparently, process checking is critical. Yes, we already have tons of Linux tools and tips available. Getting familiar with your weapons is actually the first step and the easiest part.

· Performance Zone
Free Resource

Apparently, process checking is critical. Yes, we already have tons of Linux tools and tips available. Getting familiar with your weapons is actually the first step and the easiest part.

More importantly, are what questions you ask, and what for, when approaching your critical process. Fortunately, with common sense we can dig out lots of valuable information.


Assumptions Before Deep Dive

Here we assume you are familiar with:

  • FD (file descriptor): Everything in Linux is a file.
  • /proc pseudo filesystem: How Linux kernel exposes in-depth information of process.
  • lsof, top, ps, grep: First time heard of them? Excuse me?

Basic Check For Linux Process

When the Process Is Started and How Long It Runs

This helps us to detect whether an unexpected or suspicious service restart has happened. As a supplementary, decent service will always do proper logging, which can confirm our observation.

# Get start time by pid
ps -eo pid,comm,etime,user | grep $pid

# Sample output:
root@s1:~# ps -eo pid,comm,etime,user \
                  | grep 20513
20513 dockerd          8-00:58:30 root
# It means 8 days, 58 min and 30 sec

Where Is the Log File?

A very common question, especially from Dev or QA. Usually, process will do continuous logging. Thus it holds fd of log files. lsof can list all fd opened by the process. So you don't need to ask anyone to find out the answer!

# Find out log files by pid
lsof -P -n -p $pid | grep ".*log$"

# Sample output:
# root@s1/# lsof -p 40 | grep ".*log$"
# daemon .. /var/log/jenkins/jenkins.log
# daemon .. /var/log/jenkins/jenkins.log

# Check log files for error/exceptions
grep -C 3 -iE "exception|error" $logfile

How Much CPU and Memory the Process Takes

We certainly need to be on top of any abnormal resource utilization1. Fortunately, almost all modern monitoring systems enable us to see the history — a big plus for troubleshooting.

# Check process resource utilization
top -p $pid

What's the Command Line Starting the Process?

People ask this question when they're required to manage unfamiliar or uncomfortable services. A more urgent case: the stupid service just mysteriously refuses to start. Wrong java opts? File permission issue? The process command line can give us some insight or hints.

# Find out process start command line
cat /proc/$pid/cmdline

What TCP Ports Are Listening by the Process?

Nowdays the majority of service are web-based or micro-services. It helps if we can understand what TCP ports the process is listening.

# Check what ports are serving
lsof -P -n -p $pid | grep -i listen
# Check whether given port is listening
lsof -i tcp:$tcp_port

How Many fd the Process Is Opening

Usually too many fd opening is a bad sign, say over 3000: a bad design makes application is inefficient for handling requests; fd resource leak; too many requests exceeding our expectation.

# Get total fd count opened by pid
lsof -p $pid | wc -l

Advanced Check For Linux Process

Check How Resident Memory is Used by the Process

This is especially important when the process is taking way too much memory. pmap reports memory map of a process2.

# Display detail memory usage
pmap -x $pid

Find Out Process Tree

For mult-threading process, displaying all threads and their starting commands might be helpful. It gives us very good insight.

# Get all threads for a given process
pstree -A -a -p $pid

# keep checking process tree
watch "pstree -A -a -p $pid"

Detect Long TCP Connections and How Long They Have Been Running

Watch out long TCP connections. Daemon service might not only take requests, but also initiate connections. Developers may keep long tcp connections from applications to DB services. When app nodes and DB nodes are disconnected or db instances are restarted, will your process survive from the chaos and behave functionally?

# List TCP connections it starts
lsof -p $pid | grep ESTABLISHED

# Check create/update time for given fd
stat /proc/$pid/fd/$fd_num

# Sample:
# root@s1:~# date
# Fri Sep 23 23:22:22 EDT 2016
#root@s1:# lsof -p 265 |grep ESTABLISHED
# 134u . 47..33 s1:59427->s2:9300 (EST..
# 140u . 47..10 s1:38078->s2:9300 (EST..
# 142u . 47..11 s1:38079->s2:9300 (EST..
# 143u . 47..81 s1:51033->s2:9300 (EST..
# root@s1:~# stat /proc/265/fd/134
#  File: /proc/265/fd/134->socket:[47..
#  Size: 64       Blocks: 0       ..
# Device: 3h/3d Inode: 463..8  Links:..
# Access: (0700/lrwx------)  Uid: (0/..
# Access: 2016-09-23 19:50:12... -0400
# Modify: 2016-09-05 19:48:05... -0400
# Change: 2016-09-05 19:48:05... -0400

How to Detect FD Leak

If application keeps opening files or sockets without gracefully closing them, it's a FD leak issue. Its fd count will keep rising, and eventually, the process will crash. Usually, this happens in problematic error handling logic.

# Get total fd count by pid
lsof -p $pid | wc -l

What Files Are Being Downloading and What Is the Progress Status?

The application might be stuck doing heavy internet request, e.g downloading huge files. To dig out the detail status, we can get the fd, which should be regular file and in write mode. Then keep polling file size to understand where we are.

# Get REG(regular) fd with write mode
lsof -p $pid | grep REG | grep "w "

# Check file size
watch "ls -lth /proc/$pid/fd/$fd_num"

Check Files Are Deleted but Not Gracefully Closed

When files are removed somehow, your process might still hold the stale fd. Or even try to read or write the file. This should be definitely avoided and get developers alerted.

# List unexpected file deletion
lsof -p $pid | grep deleted

More Reading

9 Key Feedbacks For Prod Envs Maintenance.


1 www.tecmint.com/12-top-command-examples-in-linux/

2 www.cyberciti.biz/tips/howto-find-memory-used-by-program.html

devops ,linux ,troubleshoot

Published at DZone with permission of Denny Zhang, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}