This is a bit of a tricky one … I am writing a test suite for a Python project – running on Linux. As part of the tests, I have to fire up various commands (shell commands, small CLI tools, Perl scripts, etc). Some of those commands can hang or crash or loop forever and must be killed after a timeout. A subset of those commands can also start an arbitrary number of sub-processes of their own, which must also be killed after the timeout (keywords: process groups and setsid
). While the test suite runs with normal user privileges, some of the commands require super user privileges, i.e. sudo
. If I have to use sudo
, I sometimes need to preserve the original user’s virtual environment, i.e. make it accessible for the super user.
Sometimes, I push my tests to a CI server, where sudo
is not restricted / protected, but sometimes, I also want to test it on my local OS where sudo
is password-protected. My test suite must therefore allow me to enter my super user password manually if required in a local test.
My (mostly working) solution is a routine named run_command
(plus two helper routines named __kill_proc__
and __get_pid__
). So far, it looks like this:
import os import signal import subprocess import psutil # get it with "pip install psutil" def run_command(cmd_list, return_output = False, sudo = False, sudo_env = False, timeout = None, setsid = False): cmd_prefix = [] if sudo: cmd_prefix.append('sudo') if setsid: cmd_prefix.append('-b') # equivalent to "setsid", will spawn new process group including sudo if sudo_env: # preserve the user's virtual env for super user cmd_prefix.append('env') cmd_prefix.append('%s=%s' % ('VIRTUAL_ENV', os.environ['VIRTUAL_ENV'])) cmd_prefix.append('%s=%s:%s' % ('PATH', os.path.join(os.environ['VIRTUAL_ENV'], 'bin'), os.environ['PATH'])) elif setsid: cmd_prefix.append('setsid') # TODO untested codepath full_cmd = cmd_prefix + cmd_list proc = subprocess.Popen( full_cmd, stdout = subprocess.PIPE, stderr = subprocess.PIPE ) timeout_alert = '' if timeout is not None: try: outs, errs = proc.communicate(timeout = timeout) # If command does not time out, this works just fine except subprocess.TimeoutExpired: timeout_alert = '\n\nCOMMAND TIMED OUT AND WAS KILLED!' if setsid: kill_pid = __get_pid__(full_cmd) # proc.pid will deliver wrong pid! else: kill_pid = proc.pid __kill_proc__(kill_pid, k_signal = signal.SIGINT, entire_group = setsid, sudo = sudo) outs, errs = (proc.stdout.read(), proc.stderr.read()) # PROBLEM: outs and errs are *always* empty! Buffer issue? # outs, errs = proc.communicate() # Makes no difference here, outs and errs are always empty this way, too. else: outs, errs = proc.communicate() if return_output: return (not bool(proc.returncode), outs.decode('utf-8'), errs.decode('utf-8') + timeout_alert) return not bool(proc.returncode) def __kill_proc__(pid, k_signal = signal.SIGINT, entire_group = False, sudo = False): if not sudo: if entire_group: os.killpg(os.getpgid(pid), k_signal) else: os.kill(pid, k_signal) else: if entire_group: run_command(['kill', '-%d' % k_signal, '--', '-%d' % os.getpgid(pid)], sudo = sudo) else: run_command(['kill', '-%d' % k_signal, '%d' % pid], sudo = sudo) def __get_pid__(cmd_line_list): for pid in psutil.pids(): proc = psutil.Process(pid) if cmd_line_list == proc.cmdline(): return proc.pid raise # TODO some error ...
The most troubling bit for me is when I have to start a new process group with sudo
and it times out (I have discussed the current design of how I start a process group with sudo
here on SE Unix & Linux). It works, but it has a fundamental problem: If the sub-process (-group) times out and I kill it, I loose ALL output from this process group and I can not figure out what is going on. I commented the lines in question.
I understand that the above code is a really ugly collection of bad hacks … I am seriously interested whether this is a good approach or not, how I could possibly improve it or how I should re-implement it.
(I am wondering whether it is better to run the test suite altogether with super user privileges and reduce the privilege level for individual routines or sub-processes, which could cause a whole new set of problems.)