Shell Functions with Python

Exploring Pipes#

There is an elegance to Unix style pipes. Designing single purpose programs that can be chained together enables rapid discovery and fast problem solving. For example, I use the ZSH shell on my machine. If I want to know how many different processes there are currently running ZSH I can combine the commands ps, grep, and less.

ps | grep zsh | less

While becoming familiar with pipx I stumbled upon shell-functools. The shell-functools project provides a series of Python programs that are designed to be used as a chain of piped applications.

For example, the filter commands enables reducing a piped list of objects using registered commands.

# Only display files.
ls | filter is_file

I was intrigued by how this module was designed so I decided to reverse engineer it to learn how this was accomplished. This post documents what I learned from that process.

shell-functools Approach#

The overall approach shell-functools takes is:

Define a class to encapsulate the command line tool’s functionality.
If the tool has subcommands:
a. Declare the allowed types for the subcommand’s inputs and outputs.
b. Register the subcommand globally.
Expose the tool for use by creating an application file.

To understand how these three activities are done, let’s see how the filter command is implemented.

1. Encapsulating Functionality#

Each shell tool implements a single command using the command pattern. This is done with inheriance. The Command class is basically an abstract class that defines the structure of a pipe-able command that can be run on its own.

# Abbreviated and Formatted for Clarity
class Command:
  def run(self):
    # Handle command line arguments
    self.parse_args()
    self.partial_application()
    self.initialize()

    # Read from standard input
    for line in self.input_lines():
        value = add_dynamic_type(line)
        self.handle_input(value)
        if self.exit_early:
            break
    self.finalize()

  # Programs must implement.
  def handle_input(self, value):
    raise NotImplementedError

  # Optional
  def add_command_arguments(self, parser):
    return parser

  # Optional
  def parse_additional_command_arguments(self, args):
    pass

  # Optional
  def initialize(self):
    pass

  # Optional
  def finalize(self):
    pass

A program such as Filter just needs to inherit from the Command abstract class and implement the details for that specific program.

class Filter(Command):
  def __init__(self, name="filter"):
    super().__init__(name)
    # Additional setup

  def add_command_arguments(self, parser):
    # add additional command arguements

  def parse_additional_command_arguments(self, args):
    # handle any additional parsing needs.

  def handle_input(self, value):
    # Add the command logic here.

Check out the entire source code for the filter class here.

2. Implementing Function Arguments#

The real power of the shell-functools is in its ability to define function arguments. This enables calling a registered Python function during the commands processing.

# Here the filter command is running the is_file function on
# each item provided by the ls program.
ls | filter is_file
# Outputs only the items that are files.

# calling the is_file with the map command shows the evaluation
ls | map is_file
# outputs True or False for each item.

A function argument is just a Python function that has been flagged as available for use by the various commands. The is_file function is just:

def is_file(inp):
  return os.path.isfile(inp)

However, in order for a function to b available to commands, three things have to happen.

The function must have its input and out types specified.
The function must be registered.
The command must call the function during its run.

Let’s look at each of these.

Defining the Function’s I/O Types#

The shell-functools approach to handling the data types for function arguments inputs and outputs is by using a custom decorator to specify the types.

@typed(T_PATH, T_BOOL)
def is_file(inp):
  return os.path.isfile(inp)

The decorator takes advantage of the fact that funtions in Python are actually objects. Like any object, they can have additional fields and methods associated with them. The typed decorator adds the additional properties type_in, type_out, and inner_argspec. These are used by the framework to enforce the specified types.

To dive deeper into the customer decorator and custom type system see the project’s code.

Registering a Function#

The Command base class dynamically finds a function based on an associated name. Valid functions are registered in a dictionary instance. When a function is declared it is annotated using the custom register decorator.

@register("is_file")
def is_file(inp):
    return os.path.isfile(inp)

The register decorator is used to register the function in question with a module level dictionary of functions. This enables easy lookup of all available functions at run time. Here is the basic pattern.

# Declare the dictionary to hold the functions.
function_list = {}

# Define the decorator. Abbreviated and formatted for clarity.
def register(*names):
  def wrap(fn):
    global function_list
    for n in names:
      function_list[n] = fn
    return wrap

For example, creating and registering a function that prints hello world is as simple as:

@register("say_hi")
def hello_world():
  print("Hello World")

Inspecting the function list shows something like:

print(function_list)
# Outputs: {'say_hi': <function hello_world at 0x109a18790>}

Using the function_list dictionary, a registered function can be provisioned with its registered name.

# Get a pointer to the registered function.
fn = function_list.get("say_hi")

# Invoke the function.
fn()
# Prints: Hello World

Using a decorator to register available functions simplifies the overall code base. The alternative is to manually build up the dictionary in the module. Like so:

def hello_world():
  print("Hello World")

function_list = {}
function_list["say_hi"] = hello_world

For a small number of functions, this isn’t a big deal. However, as the number of functions grow having to explicitly manage the function_list becomes a bit tedious. Using a decorator allows keeping the registration logic with the functions actual declaration.

Using the Function Arguement#

An array of valid function is built in memory. When the command runs the function arguement is used to search for a valid registered command. If found it can be ran against the input provided via STDIN.

# Abbreviated to just focus on handling the function selection.
import argparse
from ft.functions import function_list

class Command:
  def parse_args(self):
    parser = self.get_argument_parser()
    args = parser.parse_args()
    function_name = args.function
    try:
      # Note: Here we see the function name that was specified on the command line
      # Being used to dynamically look up the function to invoke.
      # A pointer to the function is saved on the command instance.
      self.function = function_list[function_name]
    except KeyError:
      panic("Function not found: '{}'".format(function_name))

The identified function is then invoked by the child command instance in the handle_input method.

# Abbreviated
class Filter(Command):
  def handle_input(self, value):
    val_to_test = value
    # other stuff...
    # Here the function is actually invoked.
    result = self.function(val_to_test)

3. Making the Tool Available#

Each Python program has its own dedicated entry file.
Each file:

Does not have an extension. Example: filter. This is keeping in line with the tradition of unix tools like ls and mkdir.
Has a shabang line of #!/usr/bin/env python3 that enables it to be run as a stand alone Python file using Python 3.

Looking at the filter command for example:

#!/usr/bin/env python3
from ft.commands.filter import Filter
Filter().run()

Building My Own Commands#

That is the gist for how shell-functools works. For me the next step is to use these patterns to create my own commandline tools. Perhaps that will be a later post.