Creating an Interactive Terminal

A Year’s Work#

Every year I have a hobby project that consumes my spare time. This year I’ve spent my extra hours on building the Agent’s Playground. The Playground is a lightweight, 2D simulation engine. You can call it a game engine but it’s not intended for building video games. Rather, it’s a sandbox designed to enable rapidly prototyping autonomous agent simulations. It’s a kit that could enable you to build a watered down Sims or Dwarf Fortress.

Over the course of the year the Playground has slowly taken shape as I’ve added:

2D graphics
A* Path Finding Navigation
TOML based scene files
Fixed Frame Rate Animation
Clickable Agent Interaction
UI Inspection Tools
Real-time Simulations

For my last post of the year, I want to share how I built the latest Playground features; an interactive terminal and embedded scripting language.

Why a Terminal?#

In game development it is common for developers to build themselves a terminal inside game engines to enable hacking on the game while it’s running. It is useful to be able to inspect state or make changes on the fly.

In my regular development I tend to do everything via a terminal. I want to be able to do the same thing when I’m developing a simulation.

What does Success Look Like?#

Before I set out to code my embedded terminal I identified the characteristics I want in the terminal.

Simulation Context: It’s not an OS level terminal. The context for the terminal is a running simulation. So I don’t need to worry about integrating with the file system or enable running OS commands.
REPL: I want the terminal to behave like a REPL.
Multiline: The REPL behavior should support lightweight scripting. These scripts can span multiple lines.
Text Editing: I want basic text editing capabilities. The arrow keys should move the cursor around.
C-Style Scripting: For scripting I want a dynamically typed lightweight language that enables branching, looping, and declaring functions.
Some Convenience: Clicking the up arrow should let me select and run previous commands.

Terminal Challenges#

A challenge of writing good software is determining how good is good enough. Creating the terminal could easily become a rabbit hole that surpasses the rest of the project. To avoid this, I made the decision early on that a naive implementation would be the way to go. I’m not going to try to create a highly performant terminal that can handle huge files or adhere to a standard like POSIX.

Terminal Class Diagram — **Naive implementation of an Interactive Terminal** *(Click to Enlarge)*

The class structure represented in the image above is a straightforward way to create a bare bones terminal. The primary responsibilities of handling user input, in-memory storage, rendering, processing text and interpreting the user prompts are all separated into their own respective classes.

The AgentTerminal class is the top level class. It enables embedding a terminal in a simulation. It’s role in life is to glue the other pieces together.

The TerminalDisplay class handles all rendering. Rendering is done by making multiple graphical calls with the DearPyGUI API to the graphics pipeline.

CommandLinePrompt and TerminalBuffer both handle aspects of the user input. The CommandLinePrompt is narrowly focused on translating user activity to meaningful action. When a user types a key on their keyboard the operating system fires an event with a keycode. CommandLinePrompt itself delegates the responsibility of interpreting how to handle the various key combinations to the KeyInterpreter class.

The KeyInterpreter class is the only class related to the Terminal that I’ve refactored beyond its initial naive implementation. This is due to the variety of possible key combinations and keyboard configurations. For example, just looking at Macs, a computer keyboard can have between 78 and 109 keys. Users may issue composite keyboard commands by combining a key with shift, control, alt, or by having the caps lock on.

My first attempt at coding a the KeyInterpreter was just a series of Python match statements. This works but is a sprawling mess. I refactored it to leverage the Chain of Responsibility pattern.

This pattern is a bit of mouthful but it’s really simple to build. It starts by
creating an abstract class to define a generic handler for dealing with keyboard events.

My handler has a match function that determines if the handler is up for processing a specific key code and a handle function that does the actual work.

from abc import ABC, abstractmethod

# Create a type alias for KeyCode for readability.
KeyCode = int

class KeyHandler(ABC):
  @abstractmethod
  def match(self, key_code: KeyCode) -> bool:
    """Determine if the handler is a match for the key code."""

  @abstractmethod
  def handle(self, key_code: KeyCode) -> str | None:
    """Processes the key code and returns the corresponding string value."""

Following the pattern, we then deconstruct all the different typing use cases into their own discrete handler class. Here is the handler class for dealing with the space bar.

from agents_playground.terminal.keyboard.key_handler import KeyHandler
from agents_playground.terminal.keyboard.types import KeyCode

class SpaceKeyHandler(KeyHandler):
  def __init__(self) -> None:
    super().__init__()

  def match(self, key_code: KeyCode) -> bool:
    """Determine if the handler is a match for the key code."""
    return key_code == 32

  def handle(self, key_code: KeyCode) -> str | None:
    """Processes the key code and returns the corresponding string value."""
    return ' '

When a keyboard event is fired by the OS, the KeyInterpreter just needs to loop through each of the handlers and run their individual match function. The first handler that evaluates to true is used to process the event.

from more_itertools import first_true

class KeyInterpreter:
  def __init__(self) -> None:
    # Initialize all the key handlers.
    self.key_handlers = [
      SpaceKeyHandler(),
      ArrowKeyHandler(),
      ControlFlowKeyHandler(),
      AlphaKeyHandler(),
      NumericKeyHandler(),
      SymbolKeyHandler()
    ]
    # Reserve a special handler for when none of the other handlers are a match.
    self._unknown_key_handler = UnknownKeyHandler()

  def key_to_char(self, key_code: KeyCode) -> str | None:
    """Find the appropriate action to take based on the provided key code."""
    handler = first_true(self.key_handlers,
      default=self._unknown_key_handler,
      pred=lambda h: h.match(key_code))
    return handler.handle(key_code)

This approach is a tad slower than just having a giant match statement but it’s not noticeable when typing and is easier to extend and maintain. When I eventually port the Agent Playground to Windows I can imagine organizing handlers into OS specific groups. Something like the below snippet.

self.key_handlers = [
  WindowsKeyHandler(),
  MacOSKeyHandlers()
]

The KeyInterpreter informs the CommandLinePrompt what it thinks the user intended with the keyboard input signals. The CommandLinePrompt then determines what action the terminal should take. Most actions result in a manipulation of the in-memory representation of what is active in the terminal. This is managed by the TerminalBuffer.

The TerminalBuffer is the heart of the terminal. It’s the data structure that is responsible for maintaining both the active set of text the user is working on and a history of all the commands that have been run.

# Methods not shown for brevity.
class TerminalBuffer():
  """Displays the previous commands and their output."""
  def __init__(self) -> None:
    self._scroll_back_buffer: Deque[TerminalBufferContent]  = deque([], maxlen=SCROLL_BACK_BUFFER_MAX_LENGTH)
    self._history_buffer: Deque[TerminalBufferContent]      = deque([], maxlen=HISTORY_BUFFER_MAX_LENGTH)
    """
    The active prompt is the text the user is actively working with. To support
    multiple lines, a list of strings is use. In which each item in the list
    represents a line of text in the terminal.
    """
    self._active_prompt: List[str]    = ['']
    self._cursor_horizontal_position  = Counter(start = 0, min_value = 0, increment_step=1,decrement_step=1)
    self._cursor_vertical_position    = Counter(start = 0, min_value = 0, increment_step=1,decrement_step=1)

The way the buffer works is simple. When a user is working on a script it is stored in the active_prompt field. Each line of text becomes a string in the list. Two counters are used to track where in the list of text the active prompt is.

When scripts are run their output is written to the scroll_back_buffer field. All commands that are executed are written to the history_buffer field to enable easily re-running previous commands. The Python deque data structure is used for both buffers to enable rolling off old commands.

The simplistic design of the TerminalBuffer powered the rapid creation of an interactive terminal but also severally limits what can be done with it. If I decide to expand the terminal I will need to migrate to a more robust data structure.

Scripting Pipeline#

Once I had a minimally viable terminal working I switched gears and started building a simplistic scripting language. My goals for this language is that it should have a similar syntax to C, enable defining reusable functions, provide lexical scoping, and have native functions.

I considered just embedding an existing language like Lua. But that seemed a bit excessive. I want to be able to rapidly interact with a running simulation. Not script out complex behavior. Also, I’m trying to keep the 3rd party dependencies to an absolute minimum.

There are different ways to build a language. After a round of research I decided to forego using any kind of language building kit such as Antlr or TextX and write an interpreter by hand.

Now, that sounds rather intimidating, but it’s really not. I think folks tend to get rather mystical when they talk about writing compilers, interpreters, and virtual machines but the truth is code is code. It doesn’t matter if you’re writing a React component, a compiler, or an operating system. It’s all just code.

The same rules apply. You get familiar with your problem domain. You research the existing solutions. You select a strategy, build, and then evaluate.

While I was doing my initial research Martin Fowler’s Domain-specific Languages book was really helpful. It provides a good overview of inherent challenges and the existing solutions.

The Challenge of creating an embedded language can be simplified by breaking it down into a series of smaller problems.

Raw text needs to be converted into meaningful metadata. This called scanning or lexing. A lexer is responsible for transforming a string of characters and converting it into a list of tokens.
Once we have a list of tokens, the common strategy is to organize the tokens into a tree data structure called an abstract syntax tree (AST). The AST can then be traversed to analyze, optimize, and run the code. A parser is responsible for building the AST structure in memory. There are a variety of parsing strategies out there. For my needs I choose to leverage the recursive descent pattern.
Once the AST is in memory an interpreter can walk the tree and evaluate the code. The classic visitor pattern makes this rather straightforward to implement.

Putting the 3 major steps together results in a scripting pipeline.

Scripting Pipeline Diagram — **Scripting Pipeline** *(Click to Enlarge)*

Once I landed on the strategy of using a recursive descent parser I stumbled upon Robert Nystrom’s excellent Crafting Interpreters book. Nystrom does an excellent job walking through the ins and outs of creating an interpreter using this technique. The book is great and saved me a lot of time.

Roadmap#

And just like that, I’ve got an embedded terminal and scripting language in the Agent’s Playground. If you’re interested in more of the details of how I built these features you can checkout the source code.

Going forward I intend to expand the scripting capabilities and the terminal as I need to. I’ll probably add syntactic sugar and more editing capabilities. Structurally, I’m interested in replacing the Terminal Buffer with a piece table and replacing the interpreter with a PEG parser. But those are all problems for future Sam.

Until next time…