How to write a Terminal Multiplexer with Rust, Async, and Actors - Part 2

How to write a Terminal Multiplexer with Rust, Async, and Actors - Part 2

Over the past year, I've been working on a new terminal multiplexer called tab. Tab is a navigator, designed to assist in rapid context switches between persistent, configurable sessions. In Part 1, I explained how I developed the product design and direction for the project. In this post, I'll talk about the technology that made tab possible.

What should it solve?

When I started working on tab, I wanted to solve a problem that software engineers deal with every day. We use the terminal for high-speed knowledge recall (git/grep/fzf/etc), and as a launchpad for other tools (vim/nano/git/github/atom/vscode). We run lots of local servers - likely a JS app and some backend apps and databases. We need to quickly move between repositories, finding information, writing code, watching logs, and pushing PRs.

Even in smaller companies, we need to work on 5-10 repositories. Each repository might need env, PATH entries, a specific shell, etc.

Tab is a terminal multiplexer that helps us deal with this complexity. It organizes a library of persistent sessions, and in just a few keystrokes it provides quick switches to any session. (Today, tab provides minimalist YAML configs, a built-in fuzzy finder for quick switches, and dynamic command-line autocomplete)

Getting started.

I had a sense of this goal (though I couldn't explain it as well as I can now). And I realized that there was an opportunity to build a tool that I would use every day, dozens or hundreds of times. And if it was good, hopefully other people would use it too.

But there was a real obstacle. Terminal multiplexers are large projects. GNU Screen is 39k lines of C. Tmux is about 72k lines of C.

Writing that much code on a side project is not possible. I had to find a way to get creative, and become ultra-productive during my limited spare time. I decided on a strategy: rethink the design, use modern tools, and find the right software architecture.

Rethinking the design

Tmux works by creating a virtual grid. Tmux implements it's own scrollback buffer, scroll mode, select mode, etc because the grid is implemented by the multiplexer. Tmux can translate this virtual grid into screen coordinates, which allow split panes and windows, and lots of cool features.

One of the things I wanted to do differently was support the native scrolling provided by your terminal emulator. Alacritty scrolling is so smooth, and it provides search as well. So I re-thought the basic function of the multiplexer. Take a very light approach to filtering data. Simply forward stdin, and reply with stdout. Try not to rewrite stdout.

The second feature I wanted was performance. To be successful as a tool that helps you frequently context switch, it needs to be fast when doing so. I had to find an architecture that naturally produces very high performance and low latency code.

Rust & Async

Rust is a language that I've been using for a while. I've used it for a few side projects, and I've used it for graphics programming and machine learning.

Rust is quite good at both low-level, performance sensitive tasks. And it's good at high-level tasks as well (composition, abstraction, etc). I knew I'd need great performance and likely C interop, and I needed to be very productive as an engineer. So I started with Rust.

Rust has an interesting feature called async. Async programs in Rust look a lot like Javascript:

async fn database_size(db: &Database) -> usize {
    let table_1_size = db.table('Table1').await.count().await;
    let table_2_size = db.table('Table2').await.count().await;
    table_1_size + table_2_size
}

When this function is called, it can be suspended at any await point. Contacting the database to get a handle to table1 might require network activity, and Rust can suspend and resume the task with almost no overhead, and fairly minimal memory cost (roughly the cost of the variables that span the await point).

What I learned is that it can be used to write extremely low latency services. The TLDR reason is that async Rust programs allow very fine-grained concurrency. You can spawn a large number of tasks that are idle, and execute small chunks of work when awoken by user activity. I'll explain more later on.

Here is a bit of what that async code initially looked like. It transmitted and received over a websocket. And it had some very odd behavior with \r\n that was necessary at the time.

async fn send_loop(
    mut tx: impl Sink<Request, Error = anyhow::Error> + Unpin,
) -> anyhow::Result<()> {
    tx.send(Request::CreateTab(CreateTabMetadata {
        name: "foo".to_string(),
        dimensions: size()?,
    }))
    .await?;

    forward_stdin(tx).await?;

    Ok(())
}

async fn forward_stdin(
    mut tx: impl Sink<Request, Error = anyhow::Error> + Unpin,
) -> anyhow::Result<()> {
    let mut stdin = tokio::io::stdin();
    let mut buffer = vec![0u8; 512];

    while let Ok(read) = stdin.read(buffer.as_mut_slice()).await {
        if read == 0 {
            continue;
        }

        let mut buf = vec![0; read];
        buf.copy_from_slice(&buffer[0..read]);

        let chunk = StdinChunk { data: buf };
        // TODO: use selected tab
        tx.send(Request::Stdin(TabId(0), chunk)).await?;
    }

    Ok(())
}

async fn recv_loop(
    mut rx: impl Stream<Item = impl Future<Output = anyhow::Result<Response>>> + Unpin,
) -> anyhow::Result<()> {
    info!("Waiting on messages...");

    let mut stdout = std::io::stdout();
    let mut stderr = std::io::stderr();
    enable_raw_mode().expect("failed to enable raw mode");

    while let Some(message) = rx.next().await {
        let message = message.await?;

        match message {
            Response::Chunk(tab_id, chunk) => match chunk.channel {
                ChunkType::Stdout => {
                    let mut index = 0;
                    for line in chunk.data.split(|e| *e == b'\n') {
                        stdout.write(line)?;

                        index += line.len();
                        if index < chunk.data.len() {
                            let next = chunk.data[index];

                            if next == b'\n' {
                                stdout.write("\r\n".as_bytes())?;
                                index += 1;
                            }
                        }
                    }

                    stdout.flush()?;
                }
                ChunkType::Stderr => {
                    stderr.write_all(chunk.data.as_slice())?;
                }
            },
            Response::TabUpdate(tab) => {}
            Response::TabList(tabs) => {}
        }
    }

    Ok(())
}

Actors & Messages

Async allows a lot of concurrency. But not everything can be concurrent. Object-oriented concurrent applications are full of locks and transactions for a reason: concurrency bugs are brutal. They take a lot of time to fix, and they are hard to test/verify. So how can you write an extremely concurrent application with as few of these concurrency bugs as possible?

One of the best approaches is the Actor model. The idea is to represent state in an Actor type, which sequentially processes a series of messages. For each message, the actor can mutate it's own state, spawn other actors, or send messages to other actors.

When you use this pattern, you can express lock-free synchronization using your actor implementations. You decide what state is held in each actor, and what message types it can process.

Usually, actor frameworks come with some boilerplate. You need to implement Actor, Handler, System, etc. I found a way to remove a lot of this, and solve some tough dependency injection problems as well (see lifeline-rs if you are curious). The trick is that you can 'connect' actors when the app initializes, using async channels. Then you spawn async tasks which send and receive messages. Actor state can be kept as a local variable within the async task.

Here is an example from more recent tab-rs code. This actor monitors the current 'selected tab' state, as the tab client can connect and disconnect from sessions during it's lifetime. It uses the current websocket connection with the daemon to subscribe and unsubscribe from stdout packets. bus is a lifeline bus, which connects this task to other ones.

let mut rx = bus.rx::<TabState>()?;
let mut tx_websocket = bus.tx::<Request>()?;

Self::try_task("watch_tab", async move {
    let mut last_state = TabState::None;

    while let Some(state) = rx.recv().await {
        if let TabState::Selected(id) = state {
            tx_websocket.send(Request::Subscribe(id)).await?;

            let terminal_size = terminal_size()?;
            tx_websocket
                .send(Request::ResizeTab(id, terminal_size))
                .await?;
        } else if let (TabState::Selected(prev_id), &TabState::None) =
            (last_state, &state)
        {
            debug!(
                "new state is none, unsubscribing from previous tab {}",
                prev_id
            );
            tx_websocket.send(Request::Unsubscribe(prev_id)).await?;
        }

        last_state = state;
    }

    Ok(())
})

Refactoring Actors

I was a bit surprised by how nice it is to refactor actor implementations. First, you can try to reduce the dependence on local state in the message handlers. If you can remove the dependence entirely, that handler can be moved out to it's own actor implementation.

Second, you can try to break up the local state into chunks. Then, split the actor into pieces which maintain each chunk. Hopefully the complexity of the messages will be reduced as well.

You can do this for actors which are large, or you can do it for actors that are slow. It's the same process. (Here is an example of this kind of refactoring: before and after)

This really works in practice. The slowest action in tab is opening a fresh session - launching the command-line client, the daemon, and the backend PTY process. It takes about 125ms. This didn't require a massive effort, just a few hours of refactoring actors that were on the hot path.

Github Actions & Testing

Early on, I set up Github Actions to automatically run the tests, format checks, lints, etc. It was worth it.

When you work on a side project, you often have non-contiguous time to spend on the project. A few minutes when people reach out on github, and maybe an hour or two when you are writing code on the weekend.

Automated tests and CI checks allow you to step away from the project and go to work, or eat dinner. You can come back later, self-review your code, see that the checks are good, and merge your changes.

The config is here if you are looking at setting up Rust CI checks.

Actors, Async, Tokio, and Rust

Tab is based on these technologies. Actors for concurrency and synchronization, Tokio & async for performance, and Rust as the great language that powers it all.

Usually, applications that have to maintain lots of local state have many issues (I like to joke that state is the natural predator of the application developer :D). But I don't see these kinds of bugs in tab. All the bugs have been environmental. Installation issues (so many), ANSI escape sequence issues, shell autocomplete issues, etc. It seems that the actor model allows simple state transitions that are very reliable when put into production.

It also is effective for a broad range of tasks. It works for terminal UI code in the fuzzy finder, and it works for state management code in the daemon.

As with everything in software, there is a trade-off. Concurrency bugs are less common, but when they occur they are still brutal. You can be tempted to use all that concurrency power, but your application really does need synchronization. It requires thought, and balance.

What's next?

My current focus is on community support and bug-fixes. I added a lot of features in v0.5.0, and I'm working on making sure the current functionality is stable and works on a wide range of operating systems, shells, TUI applications, etc before moving to more feature work.

I also want to send a big thank you to everyone who took the time to install tab, and helped make it better with code contributions, bug reports, and comments on new features! All these contributions really have helped to push the project forward.