VikingVision

VikingVision is a high-performance vision processing library for FRC teams, built in Rust for speed and reliability.

Why VikingVision?

FRC vision processing needs to be fast, reliable, and run on resource-constrained hardware. VikingVision provides:

Performance: Parallel pipeline processing with minimal overhead
Ease of use: Configure pipelines in TOML (and soon a GUI!) without writing code
AprilTag support: Built-in detection for FRC game pieces
NetworkTables integration: Easy communication with robot code

Also, this is done in Rust, so it’s 🔥blazingly fast🔥 or whatever.

Using it from Python leads to performance bottlenecks, and unless you’re working with the latest Python version, has almost no capability for parallel processing. In any language, writing and maintaining complex programs for requirements that change every year is also a significant overhead, especially for teams without many programmers. By using VikingVision, that work is gone, and only a small configuration file needs to be maintained.

OpenCV’s spotty documentation and lack of safety (it caused a memory error in a Java program once) were enough for me (the person writing this documentation, hi!!!) to want to avoid it in favor of lower-level system bindings and reimplementations of the needed algorithms.

Why not Limelight?

It’s proprietary and expensive.

Why not PhotonVision?

PhotonVision has way more features, but VikingVision’s smaller feature set is suitable for a lot of use cases, and it should run faster for those. The total binary size of all of the artifacts (x64 linux, stripped, release build) comes in at about 42 MB, compared to PhotonVision’s 102MB JAR. Also having our own vision system makes us look cool to the judges.

What can you do with it?

Detect AprilTags for autonomous alignment
Build vision pipelines that can track game pieces by color and estimate their positions
Run multiple camera feeds simultaneously

Example

Want to detect all of the AprilTags that a camera can see and publish it to NetworkTables? A basic config looks like this:

[ntable]
team = 4121
identity = "vv-client"

[camera.tag-cam]
type = "v4l"
path = "/dev/video0"
width = 640
height = 480
fourcc = "YUYV"
fps = 30
outputs = ["fps", "tags"]

[component.fps]
type = "fps"

[component.tags]
type = "apriltag"
family = "tag36h11"

[component.tags-unpack]
type = "unpack"
input = "tags"

[component.nt-fps]
type = "ntable"
prefix = "%N/fps"
input.pretty = "fps.pretty"
input.fps = "fps.fps"
input.min = "fps.min"
input.max = "fps.max"

[component.nt-tags]
type = "ntable"
prefix = "%N/tags"
input.found = "tags.found"
input.ids = "tags-unpack.id"

Then running it is as simple as vv-cli config.toml.

Installation

Right now, the only option for installation is to build from source. This requires Rust to be installed (which is typically done using rustup), along with a C compiler (GCC or Clang on Linux, one of which should already be installed by default) to build with apriltag support.

Quick installation

With all of the prerequisites installed, you can run cargo install --git https://github.com/FRC-4121/VikingVision <bins>, where <bins> is the binary targets to install, passed as separate arguments. For example, to install just the CLI and playground, you’d run cargo install --git https://github.com/FRC-4121/VikingVision vv-cli vv-pg. The available targets are:

vv-cli - The command-line interface for the library, which handles loading and running pipelines. It doesn’t have any fancy features, but can run without a desktop environment and only using the minimal resources.
vv-gui - Currently a stub that just prints, “Hello, World!”. Development is ongoing, and hopefully it’ll be usable by March 2026.
vv-pg - A “playground” environment, made for testing some basic image processing tools. It’s not super fancy, but it aims to at least be an alternative to messing around with OpenCV in Python that uses our libraries instead.

Building from source

If you want more control over the build, or want to more easily maintain an up-to-date nightly version, you can clone the repository with git clone --recursive https://github.com/FRC-4121/VikingVision. The source for the apriltag code is in a Git submodule, so you have to do a recursive clone! From there, you can use Cargo to build and run. It’s strongly recommended that you use release builds for production, they run about ten times faster!

Features

Various parts of this project can be conditionally enabled or disabled. The default features can be disabled by passing the --no-default-features flag to the Cargo commands, and then re-enabled using --features and then a comma-separated list of features. For example, to build on Windows without V4L, you could append --no-default-features --features apriltag,ntable to your commands.

apriltag (enabled by default) - Allow detecting AprilTags. This requires a C compiler. See apriltag-sys’s README for more information on how to build.
v4l (enabled by default) - Allow video capture through V4L2 APIs. This is the only kind of camera that’s supported, and only works on Linux (V4L stands for “video for Linux”, after all). To build on other systems, this feature must be disabled.
ntable (enabled by default) - Enable a NT 4.1 client. This is implemented in Rust, but pulls in a lot of dependencies through its use of async that it makes sense to have it be conditionally enabled.
debug-gui - Enable window creation for debugging images. This pulls in winit, which is pretty large dependency for window creation.
debug-tools - Right now, only deadlock detection. This severely impacts performance, and is only meant to be enabled for debugging deadlocks in the code.

Running Pipelines

Pipelines are the most powerful feature of VikingVision, and they enable easily composable and parallel processing through something similar to the actor model. To run a pipeline, create a pipeline config file (more information about that can be found in the configuration section) and run it with vv-cli path/to/pipeline/config.toml.

Filtering cameras

Whether for testing, different setups, or even using the same configuration across multiple processes, it can be useful to define multiple cameras without intending to use them. They can all be defined in one file, and the --filter flag can match all cameras matching a given regular expression.

Logging

VikingVision uses tracing to emit structured logs. Events happen within spans, which give additional context as to the state of the program as an event was happening. All of this information is provided in the log files, which can be opened as plain text.

Logging to a file

By default, logs are sent to the standard error stream. In addition to this, they can be saved to an output file, passed as a second argument to the vv-cli command. This argument supports the percent-escape sequences that strftime uses, so you can pass logs/%Y%m%d_%H%M%S.log as the second parameter to have a log file created with the current time and date.

Filtering logs

Logs can be filtered with the VV_LOG environment variable. The variable is parsed as a comma-separated sequence of directives, with a directive either being of pattern=level to match target locations against a regular expression, or just a level to set a default. When unset, the logs default to only allowing info-level and above logs through. When reporting bugs, please upload a log with debug level so we can see all of the information!

Using the Playground

The playground serves as a “test area” for image processing tools. It’s not as powerful as the full Rust API or even the pipeline system, but it should expose all of the basic image processing steps that can be done through an easy-to-use UI.

the playground in action

Available Cameras

This shows a live-updating view of V4L cameras as they update. Adding a camera will create a new floating window for it.

Utilities

In addition to showing actual cameras, some additional utilities are available:

Text Buffers

Text buffers act as small scratchpads for taking notes directly in the app. They’re autosaved, too!

Monochrome Cameras

Probably the fastest way to test the performance of various processing steps is a still image of a single color. You can select a color and resize the image freely. This frame is treated like a camera.

Static Images

While monochrome cameras are useful for simple performance testing, using a static image is useful because it can show the actual effects of a process on an image. It’s also useful because it doesn’t require a physical camera to be plugged in and has easily reproducible results. Currently, only JPEG and PNG images are supported, although more could be supported in the future if the need arises.

Camera Controls

Each camera runs on its own thread, which allows them to run more-or-less independently of each other and show the performance of each process (although many image processing steps use a shared thread pool, so performance will fall somewhat if multiple cameras are used). This thread can be independently paused and resumed, and the camera can be closed altogether.

In addition, a framerate counter is shown. This shows the minimum, maximum, and average framerate over the last ten seconds.

Derived Frames

A derived frame takes a previous frame and applies some basic transformation to it. Multiple steps can be performed, and the time required to do all of them is shown in the framerate counter. Note that this is different from the behavior of the pipeline runner, in which processing is done asynchronously to further improve performance.

Configuration Files

Configuration files describe the overall structure of your program—what cameras should be loaded, what components to use, and how they connect to each other. The rest of the documentation details everything that can go in the file, along with what fields it’s looking for.

In order for a configuration to be useful, there should be at least one camera and at least one component. The cameras are the inputs to the program, and components handle processing and presentation.

In addition to cameras and components, configuration files can have run parameters, NetworkTables, and vision debugging configured. More information about each of this is available in their respective sections.

Run Configuration

Run configuration is optional, and goes under the [run] table in the config. It controls thread counts and run-related parameters.

`run.max_running`

This is the maximum number of concurrently running pipelines. If a new frame comes in while this many frames are already being processed, the new frame will be dropped.

`run.num_threads`

This controls the number of threads to be used in the thread pool. It can be overridden by passing --threads N to the CLI. If neither of these is set, rayon searches for the RAYON_NUM_THREADS environment variable, and then the number of logical CPUs.

Vision Debugging

It’s fairly common to just want to see the result of some vision processing, but normally, actually seeing the results of it live would be a lot of work. You’d need to set up an event loop, create windows, handle marshalling it to your main thread because everything needs to run on the main thread, and that’s a lot of work just to show a simple window with an intermediate result, isn’t it?

To solve this problem, there’s a vision debugging tool, which has thread-safe, cheaply cloneable senders, and a receiver that sits on the main thread and blocks it for the rest of the program. From the dataflow side, it’s even simpler: you just dump the frames as input to a component with the vision-debug type. Depending on configuration, this can display the windows or save them to a file.

Configuration file

To configure this through the file, the [debug] table can be used. It has the following keys:

`debug.mode`

Debug modes can be overridden through components, but they’re optional. A global default can be set either here or through the environment variables, and if none is present, debugging is ignored.

`debug.default_path`

The default path to save videos to. This supports all of the strftime escapes, along with %i for a unique ID (32 hex characters) and %N for a pretty, human-readable name. The video will be saved as an MP4 video.

`debug.default_title`

The default window title for use when showing windows. This supports %i and %N like the default_path does.

Environment variables

The configuration file takes precedent over the environment variables, but the variables can be more convenient.

`VV_DEBUG_MODE`

Equivalent to the debug.mode configuration value, but also accepts uppercase values.

`VV_DEBUG_SAVE_PATH`

Equivalent to the debug.default_path configuration value.

`VV_DEBUG_WINDOW_TITLE`

Equivalent to the debug.default_title configuration value.

NetworkTables Configuration

In order to connect to NetworkTables, we need an address and an identity. The identity can be any URL-safe string, and the address should be the server address. Configuration goes under the [ntable] table. This table is required to be present in order to publish values from components.

`ntable.identity`

This should be unique per client connecting to the server, and a URL-safe string.

`ntable.host`

The host can be explicitly specified through this field, in which case the client will try to connect to a server on this port. It’s an error to have both this field and ntable.team present.

`ntable.team`

Alternatively, a team number can be set, in which case the client will try to connect to 10.TE.AM.1, where TEAM is the team number. It’s an error to have both this field and ntable.host present.

Camera Overview

Cameras are the entry point for a program. When a configuration is run with the CLI, each camera gets its own thread to read from (which is crucial for performance since reading from a camera is a blocking operation). For each camera, a new pipeline run will be created from its frame for each of the components specified in the camera’s outputs (note that this is not ideal behavior and subject to change). This is prone to issues like more expensive pipelines being starved and components unexpectedly not running, so it will be fixed soon.

In the config file

Cameras are created as tables, as their name under the [camera] table. Each camera has a type field that specifies which kind of camera it is, and an outputs field that should be an array of strings, to specify which components they should send their frames to.

A camera config might look like this:

[camera.front] # we call this our front camera
type = "v4l" # we want to use the V4L2 backend
outputs = ["detect-tags"] # send out the frame to a component called detect-tags
width = 640 # V4L2-specific options
height = 320
fourcc = "YUYV"
path = "/dev/video0"

Additional camera features

In addition to the basic raw frame we can get from the backend, the cameras support some additional quality-of-life features.

Reloading and Retrying

If reading from a frame fails, the camera tries reloading the camera and then tries again. It does so with exponential backoff, so if the camera’s been genuinely lost, it doesn’t waste time retrying to connect. By reloading the camera, we can recover from a loose USB connection or dropped packets instead of losing the camera altogether.

FPS throttling

Especially with the static cameras, the camera can send input significantly faster than it can be processed. FPS throttling sleeps if the real framerate exceeds the configured one, which frees up CPU time for other, more important things.

A maximum framerate can be set with the max_fps field for a camera.

Resizing

If the frame size isn’t desirable, it can be resized through the camera config itself. The resize.width and resize.height keys allow a new size to be set for the camera. Resizing uses nearest-neighbor scaling for simplicity.

V4L2 Cameras

Requires the v4l feature to be enabled

V4L2 is the primary way that we read from physical cameras. It allows for cameras to be read by a path or index, where an index N corresponds to /dev/videoN. Note that due to how V4L2 works, typically, only even-numbered indices are actually readable as cameras, and odd-numbered ones should not be used.

Keyboard shortcuts

VikingVision Docs