Back to Blog

The Question

A user connected to our DGX server was running an NVIDIA Holoscan application that received high-resolution video from an Altera FPGA over the network. The question was deceptively simple: what is the actual image data format being transmitted and processed?

This isn't something you can find with a web search. The answer is buried in the application's source code — scattered across configuration files, operator definitions, and memory pool setups. This is exactly the kind of task where an AI agent shines.

The Mission

QevosAgent was tasked with:

  1. SSH into the DGX server (172.24.217.40) remotely
  2. Locate the Holoscan project codebase in the workspace
  3. Trace the image data format through every stage of the pipeline
  4. Produce a structured technical report

No human intervention was needed between the initial question and the final report.

The Pipeline Discovered

After analyzing the source code, QevosAgent mapped out the complete data flow:

Stage 1: Sensor Output

The sensor outputs raw Bayer data — each pixel captures only one color channel through a color filter array. The RGGB pattern means red, green, blue, and green filters are arranged in a repeating 2×2 pattern.

Stage 2: Network Transmission

The raw sensor data is packaged into CSI-2 (Camera Serial Interface 2) data packets and transmitted over the network using Hololink (based on InfiniBand/RDMA):

Frame Structure:
├── Frame Start (8 bytes)
├── Embedded Data Line (1 line)
│   ├── RAW12 Header (4 bytes)
│   ├── Embedded Data
│   └── Footer (4 bytes)
├── Active Pixel Data (per line)
│   ├── RAW12 Header (4 bytes)
│   ├── Pixel Data (RAW_10 format)
│   └── Footer (4 bytes)
└── Frame End (8 bytes)

Stage 3: CSI-to-Bayer Conversion

On the DGX receiver side, the CsiToBayerOp operator:

  1. Parses the CSI-2 data packets
  2. Extracts valid pixel data
  3. Converts to 16-bit Bayer RAW format
  4. Stores in GPU memory (BlockMemoryPool with double buffering)

The 10-bit data is expanded to 16-bit to preserve precision for subsequent processing.

Stage 4: Image Processing

The ImageProcessorOp performs:

Stage 5: Demosaic

The BayerDemosaicOp converts the single-channel Bayer RAW into full color:

Stage 6: Visualization

The HolovizOp performs sRGB color space conversion and outputs to the display frame buffer.

Data Format Summary

Stage Data Format Bit Depth Description
Sensor Output Bayer RAW (RGGB) 10-bit RAW_10 format
CSI-2 Transmission CSI-2 Packet 10-bit With Header/Footer
After CSI-to-Bayer Bayer RAW (RGGB) 16-bit Expanded to 16-bit
After Image Processing Bayer RAW (RGGB) 16-bit Black level corrected
After Demosaic RGBA 16-bit/channel 4 channels
Display RGBA 16-bit/channel sRGB color space

Why This Matters

This analysis demonstrates several capabilities of QevosAgent:

  1. Remote Codebase Exploration: The agent SSH'd into a remote server, navigated the directory structure, and identified relevant files without any prior knowledge of the project layout.

  2. Cross-File Dependency Tracing: The image format information was scattered across multiple files — sensor configuration (agx5_imx678_mode.py), converter setup (agx5_imx678.py), and operator definitions. The agent connected these dots automatically.

  3. Structured Technical Output: Instead of a wall of text, the agent produced a well-organized report with clear stages, code references, and a summary table — ready for engineering review.

  4. Zero Human Intervention: From the initial question to the final report, the entire process was autonomous. The agent handled SSH authentication, file navigation, code analysis, and report generation in a single continuous run.

The Bigger Picture

Tasks like this — understanding how data flows through a complex system by reading source code — are common in engineering but tedious for humans. An AI agent can perform this analysis in minutes, cross-referencing dozens of files and producing a coherent narrative that a human engineer would need hours to assemble.

This is not about replacing engineers. It's about giving them a powerful assistant that handles the "reading and connecting" work, so they can focus on the decisions that actually matter.