Real-time graph neural network for Belle II combines FPGA and AI Engines

June 1st, 2026

A demonstrator developed jointly by a KIT student team from particle physics (ETP) and electrical engineering (ITIV) has been selected as a finalist in the Reconfigurable Computing Challenge (RCC 2026), held at the 34th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM) in Atlanta, US, from 13-16 May 2026. Team lead Marc Neu presented the work during a Demo Night as part of the AMD-sponsored competition. In total, 7 out of 30 teams had been selected as finalists to present their work in person

There is growing interest in using graph neural networks in trigger systems for collider experiments, but the latency and throughput constraints of these systems make deployment on embedded platforms challenging. As detectors move toward higher granularity, the number of inputs per inference grows, and FPGA-only implementations run into resource bottlenecks. The team built an end-to-end demonstrator for real-time deployment of a dynamic graph neural network for the Belle II electromagnetic calorimeter hardware trigger. The design targets the AMD Versal VCK190 and uses both the FPGA fabric and the AI Engine tiles. This is the first graph neural network accelerator for particle physics applications, that explores the design space of field programmable gate arrays (FPGAs) and corse-grained reconfigurable gate arrays (CGRAs) on a single system on chip. The design space exploration is enabled by a Python-based, semi-automated design flow covering operator fusion, partitioning, mapping, spatial parallelization, and kernel-level optimization. On the VCK190, a commercially available system on chip platform, the design reaches a throughput of 2.94 million events per second at an end-to-end latency of 7.15 microseconds. Against the FPGA-only baseline, that is a 53% throughput improvement while DSP utilization drops from 99% to 19%, at 29% AI Engine tile utilization. Shifting compute intensive operations onto the AI Engine partition enables higher parallelization and thus higher throughput of the complete hardware accelerator design. To validate the deployment, the team built an interactive visualization pipeline that monitors inference results on the 3D display in real time. The RCC 2026 challenge invited self-defined projects on FPGA, AI Engine, and NPU architectures, judged on technical merit, innovation, practical impact, and presentation, with in-person demonstration required of all finalists.

The work has been accepted as a short paper in the IEEE FCCM 2026 proceedings and is also available as a preprint on arXiv: https://arxiv.org/abs/2605.10612 

Contact: Prof. Torben Ferber

Pictures of the 34th IEEE International Symposium on FCCMs