W03 Reconciling Implementation Performance and Confidence in Machine Learning
This workshop addresses the challenge of reconciling performance and confidence in the implementation of machine learning algorithms for safety-critical domains. While optimized hardware platforms (GPU, FPGA, accelerators) and advanced graph transformations enable high performance, they raise concerns regarding traceability, verification, and certification. The focus will be on bridging the gap between training and implementation models, specifying design models, handling numerical accuracy, and mastering optimization techniques. Presentations will highlight industrial and academic perspectives, with emphasis on certification constraints (e.g., DO-178) and assurance arguments. The format combines a keynote with a series of short technical talks (15 min) to stimulate discussions (15 min).
The ultimate goal is to foster a common understanding of how to implement ML efficiently while ensuring reliability and safety.
W03.1 Context and Challenges
W03.1.1 Workshop welcome
W03.1.2 Assurance of Machine Learning in Aviation: Challenges, Solutions, and Emerging Guidance
Suppose we want to use Machine Learning (ML) technologies onboard aircraft. The reasons for doing so range from increasing autonomy and easing pilot workload to simply reducing the computing resources required by avionics. Can these benefits be realized while maintaining or even improving aircraft safety? What are the barriers for ML assurance and certification, and how might they be overcome? Researchers, regulators, and the aviation industry have been working to answer these questions, developing new guidance for certification of ML-based airborne systems, such as the ED-324 / ARP6983 standard that is currently being developed by EUROCAE WG-114 / SAE G-34 joint working group. This keynote will discuss several key challenges in the assurance of safety-critical ML-enabled components that are addressed by the working group. We will then highlight new technologies and processes that are being created to meet corresponding certification objectives. Specifically, formal methods will play a large role in the safety assurance and deployment of ML onboard aircraft.
W03.2 Advanced vs. embedded ML - bridging the gap
Many industrial actors have today not one, but two distinct ML departments - one mostly dedicated to R&D into advanced ML, and the other to the actual embedded implementation of ML algorithms. The embedded ML department usually considers rather simple (e.g. feedforward) networks in inference mode and focuses on providing real-time and numerical precision guarantees using compilers and run-times with limited control over optimization and resource allocation. But advanced ML algorithms require complex control involving back-propagation training (in on-device training and reinforcement learning contexts), or more generally stateful (e.g. recurrent) and conditional (e.g. gated mixture of experts) behaviors. Such advanced control is not readily covered by existing embedded back-ends, and is also often hidden in layers of Python code. We propose an approach to address and
W03.3 SONNX - Towards a standardized ML format model for safe systems
Transforming a trained model into an executable implementation must preserve the model’s intended semantics. Focusing on models specified using the ONNX standard, this presentation examines the requirements for precise and unambiguous syntax and semantics when models are interpreted or translated into lower-level representations. It highlights the limitations of ONNX in certification-oriented contexts and discusses the need for a safety-related ONNX profile. The presentation also introduces the objectives, methodology, and initial results of the SONNX Working Group, which aims to align the use of ONNX with industrial and safety-critical requirements without diverging from the standard.
W03.4 From models to Implementations
W03.4.1 Optimizing tensor operations for performance
This presentation focuses on compiling tensor operations described, for instance, in an ONNX model, through transformation passes such as loop fusion, tiling, interchange, and related optimizations.
In the context of critical embedded systems, several objectives must be met:
- Minimizing latency and memory usage to satisfy performance requirements and platform constraints, by exploiting the capabilities of the target architecture and exploring the available optimization space;
- Ensuring compliance with timing requirements by performing WCET analyses;
- Ensuring conformance to the original model, by relying on mathematical formalization and proofs.
The compilation techniques needed to achieve these objectives differ significantly, and bridging this gap is essential to deliver both safety and performance.
In this talk, we focus on the “compiling for performance” side and review the usual program transformations and compilation passes used to extract the best performance from tensor operators. The goal is to provide enough background to support discussions on how to connect these two worlds.
W03.4.2 Determinism Is Optional, Predictability Is Not: Numerical Approximation as a First-Class Citizen in Modern Machine Learning
Modern deep learning systems rely heavily on numerical approximations, including reduced precision, quantization, non-deterministic execution orderings, and parallel computation. These choices do not merely introduce “negligible noise”; they can fundamentally alter optimization dynamics, learned representations, training behavior, stability, and, in some cases, the functional behavior of neural networks. This presentation sheds light on the tensions between determinism, reproducibility, and predictability, and questions the actual role of numerical precision in the design, analysis, and certification of modern machine learning models.
W03.4.3 Impact of Optimizations on the Reliability of DNNs on GPUs and FPGAs
W03.5 Performance first, trust later? Rethinking edge AI deployment with Eclipse Aidge
The deployment of AI at the edge is often driven by performance constraints, with safety considerations addressed only at later stages. Aidge challenges this paradigm by providing an open-source framework in which confidence, and traceability are first-class design objectives alongside performance optimization.
It offers a transparent and auditable toolchain for deploying inference on edge platforms, relying on explicit intermediate representations and controlled graph transformations to ensure traceability from training models to implementation artifacts. Through the integration of the ACETONE approach, enabling traceability and worst-case execution time analysis, Aidge is well suited to aeronautical certification standards such as DO-178C and the forthcoming ML-specific standard ARP6983. In addition, Aidge supports inference testing under hardware fault conditions and pioneers the adoption of the Safety ONNX standard, contributing to robustness assessment, standardized model exchange, and strengthened assurance arguments for safety-critical AI systems.
