W03 Reconciling Implementation Performance and Confidence in Machine Learning

Start
End
Organiser
Eric Jenn, IRT Saint Exupéry, France
Organiser
Pierre Gaillard, CEA, France
Organiser
Filipo Perotto, ONERA, France

Organisers: 

W03: Reconciling Implementation Performance and Confidence in Machine Learning

This workshop addresses the challenge of reconciling performance and confidence in the implementation of machine learning algorithms for safety-critical domains. While optimized hardware platforms (GPU, FPGA, accelerators) and advanced graph transformations enable high performance, they raise concerns regarding traceability, verification, and certification. The focus will be on bridging the gap between training and implementation models, specifying design models, handling numerical accuracy, and mastering optimization techniques. Presentations will highlight industrial and academic perspectives, with emphasis on certification constraints (e.g., DO-178) and assurance arguments. The format combines a keynote with a series of short technical talks (15 min) to stimulate discussions (15 min). 

The ultimate goal is to foster a common understanding of how to implement ML efficiently while ensuring reliability and safety.

W03.1 Context and Challenges

Session Start
Session End
Session chair
Eric Jenn, IRT Saint Exupéry, France
Presentations

Workshop welcome

Start
End
Speaker
Eric Jenn, IRT Saint Exupéry, France

Challenges for ML in critical systems [PRELIM]

Start
End
Keynote Speaker
Dmitri Kirov, Collins Aerospace, Italy

(To be completed)

W03.2 Advanced vs. embedded ML - bridging the gap

Session Start
Session End
Speaker
Dumitru Potop-Butucaru, INRIA, France
Session chair
Eric Jenn, IRT Saint Exupéry, France

Many industrial actors have today not one, but two distinct ML departments - one mostly dedicated to R&D into advanced ML, and the other to the actual embedded implementation of ML algorithms. The embedded ML department usually considers rather simple (e.g. feedforward) networks in inference mode and focuses on providing real-time and numerical precision guarantees using compilers and run-times with limited control over optimization and resource allocation. But advanced ML algorithms require complex control involving back-propagation training (in on-device training and reinforcement learning contexts), or more generally stateful (e.g. recurrent) and conditional (e.g. gated mixture of experts) behaviors. Such advanced control is not readily covered by existing embedded back-ends, and is also often hidden in layers of Python code. We propose an approach to address and 

W03.3 SONNX - Towards a standardized ML format model for safe systems

Session Start
Session End
Session chair
Pierre Gaillard, CEA, France
Speaker
Eric Jenn, IRT Saint Exupéry, France

Transforming a trained model into an executable implementation must preserve the model’s intended semantics. Focusing on models specified using the ONNX standard, this presentation examines the requirements for precise and unambiguous syntax and semantics when models are interpreted or translated into lower-level representations. It highlights the limitations of ONNX in certification-oriented contexts and discusses the need for a safety-related ONNX profile. The presentation also introduces the objectives, methodology, and initial results of the SONNX Working Group, which aims to align the use of ONNX with industrial and safety-critical requirements without diverging from the standard.

W03.4 From models to Implementations

Session Start
Session End
Session chair
Eric Jenn, IRT Saint Exupéry, France
Presentations

Under the Hood: model transformation and optimizations

Start
End
Speaker
John Doe, nowhere, France

Abstract to be completed.

Topic is :

- How do we get from the model to its implementation? 
- What kind of optimizations are performed on DNN?
- How to gain confidence in automatic transformations for use in certified systems?
- How do these optimizations compare to optimization performed by compilers or a certification perspective ?
-  Issues raised by ML acceleration hardware

Impact of Optimizations on the Reliability of DNNs on GPUs and FPGAs

Start
End
Speaker
Fernando Fernandes Dos Santos, INRIA, France

DNN optimizations, such as compiler passes and reduced precision, are typically chosen to improve throughput, latency, or resource usage. However, they also change how software is mapped onto hardware, which can affect how radiation-induced faults in hardware propagate and how often the application fails. As a result, performance and reliability are linked through toolchain choices, and different optimization settings can lead to significant differences in failure rate. This presentation explores the effect of common optimizations on DNNs running on GPUs and FPGAs in the presence of radiation-induced faults. On both platforms, changing compiler optimization settings, numerical precision, and reuse parameters can change the failure rate and the criticality of fault outcomes for a given DNN configuration. Even when an optimization increases the raw failure rate, performance gains can still yield configurations that produce more correct results over time. Overall, assessing DNN-based systems requires considering both reliability and performance.

W03.5 Determinism Is Optional, Predictability Is Not: Numerical Approximation as a First-Class Citizen in Modern Machine Learning

Session Start
Session End
Session chair
Eric Jenn, IRT Saint Exupéry, France
Speaker
David Defour, Université de Perpignan, France

Modern deep learning systems rely heavily on numerical approximations, including reduced precision, quantization, non-deterministic execution orderings, and parallel computation. These choices do not merely introduce “negligible noise”; they can fundamentally alter optimization dynamics, learned representations, training behavior, stability, and, in some cases, the functional behavior of neural networks. This presentation sheds light on the tensions between determinism, reproducibility, and predictability, and questions the actual role of numerical precision in the design, analysis, and certification of modern machine learning models.

W03.6 Performance first, trust later? Rethinking edge AI deployment with Eclipse Aidge

Session Start
Session End
Session chair
Eric Jenn, IRT Saint-Exupery, France
Speaker
Pierre Gaillard, CEA, France
Speaker
Filipo Perotto, ONERA, France

The deployment of AI at the edge is often driven by performance constraints, with safety  considerations addressed only at later stages. Aidge challenges this paradigm by providing an open-source framework in which confidence, and traceability are first-class design objectives alongside performance optimization. 
It offers a transparent and auditable toolchain for deploying inference on edge platforms, relying on explicit intermediate representations and controlled graph transformations to ensure traceability from training models to implementation artifacts. Through the integration of the ACETONE approach, enabling traceability and worst-case execution time analysis, Aidge is well suited to aeronautical certification standards such as DO-178C and the forthcoming ML-specific standard ARP6983. In addition, Aidge supports inference testing under hardware fault conditions and pioneers the adoption of the Safety ONNX standard, contributing to robustness assessment, standardized model exchange, and strengthened assurance arguments for safety-critical AI systems.
 

Closure

Session Start
Session End
Session chair
Eric Jenn, IRT Saint-Exupery, France
Chair
Pierre Gaillard, CEA, France
Chair
Filipo Perotto, ONERA, France