Photos are available in the DATE 2024 Gallery.

The time zone for all times mentioned at the DATE website is CET – Central Europe Time (UTC+1). AoE = Anywhere on Earth.

ET02 On-Device Continual Learning Meets Ultra-Low Power Processing

Start
Mon, 25 Mar 2024 11:00
End
Mon, 25 Mar 2024 12:30
Organiser
Manuele Rusci, KU Leuven, Belgium
Organiser
Cristian Cioflan, ETH Zurich, Switzerland
Organiser
Davide Nadalini, Università di Bologna, Italy

 

This tutorial provides an overview of the recent On-Device Learning (ODL) topic for ultra-low power extreme edge devices, such as MicroController compute units (MCU).

Nowadays, these devices are capable of running Deep Neural Network (DNN) inference tasks to extract high-level information from data captured by on-board sensors. The DNN algorithms are typically trained off-device using high-performance servers and, then, frozen and deployed on resource-constrained MCU-powered platforms. However, the data used for the DNN training may not be representative of the deployment environment, causing mispredictions/misclasifications that eventually reflect into (i) expensive model redesigns and (ii) re-deployments at scale. Recently proposed Continual Learning methods stand out as potential solutions to this fundamental issue, enabling DNN model personalization by incorporating new knowledge (e.g. new class, new domains, or both) given the new data retrieved for the target environment. However, the DNN learning task has been normally considered out-of-scope for highly resource-constrained devices because of the high memory and computation requirements, limiting its application to server machines where, thanks to the potentially unlimited resources, custom DNN models can be retrained from scratch as soon as new data becomes available.

This tutorial focuses on the application of the Continual Learning (CL) paradigm on MCUs devices, to enable small sensor nodes to adapt their DNN models in-the-field, without relying on external computing resources. After providing a brief taxonomy of the main CL algorithms and scenarios, we will review the fundamental ODL operations, referring to the backpropagation learning algorithm. We will analyze the memory and computational costs of the learning process when targeting a multi-core RISC-V-based MCU, derived from the PULP-project template, and we will use a case study to see how these costs constrain an on-device learning application. Next, a hands-on session will bring the audience to familiarize with software optimizations for DNN learning primitives using PULP-TrainLib (https://github.com/pulp-platform/pulp-trainlib), the first software library for DNN training on RISC-V multicore MCU-class devices. Finally, we will conclude by reviewing the main ODL challenges and limitations and describing the latest major results in this field.

Speakers:

  • Dr. Manuele Rusci, post-doc KU Leuven
  • Cristian Cioflan, PhD student ETH Zurich
  • Davide Nadalini, PhD student UNIBO and POLITO

Tutorial Learning Objectives:

  • A brief taxonomy of Continual Learning scenarios, metrics and benchmarks.
  • Basic operations and building blocks of On-Device Learning.
  • Analysis of memory and computation costs for ODL on MCU-class devices.
  • Software optimization hands-on for ODL on a multi-core RISC-V platform.
  • Present main research directions and challenges for ODL on ultra-low power MCUs.

Target Audience

This tutorial targets researchers and practitioners interested in new hardware & software solutions for On-Device Learning on low-end devices, such as MCUs. Participants should be familiar with concepts of Deep Neural Networks (main building blocks, inference vs training) and basic C programming for Microcontrollers.

Hands-on & Material

The hands-on will demonstrate the open-source PULP-TrainLib software library (https://github.com/pulp-platform/pulp-trainlib), the state-of-the-art software package for MCU class devices, to provide a concrete embodiment of the ODL concepts and explain the application of software optimization to the learning routines, i.e. parallelization, low-precision (half-precision floating point), loop unrolling, and vectorization. The speaker will show the main library features using reference code examples. To this aim, we will adopt the open-source PULP platform simulator (https://github.com/pulp-platform/gvsoc) and encourage the audience to practice with the PULP-TrainLib ODL framework during the session. We will collect all the materials and installation instructions on a dedicated Github repository, which will be made available to the audience in advance and after the tutorial.

Detailed Program

Part I: M. Rusci (KUL). On-Device Continual Learning: motivation and intro. (10’)

  • Tutorial Intro
  • Limitation of DNN inference on MCUs
  • Continual Learning scenarios and Metrics
  • Dealing with Catastrophic forgetting

Part II: C. Cioflan (ETHZ). On-device Adaptation on a multi-core MCU device (25’ + 5’ Q&A)

  • Revisiting ODL basic operations
  • Reviewing DNN learning from an embedded system perspective: computation and memory
  • Learning on the PULP platform
  • Case study: On-Device noise adaptation for keyword spotting

Part III: D. Nadalini (UNIBO). Hands-on On-Device Learning on MCUs: Pulp-TrainLib. (25’ + 5’ Q&A)

  • PULP-TrainLib Overview, Operator Definitions and Learning deployment
  • Code Optimizations: parallelization, low-precision and vectorization

Part IV: M. Rusci (KUL). Challenges and Research Directions for On-Device Continual Learning – 15’ + 5’ Q&A

  • ODL software frameworks for MCUs
  • Memory-efficient learning and architectures
  • Data efficient learning
  • Open Problems

Final Q&A