DATE 2005 DESIGNERS' FORUM, ABSTRACTS

Sessions: [1D] [2D] [Interactive Presentations] [3D] [4D] [5D] [Interactive Presentations] [6D] [Interactive Presentations] [7D] [8D] [9D] [10D]


1D: Media and Signal Processing

Moderators: W. Luk, Imperial College, UK; M. Lindwer, Philips Silicon Hive, NL

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation [p. 2]
S. López, G. Callicó, J. López, and R. Sarmiento

Motion estimation is the most critical process in video coding systems. First of all, it has a definitive impact on the rate-distortion performance given by the video encoder. Secondly, it is the most computationally intensive process within the encoding loop. For these reasons, the design of high-performance low-cost motion estimators is a crucial task in the video compression field. An adaptive cost block matching (ACBM) motion estimation technique is presented in this paper, featuring an excellent tradeoff between the quality of the reconstructed video sequences and the computational effort. Simulation results demonstrate that the ACBM algorithm achieves a slight better rate-distortion performance than the one given by the well-known full search algorithm block matching algorithm with reductions of up to 95% in the computational load.

Hardware Acceleration of Hidden Markov Model Decoding for Person Detection [p. 8]
S. Fahmy, P. Cheung, and W. Luk

This paper explores methods for hardware acceleration of Hidden Markov Model (HMM) decoding for the detection of persons in still images. Our architecture exploits the inherent structure of the HMM trellis to optimise a Viterbi decoder for extracting the state sequence from observation features. Further performance enhancement is obtained by computing the HMM trellis states in parallel. The resulting hardware decoder architecture is mapped onto a field programmable gate array (FPGA). The performance and resource usage of our design is investigated for different levels of parallelism. Performance advantages over software are evaluated. We show how this work contributes to a real-time system for person-tracking in video-sequences.

A Hardware-Friendly Wavelet Entropy Codec for Scalable Video [p. 14]
H. Eeckhaut, H. Devos, B. Schrauwen, M. Christiaens, and D. Stroobandt

In the RESUME project we explore the use of reconfigurable hardware for the design of portable multimedia systems by developing a scalable wavelet-based video codec. A scalable video codec provides the ability to produce a smaller video stream with reduced frame rate, resolution or image quality starting from the original encoded video stream with almost no additional computation. This is important for portable devices that have different Quality of Service (QoS) requirements and power restrictions. Conventional video codecs do not possess this property; reduced quality is obtained through the arduous process of decoding the encoded video stream and recoding it at a lower quality. Producing such a smaller stream has therefore a very high computational cost. In this article we present the results of our investigation into the hardware implementation of such a scalable video codec. In particular we found that the implementation of the entropy codec is a significant bottleneck. We present an alternative, hardware-friendly algorithm for entropy coding with superior data locality (both temporal and spatial), with a smaller memory footprint and superior compression while maintaining all required scalability properties.

A Real-Time Streaming Memory Controller [p. 20]
A. Burchard, E. Hekstra-Nowacka, and A. Chauhan

With ever more complex multimedia applications used in mobile devices, the realization of high performance, flexibility and programmability requirements depends largely on the design of a system communication infrastructure. This infrastructure, often a network, should provide a large variety of services for the transportation of streamed data. When an external memory is also used for streaming communication purposes and, together with the communication infrastructure, forms a part of the streaming, additional support is needed for the memory in order to guarantee the integrity of the communication services provided by the network when data is accessing the memory. This led us to a design of a streaming memory controller (SMC) for off-chip (DDR-)SDRAM memories that enables a shared memory implementation of the streaming based on an off-chip network (PCI Express). In this paper, we present the ideas that gave rise to the SMC, the actual design of the SMC, as well as the evaluation of the design.

A Coprocessor for Accelerating Visual Information Processing [p. 26]
W. Stechele, S. Herrmann, L. Alvado Cárcel, and J. Lidón Simón

Visual information processing will play an increasingly important role in future electronics systems. In many applications, e.g. video surveillance cameras, data throughput of microprocessors is not sufficient and power consumption is too high. Instruction profiling on a typical test algorithm has shown that pixel address calculations are the dominant operations to be optimized. Therefore AddressLib, a structured scheme for pixel addressing was developed, that can be accelerated by AddressEngine, a coprocessor for visual information processing. In this paper, the architectural design of AddressEngine is described, which in the first step supports a subset of the AddressLib. Dataflow and memory organization are optimized during architectural design. AddressEngine was implemented in a FPGA and was tested with MPEG-7 Global Motion Estimation algorithm. Results on processing speed and circuit complexity are given and compared to a pure software implementation. The next step will be the support for the full AddressLib, including segment addressing. An outlook on further investigations on dynamic reconfiguration capabilities is given.

Area and Throughput Trade-Offs in the Design of Pipelined Discrete Wavelet Transform Architectures [p. 32]
S. Silva and S. Bampi

The JPEG2000 standard defines the discrete wavelet transform (DWT) as a linear space-to-frequency transform of the image domain in an irreversible compression. This irreversible discrete wavelet transform is implemented by FIR filter using 9/7 Daubechies coefficients or a lifting scheme of factorizated coefficients from 9/7 Daubechies coefficients. This work investigates the tradeoffs between area, power and data throughput (or operating frequency) of several implementations of the Discrete Wavelet Transform using the lifting scheme in various pipeline designs. This paper shows the results of five different architectures synthesized and simulated in FPGAs. It concludes that the descriptions with pipelined operators provide the best area-power-operating frequency trade-off over non-pipelined operators descriptions. Those descriptions require around 40% more hardware to increase the maximum operating frequency up to 100% and reduce power consumption to less than 50%. Starting from behavioral HDL descriptions provide the best area-power-operating frequency trade-off, improving hardware cost and maximum operating frequency around 30% in comparison to structural descriptions for the same power requirement.


2D: Secure and Embedded Security Systems

Moderators: A. Raghunathan, NEC Laboratories, US; L. Torres, LIRMM, FR

Hardware Engines for Bus Encryption: A Survey of Existing Techniques [p. 40]
R. Elbaz, L. Torres, G. Sassatelli, P. Guillemin, C. Anguille, M. Bardouillet, C. Buatois, and J. Rigaud

The widening spectrum of applications and services provided by portable and embedded devices bring a new dimension of concerns in security. Most of those embedded systems (pay-TV, PDAs, mobile phones, etc.) make use of external memory. As a result, the main problem is that data and instructions are constantly exchanged between memory (RAM) and CPU in clear form on the bus. This memory may contain confidential data like commercial software or private contents, which either the end-user or the content provider is willing to protect. The goal of this paper is to clearly describe the problem of processor-memory bus communications in this regard and the existing techniques applied to secure the communication channel through encryption. Performance overheads implied by those solutions will be extensively discussed in this paper.

Performance Considerations for an Embedded Implementation of OMA DRM 2 [p. 46]
D. Thull and R. Sannino

As digital content services gain importance in the mobile world, Digital Rights Management (DRM) applications will become a key component of mobile terminals. This paper examines the effect dedicated hardware macros for specific cryptographic functions have on the performance of a mobile terminal that supports version 2 of the open standard for Digital Rights Management defined by the Open Mobile Alliance (OMA). Following a general description of the standard, the paper contains a detailed analysis of the cryptographic operations that have to be carried out before protected content can be accessed. The combination of this analysis with data on execution times for specific algorithms realized in hardware and software has made it possible to build a model which has allowed us to assert that hardware acceleration for specific cryptographic algorithms can significantly reduce the impact DRM has on a mobile terminal's processing performance and battery life.
Keywords:
DRM, Security, Mobile Terminal, Cryptography

A Novel Unified Architecture for Public-Key Cryptography [p. 52]
A. Cilardo, A. Mazzeo, N. Mazzocca, and L. Romano

In this paper we propose a fully-parallel, bit-sliced unified architecture designed to perform modular multiplication/ exponentiation and GF(2M) multiplication as the core operations of RSA and EC cryptography. The architecture uses radix-2 Montgomery technique for modular arithmetic, and a radix-4 MSD-first approach for GF(2M) multiplication. To the best of our knowledge, it is the first unified proposal based on such a hybrid approach. The architecture structure is bit-sliced and is highly regular, modular, and scalable, as virtually any datapath length can be obtained at a linear cost in terms of hardware resources and no costs in terms of critical path. Our proposal outperforms all similar unified architectures found in the technical literature in terms of clock count and critical path. The architecture has been implemented on a Field-Programmable Gate Array (FPGA) device. A highly compact and efficient design was obtained taking advantage of the architectural characteristics.

A VLSI Design Flow for Secure Side-Channel Attack Resistant ICs [p. 58]
K. Tiri and I. Verbauwhede

This paper presents a digital VLSI design flow to create secure, side-channel attack (SCA) resistant integrated circuits. The design flow starts from a normal design in a hardware description language such as VHDL or Verilog and provides a direct path to a SCA resistant layout. Instead of a full custom layout or an iterative design process with extensive simulations, a few key modifications are incorporated in a regular synchronous CMOS standard cell design flow. We discuss the basis for side-channel attack resistance and adjust the library databases and constraints files of the synthesis and place & route procedures accordingly. Experimental results show that a DPA attack on a regular single ended CMOS standard cell implementation of a module of the DES algorithm discloses the secret key after 200 measurements. The same attack on a secure version still does not disclose the secret key after more than 2000 measurements.

Power Attack Resistant Cryptosystem Design: A Dynamic Voltage and Frequency Switching Approach [p. 64]
S. Yang, W. Wolf, N. Vijaykrishnan, D. Serpanos, and Y. Xie

A novel power attack resistant cryptosystem is presented in this paper. Security in digital computing and communication is becoming increasingly important. Design techniques that can protect cryptosystems from leaking information have been studied by several groups. Power attacks, which infer program behavior from observing power supply current into a processor core, are important forms of attacks. Various methods have been proposed to countermeasure the popular and efficient power attacks. However, these methods do not adequately protect against power attacks and may introduce new vulnerabilities. In this work, we addressed a novel approach against the power attacks, i.e., Dynamic Voltage and Frequency Switching (DVFS). Three designs, naive, improved and advanced implementations, have been studied to test the efficiency of DVFS against power attacks. A final advanced realization of our novel cryptosystem was given out, which achieved enough high power trace entropy and time trace entropy to block all kinds of power attacks, with 27% energy reduction and 16% time overhead for DES encryption and decryption algorithms.

Area Efficient Hardware Implementation of Elliptic Curve Cryptography by Iteratively Applying Karatsuba's Method [p. 70]
P. Langendörfer and Z. Dyka

Securing communication channels is especially needed in wireless environments. But applying cipher mechanisms in software is limited by the calculation and energy resources of the mobile devices. If hardware is applied to realize cryptographic operations cost becomes an issue. In this paper we describe an approach which tackles all these three points. We implemented a hardware accelerator for polynomial multiplication in extended Galois fields (GF) applying Karatsuba's method iteratively. With this approach the area consumption is reduced to 2.1 mm2 in comparison to. 6.2 mm2 for the standard application of Karatsuba's method i.e. for recursive application. Our approach also reduces the energy consumption to 60 per cent of the original approach. The price we have to pay for these achievement is the increased execution time. In our implementation a polynomial multiplication takes 3 clock cycles whereas the recursive Karatsuba approach needs only one clock cycle. But considering area, energy and calculation speed we are convinced that the benefits of our approach outweigh its drawback. Key words: Extended Galois fields, polynomial multiplication, Elliptic Curve Cryptography, Karatsuba's formula.


Interactive Presentations

An Improved FPGA Implementation of the Modified Hybrid Hiding Encryption Algorithm (MHHEA) for Data Communication Security [p. 76]
H. Farouk and M. Saeb

The hybrid hiding encryption algorithm, as its name implies, embraces concepts from both steganography and cryptography. In this exertion, an improved microarchitecture Field Programmable Gate Array (FPGA) implementation of this algorithm is presented. This design overcomes the observed limitations of a previously-designed micro-architecture. These observed limitations are: no exploitation of the possibility of parallel bit replacement, and the fact that the input plaintext was encrypted serially, which caused a dependency between the throughput and the nature of the used secret key. This dependency can be viewed by some as vulnerability in the security of the implemented micro-architecture. The proposed modified micro-architecture is constructed using five basic modules. These modules are; the message cache, the message alignment module, the key cache, the comparator, and at last the encryption module. In this work, we provide comprehensive simulation and implementation results. These are: the timing diagrams, the post-implementation timing and routing reports, and finally the floor plan. Moreover, a detailed comparison with other FPGA implementations is made available and discussed.
Keywords:
FPGA, micro-architecture, data communication security, encryption, steganography, cryptography, algorithm.

FPGA Based Agile Algorithm-on-Demand Co-Processor [p. 82]
R. Pradeep, S. Vinay, S. Burman, and V. Kamakoti

With growing computational needs of many real-world applications, frequently changing specifications of standards, and the high design and NRE costs of ASICs, an algorithm-agile FPGA based co-processor has become a viable alternative. In this article, we report about the general design of an algorith-agile co-processor and the proof-of-concept implementation.


3D: Hot Topic - MPSoC Platforms for Mobile Multimedia

Organisers: W. Wolf, Princeton U, US; A. Jerraya, TIMA Laboratory, FR Moderator: W. Wolf, Princeton U, US Speakers: W. Wolf, Princeton U, US; R. Chesson, STMicroelectronics, FR; E. Flamand, STMicroelectronics, FR

Multimedia Applications of Multiprocessor Systems-on-Chips [p. 86]
W. Wolf

This paper surveys the characteristics of multimedia systems. Multimedia applications today are dominated by compression and decompression, but multimedia devices must also implement many other functions such as security and file management. We introduce some basic concepts of multimedia algorithms and the larger set of functions that multimedia systems-on-chips must implement.


4D: Hot Topic . Low-Power Wireless LANs: Past, Present and Future

Organiser: T. Simunic, UC San Diego, US
Moderator: M. Renaudin, TIMA Laboratory, FR
Speakers: K. Holt, Intel Corp, US; A. Chandrakasan, Massachusetts Institute of Technology, US; T. Simunic, UC San Diego, US

Wireless LAN: Past, Present, and Future [p. 92]
K. Holt

This paper retraces the historical development of wireless LAN technology in the context of the pursuit of ever higher data rate, describes the significant technical breakthroughs that are now occurring, and speculates on future directions that the technology may take over the remainder of the decade. The challenges that these developments have created for low power operation are considered, as well as some of the opportunities that are presented to mitigate them. The importance of MIMO as an emerging technology for 802.11 is specifically highlighted, both in terms of the significant increase in data rate and range that it enables as well as the considerable challenge that it presents for the development of low power wireless LAN products.

Direct Conversion Pulsed UWB Transceiver Architecture [p. 94]
R. Blázquez, F. Lee, D. Wentzloff, B. Ginsburg, J. Powell, and A. Chandrakasan

Ultra-wideband (UWB) communication is an emerging wireless technology that promises high data rates over short distances and precise locationing. The large available bandwidth and the constraint of a maximum power spectral density drives a unique set of system challenges. This paper addresses these challenges using two UWB transceivers and a discrete prototype platform.

Power Saving Techniques for Wireless LANs [p. 96]
T. Simunic

Fast wireless access has rapidly become common-place. Wireless access points and Hotspot servers are sprouting everywhere. Battery lifetime continues to be a critical issue in mobile computing. This paper first gives an overview of WLAN energy saving strategies, followed by an illustration of a system-level methodology for saving power in heterogeneous wireless environments.


5D: Wireless Communication and Networking

Moderators: K. Torki, CMP, FR; C. Das, IMEC, BE

A Synthesizable IP Core for DVB-S2 LDPC Code Decoding [p. 100]
F. Kienle, T. Brack, and N. Wehn

The new standard for digital video broadcast DVB-S2 features Low-Density Parity-Check (LDPC) codes as their channel coding scheme. The codes are defined for various code rates with a block size of 64800 which allows a transmission close to the theoretical limits. The decoding of LDPC is an iterative process. For DVBS2 about 300000 messages are processed and reordered in each of the 30 iterations. These huge data processing and storage requirements are a real challenge for the decoder hardware realization, which has to fulfill the specified throughput of 255MBit/s for base station applications. In this paper we will show, to the best of our knowledge, the first published IP LDPC decoder core for the DVB-S2 standard. We present a synthesizable IP block based on ST Microelectronics 0:13μm CMOS technology.

picoArray Technology: The Tool's Story [p. 106]
A. Duller, D. Towner, G. Panesar, A. Gray, and W. Robbins

This paper briefly describes the picoArrayTM architecture, and in particular the deterministic internal communication fabric. The methods that have been developed for debugging and verifying systems using devices from the picoArray family are explained. In order to maximize the computational ability of these devices, hardware debugging support has been kept to a minimum and the methods and tools developed to take this into account.

Queue Management in Network Processors [p. 112]
I. Papaefstathiou, G. Kornaros, T. Orphanoudakis, C. Kachris, I. Mavroidis, and A. Nikologiannis

One of the main bottlenecks when designing a network processing system is very often its memory subsystem. This is mainly due to the state-of-the-art network links operating at very high speeds and to the fact that in order to support advanced Quality of Service (QoS), a large number of independent queues is desirable. In this paper we analyze the performance bottlenecks of various data memory managers integrated in typical Network Processing Units (NPUs). We expose the performance limitations of software implementations utilizing the RISC processing cores typically found in most NPU architectures and we identify the requirements for hardware assisted memory management in order to achieve wire-speed operation at gigabit per second rates. Furthermore, we describe the architecture and performance of a hardware memory manager that fulfills those requirements. This memory manager, although it is implemented in a reconfigurable technology, it can provide up to 6.2Gbps of aggregate throughput, while handling 32K independent queues.
KeyWords: -
Network processor, memory management, queue management

System Level Analysis of the Bluetooth Standard [p. 118]
M. Conti and D. Moretti

The SystemC modules of the Link Manager Layer and Baseband Layer have been designed in this work at behavioral level to analyze the performances of the Bluetooth standard. In particular the probability of the creation of a piconet in presence of noise in the channel and the power reduction using the sniff and hold mode have been investigated.

C Based Hardware Design for Wireless Applications [p. 124]
A. Takach, B. Bowyer, and T. Bollaert

The algorithms used in wireless applications are increasingly more sophisticated and consequently more challenging to implement in hardware. Traditional design flows require developing the micro architecture, coding the RTL, and verifying the generated RTL against the original functional C or MATLAB specification. This paper describes a C-based design flow that is well suited for the hardware implementation of DSP algorithms commonly found in wireless applications. The C design flow relies on guided synthesis to generate the RTL directly from the untimed C algorithm. The specifics of the C-based design flow are described using a simple DSP filtering algorithm consisting of a forward adaptive equalizer, a 64-QAM slicer and an adaptive decision feedback equalizer. The example illustrates some of the capabilities and advantages offered by this flow.

Hardware Accelerated Collision Detection . An Architecture and Simulation Results [p. 130]
A. Raabe, B. Bartyzel, J. Anlauf, and G. Zachmann

We present a hardware architecture for a single-chip acceleration of an efficient hierarchical collision detection algorithm as well as simulation results for collision queries using this architecture. The architecture consists of two main stages, one for traversing simultaneously a hierarchy of discretely oriented polytopes, and one for intersecting triangles. Within each stage, the architecture is deeply pipelined and parallelized. For the first stage, we compare and evaluate different traversal schemes for bounding volume hierarchies. A simulation in VHDL shows that a hardware implementation can offer a speed-up over a software implementation by orders of magnitude. Thus, real-time collision detection of complex objects at rates required by force-feedback and physically-based simulations can be achieved.


Interactive Presentations

Modeling of a Reconfigurable OFDM IP Block Family for an RF System Simulator [p. 136]
J. Liedes and H. Heusala

The idea of design domain specific Mother Model of IP block family as a base of modeling of system integration is presented here. A common reconfigurable Mother Model for ten different standardized digital OFDM transmitters has been developed. By means of a set of parameters, the mother model can be reconfigured to any of the ten selected standards. So far the applicability of the proposed reconfiguration and analog-digital co-modeling methods have been proved by modeling the function of the digital parts of three, 802.11a, ADSL and DRM, transmitters in an RF system simulator. The model is intended to be used as signal source template in RF system simulations. The concept is not restricted to signal sources, it can be applied to any IP block development. The idea of the Mother Model will be applied in other design domains to prove that in certain application areas, OFDM transceivers in this case, the design process can progress simultaneously in different design domains - mixed signal, system and RTL-architectural. without the need of high-level synthesis. Only the Mother Models of three design domains are needed to be formally proved to function as specified.

Fast and Accurate Transaction Level Modeling of an Extended AMBA2.0 Bus Architecture [p. 138]
Y.-T. Kim, T. Kim, Y. Kim, C. Shin, E.-Y. Chung, K.-M. Choi, J.-T. Kong, S.-K. Eo

Transaction Level Modeling (TLM) approach is used to meet the simulation speed as well as cycle accuracy for large scale SoC performance analysis. We implemented a transaction-level model of a proprietary bus called AHB+ which supports an extended AMBA2.0 protocol. The AHB+ transaction-level model shows 353 times faster than pin-accurate RTL model while maintaining 97% of accuracy on average. We also present the development procedure of TLM of a bus architecture.


6D: Automotive

Moderators: A. Kirschbaum, Continental Teves AG & Co, DE; J. Gerlach, Robert Bosch GmbH, DE

Meeting the Embedded Design Needs of Automotive Applications [p. 142]
W. Lyons

The importance of embedded systems in driving innovation in automotive applications continues to grow. Understanding the specific needs of developers targeting this market is also helping to drive innovation in RISC core design. This paper describes how a RISC instruction set architecture has evolved to better meet those needs, and the key implementation features in two very different RISC cores are used to demonstrate the challenges of designing for real-time automotive systems.

Debug Support, Calibration and Emulation for Multiple Processor and Powertrain Control SoCs [p. 148]
A. Mayer, H. Siebert, and K. McDonald-Maier

The introduction of complex SoCs with multiple processor cores presents new development challenges, such that development support is now a decisive factor when choosing a System-on-Chip (SoC). The presented developments support strategy addresses the challenges using both architecture and technology approaches. The Multi-Core Debug Support (MCDS) architecture provides flexible triggering using cross triggers and a multiple core break and suspend switch. Temporal trace ordering is guaranteed down to cycle level by on-chip time stamping. The Package Sized-ICE (PSI) approach is a novel method of including trace buffers, overlay memories, processing resources and communication interfaces without changing device behavior. PSI requires no external emulation box, as the debug host interfaces directly with the SoC using a standard interface.

The Integration of On-Line Monitoring and Reconfiguration Functions Using IEEE1149.4 into a Safety Critical Automotive Electronic Control Unit [p. 153]
C. Jeffrey, R. Cutajar, A. Richardson, S. Prosser, M. Lickess, and S. Riches

This paper presents an innovative application of IEEE 1149.4 and the Integrated Diagnostic Reconfiguration (IDR) as tools for the implementation of an embedded test solution for an Automotive Electronic Control Unit implemented as a fully integrated mixed signal system. The paper described how the test architecture can be used for fault avoidance with results from a hardware prototype presented. The paper concludes that fault avoidance can be integrated into mixed signal electronic systems to handle key failure modes.

LC Oscillator Driver for Safety Critical Applications [p. 159]
P. Horsky

A CMOS harmonic signal LC oscillator driver for automotive applications working in a harsh environment with high safety critical requirements is described. The driver can be used with a wide range of external components parameters (LC resonance network of a sensor). Quality factor of the external LC network can vary two decades. Amplitude regulation of the driver is digitally controlled and the DAC is constructed as exponential with piece-wise-linear (PWL) approximation. Low current consumption for high quality resonance networks is achieved. Realized oscillator is robust, used in safety critical application and has low EMC emissions.

Context Sensitive Performance Analysis of Automotive Applications [p. 165]
J. Staschulat, R. Ernst, A. Schulze, and F. Wolf

Accurate timing analysis is key to efficient embedded system synthesis and integration. While industrial control software systems are developed using graphical models, such as Matlab/Simulink or ASCET/SD, exhaustive simulation is not suitable for verifying functional and timing behavior. Formal performance analysis is an alternative but can lead to wide timing intervals because of input data dependency and complex target architectures. Hence a designer might want to restrict the formal performance analysis to parts of the software system, called context or process modes. In this paper, we describe how to define and characterize such context information from graphical models. Further, we extend the formal performance analysis to consider contexts. Results from an automotive application demonstrate the applicability of our approach.

AutoMoDe - Model-Based Development of Automotive Software [p. 171]
D. Ziegenbein, U. Freund, P. Braun, A. Bauer, J. Romberg, and B. Schätz

This paper describes first results from the AutoMoDe (Automotive Model-Based Development) project. The overall goal of the project is to develop an integrated methodology for model-based development of automotive control software, based on problem-specific design notations with an explicit formal foundation. Based on the existing AutoFOCUS framework [1], a tool prototype is being developed in order to illustrate and validate the key elements of our approach.


Interactive Presentation

SystemC Analysis of a New Dynamic Power Management Architecture [p. 177]
M. Conti

This paper presents a new dynamic power management architecture of a System on Chip. The Power State Machine describing the status of the core follows the recommendations of the ACPI standard. The algorithm controls the power states of each block on the basis of battery status, chip temperature and a user defined task priority.


7D: Sensors

Moderators: R. Zafalon, STMicroelectronics, IT; W. Ecker, Infineon Technologies, DE

Exploiting Real-Time FPGA Based Adaptive Systems Technology for Real-Time Sensor Fusion in Next Generation Automotive Safety Systems [p. 180]
S. Chappell, A. Macarthur, D. Preston, D. Olmstead, B. Flint, and C. Sullivan

We present a system for the boresighting of sensors using inertial measurement devices as the basis for developing a range of dynamic real-time sensor fusion applications. The proof of concept utilizes a COTS FPGA platform for sensor fusion and real-time correction of a misaligned video sensor. We exploit a custom-designed 32-bit soft processor core and C-based design & synthesis for rapid, platform-neutral development. Kalman filter and sensor fusion techniques established in advanced aviation systems are applied to automotive vehicles with results exceeding typical industry requirements for sensor alignment. Results of the static and the dynamic tests demonstrate that using inexpensive accelerometers mounted on (or during assembly of) a sensor and an Inertial Measurement Unit (IMU) fixed to a vehicle can be used to compute the misalignment of the sensor to the IMU and thus vehicle. In some cases the model predications and test results exceeded the requirements by an order of magnitude with a 3-sigma or 99% confidence.

Platform Based Design for Automotive Sensor Conditioning [p. 186]
L. Fanucci, A. Giambastiani, A. Rocchi, F. Iozzi, and C. Marino

In this paper a general architecture suitable to interface several kinds of sensors for automotive applications is presented. A platform based design approach is pursued to improve system performance while minimizing time-to-market.. The platform is composed by an analog front-end and a digital section. The latter is based on a microcontroller core (8051 IP by Oregano) plus a set of dedicated hardware dedicated to the complex signal processing required for sensor conditioning. The microcontroller handles also the communication with external devices (as a PC) for data output and fast prototyping. A case study is presented concerning the conditioning of a Gyro yaw rate sensor for automotive applications. Measured performance results outperform current state-of-the-art commercial devices.

Realization of a Virtual Lambda Sensor on a Fixed Precision System [p. 192]
P. Amato, N. Cesario, M. Di Meglio, and F. Pirozzi

The aim of this work is to study the implementation feasibility of a VLS (Virtual Lambda Sensor) by a TSK (Takagi, Sugeno, Kang) singleton FIS (Fuzzy Inference System). Such a sensor could be used in a model based EMS (Engine Management System) for trade gasoline engines. FIS design target is to obtain a system with a fixed data representation (i.e. 10 bit) and a limited number of inputs, outputs, rules and membership.

Hardware-Software Design of a Smart Sensor for Fully-Electronic DNA Hybridization Detection [p. 198]
C. Stagni, C. Guiducci, M. Lanzoni, L. Benini, and B. Ricc&oagrave;

This paper describes the design of a smart sensor for label-free detection of DNA hybridization. The sensor is based on a direct electrical transduction principle: it measures impedance variation at the interface between a biofunctionalized electrode and a solution containing the analyte. The smart sensor includes a complete signal conditioning and processing subsystem based on an embedded μ-controller.We outline the sensor architecture, and we describe in details board-level integration as well as hardware and software implementation and design choices. The accuracy of our embedded solution has been evaluated by comparing it with a high-cost laboratory setup. Moreover, we provide measurements of real sensing structures which demonstrate in field the functionality of our system.

A Tool and Methodology for AC-Stability Analysis of Continuous-Time Closed-Loop Systems [p. 204]
M. Milev and R. Burt

Presented are a methodology and a DFII-based tool for AC-stability analysis of a wide variety of closed-loop continuous-time (operational amplifiers and other linear circuits). The methodology used allows for easy identification and diagnostics of ac-stability problems including not only main-loop effects but also local-instability loops in current mirrors, bias circuits and emitter or source followers without breaking the loop. The results of the analysis are easy to interpret. Estimated phase margin is readily available. Instability nodes and loops along with their respective oscillation frequencies are immediately identified and mapped to the existing circuit nodes thus offering significant advantages compared to traditional "black-box" methods of stability analysis (Transient Overshoot, Bode and Phase margin plots etc.). The tool for AC-Stability analysis is written in SKILLTM and is fully integrated in DFIITM environment. Its "push-button" graphical user interface (GUI) is easy to use and understand. The tool can be invoked directly from ComposerTM schematic and does not require active Analog ArtistTM session. The tool is not dependent on the use of a specific fabrication technology or Process Design Kit customization. It requires OCEANTM, SpectreTM and Waveform calculator capabilities to run.
Index Terms -
AC stability, small-signal circuit stability, frequency instability, closed loop system stability.


8D: Best of ESSCIRC 2004

Moderators: B. Courtois, TIMA Laboratory, FR; G. Gielen, KU Leuven, BE

A CMOS-Based Tactile Sensor for Continuous Blood Pressure Monitoring [p. 210]
K.-U. Kirstein, J. Sedivy, T. Salo, C. Hagleitner, T. Vancura, and A. Hierlemann

A monolithic integrated tactile sensor array is presented, which is used to perform non-invasive blood pressure monitoring of a patient. The advantage of this device compared to a hand cuff based approach is the capability of recording continuous blood pressure data. The capacitive, membrane-based sensor device is fabricated in an industrial CMOS-technology combined with post-CMOS micromachining. The capacitance change is detected by a -modulator. The modulator is operated at a sampling rate of 128kS/s and achieves a resolution of 12bit with an external decimation filter and an OSR of 128.

Optical Receiver IC for CD/DVD/Blue-Laser Application [p. 215]
J. Sturm, M. Leifhelm, H. Schatzmayr, S. Groiss, and H. Zimmermann

In this paper an optoelectronic receiver IC for optical data storage applications is presented. The IC was developed in a 0.5 μm BiCMOS technology with integrated PIN-photodiodes. It includes a new architecture of high-speed and low-noise trans-impedance amplifiers with a gain range of 130Ω to 270kΩ. programmable with a serial interface. The bandwidth is 260MHz for highest gain which gives a gain-bandwidth-product of 70 THzΩ and a sensitivity improvement by a factor of 2 compared to published OEICs. The amplifiers support a special write/clip mode. The output buffers are 130Ω impedance matched for optimized data transmission over a flex cable.

A 97mW 110 MS/s 12b Pipeline ADC Implemented in 0.18μm Digital CMOS [p. 219]
T. Andersen, A. Briskemyr, F. Telstø, J. Bjørnsen, T. Bonnerud, B. Hernes, and Ø. Moldsvor

A 12 bit Pipeline ADC fabricated in a 0.18 μm pure digital CMOS technology is presented. Its nominal conversion rate is 110MS/s and the nominal supply voltage is 1.8V. The effective number of bits is 10.4 when a 10MHz input signal with 2VP-P signal swing is applied. The occupied silicon area is 0.86mm2 and the power consumption equals 97mW. A switched capacitor bias current circuit scale the bias current automatically with the conversion rate, which gives scaleable power consumption and full performance of the ADC from 20 to 140MS/s.

A 6bit, 1.2GSps Low-Power Flash-ADC in 0.13μm Digital CMOS [p. 223]
C. Sandner, M. Clara, A. Santner, T. Hartig, and F. Kuttner

A 6bit flash-ADC with 1.2GSps, wide analog bandwidth and low power, realized in a standard digital 0.13 *mu;m CMOS copper technology is presented. Employing capacitive interpolation gives various advantages when designing for low power: no need for a reference resistor ladder, implicit sample-and-hold operation, no edge effects in the interpolation network (as compared to resistive interpolation), and a very low input capacitance of only 400fF, which leads to an easily drivable analog converter interface. Operating at 1.2GSps the ADC achieves an effective resolution bandwidth (ERBW) of 700MHz, while consuming 160mW of power. At 600MSps we achieve an ERBW of 600MHz with only 90mW power consumption, both from a 1.5V supply. This corresponds to outstanding Figure-of-Merit numbers (FoM) of 2.2 and 1.5pJ/convstep, respectively. The module area is 0.12mm2.


9D: IP-Reuse and Reconfigurable Systems

Moderators: T. Kean, Algotronix, UK; P. Pezzati, Cadence, FR

Testing Logic Cores Using a BIST P1500 Compliant Approach: A Case of Study [p. 228]
P. Bernardi, G. Masera, F. Quaglio, and M. Sonza Reorda

In this paper we describe how we applied a BIST-based approach to the test of a logic core to be included in System-on-a-chip (SoC) environments. The approach advantages are the ability to protect the core IP, the simple test interface (thanks also to the adoption of the P1500 standard), the possibility to run the test at-speed, the reduced test time, and the good diagnostic capabilities. The paper reports figures about the achieved fault coverage, the required area overhead, and the performance slowdown, and compares the figures with those for alternative approaches, such as those based on full scan and sequential ATPG.

MultiNoC: A Multiprocessing System Enabled by a Network on Chip [p. 234]
A. Mello, L. Möller, N. Calazans, and F. Moraes

The MultiNoC system implements a programmable on-chip multiprocessing platform built on top of an efficient, low area overhead intra-chip interconnection scheme. The employed interconnection structure is a Network on Chip, or NoC. NoCs are emerging as a viable alternative to increasing demands on interconnection architectures, due to the following characteristics: (i) energy efficiency and reliability; (ii) scalability of bandwidth, when compared to traditional bus architectures; (iii) reusability; (iv) distributed routing decisions. An external host computer feeds MultiNoC with application instructions and data. After this initialization procedure, MultiNoC executes some algorithm. After finishing execution of the algorithm, output data can be read back by the host. Sequential or parallel algorithms conveniently adapted to the MultiNoC structure can be executed. The main motivation to propose this design is to enable the investigation of current trends to increase the number of embedded processors in SoCs, leading to the concept of "sea of processors" systems.

Using Mobilize Power Management IP for Dynamic and Static Power Reduction in SoC at 130nm [p. 240]
D. Hillman

At 130 nm and 90 nm, power consumption (both dynamic and static) has become a barrier in the roadmap for SoC designs targeting battery powered, mobile applications. This paper presents the results of dynamic and static power reduction achieved implementing Tensilica's 32-bit Xtensa microprocessor core, using Virtual Silicon's Power Management IP. Independent voltage islands are created using Virtual Silicon's VIP PowerSaver standard cells by using voltage level shifting cells and voltage isolation cells to implement power islands. The VIP PowerSaver standard cells are characterized at 1.2V, 1.0V and 0.8V, to accommodate voltage scaling. Power islands can also be turned off completely. Designers can significantly lower both the dynamic power and the quiescent or leakage power of their SoC designs, with very little impact on speed or area using Virtual Silicon's VIP Gate Bias standard cells.

A Partitioning Methodology for Accelerating Applications in Hybrid Reconfigurable Platforms [p. 247]
M. Galanis, A. Milidonis, C. Goutis, G. Theodoridis, and D. Soudris

In this paper, we propose a methodology for partitioning and mapping computational intensive applications in reconfigurable hardware blocks of different granularity. A generic hybrid reconfigurable architecture is considered so as the methodology can be applicable to a large number of heterogeneous reconfigurable platforms. The methodology mainly consists of two stages, the analysis and the mapping of the application onto fine and coarse-grain hardware resources. A prototype framework consisting of analysis, partitioning and mapping tools has been also developed. For the coarse-grain reconfigurable hardware, we use our previous-developed high-performance coarse-grain datapath. In this work, the methodology is validated using two real-world applications, an OFDM transmitter and a JPEG encoder. In the case of the OFDM transmitter, a maximum clock cycles decrease of 82% relative to the ones in an all fine-grain mapping solution is achieved. The corresponding performance improvement for the JPEG is 43%.

Evaluation of SystemC Modelling of Reconfigurable Embedded Systems [p. 253]
T. Rissa, W. Luk, and A. Donlin

This paper evaluates the use of pin and cycle accurate SystemC models for embedded system design exploration and early software development. The target system is MicroBlaze VanillaNet Platform running MicroBlaze uClinux operating system. The paper compares Register Transfer Level (RTL) Hardware Description Language (HDL) simulation speed to the simulation speed of several different SystemC models. It is shown that simulation speed of pin and cycle accurate models can go up to 150 kHz, compared to 100 Hz range of HDL simulation. Furthermore, utilising techniques that temporarily compromise cycle accuracy, effective simulation speed of up to 500 kHz can be obtained.

Hardware Support for QoS-Based Function Allocation in Reconfigurable Systems [p. 259]
M. Ullmann, W. Jin, and J. Becker

This contribution presents a new approach for allocating suitable function-implementation variants depending on given quality-of-service function-requirements for run-time reconfigurable multi-device systems. Our approach adapts methodologies from the domain of knowledge-based systems which can be used for doing run-time hardware/software resource usage optimizations.
Keywords:
CBR, Algorithm, Resource Management


10D: Design Verification

Moderators: F. Fummi, Verona U, IT; W. Matzke, Cadence, DE

An Integrated Design and Verification Methodology for Reconfigurable Multimedia Systems [p. 266]
M. Borgatti, A. Capello, U. Rossi, F. Fummi, G. Pravadelli, J.-L. Lambert, and I. Moussa

Recently a lot of multimedia applications are emerging on portable appliances. They require both the flexibility of upgradeable devices (traditionally software based) and a powerful computing engine (typically hardware). In this context, programmable HW and dynamic reconfiguration allow novel approaches to the migration of algorithms from SW to HW. Thus, in the frame of the Symbad project, we propose an industrial design flow for reconfigurable SoC's. The goal of Symbad consists of developing a system level design platform for hardware and software SoC systems including formal and semi-formal verification techniques.

Common Reusable Verification Environment for BCA and RTL Models [p. 272]
G. Falconeri, W. Naifer, and N. Romdhane

This paper deals with a common verification methodology and environment for SystemC BCA and RTL models. The aim is to save effort by avoiding the same work done twice by different people and to reuse the same environment for the two design views. Applying this methodology the verification task starts as soon as the functional specification is signed off and it runs in parallel to the models and design development. The verification environment is modeled with the aid of dedicated verification languages and it is applied to both the models. The test suite is exactly the same and thus it's possible to verify the alignment between the two models. In fact the final step is to check the cycle-by-cycle match of the interface behavior. A regression tool and a bus analyzer have been developed to help the verification and the alignment process. The former is used to automate the testbench generation and to run the two test suites. The latter is used to verify the alignment between the two models comparing the waveforms obtained in each run. The quality metrics used to validate the flow are full functional coverage and full alignment at each IP port.

An Assembler Driven Verification Methodology (ADVM) [p. 278]
J. MacBeth, K. Gray, and D. Heinz

This paper presents an overview of an assembler driven verification methodology (ADVM) that was created and implemented for a chip card project at Infineon Technologies AG [2]. The primary advantage of this methodology is that it enables rapid porting of directed tests to new targets and derivatives, with only a minimum amount of code refactoring. As a consequence, considerable verification development time and effort was saved.

A Formal Verification Methodology for Checking Data Integrity [p. 284]
Y. Umezawa and T. Shimizu

Formal verification techniques have been playing an important role in pre-silicon validation processes. One of the most important points considered in performing formal verification is to define good verification scopes; we should define clearly what to be verified formally upon designs under tests. We considered the following three practical requirements when we defined the scope of formal verification. They are (a) hard to verify (b) small to handle, and (c) easy to understand. Our novel approach is to break down generic properties for system into stereotype properties in block level and to define requirements for Verifiable RTL. Consequently, each designer instead of verification experts can describe properties of the design easily, and formal model checking can be applied systematically and thoroughly to all the leaf modules. During the development of a component chip for server platforms, we focused on RAS (Reliability, Availability, and Serviceability) features and described more than 2000 properties in PSL. As a result of the formal verification, we found several critical logic bugs in a short time with limited resources, and successfully verified all of them. This paper presents a study of the functional verification methodology.

On the Design and Verification Methodology of the Look-Aside Interface [p. 290]
A. Ahmed, A. Habibi, O. Mohamed, and S. Tahar

In this paper, we present a technique to design and verify the Look-Aside (LA-1) Interface standard used in network processors. Our design flow includes several refinements starting from an informal UML specification until getting to an RTL-modeled in Verilog. We integrate the verification of the LA-Interface in the design flow by considering two intermediate levels: (1) Abstract State Machines (ASM); and (2) SystemC. The first one serves the verification by model checking of a set of PSL properties, while the second includes a set of assertions to be verified by simulation. To evaluate the performance of our approach, we used the Rule-Base model checker to verify the same properties; and the OVL library to verify the same assertions.