| |
DATE 2005 DESIGNERS' FORUM, ABSTRACTS
Sessions:
[1D]
[2D]
[Interactive Presentations]
[3D]
[4D]
[5D]
[Interactive Presentations]
[6D]
[Interactive Presentations]
[7D]
[8D]
[9D]
[10D]
Moderators: W. Luk, Imperial College, UK; M. Lindwer, Philips Silicon Hive, NL
-
A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation [p. 2]
-
S. López, G. Callicó, J. López, and R. Sarmiento
Motion estimation is the most critical process in video
coding systems. First of all, it has a definitive impact on
the rate-distortion performance given by the video
encoder. Secondly, it is the most computationally
intensive process within the encoding loop. For these
reasons, the design of high-performance low-cost motion
estimators is a crucial task in the video compression field.
An adaptive cost block matching (ACBM) motion
estimation technique is presented in this paper, featuring
an excellent tradeoff between the quality of the
reconstructed video sequences and the computational
effort. Simulation results demonstrate that the ACBM
algorithm achieves a slight better rate-distortion
performance than the one given by the well-known full
search algorithm block matching algorithm with
reductions of up to 95% in the computational load.
-
Hardware Acceleration of Hidden Markov Model Decoding for Person Detection [p. 8]
-
S. Fahmy, P. Cheung, and W. Luk
This paper explores methods for hardware acceleration
of Hidden Markov Model (HMM) decoding for the detection
of persons in still images. Our architecture exploits the
inherent structure of the HMM trellis to optimise a Viterbi
decoder for extracting the state sequence from observation
features. Further performance enhancement is obtained by
computing the HMM trellis states in parallel. The resulting
hardware decoder architecture is mapped onto a field
programmable gate array (FPGA). The performance and
resource usage of our design is investigated for different
levels of parallelism. Performance advantages over software
are evaluated. We show how this work contributes to
a real-time system for person-tracking in video-sequences.
-
A Hardware-Friendly Wavelet Entropy Codec for Scalable Video [p. 14]
-
H. Eeckhaut, H. Devos, B. Schrauwen, M. Christiaens, and D. Stroobandt
In the RESUME project we explore the use of reconfigurable
hardware for the design of portable multimedia systems
by developing a scalable wavelet-based video codec.
A scalable video codec provides the ability to produce a
smaller video stream with reduced frame rate, resolution
or image quality starting from the original encoded video
stream with almost no additional computation. This is important
for portable devices that have different Quality of
Service (QoS) requirements and power restrictions. Conventional
video codecs do not possess this property; reduced
quality is obtained through the arduous process of decoding
the encoded video stream and recoding it at a lower
quality. Producing such a smaller stream has therefore a
very high computational cost.
In this article we present the results of our investigation
into the hardware implementation of such a scalable video
codec. In particular we found that the implementation of
the entropy codec is a significant bottleneck. We present an
alternative, hardware-friendly algorithm for entropy coding
with superior data locality (both temporal and spatial),
with a smaller memory footprint and superior compression
while maintaining all required scalability properties.
-
A Real-Time Streaming Memory Controller [p. 20]
-
A. Burchard, E. Hekstra-Nowacka, and A. Chauhan
With ever more complex multimedia applications used in mobile
devices, the realization of high performance, flexibility and
programmability requirements depends largely on the design of
a system communication infrastructure. This infrastructure, often
a network, should provide a large variety of services for the
transportation of streamed data. When an external memory is
also used for streaming communication purposes and, together with
the communication infrastructure, forms a part of the streaming,
additional support is needed for the memory in order to guarantee
the integrity of the communication services provided by the
network when data is accessing the memory. This led us to a design
of a streaming memory controller (SMC) for off-chip (DDR-)SDRAM
memories that enables a shared memory implementation of the
streaming based on an off-chip network (PCI Express). In this
paper, we present the ideas that gave rise to the SMC, the actual
design of the SMC, as well as the evaluation of the design.
-
A Coprocessor for Accelerating Visual Information Processing [p. 26]
-
W. Stechele, S. Herrmann, L. Alvado Cárcel, and J. Lidón Simón
Visual information processing will play an increasingly important role in future electronics systems. In many applications, e.g. video surveillance cameras, data throughput of microprocessors is not sufficient and power consumption is too high. Instruction profiling on a typical test algorithm has shown that pixel address calculations are the dominant operations to be optimized. Therefore AddressLib, a structured scheme for pixel addressing was developed, that can be accelerated by AddressEngine, a coprocessor for visual information processing. In this paper, the architectural design of AddressEngine is described, which in the first step supports a subset of the AddressLib. Dataflow and memory organization are optimized during architectural design. AddressEngine was implemented in a FPGA and was tested with MPEG-7 Global Motion Estimation algorithm. Results on processing speed and circuit complexity are given and compared to a pure software implementation. The next step will be the support for the full AddressLib, including segment addressing. An outlook on further investigations on dynamic reconfiguration capabilities is given.
-
Area and Throughput Trade-Offs in the Design of Pipelined Discrete Wavelet Transform Architectures [p. 32]
-
S. Silva and S. Bampi
The JPEG2000 standard defines the discrete wavelet
transform (DWT) as a linear space-to-frequency transform
of the image domain in an irreversible compression. This
irreversible discrete wavelet transform is implemented by
FIR filter using 9/7 Daubechies coefficients or a lifting
scheme of factorizated coefficients from 9/7 Daubechies
coefficients.
This work investigates the tradeoffs between area,
power and data throughput (or operating frequency) of
several implementations of the Discrete Wavelet
Transform using the lifting scheme in various pipeline
designs. This paper shows the results of five different
architectures synthesized and simulated in FPGAs. It
concludes that the descriptions with pipelined operators
provide the best area-power-operating frequency trade-off
over non-pipelined operators descriptions. Those
descriptions require around 40% more hardware to
increase the maximum operating frequency up to 100%
and reduce power consumption to less than 50%. Starting
from behavioral HDL descriptions provide the best area-power-operating
frequency trade-off, improving hardware
cost and maximum operating frequency around 30% in
comparison to structural descriptions for the same power
requirement.
Moderators: A. Raghunathan, NEC Laboratories, US; L. Torres, LIRMM, FR
-
Hardware Engines for Bus Encryption: A Survey of Existing Techniques [p. 40]
-
R. Elbaz, L. Torres, G. Sassatelli, P. Guillemin, C. Anguille,
M. Bardouillet, C. Buatois, and J. Rigaud
The widening spectrum of applications and services
provided by portable and embedded devices bring a new
dimension of concerns in security. Most of those
embedded systems (pay-TV, PDAs, mobile phones, etc.)
make use of external memory. As a result, the main
problem is that data and instructions are constantly
exchanged between memory (RAM) and CPU in clear
form on the bus. This memory may contain confidential
data like commercial software or private contents, which
either the end-user or the content provider is willing to
protect. The goal of this paper is to clearly describe the
problem of processor-memory bus communications in this
regard and the existing techniques applied to secure the
communication channel through encryption.
Performance overheads implied by those solutions will be
extensively discussed in this paper.
-
Performance Considerations for an Embedded Implementation of OMA DRM 2 [p. 46]
-
D. Thull and R. Sannino
As digital content services gain importance in the mobile world,
Digital Rights Management (DRM) applications will become a
key component of mobile terminals. This paper examines the
effect dedicated hardware macros for specific cryptographic
functions have on the performance of a mobile terminal that
supports version 2 of the open standard for Digital Rights
Management defined by the Open Mobile Alliance (OMA).
Following a general description of the standard, the paper
contains a detailed analysis of the cryptographic operations
that have to be carried out before protected content can be
accessed. The combination of this analysis with data on
execution times for specific algorithms realized in hardware
and software has made it possible to build a model which has
allowed us to assert that hardware acceleration for specific
cryptographic algorithms can significantly reduce the impact
DRM has on a mobile terminal's processing performance and
battery life.
Keywords: DRM, Security, Mobile Terminal, Cryptography
-
A Novel Unified Architecture for Public-Key Cryptography [p. 52]
-
A. Cilardo, A. Mazzeo, N. Mazzocca, and L. Romano
In this paper we propose a fully-parallel, bit-sliced unified
architecture designed to perform modular multiplication/
exponentiation and GF(2M) multiplication as the core
operations of RSA and EC cryptography.
The architecture uses radix-2 Montgomery technique for
modular arithmetic, and a radix-4 MSD-first approach for
GF(2M) multiplication. To the best of our knowledge, it is
the first unified proposal based on such a hybrid approach.
The architecture structure is bit-sliced and is highly regular,
modular, and scalable, as virtually any datapath length can
be obtained at a linear cost in terms of hardware resources
and no costs in terms of critical path. Our proposal outperforms
all similar unified architectures found in the technical
literature in terms of clock count and critical path.
The architecture has been implemented on a Field-Programmable Gate Array (FPGA) device. A highly
compact and efficient design was obtained taking advantage
of the architectural characteristics.
-
A VLSI Design Flow for Secure Side-Channel Attack Resistant ICs [p. 58]
-
K. Tiri and I. Verbauwhede
This paper presents a digital VLSI design flow to create
secure, side-channel attack (SCA) resistant integrated
circuits. The design flow starts from a normal design in a
hardware description language such as VHDL or Verilog
and provides a direct path to a SCA resistant layout. Instead
of a full custom layout or an iterative design process
with extensive simulations, a few key modifications are
incorporated in a regular synchronous CMOS standard
cell design flow. We discuss the basis for side-channel
attack resistance and adjust the library databases and
constraints files of the synthesis and place & route procedures
accordingly. Experimental results show that a DPA
attack on a regular single ended CMOS standard cell
implementation of a module of the DES algorithm discloses
the secret key after 200 measurements. The same
attack on a secure version still does not disclose the secret
key after more than 2000 measurements.
-
Power Attack Resistant Cryptosystem Design: A Dynamic Voltage and Frequency Switching Approach [p. 64]
-
S. Yang, W. Wolf, N. Vijaykrishnan, D. Serpanos, and Y. Xie
A novel power attack resistant cryptosystem is presented in
this paper. Security in digital computing and communication is becoming
increasingly important. Design techniques that can protect cryptosystems
from leaking information have been studied by several groups. Power
attacks, which infer program behavior from observing power supply
current into a processor core, are important forms of attacks. Various
methods have been proposed to countermeasure the popular and efficient
power attacks. However, these methods do not adequately protect against
power attacks and may introduce new vulnerabilities. In this work, we
addressed a novel approach against the power attacks, i.e., Dynamic
Voltage and Frequency Switching (DVFS). Three designs, naive, improved
and advanced implementations, have been studied to test the efficiency
of DVFS against power attacks. A final advanced realization of our novel
cryptosystem was given out, which achieved enough high power trace
entropy and time trace entropy to block all kinds of power attacks, with
27% energy reduction and 16% time overhead for DES encryption and
decryption algorithms.
-
Area Efficient Hardware Implementation of Elliptic Curve Cryptography by
Iteratively Applying Karatsuba's Method [p. 70]
-
P. Langendörfer and Z. Dyka
Securing communication channels is especially needed in
wireless environments. But applying cipher mechanisms in
software is limited by the calculation and energy resources
of the mobile devices. If hardware is applied to realize
cryptographic operations cost becomes an issue. In this
paper we describe an approach which tackles all these three
points. We implemented a hardware accelerator for
polynomial multiplication in extended Galois fields (GF)
applying Karatsuba's method iteratively. With this
approach the area consumption is reduced to 2.1 mm2 in
comparison to. 6.2 mm2 for the standard application of
Karatsuba's method i.e. for recursive application. Our
approach also reduces the energy consumption to 60 per
cent of the original approach. The price we have to pay for
these achievement is the increased execution time. In our
implementation a polynomial multiplication takes 3 clock
cycles whereas the recursive Karatsuba approach needs
only one clock cycle. But considering area, energy and
calculation speed we are convinced that the benefits of our
approach outweigh its drawback.
Key words: Extended Galois fields, polynomial
multiplication, Elliptic Curve Cryptography, Karatsuba's
formula.
-
An Improved FPGA Implementation of the Modified Hybrid Hiding Encryption Algorithm
(MHHEA) for Data Communication Security [p. 76]
-
H. Farouk and M. Saeb
The hybrid hiding encryption algorithm, as its name
implies, embraces concepts from both steganography and
cryptography. In this exertion, an improved microarchitecture
Field Programmable Gate Array (FPGA)
implementation of this algorithm is presented. This design
overcomes the observed limitations of a previously-designed
micro-architecture. These observed limitations
are: no exploitation of the possibility of parallel bit
replacement, and the fact that the input plaintext was
encrypted serially, which caused a dependency between
the throughput and the nature of the used secret key. This
dependency can be viewed by some as vulnerability in the
security of the implemented micro-architecture. The
proposed modified micro-architecture is constructed using
five basic modules. These modules are; the message
cache, the message alignment module, the key cache, the
comparator, and at last the encryption module. In this
work, we provide comprehensive simulation and
implementation results. These are: the timing diagrams,
the post-implementation timing and routing reports, and
finally the floor plan. Moreover, a detailed comparison
with other FPGA implementations is made available and
discussed.
Keywords: FPGA, micro-architecture, data
communication security, encryption, steganography,
cryptography, algorithm.
-
FPGA Based Agile Algorithm-on-Demand Co-Processor [p. 82]
-
R. Pradeep, S. Vinay, S. Burman, and V. Kamakoti
With growing computational needs of many real-world
applications, frequently changing specifications of standards,
and the high design and NRE costs of ASICs, an
algorithm-agile FPGA based co-processor has become a viable
alternative. In this article, we report about the general
design of an algorith-agile co-processor and the proof-of-concept
implementation.
Organisers: W. Wolf, Princeton U, US; A. Jerraya, TIMA Laboratory, FR
Moderator: W. Wolf, Princeton U, US
Speakers: W. Wolf, Princeton U, US; R. Chesson, STMicroelectronics, FR; E. Flamand, STMicroelectronics, FR
-
Multimedia Applications of Multiprocessor Systems-on-Chips [p. 86]
-
W. Wolf
This paper surveys the characteristics of multimedia systems.
Multimedia applications today are dominated by
compression and decompression, but multimedia devices
must also implement many other functions such as security
and file management. We introduce some basic concepts of
multimedia algorithms and the larger set of functions that
multimedia systems-on-chips must implement.
Organiser: T. Simunic, UC San Diego, US
Moderator: M. Renaudin, TIMA Laboratory, FR
Speakers: K. Holt, Intel Corp, US; A. Chandrakasan, Massachusetts Institute of Technology, US;
T. Simunic, UC San Diego, US
-
Wireless LAN: Past, Present, and Future [p. 92]
-
K. Holt
This paper retraces the historical development of wireless
LAN technology in the context of the pursuit of ever higher
data rate, describes the significant technical
breakthroughs that are now occurring, and speculates on
future directions that the technology may take over the
remainder of the decade. The challenges that these
developments have created for low power operation are
considered, as well as some of the opportunities that are
presented to mitigate them. The importance of MIMO as
an emerging technology for 802.11 is specifically
highlighted, both in terms of the significant increase in
data rate and range that it enables as well as the
considerable challenge that it presents for the
development of low power wireless LAN products.
-
Direct Conversion Pulsed UWB Transceiver Architecture [p. 94]
-
R. Blázquez, F. Lee, D. Wentzloff, B. Ginsburg, J. Powell, and A. Chandrakasan
Ultra-wideband (UWB) communication is an emerging
wireless technology that promises high data rates over
short distances and precise locationing. The large
available bandwidth and the constraint of a maximum
power spectral density drives a unique set of system
challenges. This paper addresses these challenges using
two UWB transceivers and a discrete prototype platform.
-
Power Saving Techniques for Wireless LANs [p. 96]
-
T. Simunic
Fast wireless access has rapidly become common-place.
Wireless access points and Hotspot servers are
sprouting everywhere. Battery lifetime continues to be
a critical issue in mobile computing. This paper first
gives an overview of WLAN energy saving strategies,
followed by an illustration of a system-level
methodology for saving power in heterogeneous
wireless environments.
Moderators: K. Torki, CMP, FR; C. Das, IMEC, BE
-
A Synthesizable IP Core for DVB-S2 LDPC Code Decoding [p. 100]
-
F. Kienle, T. Brack, and N. Wehn
The new standard for digital video broadcast DVB-S2
features Low-Density Parity-Check (LDPC) codes as their
channel coding scheme. The codes are defined for various
code rates with a block size of 64800 which allows a transmission
close to the theoretical limits.
The decoding of LDPC is an iterative process. For DVBS2
about 300000 messages are processed and reordered
in each of the 30 iterations. These huge data processing
and storage requirements are a real challenge for the decoder
hardware realization, which has to fulfill the specified
throughput of 255MBit/s for base station applications.
In this paper we will show, to the best of our knowledge,
the first published IP LDPC decoder core for the DVB-S2
standard. We present a synthesizable IP block based on ST
Microelectronics 0:13μm CMOS technology.
-
picoArray Technology: The Tool's Story [p. 106]
-
A. Duller, D. Towner, G. Panesar, A. Gray, and W. Robbins
This paper briefly describes the picoArrayTM architecture,
and in particular the deterministic internal communication
fabric. The methods that have been developed for
debugging and verifying systems using devices from the
picoArray family are explained. In order to maximize the
computational ability of these devices, hardware debugging
support has been kept to a minimum and the methods and
tools developed to take this into account.
-
Queue Management in Network Processors [p. 112]
-
I. Papaefstathiou, G. Kornaros, T. Orphanoudakis, C. Kachris, I. Mavroidis, and A. Nikologiannis
One of the main bottlenecks when designing a network processing system is very often its memory
subsystem. This is mainly due to the state-of-the-art network links operating at very high speeds and to the fact that in
order to support advanced Quality of Service (QoS), a large number of independent queues is desirable. In this paper we
analyze the performance bottlenecks of various data memory managers integrated in typical Network Processing Units
(NPUs). We expose the performance limitations of software implementations utilizing the RISC processing cores
typically found in most NPU architectures and we identify the requirements for hardware assisted memory management
in order to achieve wire-speed operation at gigabit per second rates. Furthermore, we describe the architecture and
performance of a hardware memory manager that fulfills those requirements. This memory manager, although it is
implemented in a reconfigurable technology, it can provide up to 6.2Gbps of aggregate throughput, while handling 32K
independent queues.
KeyWords: - Network processor, memory management, queue management
-
System Level Analysis of the Bluetooth Standard [p. 118]
-
M. Conti and D. Moretti
The SystemC modules of the Link Manager Layer and
Baseband Layer have been designed in this work at
behavioral level to analyze the performances of the
Bluetooth standard. In particular the probability of the
creation of a piconet in presence of noise in the channel
and the power reduction using the sniff and hold mode
have been investigated.
-
C Based Hardware Design for Wireless Applications [p. 124]
-
A. Takach, B. Bowyer, and T. Bollaert
The algorithms used in wireless applications are
increasingly more sophisticated and consequently more
challenging to implement in hardware. Traditional design
flows require developing the micro architecture, coding
the RTL, and verifying the generated RTL against the
original functional C or MATLAB specification. This
paper describes a C-based design flow that is well suited
for the hardware implementation of DSP algorithms
commonly found in wireless applications. The C design
flow relies on guided synthesis to generate the RTL
directly from the untimed C algorithm.
The specifics of the C-based design flow are described
using a simple DSP filtering algorithm consisting of a
forward adaptive equalizer, a 64-QAM slicer and an
adaptive decision feedback equalizer. The example
illustrates some of the capabilities and advantages offered
by this flow.
-
Hardware Accelerated Collision Detection . An Architecture and Simulation Results [p. 130]
-
A. Raabe, B. Bartyzel, J. Anlauf, and G. Zachmann
We present a hardware architecture for a single-chip acceleration
of an efficient hierarchical collision detection algorithm
as well as simulation results for collision queries
using this architecture. The architecture consists of two
main stages, one for traversing simultaneously a hierarchy
of discretely oriented polytopes, and one for intersecting
triangles. Within each stage, the architecture is deeply
pipelined and parallelized. For the first stage, we compare
and evaluate different traversal schemes for bounding volume
hierarchies.
A simulation in VHDL shows that a hardware implementation
can offer a speed-up over a software implementation
by orders of magnitude. Thus, real-time collision detection
of complex objects at rates required by force-feedback and
physically-based simulations can be achieved.
-
Modeling of a Reconfigurable OFDM IP Block Family for an RF System Simulator [p. 136]
-
J. Liedes and H. Heusala
The idea of design domain specific Mother Model of IP block family as a base of modeling of system integration is presented here. A common reconfigurable Mother Model for ten different standardized digital OFDM transmitters has been developed. By means of a set of parameters, the mother model can be reconfigured to any of the ten selected standards. So far the applicability of the proposed reconfiguration and analog-digital co-modeling methods have been proved by modeling the function of the digital parts of three, 802.11a, ADSL and DRM, transmitters in an RF system simulator. The model is intended to be used as signal source template in RF system simulations. The concept is not restricted to signal sources, it can be applied to any IP block development.
The idea of the Mother Model will be applied in other design domains to prove that in certain application areas, OFDM transceivers in this case, the design process can progress simultaneously in different design domains - mixed signal, system and RTL-architectural. without the need of high-level synthesis. Only the Mother Models of three design domains are needed to be formally proved to function as specified.
-
Fast and Accurate Transaction Level Modeling of an Extended AMBA2.0 Bus Architecture [p. 138]
-
Y.-T. Kim, T. Kim, Y. Kim, C. Shin, E.-Y. Chung, K.-M. Choi, J.-T. Kong, S.-K. Eo
Transaction Level Modeling (TLM) approach is used to meet
the simulation speed as well as cycle accuracy for large scale
SoC performance analysis. We implemented a transaction-level
model of a proprietary bus called AHB+ which supports an
extended AMBA2.0 protocol. The AHB+ transaction-level
model shows 353 times faster than pin-accurate RTL model
while maintaining 97% of accuracy on average. We also
present the development procedure of TLM of a bus
architecture.
Moderators: A. Kirschbaum, Continental Teves AG & Co, DE; J. Gerlach, Robert Bosch GmbH, DE
-
Meeting the Embedded Design Needs of Automotive Applications [p. 142]
-
W. Lyons
The importance of embedded systems in driving
innovation in automotive applications continues to grow.
Understanding the specific needs of developers targeting
this market is also helping to drive innovation in RISC
core design. This paper describes how a RISC instruction
set architecture has evolved to better meet those needs,
and the key implementation features in two very different
RISC cores are used to demonstrate the challenges of
designing for real-time automotive systems.
-
Debug Support, Calibration and Emulation for Multiple Processor and Powertrain Control SoCs [p. 148]
-
A. Mayer, H. Siebert, and K. McDonald-Maier
The introduction of complex SoCs with multiple
processor cores presents new development challenges,
such that development support is now a decisive factor
when choosing a System-on-Chip (SoC). The presented
developments support strategy addresses the challenges
using both architecture and technology approaches. The
Multi-Core Debug Support (MCDS) architecture provides
flexible triggering using cross triggers and a multiple
core break and suspend switch. Temporal trace ordering
is guaranteed down to cycle level by on-chip time
stamping. The Package Sized-ICE (PSI) approach is a
novel method of including trace buffers, overlay
memories, processing resources and communication
interfaces without changing device behavior. PSI requires
no external emulation box, as the debug host interfaces
directly with the SoC using a standard interface.
-
The Integration of On-Line Monitoring and Reconfiguration Functions Using IEEE1149.4 into a
Safety Critical Automotive Electronic Control Unit [p. 153]
-
C. Jeffrey, R. Cutajar, A. Richardson, S. Prosser, M. Lickess, and S. Riches
This paper presents an innovative application of IEEE
1149.4 and the Integrated Diagnostic Reconfiguration
(IDR) as tools for the implementation of an embedded test
solution for an Automotive Electronic Control Unit
implemented as a fully integrated mixed signal system.
The paper described how the test architecture can be used
for fault avoidance with results from a hardware prototype
presented. The paper concludes that fault avoidance can be
integrated into mixed signal electronic systems to handle
key failure modes.
-
LC Oscillator Driver for Safety Critical Applications [p. 159]
-
P. Horsky
A CMOS harmonic signal LC oscillator driver for
automotive applications working in a harsh environment
with high safety critical requirements is described. The
driver can be used with a wide range of external
components parameters (LC resonance network of a
sensor). Quality factor of the external LC network can vary
two decades. Amplitude regulation of the driver is digitally
controlled and the DAC is constructed as exponential with
piece-wise-linear (PWL) approximation. Low current
consumption for high quality resonance networks is
achieved. Realized oscillator is robust, used in safety
critical application and has low EMC emissions.
-
Context Sensitive Performance Analysis of Automotive Applications [p. 165]
-
J. Staschulat, R. Ernst, A. Schulze, and F. Wolf
Accurate timing analysis is key to efficient embedded system
synthesis and integration. While industrial control software
systems are developed using graphical models, such
as Matlab/Simulink or ASCET/SD, exhaustive simulation is
not suitable for verifying functional and timing behavior.
Formal performance analysis is an alternative but can lead
to wide timing intervals because of input data dependency
and complex target architectures. Hence a designer might
want to restrict the formal performance analysis to parts of
the software system, called context or process modes.
In this paper, we describe how to define and characterize
such context information from graphical models. Further,
we extend the formal performance analysis to consider contexts.
Results from an automotive application demonstrate
the applicability of our approach.
-
AutoMoDe - Model-Based Development of Automotive Software [p. 171]
-
D. Ziegenbein, U. Freund, P. Braun, A. Bauer, J. Romberg, and B. Schätz
This paper describes first results from the AutoMoDe
(Automotive Model-Based Development) project. The
overall goal of the project is to develop an integrated
methodology for model-based development of automotive
control software, based on problem-specific design
notations with an explicit formal foundation. Based on the
existing AutoFOCUS framework [1], a tool prototype is
being developed in order to illustrate and validate the key
elements of our approach.
-
SystemC Analysis of a New Dynamic Power Management Architecture [p. 177]
-
M. Conti
This paper presents a new dynamic power
management architecture of a System on Chip. The Power
State Machine describing the status of the core follows the
recommendations of the ACPI standard. The algorithm
controls the power states of each block on the basis of
battery status, chip temperature and a user defined task
priority.
Moderators: R. Zafalon, STMicroelectronics, IT; W. Ecker, Infineon Technologies, DE
-
Exploiting Real-Time FPGA Based Adaptive Systems Technology for Real-Time Sensor Fusion in
Next Generation Automotive Safety Systems [p. 180]
-
S. Chappell, A. Macarthur, D. Preston, D. Olmstead, B. Flint, and C. Sullivan
We present a system for the boresighting of sensors
using inertial measurement devices as the basis for
developing a range of dynamic real-time sensor fusion
applications. The proof of concept utilizes a COTS
FPGA platform for sensor fusion and real-time
correction of a misaligned video sensor. We exploit a
custom-designed 32-bit soft processor core and C-based
design & synthesis for rapid, platform-neutral
development. Kalman filter and sensor fusion
techniques established in advanced aviation systems are
applied to automotive vehicles with results exceeding
typical industry requirements for sensor alignment.
Results of the static and the dynamic tests demonstrate
that using inexpensive accelerometers mounted on (or
during assembly of) a sensor and an Inertial
Measurement Unit (IMU) fixed to a vehicle can be used
to compute the misalignment of the sensor to the IMU
and thus vehicle. In some cases the model predications
and test results exceeded the requirements by an order
of magnitude with a 3-sigma or 99% confidence.
-
Platform Based Design for Automotive Sensor Conditioning [p. 186]
-
L. Fanucci, A. Giambastiani, A. Rocchi, F. Iozzi, and C. Marino
In this paper a general architecture suitable to
interface several kinds of sensors for automotive
applications is presented. A platform based design
approach is pursued to improve system performance
while minimizing time-to-market.. The platform is
composed by an analog front-end and a digital section.
The latter is based on a microcontroller core (8051 IP by
Oregano) plus a set of dedicated hardware dedicated to
the complex signal processing required for sensor
conditioning. The microcontroller handles also the
communication with external devices (as a PC) for data
output and fast prototyping. A case study is presented
concerning the conditioning of a Gyro yaw rate sensor
for automotive applications. Measured performance
results outperform current state-of-the-art commercial
devices.
-
Realization of a Virtual Lambda Sensor on a Fixed Precision System [p. 192]
-
P. Amato, N. Cesario, M. Di Meglio, and F. Pirozzi
The aim of this work is to study the implementation feasibility
of a VLS (Virtual Lambda Sensor) by a TSK (Takagi,
Sugeno, Kang) singleton FIS (Fuzzy Inference System).
Such a sensor could be used in a model based EMS (Engine
Management System) for trade gasoline engines. FIS
design target is to obtain a system with a fixed data representation
(i.e. 10 bit) and a limited number of inputs, outputs,
rules and membership.
-
Hardware-Software Design of a Smart Sensor for Fully-Electronic DNA Hybridization Detection [p. 198]
-
C. Stagni, C. Guiducci, M. Lanzoni, L. Benini, and B. Ricc&oagrave;
This paper describes the design of a smart sensor for
label-free detection of DNA hybridization. The sensor is
based on a direct electrical transduction principle: it measures
impedance variation at the interface between a biofunctionalized
electrode and a solution containing the analyte.
The smart sensor includes a complete signal conditioning
and processing subsystem based on an embedded
μ-controller.We outline the sensor architecture, and we describe
in details board-level integration as well as hardware
and software implementation and design choices. The
accuracy of our embedded solution has been evaluated by
comparing it with a high-cost laboratory setup. Moreover,
we provide measurements of real sensing structures which
demonstrate in field the functionality of our system.
-
A Tool and Methodology for AC-Stability Analysis of Continuous-Time Closed-Loop Systems [p. 204]
-
M. Milev and R. Burt
Presented are a methodology and a DFII-based
tool for AC-stability analysis of a wide variety of closed-loop
continuous-time (operational amplifiers and other linear
circuits). The methodology used allows for easy identification
and diagnostics of ac-stability problems including not only
main-loop effects but also local-instability loops in current
mirrors, bias circuits and emitter or source followers without
breaking the loop. The results of the analysis are easy to
interpret. Estimated phase margin is readily available.
Instability nodes and loops along with their respective
oscillation frequencies are immediately identified and mapped
to the existing circuit nodes thus offering significant
advantages compared to traditional "black-box" methods of
stability analysis (Transient Overshoot, Bode and Phase
margin plots etc.). The tool for AC-Stability analysis is written
in SKILLTM and is fully integrated in DFIITM environment. Its
"push-button" graphical user interface (GUI) is easy to use
and understand. The tool can be invoked directly from
ComposerTM schematic and does not require active Analog
ArtistTM session. The tool is not dependent on the use of a
specific fabrication technology or Process Design Kit
customization. It requires OCEANTM, SpectreTM and
Waveform calculator capabilities to run.
Index Terms - AC stability, small-signal circuit stability,
frequency instability, closed loop system stability.
Moderators: B. Courtois, TIMA Laboratory, FR; G. Gielen, KU Leuven, BE
-
A CMOS-Based Tactile Sensor for Continuous Blood Pressure Monitoring [p. 210]
-
K.-U. Kirstein, J. Sedivy, T. Salo, C. Hagleitner, T. Vancura, and A. Hierlemann
A monolithic integrated tactile sensor array is
presented, which is used to perform non-invasive blood
pressure monitoring of a patient. The advantage of this
device compared to a hand cuff based approach is the
capability of recording continuous blood pressure data.
The capacitive, membrane-based sensor device is
fabricated in an industrial CMOS-technology combined
with post-CMOS micromachining. The capacitance change
is detected by a -modulator. The modulator is operated
at a sampling rate of 128kS/s and achieves a resolution of
12bit with an external decimation filter and an OSR of 128.
-
Optical Receiver IC for CD/DVD/Blue-Laser Application [p. 215]
-
J. Sturm, M. Leifhelm, H. Schatzmayr, S. Groiss, and H. Zimmermann
In this paper an optoelectronic receiver IC for optical
data storage applications is presented. The IC was developed
in a 0.5 μm BiCMOS technology with integrated
PIN-photodiodes. It includes a new architecture of high-speed
and low-noise trans-impedance amplifiers with a
gain range of 130Ω to 270kΩ. programmable with a serial
interface. The bandwidth is 260MHz for highest gain
which gives a gain-bandwidth-product of 70 THzΩ and a
sensitivity improvement by a factor of 2 compared to published
OEICs. The amplifiers support a special write/clip
mode. The output buffers are 130Ω impedance matched
for optimized data transmission over a flex cable.
-
A 97mW 110 MS/s 12b Pipeline ADC Implemented in 0.18μm Digital CMOS [p. 219]
-
T. Andersen, A. Briskemyr, F. Telstø, J. Bjørnsen, T. Bonnerud, B. Hernes, and Ø. Moldsvor
A 12 bit Pipeline ADC fabricated in a 0.18 μm
pure digital CMOS technology is presented. Its nominal
conversion rate is 110MS/s and the nominal supply
voltage is 1.8V. The effective number of bits is 10.4 when
a 10MHz input signal with 2VP-P signal swing is applied.
The occupied silicon area is 0.86mm2 and the power
consumption equals 97mW. A switched capacitor bias
current circuit scale the bias current automatically with
the conversion rate, which gives scaleable power
consumption and full performance of the ADC from 20 to
140MS/s.
-
A 6bit, 1.2GSps Low-Power Flash-ADC in 0.13μm Digital CMOS [p. 223]
-
C. Sandner, M. Clara, A. Santner, T. Hartig, and F. Kuttner
A 6bit flash-ADC with 1.2GSps, wide analog
bandwidth and low power, realized in a standard digital
0.13 *mu;m CMOS copper technology is presented.
Employing capacitive interpolation gives various
advantages when designing for low power: no need for a
reference resistor ladder, implicit sample-and-hold
operation, no edge effects in the interpolation network
(as compared to resistive interpolation), and a very low
input capacitance of only 400fF, which leads to an easily
drivable analog converter interface.
Operating at 1.2GSps the ADC achieves an effective
resolution bandwidth (ERBW) of 700MHz, while consuming
160mW of power. At 600MSps we achieve an
ERBW of 600MHz with only 90mW power consumption,
both from a 1.5V supply. This corresponds to outstanding
Figure-of-Merit numbers (FoM) of 2.2 and
1.5pJ/convstep, respectively. The module area is
0.12mm2.
Moderators: T. Kean, Algotronix, UK; P. Pezzati, Cadence, FR
-
Testing Logic Cores Using a BIST P1500 Compliant Approach: A Case of Study [p. 228]
-
P. Bernardi, G. Masera, F. Quaglio, and M. Sonza Reorda
In this paper we describe how we applied a BIST-based approach
to the test of a logic core to be included in System-on-a-chip
(SoC) environments. The approach advantages are the
ability to protect the core IP, the simple test interface (thanks
also to the adoption of the P1500 standard), the possibility to run
the test at-speed, the reduced test time, and the good diagnostic
capabilities. The paper reports figures about the achieved fault
coverage, the required area overhead, and the performance slowdown,
and compares the figures with those for alternative approaches,
such as those based on full scan and sequential ATPG.
-
MultiNoC: A Multiprocessing System Enabled by a Network on Chip [p. 234]
-
A. Mello, L. Möller, N. Calazans, and F. Moraes
The MultiNoC system implements a programmable on-chip
multiprocessing platform built on top of an efficient,
low area overhead intra-chip interconnection scheme. The
employed interconnection structure is a Network on Chip,
or NoC. NoCs are emerging as a viable alternative to
increasing demands on interconnection architectures, due
to the following characteristics: (i) energy efficiency and
reliability; (ii) scalability of bandwidth, when compared
to traditional bus architectures; (iii) reusability; (iv) distributed
routing decisions. An external host computer
feeds MultiNoC with application instructions and data.
After this initialization procedure, MultiNoC executes
some algorithm. After finishing execution of the algorithm,
output data can be read back by the host. Sequential
or parallel algorithms conveniently adapted to the
MultiNoC structure can be executed. The main motivation
to propose this design is to enable the investigation of
current trends to increase the number of embedded processors
in SoCs, leading to the concept of "sea of processors"
systems.
-
Using Mobilize Power Management IP for Dynamic and Static Power Reduction in SoC at 130nm [p. 240]
-
D. Hillman
At 130 nm and 90 nm, power consumption (both dynamic
and static) has become a barrier in the roadmap for SoC
designs targeting battery powered, mobile applications.
This paper presents the results of dynamic and static
power reduction achieved implementing Tensilica's 32-bit
Xtensa microprocessor core, using Virtual Silicon's
Power Management IP. Independent voltage islands are
created using Virtual Silicon's VIP PowerSaver standard
cells by using voltage level shifting cells and voltage
isolation cells to implement power islands. The VIP
PowerSaver standard cells are characterized at 1.2V,
1.0V and 0.8V, to accommodate voltage scaling. Power
islands can also be turned off completely. Designers can
significantly lower both the dynamic power and the
quiescent or leakage power of their SoC designs, with
very little impact on speed or area using Virtual Silicon's
VIP Gate Bias standard cells.
-
A Partitioning Methodology for Accelerating Applications in Hybrid Reconfigurable Platforms [p. 247]
-
M. Galanis, A. Milidonis, C. Goutis, G. Theodoridis, and D. Soudris
In this paper, we propose a methodology for partitioning
and mapping computational intensive applications in
reconfigurable hardware blocks of different granularity. A
generic hybrid reconfigurable architecture is considered so
as the methodology can be applicable to a large number of
heterogeneous reconfigurable platforms. The methodology
mainly consists of two stages, the analysis and the mapping
of the application onto fine and coarse-grain hardware
resources. A prototype framework consisting of analysis,
partitioning and mapping tools has been also developed.
For the coarse-grain reconfigurable hardware, we use our
previous-developed high-performance coarse-grain datapath.
In this work, the methodology is validated using two
real-world applications, an OFDM transmitter and a JPEG
encoder. In the case of the OFDM transmitter, a maximum
clock cycles decrease of 82% relative to the ones in an all
fine-grain mapping solution is achieved. The corresponding
performance improvement for the JPEG is 43%.
-
Evaluation of SystemC Modelling of Reconfigurable Embedded Systems [p. 253]
-
T. Rissa, W. Luk, and A. Donlin
This paper evaluates the use of pin and cycle accurate
SystemC models for embedded system design exploration
and early software development. The target system is MicroBlaze
VanillaNet Platform running MicroBlaze uClinux
operating system. The paper compares Register Transfer
Level (RTL) Hardware Description Language (HDL) simulation
speed to the simulation speed of several different
SystemC models. It is shown that simulation speed of pin
and cycle accurate models can go up to 150 kHz, compared
to 100 Hz range of HDL simulation. Furthermore, utilising
techniques that temporarily compromise cycle accuracy, effective
simulation speed of up to 500 kHz can be obtained.
-
Hardware Support for QoS-Based Function Allocation in Reconfigurable Systems [p. 259]
-
M. Ullmann, W. Jin, and J. Becker
This contribution presents a new approach for
allocating suitable function-implementation variants
depending on given quality-of-service function-requirements
for run-time reconfigurable multi-device
systems. Our approach adapts methodologies from the
domain of knowledge-based systems which can be used
for doing run-time hardware/software resource usage
optimizations.
Keywords: CBR, Algorithm, Resource Management
Moderators: F. Fummi, Verona U, IT; W. Matzke, Cadence, DE
-
An Integrated Design and Verification Methodology for Reconfigurable Multimedia Systems [p. 266]
-
M. Borgatti, A. Capello, U. Rossi, F. Fummi, G. Pravadelli, J.-L. Lambert, and I. Moussa
Recently a lot of multimedia applications are emerging on
portable appliances. They require both the flexibility of
upgradeable devices (traditionally software based) and a
powerful computing engine (typically hardware). In this
context, programmable HW and dynamic reconfiguration
allow novel approaches to the migration of algorithms
from SW to HW. Thus, in the frame of the Symbad project,
we propose an industrial design flow for reconfigurable
SoC's. The goal of Symbad consists of developing a
system level design platform for hardware and software
SoC systems including formal and semi-formal
verification techniques.
-
Common Reusable Verification Environment for BCA and RTL Models [p. 272]
-
G. Falconeri, W. Naifer, and N. Romdhane
This paper deals with a common verification
methodology and environment for SystemC BCA and RTL
models. The aim is to save effort by avoiding the same work
done twice by different people and to reuse the same
environment for the two design views. Applying this
methodology the verification task starts as soon as the
functional specification is signed off and it runs in parallel
to the models and design development. The verification
environment is modeled with the aid of dedicated
verification languages and it is applied to both the models.
The test suite is exactly the same and thus it's possible to
verify the alignment between the two models. In fact the
final step is to check the cycle-by-cycle match of the
interface behavior. A regression tool and a bus analyzer
have been developed to help the verification and the
alignment process. The former is used to automate the
testbench generation and to run the two test suites. The
latter is used to verify the alignment between the two
models comparing the waveforms obtained in each run.
The quality metrics used to validate the flow are full
functional coverage and full alignment at each IP port.
-
An Assembler Driven Verification Methodology (ADVM) [p. 278]
-
J. MacBeth, K. Gray, and D. Heinz
This paper presents an overview of an assembler driven
verification methodology (ADVM) that was created and implemented
for a chip card project at Infineon Technologies
AG [2]. The primary advantage of this methodology is that
it enables rapid porting of directed tests to new targets
and derivatives, with only a minimum amount of code refactoring.
As a consequence, considerable verification development
time and effort was saved.
-
A Formal Verification Methodology for Checking Data Integrity [p. 284]
-
Y. Umezawa and T. Shimizu
Formal verification techniques have been playing an
important role in pre-silicon validation processes. One of
the most important points considered in performing
formal verification is to define good verification scopes;
we should define clearly what to be verified formally
upon designs under tests. We considered the following
three practical requirements when we defined the scope
of formal verification. They are (a) hard to verify (b)
small to handle, and (c) easy to understand.
Our novel approach is to break down generic
properties for system into stereotype properties in block
level and to define requirements for Verifiable RTL.
Consequently, each designer instead of verification
experts can describe properties of the design easily, and
formal model checking can be applied systematically and
thoroughly to all the leaf modules.
During the development of a component chip for
server platforms, we focused on RAS (Reliability,
Availability, and Serviceability) features and described
more than 2000 properties in PSL. As a result of the
formal verification, we found several critical logic bugs
in a short time with limited resources, and successfully
verified all of them. This paper presents a study of the
functional verification methodology.
-
On the Design and Verification Methodology of the Look-Aside Interface [p. 290]
-
A. Ahmed, A. Habibi, O. Mohamed, and S. Tahar
In this paper, we present a technique to design and verify the
Look-Aside (LA-1) Interface standard used in network processors.
Our design flow includes several refinements starting from an informal
UML specification until getting to an RTL-modeled in Verilog.
We integrate the verification of the LA-Interface in the design
flow by considering two intermediate levels: (1) Abstract State
Machines (ASM); and (2) SystemC. The first one serves the verification
by model checking of a set of PSL properties, while the
second includes a set of assertions to be verified by simulation.
To evaluate the performance of our approach, we used the Rule-Base
model checker to verify the same properties; and the OVL
library to verify the same assertions.
|