| |
DATE 2001 Abstracts
Sessions:
[Keynote]
[1A]
[1B]
[1C]
[1E]
[2A]
[2B]
[2C]
[2E]
[3A]
[3B]
[3C]
[3E]
[4A]
[4B]
[4C]
[4E]
[4F]
[5A]
[5B]
[5C]
[5E]
[5F]
[6A]
[6B]
[6C]
[6E]
[6F]
[7A]
[7B]
[7C]
[7E]
[7F]
[8A]
[8B]
[8C]
[8E]
[8F]
[9A]
[9B]
[9C]
[9E]
[9F]
[9L]
[10A]
[10B]
[10C]
[10E]
[10F]
[Posters]
Plenary -- Keynote Session
Moderator: A. Jerraya, TIMA, Grenoble, F
-
The Semiconductor Dynamic in the Information Age -- Driving New
Technologies, Trends and Markets
-
U. Schumacher, CEO, Infineon, Munich, D
Moderators: T. Kropf, Robert Bosch GmbH, D; H. Eveking, TU Darmstadt, D
-
Abstraction of Word-Level Linear Arithmetic Functions from Bit-Level Component
Descriptions [p. 4]
-
P. Dasgupta, P. Chakrabarti, A. Nandi, S. Krishna, and A. Chakrabarti
RTL descriptions for word-level arithmetic components
typically specify the architecture at the bit-level of the registers.
The problem studied in this paper is to abstract
the word-level functionality of a component from its bit-level
specification. This is particularly useful in simulation
since word-level descriptions can be simulated much faster
than bit-level descriptions. Word-level abstractions are also
useful for reducing the complexity of component matching,
since the number of words is significantly smaller than the
number of bits. This paper presents an algorithm for abstraction
of word-level linear functions from bit-level component
descriptions. We also present complexity results for
component matching which justifies the advantage of performing
abstraction prior to component matching.
-
Biasing Symbolic Search by Means of Dynamic Activity Profiles [p. 9]
-
G. Cabodi, P. Camurati, and S. Quer
We address BDD based reachability analysis, which is the core technique
of symbolic sequential verification and Model Checking.
Within this framework, non purely breadth-first and guided traversals
have shown their value to improve efficiency by reducing memory
consumption for BDD representation.
We propose a guided search strategy exploiting performance statistics.
These activity figures are gathered through a continuous and dynamic
learning process on a variable-by-variable basis. This technique is
completely integrated with the reachability analysis routine, as it is fully
compatible with dynamic reordering and allows multiple partial traversal
phases. We thus move away from the static and manual schemes, which
are one of the main limitations of previous approaches.
Experiments are given to demonstrate the efficiency and robustness
of the approach.
Moderators: W. Rosenstiel, FZI/Tuebingen U, D; E. Villar, Cantabria U, ES
-
A Methodology for Interfacing Open Source SystemC with a Third Party Software
[p. 16]
-
L. Charest, M. Reid, E. Aboulhamid, and G. Bois
SystemC is a new open source library in C++ for
developing cycle-accurate or more abstract models of
software algorithms, hardware architecture and system-level
designs. SystemC is meant to be an interoperable,
modeling platform allowing seamless tool integration.
Our objective is to evaluate the feasibility of linking a
third party software to SystemC without modifying the
SystemC source. We chose the development of a GUI as
such an application. This application illustrates a set of
applications following the observer pattern defined
recently in software engineering. This class of
applications can be loosely coupled to a platform
designed following specific rules of software reuse.
-
Behavioral Synthesis with SystemC [p. 21]
-
G. Economakos, P. Oikonomakos, I. Panagopoulos, I. Poulakis, and G.
Papakonstantinou
Having to cope with the continuously increasing complexity
of modern digital systems, hardware designers
are considering more and more seriously language based
methodologies for parts of their designs. Last year, the introduction
of a new language for hardware descriptions, the
SystemC C++ class library, initiated a closer relationship
between software and hardware descriptions and development
tools. This paper presents a synthesis environment and
the corresponding synthesis methodology, based on traditional
compiler generation techniques, which incorporate
SystemC, VHDL and Verilog to transform existing algorithmic
software models into hardware system implementations.
Following this approach, reusability of software
components is introduced in the hardware world and time-to-market
is decreased, as shown by experimental results.
-
SystemCSV -- An Extension of SystemC for Mixed Multi-Level
Communication Modeling and Interface-Based System Design [p. 26]
-
R. Siegmund and D. Müller
An extension of SystemC for mixed-multi level communication
modeling and Interface-based system design is proposed
in this paper. SystemC SV provides a new design unit,
the interface, which enables specification, design and verification
of system communication separately from system
functionality, thus introducing a new quality of system design
into SystemC. The concepts and computational model
of SystemC SV interfaces are presented together with a design
example, the digital part of a wireless SmartCard
transponder-reader/writer system.
Organizer: Y. Zorian, LogicVision, USA
Moderator: P. Prinetto, Politecnico di Torino, IT
Speakers: J. Teixeira, IST/INESC, PT; I. Teixeira, IST/INESC, PT; C. Pereira,
UFRGS, BR; O. Dias, IST/INESC, PT; J. Semiao, IST/INESC, PT; P. Muhmenthaler,
Infineon, D; Y. Zorian, LogicVision, USA; W. Radermacher, Agilent, USA
-
Test Resource Partitioning: A Design and Test Issue [p. 34]
Product development economics and specs drive
the need for on chip embedded test functionality.
However, optimal partitioning of test functionality
between a tester and a SOC is a non-trivial task,
which must be solved during the system analysis
phase. Hence, at system level, a trade-off analysis
must be performed, in order to evaluate the costs and
benefits of different partitioning schemes. The purpose
of this contribution is to present a methodology
and tools, using the Object Oriented (OO) Paradigm
and UML, and a set of architectural Quality Metrics
(QMs), to analyze the impact of different TRP
schemes on system's architecture. A 4-core SOC case
study is presented to guide the discussion.
Organizer and Moderator: P. van Staa, Robert Bosch GmbH, D
Speaker: T. Beck, ETAS GmbH, D
-
Current Trends in the Design of Automotive Electronic Systems [p. 38]
Future developments in the automotive industry will
be governed by a variety of different requirements. Our
vision of a modern vehicle includes comprehensive
safety, a high degree of comfort, low energy
consumption, and minimal pollutant emission. These
demands can only be accomplished by employing
interconnected intelligent electronic devices, capable of
processing and sharing information about the car, the
driver, the environment, and others sources of data. The
implementation of such features will be critical for the
manufacturer's success and puts a high pressure on the
development process itself and the hardware- and
software-tools used for every step in this process.
Moderators: G. Martin, Cadence, USA; R. Seepold, FZI, D
-
Component Selection and Matching for IP-Based Design [p. 40]
-
T. Zhang, L. Benini, and G. De Micheli
Intellectual Property (IP) reuse is one of the most promising
techniques addressing the design complexity problem. IP reuse assumes
that pre-designed components can be integrated into the design
under development, thereby reducing design complexity and
time. On the other hand, as the number of IP providers increases,
the selection of the best IP block for a given design becomes more
challenging and time-consuming. In this paper, we present an
IP component matching system targeting automatic component
searching and matching across the Internet. The system is based
on Extensible Markup Language (XML) specification both for IP
libraries (a repository of pre-designed IP components indexed by
their corresponding specifications) and an IP user queries (specifications
with incomplete/uncertain attributes). An IP query is
parsed into a document object model (DOM) and the DOM is
transformed to an internal tree-structured model. Fuzzy logic
scoring and aggregation algorithms are applied to the internal
tree structure to provide a set of candidate approximate matches
ranked by proximity between the query and IP specification.
-
A Universal Communication Model for an Automotive System Integration Platform [p. 47]
-
T. Demmeler and P. Giusto
In this paper, we present a virtual integration platform
based design methodology for distributed automotive systems.
The platform, built within the 'Virtual Component
Co-Design' tool (VCC), provides the ability of distributing
a given system functionality over an architecture so as
to validate different solutions in terms of cost, safety requirements,
and real-time constraints. The virtual platform
constitutes the foundation for design decisions early
in the development phase, therefore enabling decisive and
competitive advantages in the development process. This
paper focuses on one of the key-enablers of the methodology,
the Universal Communication Model (UCM). The
UCM is defined at a level of abstraction that allows accurate
estimates of the performance including the latencies
over the bus network, and good simulation performance.
In addition, due to the high level of reusability and parameterization
of its components, it can be used as a
framework for modeling the different communication protocols
common in the automotive domain.
-
An Efficient Architecture Model for Systematic Design of Application-Specific
Multiprocessor SoC [p. 55]
-
A. Baghdadi, D. Lyonnard, N. Zergainoh, and A. Jerraya
In this paper, we present a novel approach for the
design of application specific multiprocessor systems-on-chip.
Our approach is based on a generic architecture
model which is used as a template throughout the design
process. The key characteristics of this model are its great
modularity, flexibility and scalability which make it
reusable for a large class of applications. In addition, it
allows to accelerate the design cycle. This paper focuses on
the definition of the architecture model and the systematic
design flow that can be automated. The feasibility and
effectiveness of this approach are illustrated by two
significant demonstration examples.
Moderators: N. Fristacky, Slovak TU, SLK; F. Rammig, C-LAB/Paderborn U, D
-
The Simulation Semantics of SystemC [p. 64]
-
J. Ruf, D. Hoffmann, J. Gerlach, T. Kropf, W. Rosenstiehl, and W. Mueller
We present a rigorous but transparent semantics definition
of SystemC that covers method, thread, and clocked
thread behavior as well as their interaction with the simulation
kernel process. The semantics includes watching
statements, signal assignment, and wait statements as they
are introduced in SystemC V1.0. We present our definition
in form of distributed Abstract State Machines (ASMs) rules
reflecting the view given in the SystemC User's Manual and
the reference implementation. We mainly see our formal
semantics as a concise, unambiguous, high "level specification
for SystemC" based implementations and for standardization.
Additionally, it can be used as a sound basis to investigate
SystemC interoperability with Verilog and VHDL.
-
MetaRTL: Raising the Abstraction Level of RTL Design [p. 71]
-
J. Zhu
The register transfer abstraction (RTL) has been established
as the industrial standard for ASIC design, soft IP exchange
and the backend interface for chip design at higher
level. Unfortunately, the "synthesizable" VHDL/Verilog incarnation
of the RTL abstraction has problems which prevent
it from more productive use. For example, the confusion
as the result of using simulation semantics for synthesis
purpose, the lack of facility for component reuse at the
"protocol" level, and the lack of memory abstraction. After
a detailed discussion of these problems, this paper proposes
a new RTL abstraction, called MetaRTL, which can
be implemented by a modest extension to the traditional imperative
programming languages. The productivity gain is
further demonstrated by the description of a synthesis tool,
called MetaSyn, which provides the "added-value". Experiments
on the benchmark set show that MetaRTL is far more
concise than the "synthesizable" HDL specification, and incurs
no overhead for synthesis result.
-
A Model for Describing Communication between Aggregate Objects in the
Specification and Design of Embedded Systems [p. 77]
-
K. Svarstad, G. Nicolescu, and A. Jerraya
The elevation of design description abstractions is a well
accepted technique for handling the complexity and shortening
the design time of modern embedded systems. It is
shown that abstractions for communication are as important
as for behaviour for specification and system level abstractions,
and an extension on a novel higher level communication
mechanism which has features for supporting
the description of complex aggregate associations between
objects in specifications such as UML is investigated. The
communication primitives have been implemented as extensions
to SystemC, and a comprehensive example from a UML
specification through functional specification down to an
executable SystemC decription is included.
Moderators: P. Harrod, ARM, UK; B. Becker, Freiburg U, D
-
Circuit Partitioning for Efficient Logic BIST Synthesis [p. 86]
-
A. Irion, G. Kiefer, H. Vranken, and H. Wunderlich
A divide-and-conquer approach using circuit
partitioning is presented, which can be used to
accelerate logic BIST synthesis procedures. Many
BIST synthesis algorithms contain steps with a time
complexity which increases more than linearly with the
circuit size. By extracting sub-circuits which are
almost constant in size, BIST synthesis for very large
designs may be possible within linear time. The
partitioning approach does not require any physical
modifications of the circuit under test. Experiments
show that significant performance improvements can
be obtained at the cost of a longer test application time
or a slight increase in silicon area for the BIST
hardware.
Keywords: circuit partitioning, deterministic BIST,
divide-and-conquer
-
Deterministic Software -Based Self-Testing of Embedded Processor Cores [p. 92]
-
A. Paschalis, D. Gizopoulos, N. Kranitis, M. Psarakis, and Y. Zorian
A deterministic software-based self-testing methodology
for processor cores is introduced that efficiently tests the
processor datapath modules without any modification of
the processor structure. It provides a guaranteed high
fault coverage without repetitive fault simulation
experiments which is necessary in pseudorandom
software-based processor self-testing approaches. Test
generation and output analysis are performed by utilizing
the processor functional modules like accumulators
(arithmetic part of ALU) and shifters (if they exist)
through processor instructions. No extra hardware is
required and there is no performance degradation.
-
Memory Fault Diagnosis by Syndrome Compression [p. 97]
-
J. Li and C. Wu
In this paper we present a data compression technique
that can be used to speed up the transmission of diagnosis
data from the embedded RAM with built-in self-diagnosis
(BISD) support. The proposed approach compresses the
faulty-cell address and March syndrome to about 28% of
the original size under the March-17N diagnostic test algorithm.
The key component of the compressor is a novel
syndrome-accumulation circuit, which can be realized by
a content-addressable memory. Experimental results show
that the area overhead is about 0.9% for a 1Mb SRAM with
164 faults. The proposed compression technique reduces
the time for diagnostic test, as well as the tester storage capacity
requirement.
-
Diagnosis for Scan-Based BIST: Reaching Deep into the Signatures [p. 102]
-
I. Bayraktaroglu and A. Orailoglu
For partitioning-based diagnosis in a scan-based BIST
environment, an exact analysis scheme, capable of identifying
all scan cells that receive incorrect data, is proposed.
In contrast to previously suggested approaches, the scheme
we propose identifies all failing scan cells with no ambiguity
whatsoever. Not only do we resolve failing scan cells
unambiguously, but we do so at the earliest possible instance
through reexamination of already computed signatures.
Intensive utilization of this highly precise diagnostic
state information leads to prognostic information regarding
the usefulness of running upcoming tests which in turn leads
to reductions in diagnosis time in excess of 30% compared
to previous approaches.
Organizer: P. van Staa, Robert Bosch GmbH, D
Moderator: S. Reiniger, DaimlerChrysler, D
-
Vehicle Electric/Electronic Architecture -- One of the Most Important
Challenges for OEM's [p. 112]
-
G. Hettich and T. Thurner
One of the most important challenge of a vehicle
manufacturer is the management of the increasing number
of networked E/E-Systems and their complex functional
dependencies.
To master this challenge, sophisticated E/E-architecture
approaches will be presented which cover
both, the vertical functional orientation, as well as the
horizontal integration aspects of a vehicle manufacturer.
Therefore we will present architectures and methods to
support the development of future
E/E-Systems, whereby the typical requirements of a
vehicle system integrator will be considered, such as
composability, hardware and software independence,
network-wide distribution of software components, and
the ability for separation between indication, operation
and behavior.
The paper describes the motivation, the system
integration requirements, actual existing solutions, future
technical challenges, and some detailed architecture
approaches itself. Furthermore the impacts of the
architecture on the development process and the OEM-supplier
relationship will be highlighted.
-
AIL: description of a global electronic architecture at the vehicle scale
-
Arjun Panday, Damien Couderc, Simon Marichalar
This paper introduces the Architecture
Implementation Language; a description language
that allows for an internal representation of the
architecture and acts as a connection with tools to
simplify the construction, planning, verification,
capitalisation, and documentation of an architecture.
The objective of AIL is to describe a vehicle
architecture from the level of the desired services
down to the level of physical implementation,
rendered concrete in one or more resulting
operational architectures. The proposed methodology
introduces the concepts of high level component
based architectures to the highly constrained
automotive world.
-
Methods and Tools for Systems Engineering of Automotive Electronic
Architectures
- Jakob Axelsson
The latest generations of road vehicles have seen a tremendous
development in on-board electronic systems, which
control increasingly large parts of the functionality. In this
paper, we discuss how the vehicle manufacturers need to
adjust their methods and tools to handle the increasing
complexity. The key issue is the system integration aspect,
which calls for increasing systems engineering capabilities.
Moderators: W. Damm, Oldenburg U/OFFIS, D; C. Delgado Kloos, U Carlos III de
Madrid, ES
-
Using SAT for Combinational Equivalence Checking [p. 114]
-
E. Goldberg, M. Prasad, and R. Brayton
This paper addresses the problem of combinational
equivalence checking (CEC) which forms one of the key
components of the current verification methodology for digital
systems. A number of recently proposed BDD based
approaches have met with considerable success in this area.
However, the growing gap between the capability of current
solvers and the complexity of verification instances necessitates
the exploration of alternative, better solutions. This
paper revisits the application of Satisfiability (SAT) algorithms
to the combinational equivalence checking (CEC)
problem. We argue that SAT is a more robust and flexible
engine of Boolean reasoning for the CEC application
than BDDs, which have traditionally been the method of
choice. Preliminary results on a simple framework for SAT
based CEC show a speedup of up to two orders of magnitude
compared to state-of-the-art SAT based methods for
CEC and also demonstrate that even with this simple algorithm
and untuned prototype implementation it is only moderately
slower and sometimes faster than a state-of-the-art
BDD based mixed engine commercial CEC tool. While SAT
based CEC methods need further research and tuning before
they can surpass almost a decade of research in BDD
based CEC, the recent progress is very promising and merits
continued research.
-
Combinational Equivalence Checking Using Boolean Satisfiability and Binary
Decision Diagrams [p. 122]
-
S. Reda and A. Salem
Most recent combinational equivalence checking
techniques are based on exploiting circuit similarity. In
this paper, we focus on circuits with no internal
equivalent nodes or after internal equivalent nodes have
been identified and merged. We present a new technique
integrating Boolean Satisfiability and Binary Decision
Diagrams. The proposed approach is capable of solving
verification instances that neither of both techniques was
capable to solve. The efficiency of the proposed approach
is shown through its application on hard to prove
industrial circuits and the ISCAS'85 benchmark circuits.
-
An Efficient Learning Procedure for Multiple Implication Checks [p. 127]
-
Y. Novikov and E. Goldberg
In the paper, we consider the problem of checking
whether cubes from a set S are implicants of a DNF
formula D, at the same time minimizing the overall time
taken by the checks. An obvious but inefficient way of
solving the problem is to perform all the checks
independently. In the paper, we consider a different
approach. The key idea is that when checking whether a
cube C from S is an implicant of D we can deduce (learn)
implicants of D that are not implicants of C. These cubes
can be used in the following checks for search pruning.
Experiments on random DNF formulas, DIMACS
benchmarks and DNF formulas describing circuits show
that the proposed learning procedure reduces the overall
time taken by checks by up to two orders of magnitude.
Organizers: D. Gajski, UC Irvine, USA; E. Villar, Cantabria U, ES
Moderator: E. Villar, Cantabria U, ES
Panellists: W. Rosenstiel, FZI/Tuebingen U, D; V. Gerousis, Infineon, D; D. Barton, Averstar, USA; J. Plantin, Ericsson, SE; P. Cavalloro, Italtel, IT; D.
Gajski, UC Irvine, USA; G. de Jong, Telelogic, B
-
C/C ++ : Progress or Deadlock in System-Level Specification [p. 136]
The lack of a general methodology and notation has
been identified as one of the main obstacles bedeviling
system-on-chip designers. Nevertheless, there is a lot of
confusion about what SLD (System Level Design) means
and which SLDL (System Level Design Language) is the
most appropriate.
With SOC demands there has been recently high
interest in system level design, particularly, HW/SW co-design.
In order to accommodate SW, the system
companies as well as EDA vendors would like to use C
as the language for System level Design. Many people
are trying with subset of C and others with C++ by
introducing classes that correspond to HW
(VHDL/Verilog) concepts. C/C++ syntax has become the
most popular for defining new C/C++ language
extensions for system-level specification and design. A
wide community of system designers and EDA suppliers
believe that C/C++ is the most appropriate vehicle to use
as a next-generation language. However, there are many
challenges and open problems.
Moderators: P. Muhmenthaler, Infineon Technologies, D; E.J. Marinissen, Philips Research, NL
-
An Integrated System-On-Chip Test Framework [p. 138]
-
E. Larsson and Z. Peng
In this paper we propose a framework for the testing of
system-on-chip (SOC), which includes a set of design
algorithms to deal with test scheduling, test access
mechanism design, test sets selection, test parallelization,
and test resource placement. The approach minimizes the
test application time and the cost of the test access
mechanism while considering constraints on tests, power
consumption and test resources. The main feature of our
approach is that it provides an integrated design
environment to treat several different tasks at the same time,
which were traditionally dealt with as separate problems.
Experimental results shows the efficiency and the usefulness
of the proposed technique.
-
Efficient Test Data Compression and Decompression for System-on-a-Chip Using
Internal Scan Chains and Golomb Coding [p. 145]
-
A. Chandra and K. Chakrabarty
We present a data compression method and decompression
architecture for testing embedded cores in a system-on-a-chip
(SOC). The proposed approach makes effective use
of Golomb coding and the internal scan chains of the core
under test, and provides significantly better results than a
recent compression method that uses Golomb coding and a
separate cyclical scan register (CSR). The use of the internal
scan chain for decompression obviates the need for a
CSR. In addition, the novel interleaving decompression architecture
allows multiple cores in an SOC to be tested concurrently
using a single ATE I/O channel. We demonstrate
the effectiveness of the proposed approach by applying it to
the ISCAS 89 benchmark circuits.
-
Testing TAPed Cores and Wrapped Cores with the Same Test Access Mechanism
[p. 150]
-
M. Benabdenbi, W. Maroufi, and M. Marzouki
This paper describes a way of testing both wrapped cores
and TAPed cores within a System On a Chip (SoC) with the
same Test Access Mechanism (TAM). The TAM's architecture,
which is dynamically reconfigurable, scalable and flexible,
is named CAS-BUS and have a central controller. All
the cores can be tested this way in the same session through
a modified Boundary Scan Test Access Port.
-
On Applying the Set Covering Model to Reseeding [p. 156]
-
S. Chiusano, S. Di Carlo, P. Prinetto, and H. Wunderlich
The Functional BIST approach is a rather new BIST
technique based on exploiting embedded system
functionality to generate deterministic test patterns during
BIST. The approach takes advantages of two well-known
testing techniques, the arithmetic BIST approach and the
reseeding method.
The main contribution of the present paper consists in
formulating the problem of an optimal reseeding
computation as an instance of the set covering problem.
The proposed approach guarantees high flexibility, is
applicable to different functional modules, and, in general,
provides a more efficient test set encoding then previous
techniques. In addition, the approach shorts the
computation time and allows to better exploiting the trade-off
between area overhead and global test length as well
as to deal with larger circuits.
Organizer: P. van Staa, Robert Bosch GmbH, D
Moderator: H. Heidbrink, Descon GmbH, D
Panellists: B. Potock, Mentor Graphics Corp, USA; J. Mueller, Rosemann
&Lauridsen GmbH, D; U. Ahle, Siemens Business Services, D; C. Basille,
Aerospatiale Matra Missiles, F; W. Kisselmann, Infineon Technologies, D;
W. Herden, Robert Bosch GmbH, D
-
Data Management -- Limiter or Accelerator for Electronic Design Creativity
[p. 162]
Data Management is the key to introduce concurrent
engineering, configuration management and work in
progress control throughout the entire design process.
That has been recognized by MCAD and ERP/MRP
Software vendors years ago. Product Data Management
(PDM) solutions are used and accepted for mechanical
designs but not in electronic design departments.
The EDA industry has not been focusing on strategies
to fill the gap between business processes and design
activities. Therefore today proprietary processes on a
directory file level mostly manage variant handling and
configuration management. Standard database
management solutions or Product Data Management
applications could not reach major market shares up to
now.
Moderators: H. Gräb, TU Munich, D; J. Eckmüller, Infineon
Technologies, D
-
Efficient Bit-Error-Rate Estimation of Multicarrier Transceivers [p. 164]
-
G. Vandersteen, P. Wambacq, Y. Rolain, J. Schoukens, S. Donnay,
M. Engels, I. Bolsens
Multicarrier modulation schemes are widely used in several
digital telecommunication systems, such as Asymmetric
Digital Subscriber Lines (ADSL) and Wireless Local Area
Network (WLAN) based on Orthogonal Frequency Domain
Multiplexing (OFDM). An estimate of the Bit-Error-Rate
(BER) degradation due to non-idealities in the transceiver
(e.g. nonlinear distortions in the analog front-ends, digital
clipping,...) is much more complicated in a multicarrier
system than in a single-carrier system due to the large
number of carriers and the huge number of possible
transmitted symbols. This paper proposes a method for
estimating the BER of such OFDM modulation schemes in
a CPU time that is two orders of magnitude smaller than a
Monte-Carlo method, as confirmed by simulations on a 5
GHz IEEE 802.11 WLAN receiver front-end.
-
Efficient Time -Domain Simulation of Telecom Frontends Using a Complex Damped
Exponential Signal Model [p. 169]
-
P. Vanassche, G. Gielen, and W. Sansen
This paper presents an efficient time-domain simulation
approach for telecommunication frontends at architectural
level. It is based upon the use of complex damped exponential
modeling functions. These allow to construct accurate
signal models for digitally modulated telecom signals, requiring
only few modeling functions. Since these models
are valid over a long range of time, they allow for a large
timestep, which greatly speeds up time-domain simulation
of the telecom frontends. Details of a simulation approach
based upon this signal model are discussed. The approach
is verified by experimental results.
-
Simulation Method to Extract Characteristics for Digital Wireless Communication Systems [p. 176]
-
L. Nguyen and V. Janicot
In all wireless standards involving digital
modulation, new fundamental characteristics have to be
extracted for quantifying the linearity/distortion in RF
designs. This paper describes a simulation technique,
Modulated Steady State, and its use to extract these
specifications. An example of its application to a typical
RF transmitter with a p/4-DQPSK modulator is
presented.
Moderators: G. Stamoulis, Intel, USA; K. Roy, Purdue U, USA
-
Microprocessor Power Analysis by Labeled Simulation [p. 182]
-
C. Hsieh, L. Chen, and M. Pedram
In many applications, it is important to know how power is
consumed while software is being executed on the target
processor. Instruction-level power microanalysis, which is
a cycle-accurate simulation technique based on instruction
label generation and propagation, is aimed at answering
this question for a superscalar and pipelined processor.
This technique requires the micro-architectural details of
the CPU and provides the power consumption of every
module (or gate) for each active instruction in each cycle.
To validate this approach, a Zilog digital signal processor
core was designed by using a 0.25 u TSMC cell library, and
the power consumption per instruction was collected using
a Verilog simulator specially written for the DSP core.
-
Power Aware Microarchitecture Resource Scaling [p. 190]
-
A. Iyer and D. Marculescu
In this paper we present a strategy for run-time profiling to optimize
the configuration of a microprocessor dynamically so as to
save power with minimum performance penalty. The configuration
of the processor changes according to the parallelism in the running
program. Experiments on some benchmark programs show
good savings in total energy consumption; we have observed a decrease
of up to 23% in energy/cycle and up to 8% in energy per
instruction. Our proposed approach can be used for energy-aware
computing in either portable applications or in desktop environments
where power density is becoming a concern. This approach
can also be incorporated in larger power management strategies
like ACPI.
-
Extending Lifetime of Portable Systems by Battery Scheduling [p. 197]
-
L. Benini, G. Castelli, A. Macii, E. Macii, M. Poncino, and R. Scarsi
Multi-battery power supplies are becoming popular in electronic
appliances of the latest generations, due to economical and
manufacturing constraints. Unfortunately, a partitioned battery
subsystem is not able to deliver the same amount of charge as
a monolithic battery with the same total capacity. In this
paper, we define the concept of battery scheduling, we investigate
policies for solving the problem
of optimal charge delivery, and we study the relationship
of such policies with different configurations of the battery
subsystem. Results, obtained for different workloads,
demonstrate that the choice of the proper scheduling can
make, in the best case, system lifetime as close as 1% of
that guaranteed by a monolithic battery of equal capacity.
Moderators: R. Galivanche, Intel, USA; B. Straube, FhG IIS/EAS Dresden, D
-
Efficient Spectral Techniques for Sequential ATPG [p. 204]
-
A. Giani, S. Sheng, M. Hsiao, and V. Agrawal
We present a new test generation procedure for sequential
circuits using spectral techniques. Iterative processes of
filtering via compaction and spectral analysis of
the filtered test set are performed for each primary input,
extracting inherent spectral information embedded within
the test sequence. This information, when viewed in the
frequency domain, reveals the characteristics of the input
spectrum. The filtered and analyzed set of vectors is then
used to predict and generate future vectors. We also developed
a fault-dropping technique to speed up the process.
We show that very high fault coverages and small vector
sets are consistently obtained in short execution times for
sequential benchmark circuits.
-
On the Test of Microprocessor IP Cores [p. 209]
-
F. Corno, M. Sonza Reorda, S. Squillero, and M. Violante
Testing is a crucial issue in SOC development and
production process. A popular solution for SOCs that
include microprocessor cores is based on making them
execute a test program. Thus, implementing a very
attracting BIST solution. This paper describes a method
for the generation of effective programs for the self-test of
a processor. The method can be partially automated, and
combines ideas from traditional functional approaches
and from the ATPG field. We assess the feasibility and
effectiveness of the method by applying it to a 8051 core.
-
Sequence Reordering to Improve the Levels of Compaction Achievable by Static
Compaction Procedures [p. 214]
-
I. Pomeranz and S. Reddy
We describe a reordering procedure that changes the order of test
vectors in a test sequence for a synchronous sequential circuit
without reducing the fault coverage. We use this procedure to
investigate the effects of reordering on the ability to compact the
test sequence. Reordering is shown to have two effects on compaction.
(1) The reordering process itself allows us to reduce the
test sequence length. (2) Reordering can improve the effectiveness
of an existing static compaction procedure. Reordering also
provides an insight into the detection by test generation procedures
of faults that are detected by relatively long subsequences.
-
SEU Effect Analysis in an Open-Source Router via a Distributed Fault Injection
Environment [p. 219]
-
A. Benso, S. Di Carlo, G. Di Natale, and P. Prinetto
The paper presents a detailed error analysis and
classification of the behavior of an open-source router,
when affected by Single Event Upsets (SEUs). The
experimental results have been gathered on a real
communication network, resorting to an ad-hoc Fault
Injection system. The injector has been designed to
corrupt the router during its normal service and to analyze
the SEU injection effects on the overall distributed system.
The performed experiments allowed the authors to
identify the most critical memory regions and to cluster the
router variables according to their impact on system
dependability.
Organizer: A. Lock, Synopsys, USA
Moderator: R. Camposano, Synopsys, USA
Panellists: R. Camposano, Synopsys, USA; A. Cuomo, STMicrolectronics, IT;
R. Subramanian, MorphICs., USA; H. Meyr, TU Aachen, D
-
The Programmable Platform: Does One Size Fit All? [p. 226]
This special panel session brings together several
leading technologists representing organisations within
the telecom and system-on-chip design communities.
The panel will discuss the trend in platform-based
design, where new products are increasingly based on
re-programmability or re-configuration of more
general-purpose devices. Particular emphasis will be
placed on the need to meet the requirements of the
Telecom market, where flexibility is a key concern, but
with the shift towards third-generation wireless
systems, so too is performance.
Moderators: F. Johannes, TU Munich, D; R. Otten, TU Delft, NL
-
Slicing Tree is a Complete Floorplan Representation [p. 228]
-
M. Lai and D. Wong
Slicing tree has been an effective tool for VLSI floorplan design.
Floorplanners using slicing tree representation take
full advantage of shape and orientation flexibility of circuit
modules to find highly compact slicing floorplans. However,
slicing floorplans are commonly believed to suffer from poor
utilization of space when all modules are hard. For this reason,
a large body of literature has recently been devoted to
various new representations of non-slicing floorplans to improve
space utilization. In this paper, we prove that by using
slicing tree representation and compaction, all maximally
compact placements of modules can be generated. In conclusion,
slicing tree is a complete floorplan representation
for all non-slicing floorplans as well.
-
Further Improve Circuit Partitioning Using GBAW Logic Perturbation Techniques
[p. 233]
-
C. Cheung, Y. Wu, and D. Cheng
Efficient circuit partitioning is gaining more importance
with the increasing size of modern circuits. Conventionally,
circuit partitioning is solved by modeling a circuit as a hypergraph
for the ease of applying graph algorithms. However,
there exist rooms for further improvement on even optimum
hypergraph partitioning results, if logic information
can be applied for perturbation. In this paper, we present a
multi-way partitioning framework which can couple any excellent
hypergraph partitioner and a noval logic perturbation
based (GBAW) technique for further improvement over
very excellent partitioning results. Our approach can integrate
with any graph partitioner. We performed experiments
on 2-, 3-, 4-, and 5-way partitionings for various circuits of
different sizes from MCNC benchmarks. We have chosen
the state-of-the-art hMetis-Kway to obtain high quality initial
solutions for the experiments. Our experiments showed
that this partitioning approach can achieve a further 15%
reduction in cut size for 2-way partitioning with an area
penalty of only 0.33%. The good results demonstrated the
effectiveness of this new partitioning technique.
-
Clustering Based Fast Clock Scheduling for Light Clock-Tree [p. 240]
-
M. Saitoh, M. Azuma, and A. Takahashi
We introduce a clock schedule algorithm to obtain a
clock schedule that achieves a shorter clock period and that
can be realized by a light clock tree. A shorter clock period
can be achieved by controlling the clock input timing
of each register, but the required wire length and power
consumption of a clock tree tends to be large if clock input
timings are determined without considering the locations
of registers. To overcome the drawback, our algorithm
constructs a cluster that consists of registers with the same
clock input timing located in a close area. In our algorithm,
first registers are partitioned into clusters by their
locations, and clusters are modified to improve the clock
period while maintaining the radius of each cluster small.
In our experiments for an industrial data of 888 registers,
the clock period achieved is 27% shorter than that achieved
by a zero-skew clock tree, and 1% longer than the theoretical
minimum. The computational time is about 24.9 seconds
and the wire length and power consumption of the clock tree
is comparable to these of a zero skew tree.
Moderators: N. Wehn, Kaiserslautern U, D; M. Bolle, Systemonic, D
-
Power-Efficient Layered Turbo Decoder Processor [p. 246]
-
J. Dielissen, J. van Meerbergen, M. Bekooij, F. Harmsze,
S. Sawitzki, J. Huisken, and A. van der Werf
Turbo decoding offers outstanding error correcting
capabilities, that will be used in wireless applications
like the Universal Mobile Telecom Standard[4]
(UMTS). However, the algorithm is very computational
intensive, and therefore an implementation on
a general purpose programmable DSP results in a
power consumption which reduces the applicability
of turbo decoding in hand-held applications. In
this paper we present a solution based on a layered
processing architecture. This architecture includes
an application specific Very Long Instruction Word
(VLIW) processor, a data flow processor, and hardwired
execution units in a hierarchical way. The
power consumption of this solution is an order of
magnitude better than the implementation on a current
state of the art, power efficient general purpose
DSP.
-
Exploiting Data Forwarding to Reduce the Power Budget of VLIW Embedded Processors [p. 252]
-
M. Sami, D. Sciuto, C. Silvano, V. Zaccaria, and R. Zafalon
In this paper, a low-power approach to the design
of embedded VLIW processor architectures is proposed.
To solve the most part of data hazards in the pipeline,
processors use forwarding (or bypassing) hardware to
provide the required operands from the inter-stage pipeline
registers directly to the inputs of the function units. The
operands are then stored in the Register File during the
write-back pipeline stage. In this paper, we propose a power
optimization technique based on the exploitation of the
forwarding paths in the processor to avoid the power cost
of writing/reading short-lived variables to/from the Register
File. In application-specific embedded systems, experimental
evidence has shown that a significant number of variables
are short-lived, that is their liveness (from first definition to
last use) spans only few instructions. Values of short-lived
variables can be accessed directly through the forwarding
registers, avoiding write-back. An application example of
our solution to a VLIW embedded core, when accessing the
Register File, has shown a power saving up to 35% with
respect to the unoptimized approach on the given set of
target benchmarks. The performance overhead is equal to
one-gate delay to be added on the processor critical-path.
Keywords: Low-Power, Pipeline Processors, VLIW
Embedded Architectures, Forwarding.
-
Design of Low-Power High-Speed Maximum a Priori Decoder Architectures [p. 258]
-
A. Worm, H. Lamm, and N. Wehn
Future applications demand high-speed maximum a posteriori
(MAP) decoders. In this paper, we present an in-depth
study of design alternatives for high-speed MAP architectures
with special emphasis on low power consumption.
We exploit the inherent parallelism of the MAP algorithm
to reduce power consumption on various abstraction
levels. A fully parameterizable architecture is introduced,
which allows to optimally adapt the architecture to the application
requirements and the throughput. Intensive design
space exploration has been carried out on a state-of-the-art
0.2 um technology, including efficient parallelism
techniques, a data flow transformation for reduced power
consumption, and an optimized FIFO implementation.
Moderators: E. Macii, Politecnico di Torino, IT; D. Marculescu, Carnegie
Mellon U, USA
-
Low Complexity FIR Filters Using Factorization of Perturbed Coefficients
[p. 268]
-
C. Neau, K. Muhammad, and K. Roy
This paper presents a factorization based technique to
reduce the computational complexity of implementing
Finite Impulse Response (FIR) digital filters. It is possible
to design FIR filters in which all of the filter coefficients
are products of the first seven prime numbers. For such
filters, factorization of the filter coefficients allows the
reuse of intermediate results among computations
involving common factors. Since the coefficients are
products of only small prime numbers, it is also possible to
generate each of the partial products with a single shift
and add operation. Compared to a traditional
implementation, this results in a 35-50% reduction in
computational complexity, which is shown to translate into
lower power consumption.
-
An Adaptive Algorithm for Low-Power Streaming Multimedia Processing [p. 273]
-
A. Acquaviva, L. Benini, and B. Riccó
This paper addresses the problem of power consumption
in multimedia system architectures and presents an algorithmic
optimization technique to achieve the goal of power
reduction in the context of real time processing. The technique
is based on a mixed speed-setting and shutdown policy.
We address the problem from both a theoretical and
practical point of view, by presenting a power efficient implementation
of a MPEG-layer3 real-time decoder algorithm
designed for wearable devices as a case study. The
target system is the Hewlett-Packard's SmartBadgeIII prototype
of wearable system based on the StrongARM1100
processor. Theoretical analysis as well as quantitative results
of power measurements are provided to show the effectiveness
of this technique. The experimental set-up is also
described.
-
A Static Power Estimation Methodology for IP-Based Design [p. 280]
-
X. Liu and C. Papaefthymiou
This paper proposes a novel system-level power estimation
methodology for electronic designs consisting of intellectual
property (IP) components. Our methodology relies
on analytical output and power macromodels of the
IP blocks to estimate system dissipation without performing
any simulation. We derive upper bounds on the estimation
error of our methodology and demonstrate the relation
of this error to the sensitivities of the macromodeling
functions. For circuits without feedback, we give a sufficient
condition for the worst-case power estimation error
to increase only linearly with the length of the IP cascades.
We also give a tighter sufficient condition that ensures error
boundedness in IP systems of any topology. Experiments
with signal processing and data encryption systems validate
the accuracy and efficiency of our approach. For designs of
up to 576 IP blocks, power estimates are obtained within
0.2 seconds. In comparison with switch-level simulation results,
the average error of our power estimates is 7.3%.
Moderators: C. Metra, DEIS-Bologna U, IT; R. Leveugle, TIMA, Grenoble, F
-
Optimization of Error Detecting Codes for the Detection of Crosstalk Originated Errors [p. 290]
-
M. Favalli and C. Metra
This work applies weight based codes [1] to the detection
of crosstalk originated errors. This kind of faults, whose
importance grows with device scaling, may originate errors
that are undetectable by the mostly used error detecting
codes in VLSI ICs. Conversely, such errors can be easily
detected by weight based codes that, however, have smaller
encoding capabilities. In order to reduce the cost of these
codes, a graph theoretic optimization is used. Moreover, new
applications of these codes are explored regarding the synthesis
of self-checking FSMs, and the detection of errors related
to the clock distribution network.
-
System Safety through Automatic High-Level Code Transformations: An Experimental Evaluation [p. 297]
-
P. Cheynet, B. Nicolescu, R. Velazco, M. Rebaudengo, M. Sonza Reorda,
and M. Violante
This paper deals with a software modification strategy
allowing the on-line detection of transient errors. Being
based on a set of rules for introducing redundancy in the
high-level code, the method can be completely automated,
and is particularly suited for low-cost safety-critical
microprocessor-based applications. Experimental
results from software and hardware fault injection campaigns
are presented and discussed, demonstrating the
effectiveness of the approach in terms of fault detection
capabilities.
-
From DFT to Systems Test -- A Model Based Cost Optimization Tool [p. 302]
-
M. Wahl, T. Ambler, C. Maaß and M. Rahman
Long lasting systems like airplanes have a cost structure
where the maintenance costs are larger than the
purchasing costs. Testing is required, both for preventive
maintenance as well as repair and a majpor source for
cost. Previously we have analysed test and Design for
Testability for digital systems, covering ASICs, boards
and systems. Besides, the continuous development of
technology requires cost models that can grow dynamically
and, because we will never have all information,
can work with incomplete data sets. In this paper we
present a tool that is well suited for a wide range of
applications. Previously developed cost models can be
incorporated and new elements can be added to the
model as needed. Due to the generic approach the tool
allows modelling general systems. It is not bound to the
digital domain, although it has a strong background
there.
-
Efficient On-Line Testing Method for a Floating-Point Adder [p. 307]
-
A. Drozd and M. Lobachev
In this paper we present a residue method for on-line
testing of the floating-point adder. This circuit contains
arithmetic shifter which executes an abridged operation.
In the method the problem of the abridged operation
checking with the reduced hardware amount is solved.
Organizer: J. Rabaey, UC Berkeley, USA
Moderator: M. Engels, IMEC, B
-
Design Methodology for PicoRadio Networks [p. 314]
-
J. da Silva Jr., J. Shamberger, M. Ammer, C. Guo, S. Li,
R. Shah, T. Tuan, M. Sheets, J. Rabaey,
B. Nikolic, A. Sangiovanni-Vincentelli, and P. Wright
One of the most compelling challenges of the next decade is
the "last-meter" problem, extending the expanding data network
into end-user data-collection and monitoring devices. PicoRadio
supports the assembly of an ad hoc wireless network of self-contained
mesoscale, low-cost, low-energy sensor and monitor
nodes. While technology advances have made it conceivable to
deploy wireless networks of heterogeneous nodes, the design of a
low-power, low-cost, adaptive node in a reduced time to market is
still a challenge. We present a design methodology for PicoRadio
Networks, from system conception and optimization to silicon
platform implementation. For each phase of the design, we
demonstrate the applicability of our methodology through
promising experimental results.
Moderators: W. John, Fraunhofer Institute Berlin/Paderborn, D;
F. Sabath, Armed Forces Institute for Protection Technologies, USA
-
High-Level Simulation of Substrate Noise Generation from
Large Digital Circuits with Multiple Supplies [p. 326]
-
M. Badaroglu, M. van Heijningen, V. Gravot, S. Donnay, H. De Man, G. Gielen
M. Engels, and I. Bolsens
Substrate noise generated by large digital circuits degrades
the performance of analog circuits sharing the
same substrate. Existing approaches usually extract the
model of the substrate from the layout information and
then simulate the extracted transistor-level netlist with this
substrate model using a transistor-level simulator. For
large digital circuits, the substrate simulation is however
not feasible with a transistor-level simulator. In our previous
work, it has been demonstrated that efficient and accurate
simulation of substrate noise generation at gate-level
is feasible. In this paper several important extensions
to our previous work are introduced: modeling of IO cells,
modeling of input transition time and load dependency
and the extraction methodology of an equivalent substrate
model within multiple supply domains. Experimental results
show an improved accuracy (6.3% error on RMS
substrate voltage with respect to a full SPICE level simulation)
with these extensions, while maintaining a large
speedup with respect to SPICE simulations.
-
Crosstalk Noise in Future Digital CMOS Circuits [p. 331]
-
C. Werner, R. Göttsche, A. Wörner, and U. Ramacher
This paper presents simulation results for crosstalk noise
in future CMOS generations down to 35 nm features. The
noise voltage is calculated from circuit simulations with
lumped RLC networks and static CMOS cells. A static
noise margin is derived from inverter characteristics of
NAND and NOR gates and a critical wire length is
calculated from considering statistical variations in the
chip manufacturing process. The model agrees well with
measurements on a quarter micron testchip and predicts a
drastic drop of critical wirelengths to 50-60 um after the
100 nm technology generation.
-
Modeling Electromagnetic Emission of Integrated Circuits for System Analysis [p. 336]
-
P. Kralicek, W. John, and H. Garbe
In this contribution a new methodology for modeling electromagnetic
emission of integrated circuits in system analysis
is shown. By using a physical model based on a multipole
expansion, the emitted fields can be well approximated
in the space outside a component. This allows a convenient
representation with a low number of model parameters
which can be determined by measurement or simulation.
To show the applicability, the developed models are
used in a system level printed circuit board simulator. The
results are compared with reference calculations.
-
Analysis of EME Produced by a Microcontroller Operation [p. 341]
-
F. Fiori and F. Musolino
This paper deals with the characterization of integrated circuits
electromagnetic emissions. The TEM cell method is employed in order
to identify primary emissions sources of complex digital devices.
An 8-bit microcontroller, realized by a 0.8 um HCMOS process is
considered. It is composed of several building blocks like the
central processing unit, the analog to digital converter and the
EPROM memory. Emission measurements are performed by operating a
specific program code stored in the microcontroller memory and
emissions due to each building block are identified.
Moderators: A. Kaiser, IEMN-ISEN, F; P. Wambacq, IMEC, B
-
Top-Down Design of a xDSL 14-bit 4MS/s Sigma-Delta Modulator in Digital CMOS Technology [p. 348]
-
R. del Río, J. de la Rosa, F. Medeiro, B. Pérez-Verdú, and A.
Rodríguez-Vázquez
This paper describes the design of a Sigma-Delta modulator
aimed for A/D conversion in xDSL applications, featuring
14-bit@4Msample/s in a 0.35mm mainstream digital
CMOS technology. Architecture selection, modulator sizing
and cell sizing tasks where supported by a CAD methodology,
thus allowing us to obtain a power efficient implementation
in a short design cycle.
-
Analog Design for Reuse -- Case Study: Very Low-Voltage Sigma-Delta Modulator [p. 353]
-
M. Dessouky, A. Kaiser, M. Louërat, and A. Greiner
This paper presents the complete design methodology
of a very low-voltage DS third-order modulator from high-level
specifications down to layout. Behavioral models taking
into account cell non-idealities are developed and used
to map performance specifications to lower levels. Emphasis
has been made on eventual design reuse through design
plans and layout templates in a layout-oriented circuit design
approach. The modulator has been designed for two
different technologies demonstrating the suitability of the
methodology for very high performance mixed-signal circuits.
Moreover, the same design knowledge has been successfully
reused in another fourth-order modulator.
-
A Design Strategy for Low-Voltage Low-Power Continuous-Time
Sigma-Delta A/D Converters [p. 361]
-
F. Gerfers and Y. Manoli
This paper presents a design strategy for low-voltage
low-power Sigma-Delta analog-to-digital (A/D) converter using a
continuous-time (CT) lowpass loopfilter. An improved
method is used to find the optimal Sigma-Delta modulator implementation
with respect to a minimal power consumption on
the one hand and to fulfill a rapid prototyping approach on
the other hand. The influence of the low supply voltage
as well as circuit nonidealities on the overall Sigma-Delta modulator
determined and verified by behavioral simulations.
Transistor-level simulation results of a 1:5 V CT Sigma-Delta A/D
converter show a 75 dB dynamic range in a bandwidth of
25kHz.
Moderators: R. Murgai, Fujitsu Labs of America, USA; S. Minato, NTT, JP
-
Minimizing Stand-By Leakage Power in Static CMOS Circuits [p. 370]
-
S. Naidu and E. Jacobs
In this paper we concern ourselves with the problem of
minimizing leakage power in CMOS circuits consisting of
AOI (and-or-invert) gates as they operate in stand-by mode
or an idle mode waiting for other circuits to complete their
operation. It is known that leakage power due to subthreshold
leakage current in transistors in the OFF state is
dependent on the input vector applied. Therefore, we try to
compute an input vector that can be applied to the circuit in
stand-by mode so that the power loss due to sub-threshold
leakage current is the minimum possible. We employ a
integer linear programming (ILP) approach to solve the
problem of minimizing leakage by first obtaining a good
lower bound (estimate) on the minimum leakage power and
then rounding the solution to actually obtain an input
vector that causes low leakage. The chief advantage of this
technique as opposed to others in the literature is that it
invariably provides us with a good idea about the quality of
the input vector found.
-
In-Place Delay Constrained Power Optmization Using Functional Symmetries
[p. 377]
-
C. Chang, B. Hu, and M. Marek-Sadowska
In-Place Optimization (IPO) has become the backend
methodology of choice to resolve the gap between logic
synthesis and physical design as the optimization can be
guided by accurate physical information. To perform optimization
without perturbing too much the placed netlist,
only buffer insertion and gate sizing are commonly used in
current design tools. In this paper, we address the problem
of delay-constrained power optimization by introducing
another degree of freedom: functional symmetry based
rewiring. Theoretical results on the effect of using functional
symmetry on transition density for power estimation
is also derived. Experimental results show that, under the
same delay constraint, our technique achieves much better
power reduction as compared to the discrete gate sizing
only technique.
-
High-Quality Sub-Function Construction in Functional Decomposition Based on
Information Relationship Measures [p. 383]
-
L. Józwiak and A. Chojnacki
Functional decomposition seems to be the most effective
circuit synthesis approach for look-up table (LUT)
FPGAs, (C)PLDs and complex gates. In the functional
decomposition that targets LUT FPGAs, the circuit is
constructed by recursively decomposing a given function
and its sub-functions until each of the resulting sub-functions
can be directly implemented with a LUT. The
choice of sub-functions constructed in this process
decides the quality of the resulting multi-level circuit
expressed in terms of the logic block count and speed. In
this paper, we propose a new effective and efficient
method for the sub-function construction, and we consider
its application in our circuit synthesis tool that targets
LUT-based FPGAs. The method is based on the
information relationship measures. The experimental
results demonstrate that the proposed approach leads to
extremely fast and very small circuits.
-
Generalized Reasoning Scheme for Redundancy Addition and Removal Logic Optimization [p. 391]
-
J. Espejo, L. Entrena, E. San Millán, and E. Olías
In this work a generalization of the structural
Redundancy Addition and Removal (RAR) logic
optimization method is presented. New concepts based on
the functional description of the nodes in the network are
introduced to support this generalization. Necessary and
sufficient conditions to identify all the possible structural
expansions are given for the general case of multiple
variable expansion. Basic nodes are no longer restricted
to simple gates and can be any function of any size. With
this generalization, an incremental mechanism to perform
structural transformations involving any number of
variables can be applied in a very efficient manner.
Experimental results are presented that illustrate the
efficiency of our scheme.
Moderators: J. Teixeira, IST/INESC, PT; M. Sonza Reorda, Politecnico di Torino, IT
-
LPSAT: A Unified Approach to RTL Satisfiability [p. 398]
-
Z. Zeng, P. Kalla, and M. Ciesielski
LPSAT is an LP-based comprehensive infrastructure designed
to solve the satisfiability (SAT) problem for complex RTL
designs containing both word-level arithmetic operators and
bit-level Boolean logic. The presented technique uses a mixed
integer linear program to model the constraints corresponding
to both domains of the design. Our technique renders the
constraint propagation between the two domains implicit to
the MILP solver, thus enhancing the overall efficiency of the
SAT framework. The experimental results are quite promising
when compared with generic CNF-based and BDD-based SAT
algorithms.
-
Functional Test Generation for Behaviorally Sequential Models [p. 403]
-
F. Ferrandi, G. Ferrara, D. Sciuto, A. Fin, and F. Fummi
Functional testing of HDL specifications is one of the
most promising approaches for the verification of the functionalities
of a design before synthesis. The contribution of
this work is the development of a test generation algorithm
targeting a new coverage metric (called bit-coverage) that
provides full statement coverage, branch coverage, condition
coverage and partial path coverage for behaviorally
sequential models.
The behavioral test sequences can be also the only way
to evaluate testability of VHDL model for which a gate-level
representation is not available (e.g third-party cores), since
the behavioral error model is characterized also by a high
correlation with the RT and gate-level stuck-at fault model.
Moreover, the preciseness of the proposed coverage metric
makes the identified test sequences more effective in identifying
design errors, than other test patterns developed by
following standard coverage metrics.
-
High Quality Behavioral Verification Using Statistical Stopping Criteria [p. 411]
-
A. Hajjar, T. Chen, I. Munn, A. Andrews, and M. Bjorkman
In order to improve the efficiency of behavioral model
verification, it is important to determine the points of deminishing
return for a given verification strategy. This paper
compares the existing stopping rules and presents a new
stopping rule based on static Bayesian technique. The new
stopping rule was applied to verifying 14 complex VHDL
models. We used the figure of merit to compare the efficiency
of the stopping rules. The results in terms of coverage and
verification time were shown to consistently outperform existing
stopping rules.
Keywords: Behavioral Model Verification, VHDL, Statistical
Stopping Rules.
Organizers: P. Bromley, F. Karim, and P. Paulin, STMicroelectronics, F
Moderator: P. Paulin, STMicroelectronics, F
-
Network Processors: A Perspective on Market Requirements,
Processor Architectures and Embedded S/W Tools [p. 420]
-
P. Paulin, F. Karim, and P. Bromley
With the projected explosion of low-cost bandwidth
availability, the intensive processing tasks and service
hosting will move close to consumers on the "intelligent
edge" of the network, where a significant portion of the
future storage, processing and network management will
take place. We address the rationale for this change, the
characteristics of the network processor architecture
required to address it, and the software development tools
needed in order to improve time-to-market without
sacrificing embedded software performance.
Moderators: L. Silveira, IST/INESC, PT; H. Grabinski, Hannover U, D
-
Efficient Inductance Extraction via Windowing [p. 430]
-
M. Beattie and L. Pileggi
We propose a new, efficient and accurate localized inductance modeling
technique via windowing in a manner that is analogous to localized
capacitance extraction. The stability and accuracy of this process
is made possible by twice inverting the localized inductance models,
and in the process exploit properties of the magnetostatic interactions
as modeled via the susceptance (inverse inductance). Application of these
localized double-inverse inductance models to actual IC bus examples
demonstrates the significant improvement in simulation efficiency and
overall accuracy as compared to alternative methods of approximation
and simplification.
-
Efficient and Passive Modeling of Transmission Lines by Using Differential Quadrature Method [p. 437]
-
Q. Xu and P. Mazumder
This paper introduces a new transmission line modeling
approach that employs an efficient numerical approximation
technique called the Differential Quadrature Method
(DQM). The transmission line has been discretized and
the approximation framework is constructed by using the
5th order differential quadrature method, consequently an
improved discrete equivalent-circuit model is developed
in the paper. The DQM-based modeling requires far fewer
intervening grid points for building an accurate discrete
model of the transmission line than numerical methods
like FD requires. It introduces far less state variables than
FD-based models; therefore, it has higher efficiency. The
DQM technique can be integrated in a circuit simulator
since it preserves the passivity.
-
Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC
Trees with Lumped and Distributed Elements [p. 445]
-
Q. Yu and E. Kuh
In today's deep submicron technology, the coupling capacitances
among individual on-chip RC trees have essential
effect on the signal delay and crosstalk, and the interconnects
should be modeled as coupled RC trees. We provide
simple explicit formulas for the Elmore delay and higher
order voltage moments, and a linear order recursive algorithm
for the voltage moment computation for lumped and
distributed coupled RC trees. By using the formulas and
algorithms, the moment matching method can be efficiently
implemented to deal with delay and crosstalk estimation,
model order reduction and optimal design of interconnects.
-
On the Impact of On-Chip Inductance on Signal Nets under the Influence of Power Grid Noise [p. 451]
-
T. Chen
It has been well recognized that the impact of on-chip inductance
on some critical nets, such as clock nets, is significant and cannot be
ignored in delay modeling for these nets. However, the impact of
on-chip inductance on signal nets in general is still not well understood.
We present results of analyzing inductive effects on signal nets for
ultra-deep submicron technologies. The analysis is based on a Al-based
0.18 um CMOS process and a Cu-based 0.13 um CMOS process. The impact
of on-chip inductance is shown to be insignificant if we assume a perfect
power supply network around the interconnect routes. Otherwise, the
impact of on-chip inductance can be significant. Furthermore, the
results presented in this paper illustrate the impact of on-chip
inductance one would expect from transitioning from an Al-based
interconnect technology to a Cu-based interconnect technology.
Moderators: S. Yoo, TIMA, Grenoble, F; F. Wagner, UFRGS, BRZ
-
Timing Simulation of Digital Circuits with Binary Decision Diagrams [p. 460]
-
R. Ubar, A. Jutman, and Z. Peng
Meeting timing requirements is an important
constraint imposed on highly integrated circuits, and the
verification of timing of a circuit before manufacturing is
one of the critical tasks to be solved by CAD tools. In this
paper, a new approach and the implementation of several
algorithms to speed up gate-level timing simulation are
proposed where, instead of gate delays, path delays for
tree-like subcircuits (macros) are used. Therefore timing
waveforms are calculated not for all internal nodes of the
gate-level circuit but only for outputs of macros. The
macros are represented by structurally synthesized binary
decision diagrams (SSBDD) which enable a fast
computation of delays for macros. The new approach to
speed up the timing simulation is supported by
encouraging experimental results.
-
HALOTIS: High Accuracy LOgic TIming Simulator with Inertial and Degradation
Delay Model [p. 467]
-
P. Vazquez, J. Juan-Chico, M. Bellido, A. Acosta, and M. Valencia
This communication presents HALOTIS, a novel high
accuracy logic timing simulation tool, that incorporates a
new simulation algorithm based on different concepts for
transitions and events. This new simulation algorithm is
intended for including the inertial and degradation delay
models. Simulation results are very similar to those
obtained by electrical simulators, and show a higher
accuracy compared to conventional delay models
implemented in current logic simulators.
-
dlbSIM -- A Parallel Functional Logic Simulator Allowing Dynamic Load Balancing [p. 472]
-
K. Hering, J. Löser, and J. Markwardt
To meet the demanding time-to-market requirements in
VLSI/ULSI design, the acceleration of verification processes
is inevitable. The parallelization of cycle-based simulation
at register-transfer- and gate level is one facet in
a series of efforts targeted at this objective. We introduce
dlbSIM, a parallel compiled code functional logic simulator
that has been developed to run on loosely-coupled systems.
It has the ability to balance the application-specific load of
cooperating simulator instances in dependence of the overall
load situation on involved processor nodes. Thereby,
the load of a simulator instance is expressed in terms of
a set of circuit model parts which are to be simulated by
the corresponding instance. The centralized load management
runs simultaneously with a parallel simulation. Both
processes interact after a controllable number of simulated
clock-cycles to transmit load information and realize load
modifications. dlbSIM is successfully used to simulate IBM
S/390 processor models.
-
Architecture Driven Partitioning [p. 479]
-
J. Küter and E. Barke
In this paper, we present a new algorithm to partition
netlists for logic emulation under consideration of the
targeted emulator architecture. The proposed algorithm
allows the flexible use for a wide variety of applications
because the description of the architecture is part of the
input data. It combines a new approach of finding and
improving an initial solution with existing algorithms to
cluster the netlist and optimize the number of cut nets
between blocks. As a result, the algorithm ensures that the
cut nets between the created blocks can be connected
within the emulation system, even without a full interconnect
structure. Experiments on a number of designs and
architectures demonstrate that the algorithm is competitive
for architectures with full interconnect and that it is
unique for architectures with limited interconnect resources.
Moderator: C. Piguet, CSEM, Neuchatel, CH
-
Low-Power Systems on Chips (SOCs) [p. 488]
-
C. Piguet, M. Renaudin, and T. Omnès
For innovative portable products, Systems on Chips (SoCs)
containing several processors, memories and specialised
modules are obviously required. Performances but also
low-power are main issues in the design of such SoCs.
Are these low-power SoCs only constructed with low-power
processors, memories and logic blocks? If the latter are unavoidable,
many other issues are quite important for low-power
SoCs, such as the way to synchronise the communications
between processors as well as test procedures, on-line
testing, software design and development tools. This
paper is a general framework for the design of low-power
SoCs, starting from the system level to the architecture level,
assuming that the SoC is mainly based on the re-use of low-power
processors, memories and logic peripherals.
Moderators: H. Kerkhoff, Twente U, NL; J. Pineda de Gyvez, Philips Research, NL
-
Static and Dynamic Behavior of Memory Cell Array Opens and Shorts in Embedded
DRAMs [p. 496]
-
Z. Al-Ars and A. van de Goor
Fault analysis of memory devices using defect
injection and simulation is becoming increasingly important
as the complexity of memory faulty behavior increases.
In this paper, this approach is used to study the effects
of opens and shorts on the faulty behavior of embedded
DRAM (eDRAM) devices produced by Infineon Technologies.
The analysis shows the existence of previously defined
memory fault models, and establishes new ones. The
paper also investigates the concept of dynamic faulty behavior
and establishes its importance for memory devices.
Conditions to test the newly established fault models are
also given.
Key words: Embedded DRAM, functional fault models,
fault primitives, defect simulation, opens, shorts.
-
Definitions of the Numbers of Detections of Target Faults and their
Effectiveness in Guiding Test Generation for High Defect Coverage [p. 504]
-
I. Pomeranz and S. Reddy
The number of times a fault f in a combinational circuit is
detected by a given test set T was shown earlier to affect the
defect coverage of the test set. The earlier definition counted
each test in T, that detects f, as a distinct detection of f. This
definition counts two tests as distinct detections even if they
differ only in the values of inputs that do not affect the activation
or propagation of the fault. In this work, we introduce a stricter
definition that requires that two counted tests would be different
in the way they activate and/or propagate the fault. We describe
procedures for constructing test sets based on the stricter
definition, and compare them to test sets for the earlier, less strict
definition. The results show a simple criterion to decide when it
may be necessary to combine the two definitions in order to
obtain a high quality test set.
-
CMOS Open Defect Detection by Supply Current Test [p. 509]
-
M. Hashizume, M. Ichimiya, H. Yotsuyanagi, and T. Tamesada
In this paper, a new test method is proposed for
detecting open defects in CMOS ICs. The method is based
on supply current of ICs generated by applying time-variable
electric field from the outside of the ICs. The
feasibility of the test is examined by some experiments.
The empirical results promised us that by using the
method, open defects in CMOS ICs can be detected by
measuring supply current which flows when time-variable
electric field is applied.
-
Full Chip False Timing Path Identification: Applications to the
PowerPCTM Microprocessors [p. 514]
-
J. Zeng, M. Abadir, J. Bhadra, and J. Abraham
Static timing analysis sets the industry standard in the
design methodology of high speed/performance microprocessors
to determine whether timing requirements have
been met. Unfortunately, not all the paths identified using
such analysis can be sensitized. This leads to a pessimistic
estimation of the processor speed. Also, no amount of engineering
effort spent on optimizing such paths can improve
the timing performance of the chip. In the past, we demonstrated
initial results of how ATPG techniques can be used
to identify false paths efficiently[1]. Due to the gap between
the physical design on which the static timing analysis of
the chip is based and the test view on which the ATPG techniques
are applied to identify false paths, in many cases
only sections of some of the paths in the full-chip were analyzed
in our initial results. In this paper, we will fully analyze
all the timing paths using the ATPG techniques, thus
overcoming the gap between the testing and timing analysis
techniques. This enables us to do false path identification
at the full-chip level of the circuit. Results of applying our
technique to the second generation G4 PowerPCTM
will be presented.
Moderator: P. Wambacq, IMEC, B
-
CAD for RF Circuits [p. 520]
-
P. Wambacq, G. Vandersteen, J. Phillips, J. Roychowdhury, W. Eberle, B. Yang,
D. Long, and A. Demir
Wireless transceivers for digital telecommunications are
heterogeneous systems that combine digital hardware,
software and analog circuitry. The pressure to miniaturization
and lower power consumption for these transceivers
imposes tight specifications on their analog RF parts.
Many aspects of RF circuits cannot be simulated accurately
and efficiently with a classical circuit-level SPICE
approach. In this paper three important simulation problems
for RF circuits are addressed:
1. high-level simulation of analog and RF blocks for the
determination of the specifications of the circuits
2. accurate circuit-level simulation of nonlinear circuits
with time constants that differ largely,
3. efficient and accurate computation of phase noise in RF
oscillators
For each of these problems, solutions are proposed. These
solutions illustrate that accurate and efficient simulations
of RF communication circuits need a heterogeneous variety
of advanced algorithms.
Moderators: J. Lienig, Robert Bosch GmbH, D; A. Takahashi, Tokyo IT, JP
-
Modeling Crosstalk Noise for Deep Submicron Verification Tools [p. 530]
-
P. Bazargan-Sabet and F. Ilponse
In deep submicron technologies, the verification task has
to cover some new issues to certify the correctness of a
design. The noise produced by crosstalk couplings is one
of these emerging problems. In this paper, we propose a
model to evaluate the peak value of the noise injected on
a signal when its neighboring signals make their
transitions. This model has been used in a prototype
verification tool and has shown a satisfying performace-accuracy
ratio.
-
A Graph Based Algorithm for Optimal Buffer Insertion under Accurate Delay Models [p. 535]
-
Y. Gao and D. Wong
Buffer insertion is an efficient technique in interconnect optimization.
This paper presents a graph based algorithm for
optimal buffer insertion under accurate delay models. In our
algorithm, a signal is accurately represented by a finite ramp
which is characterized by two parameters, shift time and transition
time. Any accurate delay model, such as delay models based on
the transmission line model and SPICE simulations, can be incorporated
into our algorithm. The algorithm
determines the optimal number of buffers and their locations
on a wire such that some optimization objective is satisfied.
Two typical examples of such optimization objectives are minimizing
the 50% threshold delay and minimizing the transition time. Both
can be easily determined in our algorithm.
We show that the buffer insertion problem can be reduced to
a shortest path problem. The algorithm can be easily extended
for simultaneous buffer insertion and wire-sizing, and complexity
is still polynomial. The algorithm can also be extended to
deal with problems such as buffer insertion subject to transition
time constraints at any position along the wire.
-
Repeater Block Planning under Simultaneous Delay and Transition Time Constraints [p. 540]
-
P. Sarkar and C. Koh
We present a solution to the problem of repeater block planning
under both delay and signal transition time constraints for a given
floorplan. Previous approaches have considered only meeting the
target delay of a net. However, it has been observed that the repeater
planning for meeting the delay target can cause signals on
long interconnects to have very slow transition rates. Experimental
results show that our new approach satisfies both timing constraints
for an average of 79% of all global nets for six MCNC benchmark
floorplans studied (at 1GHz frequency), compared with an average
of 22% for the repeater block planner in [11].
Moderators: V. Meyer zu Bexten, Atmel Germany GmbH, D; E. Barke, Hannover U, D
-
On-The-Fly Layout Generation for PTL Macrocells [p. 546]
-
L. Macchiarulo, L. Benini, and E. Macii
Pass transistor logic (PTL) has been recently proposed as
an alternative to standard MOS for aggressive circuit design.
Even though PTL has been successful in a few handcrafted designs,
its acceptance into mainstream digital design critically depends
on the availability of tools for logic
and physical synthesis and optimization. The automatic
synthesis of pass transistor circuits starting from BDDs has
been intensively studied in the past with promising results,
but back-end tools for PTL cell generation are still missing.
We describe an automatic layout generator that has
been designed for seamless integration in a library-free PTL
design flow. The generator exploits the distinctive characteristics
of pass transistor networks produced by synthesis
to achieve quality of results comparable with state-of-the
art commercial cell generation tools in a fraction of the
execution time.
-
Automatic Datapath Tile Placement and Routing [p. 552]
-
T. Serdar and C. Sechen
We report the very first fully automatic datapath tile
layout flow. We subdivided the placement process into
two steps: a global placement step using simulated annealing,
and a new detailed placement step based on extensive
modifications we made to the O-tree algorithm.
The modifications have enabled the extended O-tree algorithm
to handle the rectilinearly shaped transistor
chains and gates common in datapath tile layout. We
show that datapath tiles can be placed and routed automatically
at the transistor level or at the mixed transistor/
gate level, achieving results for the very first time that
are competitive to those obtained manually by a skilled
designer.
-
A Boolean Satisfiability-Based Incremental Rerouting Approach with Application
to FPGAs [p. 560]
-
G. Nam, K. Sakallah, and R. Rutenbar
Incremental redesign is an increasingly essential step in
any complex design. Late changes or corrections in
functional specifications (so-called "engineering change
orders" or ECOs) force us to search for a minimal
perturbation that achieves the desired repair. In
reconfigurable design scenarios, these incremental
repairs may be in response to physical faults: the goal is
to "design around" the fault. For FPGAs, incremental
rerouting is an essential component of this repair
problem. We develop a new incremental rerouting
algorithm for FPGAs using techniques from Boolean
Satisfiability (SAT). In this application, these techniques
have the twin virtues that they (1) represent all possible
routing (and rerouting) constraints simultaneously and
exactly, and (2) search for rerouting solutions by
perturbing all nets concurrently. Preliminary results are
promising. For several FPGA benchmarks, we were able
to reroute fault reconfigurations that perturb up to 5.74%
of all nets for a small number of fault sets (one to four
faults) with only 1.55 track overhead per channel on
average, with CPU time 0.76 to 4.91 seconds/fault.
Moderators: J. Plantin, Ericsson Radio Systems, SE; L. Lavagno, Udine U, IT
-
Dual Transitions Petri Net Based Modelling Technique for Embedded Systems
Specification [p. 566]
-
M. Varea and B. Al-Hashimi
This paper presents a new modelling technique capable of modelling
both control and data information using a single unified
approach. This is achieved by modifying the classical
Petri Net structure, allowing it to have two types of transitions
and arcs. As a consequence, loops and conditional operations
within complex specifications are easily identified. The system
dynamic behaviour is modelled using a new marking scheme
of the net consisting of a new element called value for data
representation in addition to classical tokens used for control
purpose. Structural definitions, behavioural rules and graphical
representation of the new modelling technique are given.
One potential application of the proposed modelling technique
is the internal representation of embedded systems specification.
Two examples are included illustrating the applicability
and efficiency of the proposed modelling technique.
-
Probabilistic Application Modeling for System-Level Performance Analysis
[p. 572]
-
R. Marculescu and A. Nandi
The objective of this paper is to introduce the Stochastic
Automata Networks (SANs) as an effective formalism
for application modeling in system-level analysis. More precisely,
we present a methodology for application modeling for
system-level power/performance analysis that can help the
designer to select the right platform and implement a set of
target multimedia applications. We also show that, under various
input traces, the steady-state behavior of the application
itself is characterized by very different 'clusterings' of the
probability distributions. Having this information available,
not only helps to avoid lengthy profiling simulations for predicting
power and performance figures, but also enables efficient
mappings of the applications onto a chosen platform.
We illustrate the benefits of our methodology using the
MPEG-2 video decoder as the driver application.
Keywords: system-level design, performance analysis, application
modeling, stochastic automata networks, embedded
multimedia systems.
-
Reliable Estimation of Execution Time of Embedded Software [p. 580]
-
P. Giusto, G. Martin, and E. Harcourt
Estimates of execution time of embedded software play
an important role in function-architecture co-design. This
paper describes a technique based upon a statistical approach
that improves existing estimation techniques. Our
approach provides a degree of reliability in the error of the
estimated execution time. We illustrate the technique using
both control-oriented and computational-dominated benchmark
programs.
Moderators: M. Renovell, LIRMM, F; B. Kruseman, Philips Research, NL
-
Implementation of a Linear Histogram BIST for ADCs [p. 590]
-
F. Azaïs, S. Bernard, Y. Bertrand, and M. Renovell
This paper validates a linear histogram BIST scheme for
ADC testing. This scheme uses a time decomposition
technique in order to minimize the required hardware
circuitry. A practical implementation is described and
the structure together with the operating mode of the
different modules are detailed. Through this practical
implementation, the performances and limitations of the
proposed scheme are evaluated both in terms of
additional circuitry and test time.
-
Test Generation Based Diagnosis of Device Parameters for Analog Circuits [p. 596]
-
S. Cherubal and A. Chatterjee
With the increasing complexity of manufacturing processes and
the shrinking of device geometries, the performance metrics of
integrated circuits (ICs) are becoming increasingly sensitive to
random fluctuations in the manufacturing process. We propose a
diagnosis methodology that can be used to infer the cause(s) of
variations in performance of analog ICs. The methodology consists
of (a) a device parameter computation technique which is
used to compute the device parameters of an IC from measurements
made on it and (b) a cause-effect analysis module that is
used to compute the cause of the variation in performance metrics
of a given set of ICs. Simulation results to demonstrate the effectiveness
of the technique are presented.
-
Generation of Optimum Test Stimuli for Nonlinear Analog Circuits Using
Nonlinear Programming and Time -Domain Sensitivities [p. 603]
-
B. Burdiek
In this paper a novel approach for the generation of an
optimum transient test stimulus for general analog circuits
is proposed. The test stimulus is optimal with respect to the
detection of a given fault set by means of a predefined fault
detection criterion. The problem of finding an optimum test
stimulus detecting all faults from the fault set is formulated
as a nonlinear programming problem. A functional
describing the differences between the good and all faulty
test responses of the circuit serves as a merit functional for
the programming problem. A parameter vector completely
describing the test stimulus is used as the optimization
vector. The gradient of the merit functional required for the
optimization is computed using time-domain sensitivities.
Since in this approach the evaluation of the fault detection
criterion represented by the merit functional flows directly
into the computation of the test stimulus, optimal test stimuli
for hard to detect faults can be generated. If more than one
input terminal is used for testing, several test stimuli can be
generated simultaneously.
Organizer: D. Davis, Actel, USA
Moderator: R. Wilson, EETimes, USA
Panellists: T. Kambe, Sharp, JP; B. Gupta, STmicroelectronics, USA;
C. Balough, Triscend, USA; Y. Tanurhan, Actel, USA
-
Managing the SoC Design Challenge with "Soft" Hardware [p. 610]
-
R. Wilson
Panel members will discuss, from their individual
perspectives, why embedded reconfigurability has
become critical to the future success of systems-on-a-chip
and how they are attempting to implement solutions.
The Opportunity: Implementing reconfigurable logic
within SoCs will also help to expand and differentiate
members of product families as well as extend product
lifecycles and reduce design and test cycles, thus
shortening product time to market. Having
reconfigurability in system-on-a-chip silicon will increase
design flexibility by allowing re-use of design elements to
create differentiated products. Changing or revising logic
elements on the fly via reconfigurability to meet changes
in standards or features or to fix design errors will help
avoid increasingly expensive NRE re-spins.
Moderators: J. Henkel, NEC, USA; R. Leupers, Dortmund U, D
-
Integrated Hardware-Software Co-Synthesis and High-Level Synthesis for Design i
of Embedded Systems under Power and Latency Constraints [p. 612]
-
A. Doboli
This paper presents an integrated approach to hardware-software
co-synthesis and HLS for design of low-power embedded
systems. The main motivation for this work is that
fine trade-offs between latency and power can be explored at
the system level only with a detailed knowledge of used hardware
resources. Integrated method was realized as a simulated
annealing based solution-space exploration. Exploration
is guided by Performance Models, that exactly capture
the relationship between performances i.e. power consumption
and latency and design decisions i.e. binding and
scheduling. The proposed approach permits not only a more
accurate latency and power estimation but also the exposure
of RTL-level design decisions at the system level. As a result,
more effective power-latency trade-offs are possible during
co-synthesis as compared to traditional task-level methods.
-
Allocation and Scheduling of Conditional Task Graph in Hardware/Software
Co-Synthesis [p. 620]
-
Y. Xie and W. Wolf
This paper introduces an allocation and scheduling algorithm
that efficiently handles conditional execution in
multi-rate embedded system. Control dependencies are introduced
into the task graph model. We propose a mutual
exclusion detection algorithm that helps the scheduling
algorithm to exploit the resource sharing. Allocation
and scheduling are performed simultaneously to take advantage
of the resource sharing among those mutual exclusive
tasks. The algorithm is fast and efficient,and so is suitable
to be used in the inner loop of our hardware/software
co-synthesis framework which must call the scheduling routine
many times.
-
Code Placement in Hardware Software Co -Synthesis to Improve Performance and
Reduce Cost [p. 626]
-
S. Parameswaran
This paper introduces an algorithm for code placement in
cache, and maps it to memory using a second algorithm. The
target architecture is a multiprocessor system with 1st level
cache and a common main memory. These algorithms
guarantee that as many instruction codewords as possible of
the high priority tasks remain in cache all of the time so that
other tasks do not overwrite them. This method improves the
overall performance, and might result in cheaper systems if
more powerful processors are not needed. Amount of memory
increase necessary to facilitate this scheme is in the order of
13%. The average percentage of highest priority tasks always
in memory can vary from 3% to 100% depending upon how
many tasks (and their sizes) are allocated to each processor.
-
System-On-A-Chip Processor Synchronization Support in Hardware [p. 633]
-
B. Saglam and V. Mooney III
For scalable-shared memory multiprocessor System-on-a-Chip
implementations, synchronization overhead
may cause catastrophic stalls in the system. Efficient
improvements in the synchronization overhead in terms of
latency, memory bandwidth, delay and scalability of the
system involve a solution in hardware rather than in
software. This paper presents a novel, efficient, small and
very simple hardware unit that brings significant
improvements in all of the above criteria: in an example,
we reduce time spent for lock latency by a factor of 4.8,
the worst-case execution of lock delay in a database
application by a factor of more than 450. Furthermore,
we developed a software architecture together with RTOS
support to leverage our hardware mechanism. The worst-case
simulation results of a client-server example on a
four-processor system showed that our mechanism
achieved an overall speedup of 27%.
Moderators: K. Buchenrieder, Infineon Technologies, D; H. Grünbaecher,
Carinthia Tech. Inst., Villach, A
-
A Decade of Reconfigurable Computing: A Visionary Retrospective [p. 642]
-
R. Hartenstein
The paper surveys a decade of R&D on coarse
grain reconfigurable hardware and related CAD, points out
why this emerging discipline is heading toward a dichotomy
of computing science, and advocates the introduction of a
new soft machine paradigm to replace CAD by compilation.
-
Hierarchical Memory Mapping during Synthesis in FPGA -Based Reconfigurable
Computers [p. 650]
-
I. Ouaiss and R. Vemuri
One step in the synthesis for FPGA-based Reconfigurable
Computers (RCs) involves mapping the design data
structures onto the physical memory banks available in the
hardware. The advent of Xilinx Virtex-style FPGAs and of
hierarchical memory schemes on reconfigurable boards introduced
an added complexity to this mapping. The new
RC boards offer a wealth of memory banks many of them
on-chip (such as the BlockRAMs available in the Virtex architecture)
and many of them offering variable number of
ports and several depth/width configurations. Along with
the external RAMs, a hierarchy of memories with varying
access performances are available in a reconfigurable computer.
It becomes critical to perform a good mapping to
achieve optimal design performance. This paper presents
an automatic memory mapping methodology which takes
into account: the number of words and word size of design
data segments and physical memory banks, number of
ports on the banks, access latency of the banks, proximity of
the banks to the processing unit, life cycle analysis of data
segments, and it also incorporates configuration selection
from the multiple configurations available in BlockRAMs of
Virtex series FPGAs. In the case of multiple processing elements
on board, the paper also provides a framework in
which the task of memory mapping interacts with spatial
partitioning to provide the best implementation.
-
Optimal FPGA Module Placement with Temporal Precedence Constraints [p. 658]
-
S. Fekete, E. Köhler, and J. Teich
We consider the optimal placement of hardware modules
in space and time for FPGA architectures with reconfiguration
capabilities, where modules are modeled as
three-dimensional boxes in space and time. Using a graph-theoretic
characterization of feasible packings, we are able
to solve the following problems:
(a) Find the minimal execution time of the given problem
on an FPGA of fixed size,
(b) Find the FPGA of minimal size to accomplish the tasks
within a fixed time limit.
Furthermore, our approach is perfectly suited for the treatment
of precedence constraints for the sequence of tasks,
which are present in virtually all practical instances. Additional
mathematical structures are developed that lead to a
powerful framework for computing optimal solutions. The
usefulness is illustrated by computational results.
Moderators: P. Marwedel, Dortmund U, D; Z. Peng, Linkoping U, SE
-
Generation of Minimal Size Code for Schedule Graphs [p. 668]
-
C. Passerone, Y. Watanabe, and L. Lavagno
This paper proposes a procedure for minimizing the code
size of sequential programs for reactive systems. It identifies
repeated code segments (a generalization of basic blocks to directed
rooted trees) and finds a minimal covering of the input
control flow graphs with code segments. The segments are disjunct,
i.e. no two segments have the same code in common.
The program is minimal in the sense that the number of code
segments is minimum under the property of disjunction for the
given control flow specification.
The procedure makes no assumption on the target processor
architecture, and is meant to be used between task synthesis
algorithms from a concurrent specification and a standard
compiler for the target architecture. It is aimed at optimizing
the size of very large, automatically generated flat code,
and extends dramatically the scope of classical common sub-expression
identification techniques.
The potential effectiveness of the proposed approach is
demonstrated through preliminary experiments.
-
Generating Production Quality Software Development Tools Using a Machine
Description Language [p. 674]
-
A. Hoffmann, A. Nohl, S. Pees, G. Braun, and H. Meyr
This paper presents a methodology to automatically generate
production quality software development tools for
programmable architectures using the machine description
language LISA. Various architectures presenting diverse
architectural originalities will be presented and the feasibility
of automatically generating simulator, assembler, linker
and graphical debugger frontend will be discussed. The
presented approach is not limited to a fixed abstraction level
-- case studies of the Texas Instruments C62x and C54x, the
Analog Devices ADSP2101 as well as the ARM7 will show
the applicability of the methodology from cycle/phase to instruction
accurate models.
-
Automatic Generation and Targeting of Application Specific Operating Systems
and Embedded Systems Software [p. 679]
-
L. Gauthier, S. Yoo, and A. Jerraya
We propose a method of automatic generation of application
specific operating systems (OS's) and automatic targeting
of application software. OS generation starts from a
very small but yet flexible OS kernel. OS services, which are
specific to the application and deduced from dependencies
between services, are added to the kernel to construct the
whole OS. Communication and synchronization functions
in the application code are adapted to the generated OS. As
a preliminary experiment, we applied the proposed method
to a system example called token ring system.
-
Cache Conscious Data Layout Organization for Embedded Multimedia Applications
[p. 686]
-
C. Kulkarni, C. Ghez, M. Miranda, F. Catthoor, and H. De Man
Cache misses form a major bottleneck for real-time multimedia applications
due to the off-chip accesses to the main memory. This results in both a
major access bandwidth overhead (and related power consumption) as well
as performance penalties. In this paper, we propose a new technique for
organizing data in the main memory for data dominated multimedia applications
so as to reduce majority of the conflict cache misses. The focus of this
paper is on the formal and heuristic algorithms we use to steer the data
layout decisions and the experimental results obtained using a prototype
tool. Experiments on real-life demonstrators illustrate that we are
able to reduce up to 82% of the conflict misses for applications that
are already aggressively transformed at the source-level. At the same
time, we also reduce the of-chip data accesses by up to 78% and combined
with address optimizations we are able to reduce the execution time. Thus
out approach is complimentary to the more conventional way of reducing misses
by reorganizing the execution order.
Organizer and Moderator: G. Gielen, KU Leuven, B
Panellists: B. Sorensen, Atrium Design Solutions; H. Casier, Alcatel Microelectronics, B;
P. Magarshack, STMicroelectronics, F; J. Rodriguez, Anacad; J. Pollet, Dolphin, F
-
Design Challenges and Emerging EDA Solutions in Mixed-Signal IC Design [p. 694]
With increasing integration levels, more and more ICs
and systems-on-chip turn into mixed-signal designs.
Typical examples are telecom (Bluetooth, WLAN,
xDSL...and multimedia (digital video, MP3 audio...)
systems. This hot topic session will explore the
challenges that designers face with these mixed-signal
designs, covering both technical and methodological
challenges as well as engineering resource and skill
shortage problems. On the technical side, basic challenges
are in incorporating analog design in a digital-oriented
system design flow, signal integrity problems (supply and
substrate noise, crosstalk...), trailing analog design
productivity and test. In addition, the session will discuss
the emerging progress in the methodology and EDA field,
ranging from new software startups to analog and mixed-signal
IP providers.
The session will start with a brief tutorial overview
about the problems and emerging solutions in the mixed-signal
domain, for the audience to get an update of the
current state of the art in mixed-signal. This will be
followed by a panel discussion, where the goal for the
audience is to really explore where the unaddressed
problems are in mixed-signal design and which problems
are today close to being solved commercially in this
dynamically moving market. Issues addressed by the
panel members include the integration of analog and
mixed-signal IP, the emergence of mixed-signal CAD
tools including behavioral modeling and simulation as
well as analog synthesis, the challenge of rapid
technology changes and analog design retargeting, the
mixed-signal signal integrity nightmare, the rise of
specialized mixed-signal design companies, single-chip
versus single-package integration, the trimming of analog
courses in many recently restructured EE curricula and
the shortage of analog designers.
Organizers/Moderators: W. Rosenstiel, FZI/Tübingen U, D; Y. Nakamura,
Kyoto U, JP
Speakers: H. Tago, System LSI R&D Center, Toshiba Semiconductor Company;
A. Mandapati, ATI Research Inc (Subsidiary of Nintendo in the US);
S. Narita, Advanced Microcomputer Business Operation, System LSI Business Division, Hitachi Ltd.
-
CPU for PlayStation®2 [p. 696]
-
H. Tago, K. Hashimoto, N. Ikumi, M. Nagamatsu, M. Suzuoki, and Y. Yamamoto
Processors designed for computer entertainment must
perform 3D graphics calculations, especially geometry
and perspective transformations. In the PlayStationR2, we
introduced the new idea of synthesizing emotion called
Emotion Synthesis and devised a new processor
architecture to support its graphics demands. The
architecture is embodied in the PlayStationR2's "Emotion
Engine" CPU, which uses vector units (VUs) as the key
units for floating-point calculations. Emotion synthesis
means the real-time synthesis of a computer graphics
animation scene that projects a great deal of atmosphere.
For example, when a female character walks into a video
game scene, her motion must be determined by solving
physical equations in response to interactive events
instead of replaying prerecorded data. Moreover,
differential equations with a large number of variables
must be used to describe, for example,
the waving motions of her hair in a breeze. For
authenticity in emotion synthesis, the CPU must execute
these calculations in real time. "Emotion Engine" ("EE")
is a system LSI including a 300MHz 128-bit 2-way
superscalar RISC core, two Vector Units ("VU"s), Image
Processing Unit ("IPU") for MPEG-2 stream decode, a
10-channel memory access (DMA) controller, two
channel RambusR memory controller (RAC) and other
peripheral modules. 13.5M transistors are integrated on
15.02mm x 15.04mm die with 0.25um device technology
with 0.18um gate length. Design strategy and LSI design
methodologies and CAD for "Emotion Engine" LSI are
presented with emphasis on practical aspects of
verification and timing closure. A combination of
simulation, emulation and formal verification ensured the
functional first silicon for system evaluation. In order to
control wire delay in early design stage, floor-plan based
synthesis and wire load estimation are adopted for quick
timing closure.
-
Implementation of the ATI Flipper Chip [p. 697]
-
A. Mandapati
The Nintendo GameCube(tm) video game console
system is designed to outpace all other such systems
when released. Formerly known by the codename
Dolphin, this system includes an IBM PowerPC(tm)
processor and specialized hardware from ATI. This
specialized hardware is embodied in ATI's Flipper chip,
the centerpiece in the Dolphin design. Flipper functions
as the graphics processor, audio processor, host
controller, memory controller, and I/O processor of the
Dolphin system. Such a complex chip requires a very
robust design flow to get to functioning silicon in as little
time as possible. Here we will describe that design flow,
developed by ATI engineers to implement the Flipper
design. The goal was to develop a flow to implement the
best gaming hardware on a chip that needed to be as cost-effective
as possible. There were many challenges the
design offered, requiring optimal use of a small design
team with a minimal budget to achieve aggressive
schedules. The biggest challenge the team was presented
was that of area. With high volumes, chips for consumer
devices can benefit greatly from smaller die sizes, due in
part to higher yields and also in part to lower power and
cheaper packages. Another daunting challenge the design
offered was that of the use of embedded DRAM. The
Dolphin architecture called for the use of an embedded
frame buffer and texture memory buffer for fast access.
-
SH-4 RISC Microprocessor for Multimedia, Game Machine [p. 699]
-
S. Narita
The SH-4 is a 2-issue superscalar 32-bit RISC
microprocessor for SEGA's game machine, Dreamcast.
In order to extend the floating-point performance, a
graphic FPU and graphic-oriented instructions are
provided. The performance is 360 VAX MIPS, 6.0M
Polygons/sec, 1.4G FLOPS(peak with the new
instructions) at 200MHz.
Moderators: A. Oliveira, IST/INESC, PT; E. Macii, Politecnico di Torino, IT
-
Streaming BDD Manipulation for Large-Scale Combinatorial Problems [p. 702]
-
S. Minato and S. Ishihara
We propose a new BDD manipulation method that never
causes memory over ow or swap out. In our method,
BDD data are accessed through the I/O stream ports.
We can read unlimited length of BDD data streams usng
a limited size of the memory, and the result of BDD
data streams are concurrently produced. Our streaming
method features that (1) it gives a continuous trade-off
between the memory usage and the streaming data length,
(2) a valid partial result can be obtained before completing
process, and (3) easily accelerated by pipelined multiprocessing.
Experimental result shows that our new method is
especially useful for the cases where conventional BDD
packages are ineffective. For example, we succeeded in
finding a number of solutions to a SAT problem using a
commodity PC with a 64 MB memory, where the conventional
method will require a 100 memory to compute it.
BDD manipulation has been considered as an intensively
memory-consuming procedure, but now we can also
utilize the hard disk and network resources as well. Our
method will lead a new style of BDD applications.
-
Binary Decision Diagram with Minimum Expected Path Length [p. 708]
-
Y. Liu, K. Wang, T. Hwang, and C. Liu
We present methods to generate a Binary Decision Diagram
(BDD) with minimum expected path length. A BDD
is a generic data structure which is widely used in several
fields. One important application is the representation of
Boolean functions. A BDD representation enables us to
evaluate a Boolean function: Simply traverse the BDD from
the root node to the terminal node and retrieve the value in
the terminal node. For a BDD with minimum expected path
length will be also minimized the evaluation time for the
corresponding Boolean function. Three efficient algorithms
for constructing BDDs with minimum expected path length
are proposed.
-
Spectral Decision Diagrams Using Graph Transformations [p. 713]
-
M. Thornton and R. Drechsler
Spectral techniques are powerful methods for synthesis
and verification of digital circuits. The advances in DD representations
for discrete valued functions in terms of computational
efficiency can be exploited in the calculation of
the spectra of Boolean functions. The classical approach in
computing the spectrum of a function by taking advantage
of factored transformation matrices as used in the "Fast
Fourier Transform" may be reformulated in terms of DD
based graph algorithms resulting in a complete representation
of the spectrum. The relationship between DD based
interpretations and the linear algebra based definitions of
spectral methods are described.
Moderator: A. Jerraya, TIMA, Grenoble, F
Speaker: G. Matheron, MEDEA Office Director, Paris, F
-
Electronic System Design Methodology: Europe's Positioning [p. 720]
The engine that drives all the ICT industries is
microelectronics. By 2015, according to Mark Pinto
of Bell Labs, the microelectronics industry "will be
manufacturing 10 million silicon transistors per
human being per day ... and the applications will
exist to consume them".
Microelectronics, through its dramatic increase in
performance, is the enabler of this revolution. Soon
entire products -- such as mobile telephones,
computers and camcorders -- will be based on single
silicon chips, reducing product cost and price,
opening new markets and boosting manufacturing.
Microelectronic chips, together with embedded
software, drive the entire ICT industry, by doubling
performance and halving cost every 18 months,
allowing continuous innovation in products such as
mobile phones and smart cards and in services like
the Internet and e-commerce. The chips generate new
products used by professionals and laymen: 60% of
today's electronics applications have been made
possible solely by the technical progress of
microelectronics.
Gérard Matheron will describe how the evolution of
electronic system design is changing the world.
Moderators: R. Lauwereins, KU Leuven, B; R. Hartenstein, Kaiserslautern U, D
-
Precision and Error Analysis of MATLAB Applications during Automated
Hardware Synthesis for FPGAs [p. 722]
-
A. Nayak, M. Haldar, A. Choudhary, and P. Banerjee
We present a compiler that takes high level signal and image processing
algorithms described in MATLAB and generates an optimized hardware for
an FPGA with external memory. We propose a precision analysis algorithm
to determine the minimum number of bits required by an integer variable
and a combined precision and error analysis algorithm to infer the minimum
number of bits required by a floating point variable. Our results show
that on an average, our algorithms generate hardware requiring a factor of
5 less FPGA resources in terms of the Configurable Logic Blocks (CLBs)
consumed as compared to the hardware generated without these optimizations.
We show that our analysis results in the reduction in the size of lookup
tables for functions like sin, cos, sqrt, exp etc. Our precision analysis also
enables us to pack various array elements into a single memory location to
reduce the number of external memory accesses. We show that such a technique
improves the performance of the generated hardware by an average
of 35%.
-
A HW/SW Partitioning Algorithm for Dynamically Reconfigurable Architectures [p. 729]
-
J. Noguera and R. Badia
"System-On-Chip" has become a reality, and recently
new reconfigurable devices have appeared. However, few
efforts have been carried out in order to define HW/SW
codesign methodologies and algorithms which address the
challenges presented by new reconfigurable devices.
In this paper we address this open problem and present
a novel HW/SW partitioning algorithm for dynamically
reconfigurable architectures. The algorithm is a
constructive algorithm, which obtains an initial solution
and afterwards tries to optimize it. The HW/SW
partitioning is done taking into account the features of the
dynamically reconfigurable devices, and its final goal is
minimize the reconfiguration latency.
The partitioning algorithm has been implemented and
integrated into our developed codesign environment,
where several experiments have been carried out. The
results obtained demonstrate the benefits of the algorithm.
-
Managing Dynamic Reconfiguration Overhead in Systems -On-A-Chip Design Using
Reconfigurable Datapaths and Optimized Interconnection Networks [p. 735]
-
Z. Huang and S. Malik
This research examines the role of dynamically
reconfigurable logic in systems-on-a-chip (SOC) design.
Specifically we study the overhead of storing and
downloading the configuration code bits for different
parts of an application in a dynamically reconfigurable
coprocessor environment. For SOC designs the different
configuration bit-streams will likely need to be stored on
chip, thus it becomes crucial to reduce the storage
overhead. In addition, reducing the reconfiguration time
overhead is crucial in realizing performance benefits.
This study provides insight into the granularity of the
reconfigurable logic that is appropriate for the SOC
context. Our initial study is in the domain of multimedia
and communication systems. We first present profiling
results for these using the MESCAL compiler
infrastructure. These results are used to derive an
architecture template that consists of dynamically
reconfigurable datapaths using coarse grain logic blocks
and a reconfigurable interconnection network. We justify
this template based on the constraints of SOC design. We
then describe a design flow where we start from an
application, derive the kernel loops via profiling and then
map the application using the dynamically reconfigurable
datapath and the simplest interconnection network. As
part of this flow we have developed a mapping algorithm
that minimizes the size of the interconnection network,
and thus the overhead of reconfiguration, which is key for
systems-on-a-chip. We provide some initial results that
validate our approach.
Moderators: P. Schwarz, FhG IIS/EAS Dresden, D; M. Rencz, TU Budapest, H
-
Simulation-Guided Property Checking Based on a Multi-Valued AR-Automata [p. 742]
-
J. Ruf, D. Hoffmann, T. Kropf, and W. Rosenstiel
The verification of digital designs, i.e., hardware or embedded
hardware/software systems, is an important task in the
design process. Often more than 70% of the development
time is spent for locating and correcting errors in the design.
Therefore, many techniques have been proposed to
support the debugging process. Recently, simulation and
test methods have been accompanied by formal methods
such as equivalence checking and property checking. However,
their industrial applicability is currently restricted to
small or medium sized designs or to a specific phase in
the design cycle. In this paper, we present a method for
verifying temporal properties of systems described in an
executable description language. Our method allows the
user to specify properties about the system in finite linear
time temporal logic (FLTL). These properties are translated
to a special kind of finite state machines which are
then efficiently checked on-the-fly during each simulation
run. Properties may be placed anywhere in the system description
and violations are immediately indicated to the designer.
-
Performance Improvement of Multi-Processor Systems Cosimulation
Based on SW Analysis [p. 749]
-
J. Jung, S. Yoo and K. Choi
In this paper, we propose a method for performance improvement
of multi-processor systems cosimulation by reducing synchronization
overhead between multiple simulators. To reduce the amount
of simulator synchronization, we predict synchronization time points
based on a static analysis of application software running on each
processor. In the experiments with real embedded systems, we obtained
up to orders of magnitude higher performance in cosimulation
runtimes.
-
Mixed-Level Cosimulation for Fine Gradual Refinement of Communication in SoC
Design [p. 754]
-
G. Nicolescu, S. Yoo, and A. Jerraya
In this paper, we propose a method of mixed-level cosimulation
that enables gradual refinement of SoC communication
from protocol-neutral communication to protocol-fixed
communication. For fine granularity in refinement,
the method enables the designer to perform channel refinement
and module refinement. Thus, the designer can perform
more extensive design space exploration in communication
refinement. We show the effectiveness of the proposed
method in a case study of communication refinement
in an IS-95 CDMA cellular phone system design.
-
A Framework for Fast Hardware-Software Co-Simulation [p. 760]
-
A. Hoffmann, T. Kogel, and H. Meyr
We present a new hardware-software co-simulation
framework enabling fast prototyping in system-on-chip designs.
On the software side, the machine description
language LISA allows the generation of bit-true models
of programmable architectures on various levels -- from
instruction-set to phase accuracy. Based on these models,
a complete tool-suite consisting of fast compiled processor
simulator, assembler, linker, HLL-compiler as well as co-simulation
interface can be generated automatically. On
the hardware side, the SystemC simulation class library is
employed and enhanced with our generic co-simulation interface
that enables the coupling of hardware and software
models specified at various levels of abstraction. Besides
that, a hardware modeling strategy using abstract macro-cycle
based C ++ processes to increase hardware modeling
efficiency and simulation speed is presented.
Moderators: J. Vital, IST, PT; A. Rueda, CNM, Seville U, ES; A. Vasquez, CNM,
Seville U, ES
-
Analog/Mixed-Signal IP Modeling for Design Reuse [p. 766]
-
N. Madrid, E. Peralías, A. Acosta, and A. Rueda
The application of design reuse to analog and
mixed-signal components for System-on-Chip
(SoC) is an emerging and revolutionary field.
This paper presents a methodological approach
to this area illustrated with a mixed-signal case
study.
-
A SkillTM-Based Library for Retargetable Embedded Analog Cores
[p. 768]
-
X. Jingnan, J. Vital, and N. Horta
This paper describes the automatic generation and re-usability
of physical layouts of analog and mixed-signal blocks
based on high-functionality pCells that are fully independent of
technologies. The high-functionality pCell library presently
contains over 42 pCells and is fully compliant with 7 different
sets of technology design rules from 5 different foundries.
Practical examples employed in industrial projects are illustrated.
-
Modelling SoC Devices for Virtual Test Using VHDL [p. 770]
-
M. Rona and G. Krampl
Virtual Test (VT) is a new technique to cut the time-to-market
especially for SoC products that inherently
contain complex mixed-signal blocks. VT allows
debugging test programs in a simulation environment if a
fast and sufficiently accurate IC model can be made
available. VHDL behavioural models turned out to be a
very promising approach to cover both the needs of
designers for the sign-off simulation on chip level and of
test engineers for VT. The trade-offs between modelling
effort, simulation performance and accuracy of results
will be discussed for VT applications based on an
industrial example.
-
Retargeting of Mixed-Signal Blocks for SoCs [p. 772]
-
R. Castro-López, F. Fernández, M. Delgado-Restituto, and A.
Rodríguez-Vázquez
This paper introduces a very efficient methodology for
retargetability of embedded mixed-signal blocks for
SoCs. The key parts of this methodology are: parameterised
layout templates at different hierarchical levels,
accurate behavioral modeling of mixed-signal blocks
and appropriate mechanisms to tuning sized circuits to
new sets of specs.
Organizer: C. Yeung, VSI Alliance, USA
Moderator: P. Clarke, Electronic Engineering Times, UK
Panellists: A. Haverinen, Nokia, FIN; USA; G. Matthews, STMicroelectronics, F;
J. Morris, ARM, UK, and J. Zaidi, Palmchip Corp., USA
-
Standard Bus vs. Bus Wrapper: What is the Best Solution for Future SoC Integration? [p. 776]
A number of companies have promoted their on-chip
busses as potential standards for the SoC industry.
VSIA's On-Chip Bus Development Working Group
chooses to develop a Standard Bus Wrapper (VCI) as
opposed to endorsing a single bus as the standard.
Standard Bus advocates claim Wrappers incur
performance and area overhead. Bus Wrapper advocates
claim no single On-Chip Bus will meet the needs of all
SoCs. Will a single bus emerge, and if not where should a
standard wrapper be used? Which is the correct approach
for future SoC Integration? This panel will include
experts from both of these perspectives, to discuss the
pros and cons of their positions.
Moderators: A. Brown, Southampton U, UK; P. Eles, Linkoping U, SE
-
Access Pattern Based Local Memory Customization for Low Power Embedded Systems
[p. 778]
-
P. Grun, N. Dutt, and A. Nicolau
Memory accesses represent a major bottleneck in embedded
systems power and performance. Traditionally, the local
memory relied on a large cache to store all the variables in
the application. However, especially in large real-life applications,
different types of data exhibit divergent types of locality
and access patterns, with diverse locality and bandwidth
needs. Traditional caches had to compromise between the different
types of locality required by the access patterns, and
trade-off performance against bandwidth requirement. Instead,
our approach customizes the local memory architecture
matching the diverse access patterns and locality types
present in the application, to reduce the main memory bandwidth
requirement, and significantly improve power consumption,
without sacrificing performance. Our approach generated
an average 30% memory power reduction without de-grading
performance on a set of large multimedia/general
purpose applications and scientific kernels, over the best traditional
cache configuration of similar size, demonstrating the
utility of our algorithm.
-
Static Memory Allocation by Pointer Analysis and Coloring [p. 785]
-
J. Zhu
Modern systems-on-chips often allocate more silicon
real-estate on memory than logic. The minimization of on-chip
memory becomes increasingly important for the reduction
of manufacturing cost. In this paper, we present a new
technique that minimizes memory usage. Incoporated in a
behavioral synthesis tool that synthesizes general-purpose
C programs, this technique is fully automated and does not
rely on users to explicitly specify dataflow information. Experimental
results show that significant improvements can
be achieved for the benchmark set.
-
Heuristic Datapath Allocation for Multiple Wordlength Systems [p. 791]
-
G. Constantinides, P. Cheung, and W. Luk
This paper introduces a heuristic to solve the combined
scheduling, resource binding, and wordlength selection
problem for multiple wordlength systems. The algorithm involves
an iterative refinement of operator wordlength information,
leading to a scheduled and bound data-flow graph.
Scheduling is performed with incomplete wordlength information
during the intermediate stages of this refinement
process. Results show significant area savings over known
alternative approaches.
-
On the Verification of Synthesized Designs Using Automatically Generated
Transformational Witnesses [p. 798]
-
E. Teica, R. Radhakrishnan, and R. Vemuri
This poster presents a new methodology for verifying the
synthesized designs, and for debugging the software implementation
of high-level synthesis algorithms. The methodology
is based on a set of 7 RTL transformations which are
able to emulate the effect of many scheduling and resource
allocation algorithms.
-
Property-Specific Witness Graph Generation for Guided Simulation [p. 799]
-
A. Casavant, A. Gupta, S. Liu, A. Mukaiyama, K. Wakabayashi, and P. Ashar
A practical solution to the complexity of design validation is
semi-formal verification, where the specification of correctness
criteria is done formally, as in model checking, but checking is
done using simulation, which is guided by directed vector
sequences derived from knowledge of the design and/or the
property being checked. Simulation vectors must be effective in
targeting the types of bugs designers expect to find rather than
some generic coverage metrics. The focus of our work is to
generate property-specific testbenches for guided simulation,
that are targeted either at proving the correctness of a full CTL
property or at finding a bug. This is facilitated by generation of a
property-specific model, called a "Witness Graph", which
captures interesting paths in the design. Starting from an initial
abstract model of the design, symbolic model checking, pruning,
and refinement steps are applied in an iterative manner, until
either a conclusive result is obtained or computing resources are
exhausted. The witness graph is annotated with, e.g., state or
transition priorities before testbench generation. The overall
testbench generation flow, and the iterative flow for witness
graph generation are shown in Figures 1 and 2.
-
Two Approaches for Developing Generic Components in VHDL [p. 800]
-
V. Stuikys, G. Ziberkas, R. Damasevicius, and G. Majauskas
We consider the one- and two-language approaches (1LA &
2LA) for developing generic components (GCs) for VHDL
generators. By 1LA & 2LA we mean a generalization using
"pure" VHDL, or using the VHDL abstractions mixed with Open
PROMOL, the external scripting language we have developed
for building GCs and generators, respectively. We present the
evaluation of both approaches.
-
Annotated Data Types for Addressed Token Passing Networks [p. 801]
-
G. Cichon and W. Bunnbauer
Introduction to Annotated Data Types Annotated data
types have proven to be a practical description form for interfaces
of SoC components to random addressable buses
(see [1]). The central idea behind this new approach is to
define a component's interface to a random addressable bus
in terms of a data structure it exposes to this bus. This data
structure is modeled using a type system similar to that of
computer languages, like C or VHDL. This data structure
is annotated with additional information relevant for hardware
description purposes. In [1], the underlying terminology
and framework, as well as a method for synthesizing the
functional adaptor part of hardware components, has been
described.
-
Testability Trade-Offs for BIST RTL Data Paths: The Case for Three Dimensional
Design Space [p. 802]
-
N. Nicolici and B. Al-Hashimi
Power dissipation during test application is an emerging
problem due to yield and reliability concerns. This paper
focuses on BIST for RTL data paths and discusses testability
trade-offs in terms of test application time, BIST area
overhead and power dissipation.
-
Towards a Better Understanding of Failure Modes and Test Requirements of ADCs
[p. 803]
-
A. Lechner, A. Richardson, and B. Hermes
It is now widely recognised that Built-In Self-Test (BIST)
techniques and Design-for-Testability (DfT) will be
mandatory to meet test and quality specifications in next
generation mixed signal ICs [1]. For evaluating,
verifying, and comparing testability improvements, a
more detailed understanding of circuit specific failure
modes is essential. This paper presents fault simulation
results for a 6-bit ADC and identifies typical failure
modes the converter is likely to exhibit and hence must be
tested for.
-
Exact Fault Simulation for Systems on Silicon that Protects Each Core's
Intellectual Property (IP) [p. 804]
-
M. Quasem and S. Gupta
We present a fault simulation approach for multi-core
systems on silicon (SOC) (a) that provides exact fault
coverage for the entire SOC, (b) does so without revealing
any intellectual property (IP) of core vendors, and (c)
whose run time is comparable to that required by the
existing approaches that require all IP to be revealed.
This fault simulator assumes a full scan SOC design and is
first in a suite of simulation, test generation, and DFT
tools that are currently under development. The proposed
approach allows flexibility in selection of a test
methodology for SOC, reduces test application cost and
area and performance overheads, and allows more
comprehensive testing.
-
Using Mission Logic for Embedded Testing [p. 805]
-
R. Dorsch and H. Wunderlich
Testing logic cores of a system-on-a-chip causes a high
test data volume which has to be stored on the external automatic
test equipment (ATE), a high bandwidth requirement
between ATE and the chip under test implying the need
for high-speed ATE. This paper reduces these requirements
by reusing embedded cores during test mode as embedded
testers. Hard, firm, and soft cores may be reused, since only
the functionality of the core in system mode is used.
-
A Regularity-Based Hierarchical Symbolic Analysis Method for Large-Scale Analog Networks [p. 806]
-
A. Doboli and R. Vemuri
The main challenge for any symbolic analysis method is
the exponential size of the produced symbolic expressions
[2] (1011 terms for an op amp [1]). Current research considers
two ways of handling this limitation: approximation
of symbolic expressions and hierarchical methods. Approximation
methods [2] retain only the significant terms of the
symbolic expressions and eliminate the insignificant ones.
The difficulty, however, lies in identifying what terms to
eliminate and what the resulting approximation error could
be. Hierarchical methods [1] tackle the symbolic analysis
problem in a divide-and-conquer manner. They consider
only one part of the global network at a time and then recombine
partial expressions for finding overall symbolic formulas.
Existing hierarchical methods have a main limitation in
that they are not feasible for addressing networks that are
built of tightly coupled blocks i.e. operational amplifiers [2].
-
An Improved Hierarchical Classification Algorithm for Structural Analysis of Integrated Circuits [p. 807]
-
M. Olbrich, A. Rein, and E. Barke
A new and efficient combination of signal tracing
and block recognition techniques for circuit analysis is
proposed. It utilizes the benefits of both approaches to
solve problems such as signal flow or gate recognition.
The analysis process is easily controlled by a user definable
rule set where ports, nets and blocks are attributed
with types. After structural investigation a hierarchical
netlist is produced providing block information
as subcircuits. As an important feature, the algorithm
allows the handling of optional ports as well. Thus, this
flexible approach is applicable to various circuit types
and works on several abstraction levels.
-
Automatic Nonlinear Memory Power Modelling [p. 808]
-
E. Schmidt, G. Jochens, L. Kruse, F. Theeuwen, and W. Nebel
Power estimation and optimization is an increasingly
important issue in IC design. The memory subsystem is a
significant aspect, since memory power can dominate total
system power. Estimation and optimization hence rely
heavily on models for embedded memories. We present an
effective black box modelling methodology for generating
nonlinear memory models automatically. The resulting
models are accurate, computationally modest, and in
analytical form. They outperform linear models by far.
Average absolute relative errors are below 6%.
-
An Operation Rearrangement Technique for Power Optimization in VLIW Instruction Fetch [p. 809]
-
D. Shin, J. Kim, and N. Chang
In VLIW machines where a single instruction contains
multiple operations, the power consumption during instruction
fetches varies significantly depending on how the operations
are arranged within the instruction. In this paper, we
describe a post-pass operation rearrangement method that
reduces the power consumption from the instruction-fetch
datapath. The proposed method modifies operation placement
orders within VLIW instructions so that the switching
activity between successive instruction fetches is minimized.
Our experiment shows that the switching activity can be reduced
by 34% on average for benchmark programs.
-
A Pseudo Delay-Insensitive Timing Model to Synthesizing Low-Power Asynchronous Circuits [p. 810]
-
O. Garnica, J. Lanchares, and R. Hermida
The aim of this paper is to present a new approach to
creating high performance, low-power and low-area asynchronous
circuits using high level design tools. In order to
achieve this, we introduce the new timing model on which
this approach is based on. Following this, we present the
results from comparing, for a set of benchmarks, our implementation
with other implementations.
-
A Register-Transfer-Level Fault Simulator for Permanent and Transient Faults in
Embedded Processors [p. 811]
-
C. Rousselle, M. Pflanz, A. Behling, T. Mohaupt, and H. Vierhaus
HEARTLESS (Hierarchical Register-Transfer-Level
Fault-Simulator for Permanent & Transient Faults) was
developed to simulate the behavior of complex sequential
designs like processor cores in case of transient and
permanent faults. HEARTLESS can be enhanced by
propagation over macros described in a C++-function.
Available is a C-interface for access to internal signals
during the simulation.
-
Efficient Finite Field Digit-Serial Multiplier Architecture for
Cryptography Applications [p. 812]
-
G. Bertoni, L. Breveglieri, and P. Fragneto
Cryptographic applications in embedded systems for
smart-cards require low-latency, low-complexity and low
power dedicated hardware. In this work the GBB
algorithm for finite field multiplication is optimised by
recoding and the related digit-serial VLSI multiplier
architecture is designed and evaluated [6].
-
Task Concurrency Management Methodology Summary [p. 813]
-
C. Wong, P. Marchal, P. Yang, F. Catthoor, H. De Man,
A. Prayati, N. Cossement, R. Lauwereins, and D. Verkest
This paper summarizes a new methodology for the design
of concurrent dynamic real-time embedded systems.
The framework of our methodology is depicted in Fig. 1.
An embedded system can be specified at a grey-box abstraction
level in a combined MTG-CDFG model [6]. The
grey-box model is different from both the detailed white-box
model [1] where all the operations are considered during the
mapping and where too much information is present to allow
a system wide exploration, and the black-box model
[2, 3] where insufficient information is available to accurately
steer even the most crucial cost trade-offs. In contrast,
the grey-box specification is functional in representing
the concepts of concurrency, timing constraints and interaction
at either an abstract or a more detailed level, depending
on what is required to perform a thorough exploration
of the decisions afterwards. We believe that task concurrency
management can be implemented in four major steps
[4]. Firstly, the grey-box model is built, including the necessary
concurrency extraction. Then transformations are applied
on the specified MTG-CDFG to increase the opportunities
for concurrency exploration and cost minimization
[5]. Then static scheduling will be applied on the design-time
analyzable parts of the grey-box model, including processor
assignment in the multiple processor context. Finally,
a dynamic scheduler will schedule the dynamic and
coarse-grain constructs at run time on the given platform
while making trade-offs based on Pareto curves.
The main driver for our work is the MPEG-4 IM1 player.
Experiment results confirm the validity of our assumptions
and the usefulness of our approach [4, 5].
-
Susceptibility of Analog Cells to Substrate Interference [p. 814]
-
F. Fiori
This paper deals with the susceptibility of smart power
integrated circuits to substrate interference. In particular,
propagation of RF interference through substrate and its effects
on analog cells are investigated. A new method, developed
to identify a parasitic substrate-coupling network in
VLSI devices, has been customized for a smart power technology
process. The layout view of a specific circuit is elaborated
in order to extract a netlist composed of circuits in
the die surface and the substrate parasitic network. Predictions
are obtained by executing time-domain simulations. A
simple test circuit composed of a power transistor and an
OTA is considered. Investigations are carried out for various
layout of the same test circuit and the effectiveness of
shielding substrate contacts is evaluated.
-
Order Determination for Frequency Compensation of Negative-Feedback Systems [p. 815]
-
A. van Staveren and C. Verhoeven
To maximize the bandwidth of dedicated negative-feedback
amplifiers by passive frequency compensation, the
order of the amplifier needs to be known. Here a method is
introduced to determine the order of a circuit with negative
feedback. It is shown that the sum of poles in the negative-feedback
loop, i.e. the loop poles, can be used to determine
the order of the amplifier. These loop poles can be found
relatively easily from the circuit diagram and thus the order
of the circuit is also relatively easily found.
-
Minimizing the Number of Floating Bias Voltage Sources with Integer Linear
Programming [p. 816]
-
E. Yildiz, A. van Staveren, and C. Verhoeven
Applying the non-heuristic biasing theory as described
in [1] results in circuits which are optimally biased. However
the resulting circuits will contain many floating voltage
sources. This one page paper describes the use of Integer
Linear Programming to minimize the number of these
sources.
-
CMOS Sizing Rule for High Performance Long Interconnects [p. 817]
-
G. Cappuccino and G. Cocorullo
During the past fifteen years, the role of interconnects
has turned to be the determining factor of the overall
performance of VLSI circuits. In this work, the Authors
present a new transistor sizing rule for long interconnect
buffers. It is shown how transmission line properties of long
interconnects alter the behaviour of the CMOS buffer,
forcing transistors to work mainly in linear mode rather
than in saturation as is usually assumed. This unusual
condition leads to strong mismatching between predicted
and actual driver output impedance if conventional sizing
rules are used. The proposed sizing rule allows true line
matching to be achieved, thus either minimizing delay or
preserving signal integrity.
-
On Automatic Analysis of Geometrically Proximate Nets in VLSI Layout [p. 818]
-
S. Koranne and O. Gangwal
We address the problem of automatic analysis of geometrically proximate
nets in VLSI layout by presenting a framework (named FASCL) which supports
pairwise analysis of nets based on a geometric kernel. The exact form of
the analysis function can be specified to the kernel, which assumes a coupling
function based on pairwise interaction between geometrically proximate
nets. The user can also attach these functions to conditions and FASCL
will automatically apply the function to all pairs of nets which satisfy
a condition. Our method runs with sub-quadratic time complexity,
O(N1+k), where N is the number of nets of we have proved that
k < 1. We have successfully used the program to analyze circuits for
bridging faults, coupling capacitance extraction, crosstalk analysis, signal
integrity analysis and delay fault testing.
-
AnalogRouter: A New Approach of Current-Driven Routing for Analog Circuits
[p. 819]
-
J. Lienig, G. Jerke, and T. Adler
We present a new automatic routing tool, named AnalogRouter, specifically developed to address the problems of current densities and electromigration in routing of multi-terminal, non-planar signal nets in analog circuits.
The contributions of our work are:
a new current characterization method based on current vectors attached to each terminal,
current-driven Steiner tree generation which effectively determines all branch currents prior to detailed routing, and
a run-time and memory efficient detailed routing strategy which addresses all
features of current- driven routing for analog circuits, particularly varying
wire widths.
-
A Hardware-Software Operating System for Heterogeneous Designs [p. 820]
-
J. Moya, F. Moya, and J. López
Current embedded systems are made of multiple heterogeneous
devices interconnected. These devices present a great variation
of functionality, performance, and interfaces. Therefore, it is
difficult to build applications for these platforms.
In this paper we present some techniques to introduce
component-based methodologies into hardware-software code-sign.
We make special emphasis on the use of simple, homogeneous
interfaces to hide the inherent complexity of current
designs. A key contribution is the definition of a HW-SW Operating
System that makes system resources available to application
developers in a clean, homogeneous way. This greatly
simplifies the task of designing complex heterogeneous embedded
systems.
-
PRMDL: A Machine Description Language for Clustered VLIW Architectures [p. 821]
-
A. Terechko, E. Pol, and J. van Eijndhoven
PRMDL is a format of the central machine description
file that contains parameters of the whole retargetable
compiler-simulator framework. The format features
separate software and hardware views on the processor and
defines a wide scope of the framework retargetability,
enabling platform-based processor design and vast design
space exploration for clustered VLIW architectures.
-
Functional Units with Conditional Input/Output Behavior in VLIW Processors
[p. 822]
-
M. Bekooij, L. Engels, A. van der Werf, and N. Busá
In this paper we extend the method to deal with coarse-grain
operations in static scheduled VLIW Processors as is
introduced by Busá [1]. We allow functional units with a
controller that does not traverse its states in a predefined
way. This makes it possible to execute a function that contains
a conditional construct like an if-statement as a single
operation on a functional unit. This way the performance
penalty otherwise caused by branch instructions is reduced.
By adding a valid input and output signals the problem
is circumvented that during compilation it is for this type
of functional units not known when and how many samples
will be consumed or produced. We will refer to these
units as Conditional Input/output Units (CIUs). The operations
that are executed on CIUs are called Conditional In-put/
output Operations (CIOs). The difference with guarded
operations is that the production of a result of a CIO depends
on the state of the CIU.
-
Adaptation of an Event-Driven Simulation Environment to Sequentially Propagated
Concurrent Fault Simulation [p. 823]
-
M. Zolfy, S. Mirkhani, and Z. Navabi
A new fault simulation method is presented here. The
method relies on simulation cycle timing of event-driven
simulators (delta delays in VHDL). This timing is used for
propagation of faulty values in faulty sections of a circuit.
This method is based on concurrent fault simulation and is
implemented in VHDL. VHDL gate models that are
capable of propagating faults in fault queues perform this
fault simulation. Gate models process their fault queues
and propagate them in delta time units. In these models,
gates with faulty input values are expanded in delta time to
evaluate faulty output values and propagate them to other
sections of the circuit. Using ISCAS benchmarks, a
performance improvement of up to 500X over serial fault
simulation has been obtained. This work is useful for fault
simulation of post-synthesis VHDL outputs.
-
Constraint Satisfaction for Storage Files with Fifos or Stacks during
Scheduling [p. 824]
-
C. Alba Pinto, B. Mesman, K. van Eijk, and J. Jess
This paper presents a method that, during scheduling of
DSP algorithms, handles constraints of storage files with fifos
or stacks together with resource- and timing constraints.
Constraint analysis techniques and the characteristics of the
exact coloring of conflict graphs are used to identify values
that are bottlenecks for storage assignment with the aim of
ordering their accesses. This is done with pairs of values
until it can guarantee that all constraints will be satisfied.
|