| |
DATE 2000 Abstracts
Sessions:
[Keynote]
[1A]
[1B]
[1C]
[Embedded Tutorial]
[2A]
[2B]
[2C]
[2D]
[3A]
[3B]
[3C]
[3D]
[4A]
[4B]
[4C]
[4D]
[5A]
[5B]
[5C]
[5D]
[6A]
[6B]
[6C]
[6D]
[7A]
[7B]
[7C]
[7D]
[8A]
[8B]
[8D]
[9A]
[9B]
[9C]
[9D]
[10A]
[10B]
[10D]
[Posters]

Moderator: I. Bolsens, IMEC, B
Speakers: Jerry Fiddler, Chairman and Co-founder of Wind River Systems, USA
Wim Roelandts, CEO Xilinx, USA
Moderators:
L. Thiele, TU Zurich, CH
J.C. Lopez, Castilla-La Mancha U, ES
-
Code Selection for Media Processors with SIMD Instructions [p. 4]
-
R. Leupers
Media processors show special instruction sets
for fast execution of signal processing algorithms on different
media data types. They provide SIMD instructions, capable
of executing one operation on multiple data in parallel within
a single instruction cycle. Unfortunately, their use in compilers
is so far very restricted and requires either assembly
libraries or compiler intrinsics. This paper presents a novel
code selection technique capable of exploiting SIMD instructions
also when compiling plain C source code. It permits to
take advantage of SIMD instructions for multimedia applications,
while still using portable source code.
-
Analysis of High-Level Address Code Transformations for Programmable Processors
[p. 9]
-
S. Gupta, M. Miranda, F. Catthoor, R. Gupta
Memory intensive applications require considerable
arithmetic for the computation and selection of the different
memory access pointers. These memory address calculations
often involve complex (non)linear arithmetic expressions
which have to be calculated during program execution
under tight timing constraints, thus becoming a crucial bottleneck
in the overall system performance. This paper explores
applicability and effectiveness of source-level optimisations
(as opposed to instruction-level) for address computations
in the context of multimedia. We propose and evaluate
two processor-target independent source-level optimisation
techniques, namely, global scope operation cost minimisation
complemented with loop-invariant code hoisting,
and non-linear operator strength reduction. The transformations
attempt to achieve minimal code execution within
loops and reduced operator strengths. The effectiveness of
the transformations is demonstrated with two real-life multimedia
application kernels by comparing the improvements
in the number of execution cycles, before and after applying
the systematic source-level optimisations, using state-of-the-art
C compilers on several popular RISC platforms.
-
Free MDD-based Software Optimization Techniques for Embedded Systems
[p. 14]
-
C. Kim, L. Lavagno, A. Sangiovanni-Vincentelli
Embedded systems make a heavy use of software to perform
Real-Time embedded control tasks. Embedded software
is characterized by a relatively long lifetime and by tight
cost, performance and safety constraints. Several super-optimization
techniques for embedded softwares based on
Multi-valued Decision Diagram (MDD) representations have
been described in the literature, but they all share the same
basic limitation. They are based on standard Ordered MDD
(OMDD) packages, and hence require a fixed order of evaluation
for the MDD variables on every execution path. Free
MDDs (FMDDs) lift this limitation, and hence open up more
optimization opportunities. Finding the optimal variable ordering
for FMDDs is a very difficult problem. Hence in this
paper we describe a heuristic procedure that performs well
in practice, and is based on FMDD cost estimation applied
to recursive cofactoring. Experimental results show that our
new variable ordering method obtains often smaller embedded
software than previous (sifting-based) methods.
Moderators:
A.M. Trullemans-Anckaert, UC Louvain, B
C. Guardini, PDF Solutions, USA
-
Quantitative Comparison of Power Management Algorithms [p. 20]
-
Y. Lu, E. Chung, T. Simunic, L. Benini, G. De Micheli
Dynamic power management saves power by shutting
down idle devices. Several management algorithms
have been proposed and demonstrated effective in certain
applications. We quantitatively compare the power
saving and performance impact of these algorithms on
hard disks of a desktop and a notebook computers. This
paper has three contributions. First, we build a framework
in Windows NT to implement power managers
running realistic workloads and directly interacting with
users. Second, we define performance degradation that
reflects user perception. Finally, we compare power saving
and performance of existing algorithms and analyze
difference.
-
Efficient Power Co-Estimation Techniques for System-On-Chip Design [p. 27]
-
M. Lajolo, A. Raghunathan, S. Dey, L. Lavagno
We present efficient power estimation techniques for HW/SW
System-On-Chip (SOC) designs. Our techniques are based on concurrent
and synchronized execution of multiple power estimators
that analyze different parts of the SOC (we refer to this as co-estimation),
driven by a system-level simulation master. We motivate
the need for power co-estimation, and demonstrate that performing
independent power estimation for the various system components
can lead to significant errors in the power estimates, especially
for control-intensive and reactive embedded systems.
We observe that the computation time for performing power co-estimation
is dominated by: (i) the requirement to analyze/simulate
some parts of the system at lower levels of abstraction in order obtain accurate estimates of timing and switching activity information,
and (ii) the need to communicate between and synchronize
the various simulators. Thus, a naive implementation of power co-estimation
may be too inefficient to be used in an iterative design
exploration framework. To address this issue, we present several
acceleration (speedup) techniques for power co-estimation. The
acceleration techniques are energy caching, software power macro-modeling,
and statistical sampling. Our speedup techniques reduce
the workload of the power estimators for the individual SOC components,
as well as their communication/synchronization overhead.
Experimental results indicate that the use of the proposed acceleration
techniques results in significant (8X to 87X) speedups in SOC
power estimation time, with minimal impact on accuracy. We also
show the utility of our co-estimation tool to explore system-level
power tradeoffs for a TCP/IP Network Interface Card sub-system
and an automotive controller.
-
A Discrete-Time Battery Model for High-Level Power Estimation [p. 35]
-
L. Benini, G. Castelli, A. Macii, E. Macii, M. Poncino, R. Scarsi
In this paper, we introduce a discrete-time model for the complete
power supply sub-system that closely approximates the behavior
of its circuit-level (i.e., HSpice), continuous-time counterpart.
The model is abstract and efficient enough to enable
event-driven simulation of digital systems described at a very
high level of abstraction and that include, among their components
, also the power supply. Therefore, it can be successfully
used for the purpose of battery life-time estimation during design
optimization, as shown by the results we have collected
on a meaningful case study. Experiments prove also that the
accuracy of our model is very close to that provided by the corresponding
Spice-level model.
Moderators: G. Gielen, KU Leuven, B
U. Feldmann, Infineon Technologies, D
-
The Generalized Boundary Curve -- A Common Method for Automatic Nominal Design
and Design Centering of Analog Circuits [p. 42]
-
R. Schwencker, F. Schenkel, H. Graeb, K. Antreich
In this paper, a new method for analog circuit sizing with
respect to manufacturing and operating tolerances is presented.
Two types of robustness objectives are presented,
i.e. parameter distances for the nominal design and worst-case
distances for the design centering. Moreover, the generalized
boundary curve is presented as a method to determine
a parameter correction within an iterative trust region
algorithm. Results show that a significant reduction in computational
costs is achieved using the presented robustness
objectives and generalized boundary curve.
-
A Hierarchical Approach for the Symbolic Analysis of Large Analog Integrated Circuits [p. 48]
-
O. Guerra, E. Roca, F. Fernández, A. Rodríguez-Vázquez
This paper introduces a new hierarchical analysis methodology
which incorporates approximation strategies
during the analysis process. Consequently, the circuit
sizes that can be analyzed increase dramatically, without
suffering from the combinatorial explosion of expression
complexity. Moreover, the interpretability and usability
in practical applications is enabled by providing analytical
models that keep complexity at a minimum with the
prescribed accuracy.
-
Layout-Oriented Synthesis of High Performance Analog Circuits [p. 53]
-
M. Dessouky, M. Louërat, J. Porte
This paper presents a methodology towards synthesis of
high performance analog circuits. Layout parasitics are estimated
and compensated during circuit sizing. Physical
layout constraints are thus taken into consideration early
in the design. This approach shortens the overall design
time by avoiding laborious sizing-layout iterations. The approach
has been implemented using two knowledge-based
tools dedicated to analog circuit sizing and layout generation.
An example of a high performance OTA is presented
at the end to illustrate the effectiveness of the approach.
-
Technology Mapping and Retargeting for Field-Programmable Analog Arrays [p. 58]
-
S. Ganesan, R. Vemuri
Rapid prototyping followed by technology retargeting provides
a fast and cost-effective approach to analog system
synthesis. Field-programmable analog arrays (FPAAs) enable
rapid implementation of a function-compliant prototype,
while technology retargeting converts the functional FPAA
prototype to an ASIC. We first address the FPAA technology
mapping problem. A novel structural approach based on
hierarchical pattern matching and covering is employed to
map the analog behavior onto the FPAA. We then address issues
of technology retargeting and design reuse, and present
our FPAA-ASIC retargeting strategy. We present experiments
and a design example for FPAA technology mapping and retargeting.
Organizer: Yervant Zorian, Logic Vision, USA
Moderators: Michael Nicolaidis, TIMA, F
Peter Muhmenthaler, Infineon, D
Speakers: David Lepejian, HPL, USA
Chris Strolenberg, Sagentec, F
Kees Veelenturf, Philips, NL
Yervant Zorian, Logic Vision, USA
-
Tutorial Statement [p. 66]
-
The Road to better Reliability and Yield Embedded DfM Tools [p. 67]
-
K. Veelenturf
This paper gives an overview of the different tools, needed
for accomplishing optimal IC manufacturability and rapid
technology learning during the successive phases of
process maturity. The paper then describes two specific
DfM tools that are in use within Philips Semiconductors.
Keywords: DfM, yield improvement, yield prediction, wire
spreading.
-
Yield Improvement and Repair Trade-Off for Large Embedded Memories [p. 69]
-
Y. Zorian
In this paper, we give an overview of the trade-off to
improve yield and optimize silicon manufacturing cost.
The specific technology focus is on large embedded
memories in complex ASIC or system-on-chip designs.
Embedded capabilities for test, redundancy analysis and
repair are shown as design-for-manufacturability features
needed for large embedded memories in VDSM design.
Keywords: Yield improvement, DFM, BIST, silicon repair
-
Stay Away from Minimum Design-Rule Values [p. 71]
-
C. Strolenberg
With the introduction of 0.18 micron CMOS process
technology a new phenomenon in circuit manufacturing
can be observed: design-rule values as specified in the
design-rule manual are no longer "hard" numbers. Where
designers and EDA tool manufacturers used to consider
rule-values as strict limits when creating mask layouts,
rule values have turned into gray areas around the
specified rule values. This concept is illustrated in the
following figure.
Moderator: Giovanni De Micheli, Stanford U, USA
Organizer: Luciano Lavagno, DIEGM/U Udine, IT
Speakers: Joachim Kunkel, Synopsys, USA
Diederik Verkest, IMEC, B
Frank Schirrmeister, Cadence Design Systems, USA
-
System Level Design using C++ [p. 74]
-
D. Verkest, J. Kunkel, F. Schirrmeister
This paper discusses the use of C++ for the design of
digital systems. The paper distinguishes a number of
different approaches towards the use of programming
languages for digital system design and will discuss in
more detail how C++ can be used for system modeling
and refinement, for simulation, and for architecture
design.
Moderators: Mark Genoe, Alcatel, B
Ian Phillips, ARM, UK
-
Techniques for Reducing Read Latency of Core Bus Wrappers [p. 84]
-
R. Lysecky, F. Vahid, T. Givargis
Today's system-on-a-chip designs consist of many cores. To
enable cores to be easily integrated into different systems,
many propose creating cores with their internal logic
separated from their bus wrapper. This separation may
introduce extra read latency. Pre-fetching register data into
register copies in the bus wrapper can reduce or eliminate this
extra latency. In this paper, we introduce a technique for
automatically designing a pre-fetch unit that satisfies user-imposed
register-access constraints. The technique benefits
from mapping the pre-fetching problem to the well-known real-time
process scheduling problem. We then extend the technique
to allow user-specified register interdependencies, using a
Petri Net model, resulting in even more efficient pre-fetch
schedules.
Keywords: Cores, system-on-a-chip, interfacing, on-chip bus, intellectual
property, design reuse, bus wrapper.
-
Formalized Three-Layer System-Level Reuse Model and Methodology for Embedded
Data-Dominated Applications [p. 92]
-
F. Vermeulen, F. Catthoor, D. Verkest, H. De Man
In embedded data-dominated applications a global system-level
data transfer and storage exploration phase is crucial
in obtaining an efficient solution. We have developed
a novel formalism to describe reusable blocks such that
the essential part of the design exploration freedom is retained.
This formalism is the basis for a system-level reuse
methodology which allows to reuse large parts of the design
as structural VHDL and describes the costly data access
related constructs at higher levels in the code hierarchy.
Compared to a reuse approach based on fixed blocks,
considerable power and area savings can be obtained, as
demonstrated on real-life video and modem applications.
-
Virtual Fault Simulation of Distributed IP-based Designs [p. 99]
-
M. Dalpasso, A. Bogliolo, L. Benini, M. Favalli
Fault simulation and testability analysis are major concerns
in design flows employing intellectual-property (IP)
protected virtual components. In this paper we propose a
paradigm for the fault simulation of IP-based designs that
enables testability analysis without requiring IP disclosure,
implemented within the JavaCAD framework for distributed
design [1, 2]. As a proof of concept, stuck-at fault simulation
has been performed for combinational circuits containing
virtual components.
Moderators: R. Otten, TU Delft, NL
J. Koehl, IBM, D
-
Fast Evaluation of Sequence Pair in Block Placement by Longest Common Subsequence Computation [p. 106]
-
X. Tang, R. Tian, D. Wong
In [1], Murata et al introduced an elegant representation
of block placement called sequence pair. All block placement
algorithms which are based on sequence pairs use simulated
annealing where the generation and evaluation of a large number
of sequence pairs is required. Therefore, a fast algorithm is
needed to evaluate each generated sequence pair, i.e. to translate
the sequence pair to its corresponding block placement.
This paper presents a new approach to evaluate a sequence
pair based on computing longest common subsequence in pair of weighted sequences. We present a very simple and
efficient O(n2) algorithm to solve the sequence pair evaluation
problem. We also show that using a more sophisticated
data structure, the algorithm can be implemented to run in
O(n log n) time. Both implementations of our algorithm are
significantly faster than the previous O(n2
graph-based algorithm in [1]. For example, we achieve 60X speedup over the
previous algorithm when input size n = 128.
-
A New Effective and Efficient Multi-Level Partitioning Algorithm [p. 112]
-
Y. Saab
This paper describes a new multi-level partitioning algorithm (PART) that
combines a blend of iterative improvement and clustering, biasing of node
gains, and local uphill climbs. PART is competitive with recent
state-of-the-art partitioning algorithms. PART was able to find new
lower cuts for a number of benchmark circuits.
-
Faster Optimal Single-Row Placement with Fixed Ordering [p. 117]
-
U. Brenner, J. Vygen
We consider the problem of placing a set of cells in a single row with
a given horizontal ordering, minimizing the (weighted) bounding box
netlength. we analyze the running time of an algorithm of Kahng, Tucker
and Zelikovsky which solve this problem optimally. By using different
data structures we are able to improve the worst-case running time in the
unweighted case as well as in the presence of netweights.
-
Layout Compaction for Yield Optimization via Critical Area Minimization [p. 122]
-
Y. Bourai, C. Shi
This paper presents a new compaction algorithm to improve the
yield of IC layout. The yield is improved by reducing the area
where the faults are more likely to happen known as critical area.
Instead of assuming that the critical area could probably be
present everywhere in the layout, the algorithm first finds where
this area can actually exist, and then attempts to minimize it. The
algorithm takes benefit from a fast multi-layer critical area
computation to extract the rectangles that compose it. Afterwards,
the extracted rectangles are involved into the layer minimization
process which is the second phase of the compaction procedure to
minimize their area. A new formulation of the layer minimization
problem is used in such a way that the critical area minimization
adds neither extra variables nor extra constraints to the original
compaction algorithm. The algorithm has been tested on actual
layouts.
Moderators: D. Gizopoulos, 4Plus Technologies, GR
Y. Bertrand, LIRMM, F
-
Test Synthesis for Mixed-Signal SOC Paths [p. 128]
-
S. Ozev, I. Bayraktaroglu, A. Orailoglu
Higher levels of integration, the need for test re-use, and
the mixed-signal nature of today's SOC's necessitate hierarchical
test generation and system level test composition
to meet stringent market requirements. In this paper, a
novel methodology for testing analog and digital components
in a signal path is discussed. Consequent testability
analysis can be utilized to reduce DFT requirements, while
test translation provides highly effective low cost test. The
proposed approach seamlessly propagates test information
across the analog/digital divide. Experimental results substantiate
the effectiveness of the proposed mixed-signal test
synthesis methodology.
-
Analysis and Minimization of Test Time in a Combined BIST and External Test Approach [p. 134]
-
M. Sugihara, H. Date, H. Yasuura
In this paper, an analysis of test time by CBET
(which is an acronym for Combination of BIST and
External Test) test approach is presented. The analysis
validates that CBET test approach can achieve shorter
testing time than both external test and BIST in many
situations. An efficient test time minimization algorithm
for CBET-based LSIs is also proposed. It uses
several characteristics of CBET test approach derived
by the analysis to reduce computation time to find the
optimum test sets. The algorithm helps designers to
save their precious design time.
-
CAS-BUS: A Scalable and Reconfigurable Test Access Mechanism for Systems on a Chip [p. 141]
-
M. Benabdenebi, W. Maroufi, M. Marzouki
This paper describes CAS-BUS, a P1500 compatible Test
Access Mechanism for Systems on a Chip. The TAM architecture
is made up of a Core Access Switch (CAS) and a test
bus. The TAM characteristics are its flexibility, scalability
and reconfigurability. A CAS generator has been developed,
and some results are provided in the paper.
-
Design and Test Space Exploration of Transport-Triggered Architectures [p. 146]
-
V. Zivkovic, R. Tangelder, H. Kerkhoff
This paper describes a new approach in the high level
design and test of transport-triggered architectures
(TTA), a special type of application specific instruction
processors (ASIP). The proposed method introduces the
test as an additional constraint, besides throughput and
circuit area. The method, that calculates the testability of
the system, helps the designer to assess the obtained
architectures with respect to test, area and throughput in
the early phase of the design and selects the most suitable
one. In order to create the templated TTA, the "MOVE"
framework has been addressed. The approach is
validated with respect to the "Crypt" Unix application.
Moderators: W. Nebel, Oldenburg U and OFFIS, D
E. Moser, Bosch, D
-
Composite Signal Flow: A Computational Model Combining Events, Sampled Streams, and Vectors [p. 154]
-
A. Jantsch, P. Bjurèus
The composite signal flow model of computation targets
systems with significant control and data processing parts.
It builds on the data flow and synchronous data flow models
and extends them to include three signal types: non-periodic
signals, sampled signals, and vectorized sampled
signals. Vectorized sampled signals are used to represent
vectors and computations on vectors. Several conversion
processes are introduced to facilitate synchronization and
communication with these signals. We discuss the severe
implications, that these processes have on the causal
behaviour of the system.
We illustrate the model and its usefulness with three
applications. A co-modelling and co-simulation
environment combining Matlab and SDL; a high level
timing analysis as a consequence of the operations on
vectors; conditions for a parallel, distributed simulation.
-
MASCOT: A Specification and Cosimulation Method Integrating Data and Control Flow [p. 161]
-
P. Bjurèus, A. Jantsch
We integrate data and control flow at the system specification
level, using the two specialized and well established
languages Matlab and SDL. For this we provide a
modeling technique, which integrates the timing concepts
and allows synchronization of vector-based computation
with event based state transition. The technique is supported
by a library of wrappers and communication functions,
which has been implemented to make cosimulation
easy to use and almost transparent to the user. A methodology
formulates the rules to use the modeling technique,
to partition the system, and to select communication
modes. A complex industrial example illustrates the modeling
technique and the methodology, and shows the efficiency
of the Matlab-SDL cosimulation.
-
Delay-Insensitive Interface Specification and Synthesis [p. 169]
-
M. Josephs, D. Furey
Delay-insensitive interfacing was first demonstrated on
the macromodules project in the 1960's, but globally synchronous
(clocked) schemes have so far dominated the VLSI
era. In deep sub-micron technologies, problems of clock
skew, including excessive size and power consumption of
clock buffers, and heterogeneity of systems on a chip are
rekindling an interest in global asynchrony. DI-Algebra is
presented here as a language for the specification of modules
with delay-insensitive interfaces. Such modules can
be implemented either in synchronous or in asynchronous
logic. A design flow is also illustrated in which specifications
are automatically translated into Petri nets, validated,
and synthesised into asynchronous logic.
Moderators: P. Guerrier, UPMC, F
J. van Meerbergen, Philips Research, NL
-
A 50 Mbit/s Iterative Turbo-Decoder [p. 176]
-
F. Viglione, G. Masera, G. Piccinini, M. Roch, M. Zamboni
Very low bit error rate has become an important constraint
in high performance communication systems that operate
at very low signal to noise ratios: due to their impressive
coding gains, turbo codes have been proposed for several
applications, although they suffer a large decoding delay.
This paper presents the design of a turbo decoder with
high performances in terms of throughput implemented using
TSPC (True Single Phase Clocking) logic family. In order
to achieve the best compromise between cost (in terms of
area) and throughput, several architectural solutions have
been analyzed. The whole system and in particular its core,
the SISO module, has been verified through VHDL simulations.
HSPICE simulations show that the system can operate
with a 1 GHz clock and thus it can reach a throughput
of 50 Mbit/s.
-
Smart Antenna Receiver based on a Single Chip Solution for GSM/DCS Baseband Processing [p. 181]
-
U. Girola, A. Picciriello, D. Vincenzoni
This paper presents a single chip implementation of a
space-time algorithm for co-channel interference (CCI)
and intersymbol interference (ISI) reduction in GSM/DCS
systems. The temporal channel for the Viterbi receiver
and the beamformer weights for the CCI rejection are
estimated jointly by optimizing a suitable cost function for
separable space-time channels.
By taking into account nowadays integration
capabilities provided by FPGA (Field Programmable
Gate Array), it is demonstrated the feasibility of a single
chip JSTE solution based on three processor architecture
for carrier beamforming, equalization and demodulation.
-
Protocol Stack-based Telecom-Emulator [p. 186]
-
T. Murooka, T. Miyazaki
The paper describes the concept and implementation
of a telecom emulator that features both reconfigurability
and high-speed processing. The emulator can be easily
transmuted into any telecom system as a real node.
It has two innovative system design concepts. The first
is to divide the specification into simplified processes
based on the open system interconnection (OSI) reference
model. The second is the use of a sophisticated
hardmacro and its software-callable driver. We implemented
a prototype system called ATTRACTOR and
applied it to some telecom applications. The applications
were able to be implemented in a short design
time and were operated in real computer network environments.
Moderators: W. Lin, UC San Diego, USA
M. Berkelaar, TU Eindhoven, NL
-
Transformational Placement and Synthesis [p. 194]
-
W. Donath, P. Kudva, L. Stok, P. Villarrubia,
L. Reddy, A. Sullivan, K. Chakraborty
Novel methodology and algorithms to seamlessly integrate
logic synthesis and physical placement through a transformational
approach are presented.
Contrary to most placement algorithms that minimize a
global cost function based on an abstract representation of
the design, we decomposed the placement function into a set
of transforms and coupled them directly with incremental timing,
noise, and/or power analyzers. This coupling results in a
direct and more accurate feedback on optimizations for placement
actions.
These placement transforms are then integrated with traditional
logic synthesis transforms leading to a converging
set of optimizations based on the concurrent manipulation of
boolean, electrical, as well as physical data.
Experimental results indicate that the proposed approach
creates an efficient converging design flow that eliminates
placement and synthesis iteration. It results in timing improvements,
and maintains other global placement measures
such as wire congestion and wire length.
The flexibility of the transformational approach allows us
to easily add, extend and support more sophisticated algorithms
that involve critical as well as non-critical regions and
target a variety of metrics including noise, yield and manufacturability.
-
Power and Delay Reduction via Simultaneous Logic and Placement Optimization in FPGAs [p. 202]
-
B. Kumthekar, F. Somenzi
Traditional FPGA design flows have treated logic synthesis
and physical design as separate steps. With the recent
advances in technology, the lack of information on the
physical implementation during logic synthesis has caused
mismatches between the final circuit characteristics (delay,
power and area) and those predicted by logic synthesis. In
this paper, we present a technique that tightly links the logic
and physical domains -- we combine logic and placement
optimization in a single step. The combined algorithm is
based on simulated annealing and hence, very amenable
to new optimization goals or constraints. Two types of
moves, directed towards global reduction in the cost function
(linear congestion), are accepted by the simulated annealing
algorithm: (1) logic optimization steps consisting
of removing or replacing redundant wires in a circuit using
functional flexibilities derived from SPFDs [12] and (2) the
placement optimization steps consisting of swapping a pair
of blocks in the FPGA. Feedback from placement is very
valuable in making an informed choice of a target wire during
logic optimization moves. Experimental results demonstrate
the efficacy of our approach over the placement independent
approach.
-
Constructive Library-Aware Synthesis using Symmetries [p. 208]
-
V. Kravets, K. Sakallah
In this paper a constructive library-aware multilevel logic
synthesis approach using symmetries is described. It
integrates the technology-independent and technology-dependent
stages of synthesis, and is premised on the goal of
relating the functional structure of a logic specification closer
to the ultimate topological and physical structures. We show
that symmetries interpreted as structural attributes of
functions can be effectively used to induce a favorable
structural implementation. These symmetries are used in
bridging 1) the structural properties of the functions being
synthesized, 2) the structural attributes of the implementation
network, and 3) the functional content of the target library.
Experimental results show that the quality of circuits
synthesized using this approach is generally superior to those
synthesized by traditional approaches, and that the
improvement correlates with the symmetry measure in a
function.
Moderators: M. Ohletz, Alcatel, B
A. Rueda, CNM, ES
-
A BIST Scheme for On-Chip ADC and DAC Testing [p. 216]
-
J. Huang, C. Ong, K. Cheng
In this paper, we present a BIST scheme for testing on-chip
AD and DA converters. We discuss on-chip generation
of linear ramps as test stimuli, and propose techniques for
measuring the DNL and INL of the converters. We validate
the scheme with software simulation -- 5% LSB (least significant
bit) test accuracy can be achieved in the presence of
reasonable analog imperfection.
-
An On Chip ADC Test Structure [p. 221]
-
Y. Wen, K. Lee
In this paper, a new built-in self-test structure to test the
static specifications of analog to digital converters (ADCs)
is presented. A ramp signal generated by an integrator
serves as a test input signal. A specific range of this signal is
divided into 2n+1
segments, with each segment corresponding
to one output combination of an n+1-bit counter, where
n is the number of bits of the ADCs under test. The testing
process is done with digital data processing by comparing
the outputs of ADCs under test with the outputs of the n+1-
bit counter. Simple structure, low area overhead, and high
speed are the advantages of the proposed test structure.
-
Reuse of Existing Resources for Analog BIST of a Switch Capacitor Filter [p. 226]
-
E. Cota, L. Carro, M. Renovell, M. Lubaszewski, F. Azaïs, Y. Bertrand
The objective of this paper is to discuss the possibility of
reusing the existing hardware originally present in an
analog application to implement test functions for a
completely autonomous self-testable solution. In this first
approach, a 8 th analog linear filter is used as an
application example. The required modifications in the
circuit are presented with the results in terms of area
overhead and fault coverage.
Moderators: P. Camurati, Politecnico di Torino, IT
N. Fristacky, Slovak TU, SLK
-
A BDD-based Satisfiability Infrastructure using the Unate Recursive Paradigm [p. 232]
-
P. Kalla, Z. Zeng, M. Ciesielski, C. Huang
Binary Decision Diagrams have been widely used
to solve the Boolean Satisfiability (SAT) problem. The individual constraints
can be represented using BDDs and the conjunction of all
constraints provides all satisfying solutions. However, BDD-related
SAT techniques suffer from size explosion problems. This paper
presents two BDD-based algorithms to solve the SAT problem that
attempt to contain the growth of BDD-size while identifying solutions
quickly. The first algorithm, called BSAT, is a recursive, backtracking
algorithm that uses an exhaustive search to find a SAT solution. The
well known unate recursive paradigm is exploited to solve the SAT
problem. The second algorithm, called INCOMPLETE-SEARCH-USAT
(abbreviated IS-USAT), incorporates an incomplete search to
find a solution. The search is incomplete inasmuch as it is restricted
to only those regions that have a high likelihood of containing the solution,
discarding the rest. Using our techniques we were able to find
SAT solutions not only for all MCNC & ISCAS benchmarks, but also
for a variety of industry standard designs.
-
Automatic Lighthouse Generation for Directed State Space Search [p. 237]
-
P. Yalagandula, V. Singhal, A. Aziz
Previous researchers have suggested the use of "light-houses"
to act as guides in directed state space search. The
drawback of using lighthouses is that the user has to manually
derive them, through a potentially laborious examination
of the design. Additionally, specifying a large number
of lighthouses results in wasted effort during the search. We
present approaches to automatically generate high-quality
lighthouses for hard-to-cover targets.
-
Analyzing Real-Time Systems [p. 243]
-
J. Ruf, T. Kropf
Temporal logic model checking is a technique for the automatic
verification of systems against specifications. Besides
the correctness of safety and liveness properties it is often
important to determine critical answer and delay times of
systems, especially if they are embedded in a real-time environment.
In this paper we present an approach which
allows the verification as well as the timing analysis of real-time
systems. The systems are described as networks of
communicating time-extended finite state machines (I/O-interval
structures). We use a compact symbolic representation
to obtain efficient analysis algorithms.
Moderators: L. Nachtergaele, IMEC, B
M. Bolle, Bosch Telecom, B
-
A Generic Architecture for On-Chip Packet-Switched Interconnections [p. 250]
-
P. Guerrier, A. Greiner
This paper presents an architectural study of a scalable
system-level interconnection template. We explain why
the shared bus, which is today's dominant template, will
not meet the performance requirements of tomorrow's
systems. We present an alternative interconnection in the
form of switching networks. This technology originates
in parallel computing, but is also well suited for heterogeneous
communication between embedded processors and
addresses many of the deep submicron integration issues.
We discuss the necessity and the ways to provide high-level
services on top of the bare network packet protocol,
such as dataflow and address-space communication
services. Eventually we present our first results on the
cost/performance assessment of an integrated switching
network.
-
Memory Arbitration and Cache Management in Stream-based Systems [p. 257]
-
F. Harmsze, A. Timmer, J. van Meerbergen
With the ongoing advancements in VLSI technology, the
performance of an embedded system is determined to a
large extend by the communication of data and instructions.
This results in new methods for on- and off-chip communication
and caching schemes. In this paper, we use an
arbitration scheme that exploits the characteristics of continuous
"media" streams while minimizing the latency for
random (e.g. CPU) memory accesses to background memory.
We also introduce a novel caching scheme for a stream-based
multiprocessor architecture, to limit as much as possible
the amount of on-chip buffering required to guarantee
the throughput of the continuous streams. With these two
schemes we can build an architecture for media processing
with optimal flexibility at run-time while performance guarantees
can be determined at compile-time.
-
HW/SW Codesign of an Engine Management System [p. 263]
-
M. Baleani, A. Ferrari, A. Sangiovanni-Vincentelli, C. Turchetti
The design process for an engine management system
is presented. The functional specification of the system
has been captured using C and C++ as specification languages.
The validation of the specification has been carried
out using functional simulation. Then an architecture
for the implementation of the functional specification
is selected among a set of three possible alternatives, all
based on the same micro-controller, characterized by different
hardware-software trade-offs. The choice is motivated
by a fast performance estimation that can also be used to
identify the parts of the design that could be moved across
the hardware-software partition to obtain better cost or better
performance. The case study has been performed in the
Felix VCC framework.
Moderators: L. Stok, IBM, USA
J. Monteiro, INESC, PT
-
Wave Steered FSMs [p. 270]
-
L. Macchiarulo, S. Shu, M. Marek-Sadowska
In this paper we address the problem of designing very high
throughput finite state machines (FSMs). The presence of
loops in sequential circuits prevents a straightforward and
generalized application of pipelining techniques, which work
so well for combinational circuits, to increase FSM performance.
We observe that appropriate extensions of the "wave
steering" technique [17,18] are possible to partially overcome
the problem. Additionally we use FSM decomposition
theory to decouple state variable dependencies. Application
of these two techniques to MCNC benchmarks resulted in a
factor of 3 average throughput increase as compared to a
standard cell implementation, at the expense of factor 3.7
area and less than factor 2 latency penalties.
-
Delay Minimization and Technology Mapping of Two-Level Structures and Implementation using Clock-Delayed Domino Logic [p. 277]
-
J. Ciric, G. Yee, C. Sechen
This paper presents a new delay minimization and
technology mapping algorithm for two-level structures
(TLS) implemented using clock-delayed (CD) domino
logic. We take advantage of CD domino's high-speed,
large fan-in NOR and OR gates to increase the speed of circuit by partial collapsing. The algorithm is delay-driven
and the delays are obtained from a characterized CD
domino library. The results on eight combinational MCNC
benchmark circuits show an average speed improvement
of 89% for CD domino with TLS, compared to static
CMOS implementations generated by Synopsys. CD domino
with TLS using our tools produced on average 44%
faster circuits than CD domino benchmarks minimized and
mapped using Synopsys. At last, the delay results for CD
domino with TLS were on average 22% better than for
standard domino.
-
Gate Sizing using a Statistical Delay Model [p. 283]
-
E. Jacobs, M. Berkelaar
This paper is about gate sizing under a statistical delay
model. It shows we can solve the gate sizing problem
exactly for a given statistical delay model. The formulation
used allows many different forms of objective functions,
which could for example directly optimize the delay
uncertainty at the circuit outputs. We formulate the gate
sizing problem as a nonlinear programming problem, and
show that if we do this carefully, we can solve these
problems exactly for circuits up to a few thousand gates
using the publicly available large scale nonlinear programming
solver LANCELOT.
Moderators: B. Becker, Freiburg U, D
K. Kinoshita, Osaka U, JP
-
Optimal Hardware Pattern Generation for Functional BIST [p. 292]
-
S. Cataldo, S. Chiusano, P. Prinetto, H. Wunderlich
Functional BIST is a promising solution for self-testing
complex digital systems at reduced costs in terms of area
and performance degradation. The present paper
addresses the computation of optimal seeds for an
arbitrary sequential module to be used as hardware test
pattern generator. Up to now, only linear feedback shift
registers and accumulator based structures have been used
for deterministic test pattern generation by reseeding.
In this paper, a method is proposed which can be
applied to general finite state machines. Nevertheless the
method is absolutely general, for sake of comparison with
previous approaches, in this paper an accumulator based
unit is assumed as pattern generator module. Experiments
prove the effectiveness of the approach which outperforms
previous results for accumulators, in terms of test size and
test time, without sacrificing the fault detection capability.
-
Built-in Generation of Weighted Test Sequences for Synchronous Sequential Circuits [p. 298]
-
I. Pomeranz, S. Reddy
We describe a method for on-chip generation of weighted test
sequences for synchronous sequential circuits. For combinational
circuits, three weights, 0, 0.5 and 1, are sufficient to achieve
coverage of stuck-at faults, since these weights are
sufficient to reproduce any specific test pattern. For sequential
circuits, the weights we use are defined based on subsequences of deterministic
test sequence. Such weights allow us to reproduce
parts of the test sequence, and help ensure that complete fault
coverage would be obtained by the weighted test sequences generated.
-
Diagnostic Testing of Embedded Memories using BIST [p. 305]
-
T. Bergfeld, E. Rudnick, D. Niggemeyer
The increasing use of large embedded memories in
Systems-on-Chips requires automatic memory reconfiguration
to avoid the need for external accessibility. In this work,
effective diagnostic memory tests of linear order O(N) are
proposed that enable memory reconfiguration, and their diagnostic
capabilities are analyzed. In particular, these tests
allow single-cell faults to be distinguished from multiple-cell
faults, such as coupling faults. In contrast to conventional
O(N) tests, all cells involved in a fault are detected
and localized, which allows complete reconfiguration using
minimal-area BIST hardware that compares favorably with
other BIST designs.
Moderators: P. Eles, Linköping U, SE
R. Hermida, U Complutense Madrid, ES
-
Resolution of Dynamic Memory Allocation and Pointers for the Behavioral Synthesis from C [p. 312]
-
L. Séméria, K. Sato, G. De Micheli
One of the greatest challenges in C/C++-based design
methodology is to efficiently map C/C++ models into hardware.
Many of the networking and multimedia applications implemented
in hardware or mixed hardware/software systems are making use of
complex data structures stored in one or multiple memories. As result, many of the C/C++ features which were originally designed
for software applications are now making their way into hardware.
Such features include dynamic memory allocation and pointers
used to manage data. We present a solution for efficiently mapping
arbitrary C code with pointers and malloc/free into hardware.
Our solution fits current memory management methodologies. It
consists of instantiating a hardware allocator tailored to an application
and a memory architecture. Our work also supports the resolution
of pointers without restriction on the data structures. An
implementation using the SUIF framework is presented, followed by
some case studies such as the realization of a video filter.
-
An Integrated Temporal Partitioning and Partial Reconfiguration Technique for
Design Latency Improvement [p. 320]
-
S. Ganesan, R. Vemuri
Partially reconfigurable processors provide the unique ability
by which a part of the device can be reconfigured, while
the remaining part is still operational. In this paper, we
present a novel partitioning methodology that temporally
partitions a design for such a partially reconfigurable processor
and improves design latency by minimizing reconfiguration
overhead. This is achieved by overlapping execution
of one temporal partition with the reconfiguration of another,
using the processors partial reconfiguration capability.
We have incorporated block-processing in the partitioning
framework for reducing reconfiguration overhead of partitioned
designs. A highlight of our partitioner is it's ability
to handle loops and conditional constructs in the input specification.
The proposed methodology was tested on several
examples on the Xilinx 6200 FPGA. The results show significant
reduction in the design latency, leading to a considerable
speed-up due to partial reconfiguration.
-
Target Architecture Oriented High-Level Synthesis for Multi-FPGA based Emulation [p. 326]
-
O. Bringmann, C. Menn, W. Rosenstiel
This paper presents a new approach on combined high-level
synthesis and partitioning for FPGA-based multi-chip
emulation systems. The goal is to synthesize a prototype
with maximal performance under the given area and interconnection
constraints of the target architecture. Interconnection
resources are handled similarly to functional
resources, enabling the scheduling and the sharing of interchip
connections according to their delay. Moreover, data
transfer serialization is performed completely or partially,
depending on the mobility of the data transfers, in order to
satisfy the given interconnection constraints. In contrast to
conventional partitioning approaches, the constraints of the
target architecture are fulfilled by construction.
-
Fast Cache and Bus Power Estimation for Parameterized System-On-A-Chip Design [p. 333]
-
T. Givargis, F. Vahid, J. Henkel
We present a technique for fast estimation of the power
consumed by the cache and bus sub-system of a parameterized
system-on-a-chip design for a given application. The technique
uses a two-step approach of first collecting intermediate data
about an application using simulation, and then using
equations to rapidly predict the performance and power
consumption for each of thousands of possible configurations
of system parameters, such as cache size and associativity and
bus size and encoding. The estimations display good absolute
as well as relative accuracy for various examples, and are
obtained in dramatically less time than other techniques,
making possible the future use of powerful search heuristics.
Keywords: System-on-a-chip, low power, estimation, intellectual property,
cache, on-chip bus.
Moderators: A. Konczykowska, France Telecom CNET, F
R. Schwencker, TU Munich, D
-
Stochastic Modeling and Performance Evaluation for Digital Clock and Data Recovery Circuits [p. 340]
-
A. Demir, P. Feldmann
Clock and data recovery circuits are essential components in communication
systems. They directly influence the bit-error-rate performance
of communication links. It is desirable to predict the rate
of occasional detection errors and the loss of synchronization due
to the non-ideal operation of such circuits. In high-speed data networks,
the bit-error-rate specification on the system can be very
stringent, i.e., 10 -14. It is not feasible to predict such
error rates
straightforward, simulation based, approaches. This work introduces
a stochastic model and an efficient, analysis-based, non-Monte-method
for performance evaluation of digital data
and clock recovery circuits. The analyzed circuit is modeled as finite
state machines with inputs described as functions on a Markov
state-space. System performance measures, such as probability
of bit errors and rate of synchronization loss, can be evaluated
the analysis of a larger resulting Markov system. A
multi-grid method is used to solve the very large associated
systems. The method is illustrated on a real industrial
recovery circuit design.
-
A New Approach for Computation of Timing Jitter in Phase Locked Loops [p. 345]
-
M. Gourary, S. Rusakov, S. Ulyanov, M. Zharov, K. Gullapalli, B. Mulvaney
A new method for computation of timing jitter in a PLL
is proposed. The computational method is based on the
representation of the circuit as a linear time-varying system
with modulated stationary noise models, spectral
decomposition of stochastic process and decomposition of
noise into orthogonal components i. e. phase and amplitude
noise. The method is illustrated by examples of jitter
computation in PLLs.
-
Compact Modeling of Nonlinear Distortion in Analog Communication Circuits [p. 350]
-
P. Wambacq, P. Dobrovolný, S. Donnay, M. Engels, I. Bolsens
The design of analog front-ends of digital telecommunication transceivers
requires simulations at the architectural level. The nonlinear nature of the
analog front-end blocks is a complication for their modeling at the
architectural level, especially when the nonlinear behavior is frequency
dependent. This paper describes a method to derive a bottom-up model of
nonlinear analog continuous-time circuits used in communication systems.
The models take into account frequency dependence of the nonlinear behavior,
making them suitable for wideband applications. Such model
consists of a block diagram that corresponds to the most
important contributions to the second- and third-order
Volterra kernels of the output quantity (voltage or
current) of a circuit. The examples in the paper, a
high-level model of a CMOS low-noise amplifier and an
active lowpass filter, demonstrate that the generated
models can be efficiently evaluated in high-level
dateflow-type simulations of mixed-signal front-ends
and that they yield insight in the nonlinear behavior
of the analog front-end blocks.
Moderators: M. Berkelaar, TU Eindhoven, NLM
L. Stok, IBM, USA
-
On using Satisfiability-based Pruning Techniques in Covering Algorithms [p. 356]
-
V. Manquinho, J. Marques-Silva
Covering problems are widely used as a modeling tool
in Electronic Design Automation (EDA). Recent years
have seen dramatic improvements in algorithms for the
Unate/Binate Covering Problem (UCP/BCP). Despite these
improvements, BCP is a well-known computationally hard
problem, with many existing real-world instances that currently
are hard or even impossible to solve. In this paper we
apply search pruning techniques from the Boolean Satisfiability
(SAT) domain to BCP. Furthermore, we generalize
these techniques, in particular the ability to backtrack non-chronologically,
to exploit the actual formulation of covering
problems. Experimental results, obtained on representative
instances of the Unate and Binate Covering Problems,
indicate that the proposed techniques provide significant
performance gains for different classes of instances.
-
An Efficient Heuristic Approach to Solve the Unate Covering Problem [p. 364]
-
R. Cordone, F. Ferrandi, D. Sciuto, R. Calvo
The classical solving approach for two-level logic minimisation
reduces the problem to a special case of unate covering and
attacks the latter with a (possibly limited) branch-and-bound algorithm.
We adopt this approach, but we propose a constructive
heuristic algorithm that combines the use of Binary Decision Diagrams
with the lagrangian relaxation. This technique permits to
achieve an effective choice of the elements to include into the solution,
as well as cost-related reductions of the problem and a good
lower bound on the optimum. The results support the effectiveness
of this approach: on a wide set of benchmark problems, the algorithm
nearly always hits the optimum, and in most cases proves it
to be such. On the problems whose optimum is actually unknown,
the best known result is strongly improved.
-
On the Generation of Multiplexer Circuits for Pass Transistor Logic [p. 372]
-
C. Scholl, B. Becker
Pass Transistor Logic has attracted more and more interest
during last years, since it has proved to be an attractive
alternative to static CMOS designs with respect to area,
performance and power consumption. Existing automatic
PTL synthesis tools use a direct mapping of (decomposed)
BDDs to pass transistors. Thereby, structural properties of
BDDs like the ordering restriction and the fact that the select
signals of the multiplexers (corresponding to BDD nodes)
directly depend on input variables, are imposed on PTL circuits
although they are not necessary for PTL synthesis.
General Multiplexer Circuits can be used instead and
should provide a much higher potential for optimization
compared to a pure BDD approach. Nevertheless -- to the
best of our knowledge -- an optimization of general Multiplexer
Circuits (MCs) for PTL synthesis was not tried so far
due to a lack of suitable optimization approaches. In this
paper we present such an algorithm which is based on efficient
BDD optimization techniques. Our experiments prove
that there is indeed a high optimization potential by the use
of general MCs -- both concerning area and depth of the
resulting PTL networks.
Moderators: G. Kosonocky, Intel, USA
S. Pravossoudovitch, LIRMM, F
-
On Applying Incremental Satisfiability to Delay Fault Testing [p. 380]
-
J. Kim, J. Whittemore, J. Marques-Silva, K. Sakallah
The Boolean satisfiability problem (SAT) has various
applications in electronic design automation (EDA) fields
such as testing, timing analysis and logic verification. SAT
has been typically applied to EDA as follows: 1) formulation
of the given problem as a SAT instance 2) solution of
the SAT instance. In this paper, we present a method to
simultaneously solve several closely related SAT instances
using incremental satisfiability (ISAT). In ISAT, the decision
sequence made for a "prefix" function is used to solve
another set of functions which have a number of new constraints
(extensions) added to the prefix function. Our
experiments show that we can achieve significant gains in
total runtime when we use this methodology as opposed to
resetting the decision sequences and solving each instance
from scratch. Application of ISAT to delay fault testing is
presented by formulating incremental path sensitization as
an ISAT problem. Non-robust tests for the combinational
portion of ISCAS 89 circuits are generated using this
method.
-
Automatic Test Bench Generation for Validation of RT-Level Descriptions: An Industrial Experience [p. 385]
-
F. Corno, M. Sonza Reorda, G. Squillero, A. Manzone, A. Pincetti
In current microprocessors and systems, an
increasingly high silicon portion is derived through
automatic synthesis, with designers working exclusively at
the RT-level, and design productivity is greatly enhanced.
However, in the new design flow, validation still remains a
challenge: while new technologies based on formal
verification are only marginally accepted, standard
techniques based on simulation are beginning to fall
behind the increased circuit complexity. This paper
proposes a new approach to simulation-based validation,
in which a Genetic Algorithm helps the designer in
generating useful input sequences to be included in the test
bench. The technique has been applied to an industrial
circuit, showing that the quality of the validation process
is increased.
-
A VHDL Error Simulator for Functional Test Generation [p. 390]
-
A. Fin, F. Fummi
This paper describes an efficient error simulator able to
analyze functional VHDL descriptions. The proposed simulation
environment can be based on commercial VHDL simulators.
All components of the simulation environment are
automatically built starting from the VHDL specification of
the description under test. The effectiveness of the simulator
has been measured by using a random functional test generator.
Functional test patterns produce, on some benchmarks,
a higher gate-level fault coverage than the fault coverage
achieved by a very efficient gate-level test pattern generator.
Moreover, functional test generation requires a fraction
of the time necessary to generate test at the gate level. This
is due to the possibility of effectively exploring the test patterns
space since error simulation is directly performed at
the VHDL level.
-
Functional Test Generation for Full Scan Circuits [p. 396]
-
I. Pomeranz, S. Reddy
We study the effectiveness of functional tests for full scan
circuits. Functional tests are important for design validation, and
they potentially have a high defect coverage independent of the
circuit implementation. The functional fault model we consider
consists of single state-transition faults. The test generation procedure
we describe uses one of two approaches at any given time
in order to minimize the number of tests while minimizing the
test application time. (1) It may use scan to set the state of the
circuit, and observe fault effects propagated to the next-state
variables. (2) It may use transfer sequences to set the circuit
state, or unique input-output sequences to propagate fault effects
to the primary outputs. We present experimental results to
demonstrate the effectiveness of scan-based functional tests.
Moderators: S. Huss, TU Darmstadt, D
R. Leupers, Dortmund U, D
-
Shared Memory Implementations of Synchronous Dataflow Specification [p. 404]
-
P. Murthy, S. Bhattacharyya
There has been a proliferation of block-diagram environments
for specifying and prototyping DSP systems. These
include tools from academia like Ptolemy [3], and GRAPE
[7], and commercial tools like SPW from Cadence Design
Systems, Cossap from Synopsys, and the HP ADS tool
from HP. The block diagram languages used in these environments
are usually based on dataflow semantics
because various subsets of dataflow have proven to be
good matches for expressing and modeling signal processing
systems. In particular, synchronous dataflow (SDF)[8]
has been found to be a particularly good match for
expressing multirate signal processing systems.
One of the key problems that arises during synthesis from
an SDF specification is scheduling. Past work on scheduling
[1] from SDF has focused on optimization of program
memory and buffer memory. However, in [1], no attempt
was made for overlaying or sharing buffers. In this paper,
we formally tackle the problem of generating optimally
compact schedules for SDF graphs, that also attempt to
minimize buffering memory under the assumption that
buffers will be shared. This will result in schedules whose
data memory usage is drastically lower (up to 83%) than
methods in the past have achieved.
-
Constraint-Driven System Partitioning [p. 411]
-
M. López-Vallejo, J. Grajal, J. López
This paper describes how optimization techniques can be
applied to efficiently solve the constrained co-design problem.
This is performed by the formulation of different cost
functions which will drive the hardware-software partitioning
process. The use of complex cost functions allows us to
capture more aspects of the design. Besides, the appropriate
formulation of this kind of functions has a great impact
on the results that can be obtained regarding both quality
and algorithm convergence rate. A strong point of the proposed
formulation is its generality. Therefore, it does not
depend on the problem and can be easily extended for considering
new design constraints.
-
A System-Level Synthesis Algorithm with Guaranteed Solution Quality [p. 417]
-
U. Shenoy, P. Banerjee, A. Choudhary
Recently a number of heuristic based system-level synthesis
algorithms have been proposed. Though these algorithms
quickly generate good solutions, how close these
solutions are to optimal is a question that is difficult to answer.
While current exact techniques produce optimal results,
they fail to produce them in reasonable time. This paper
presents a synthesis algorithm that produces solutions
of guaranteed quality (optimal in most cases or within a
known bound) with practical synthesis times (few seconds
to minutes). It takes a unified look (the lack of which is
one of the main sources of sub-optimality in the heuristic
techniques) at different aspects of system synthesis such
as pipelining, selection, allocation, scheduling and FPGA
reconfiguration. Our technique can handle both time constrained
as well as resource constrained synthesis problems.
We present results of our algorithm implemented as part of
the Match project [1] at Northwestern University.
Organizer: Francky Catthoor, IMEC, B
Moderator: Kees Vissers, Philips, USA
Speakers: Francky Catthoor, IMEC, B
Nikil Dutt, UC Irvine, USA
Christoforos Kozyrakis, UC Berkeley, USA
-
How to Solve the Current Memory Access and Data Transfer Bottlenecks: At the Processor
Architecture or at the Compiler Level? [p. 426]
-
F. Catthoor, N. Dutt, C. Kozyrakis
Current processor architectures, both in the programmable
and custom case, become more and more dominated
by the data access bottlenecks in the cache, system
bus and main memory subsystems. In order to provide sufficiently
high data throughput in the emerging era of highly
parallel processors where many arithmetic resources can
work concurrently, novel solutions for the memory access
and data transfer will have to be introduced.
The crucial question we want to address in this hot topic
session is where one can expect these novel solutions to
rely on: will they be mainly innovative processor architecture
ideas, or novel approaches in the application compiler/
synthesis technology, or a mix.
Moderators: F.M. Johannes, TU Munich, D
J. Koehl, IBM, D
-
Meeting Delay Constraints in DSM by Minimal Repeater Insertion [p. 436]
-
I. Liu, A. Aziz, D. Wong
We address the problem of inserting repeaters, selected
from a library, at feasible locations in a placed and routed
network to meet user-specified delay constraints. We use
minimal repeater area by taking advantage of slacks available
in the network. Specifically, we transform the problem
into an unconstrained optimization problem and solve
it by iterative local refinement. We show that the optimal
repeater locations and sizes that locally minimize the objective
function in the unconstrained problem can be efficiently
computed. We have implemented our algorithm and tested it
on a set of benchmarks; experimental results are promising.
-
A Bus Delay Reduction Technique Considering Crosstalk [p. 441]
-
K. Hirose, H. Yasuura
As the CMOS technology scaled down, the horizontal
coupling capacitance between adjacent wires plays dominant
part in wire load, crosstalk interference becomes a
serious problem for VLSI design. We focused on delay
increase caused by crosstalk. On-chip bus delay is maximized
by crosstalk effect when adjacent wires simultaneously switch
for opposite signal transition directions.
This paper proposes a bus delay reduction technique by
intentional skewing signal transition timing of adjacent
wires. An approximated equation of bus delay shows
our delay reduction technique is effective for repeater-inserted
bus. The result of SPICE simulation shows
that the total bus delay reduction by from 5% to 20%
can be achieved.
-
Single Step Current Driven Routing of Multiterminal Signal Nets for Analog Applications [p. 446]
-
T. Adler, E. Barke
We present the single layer router CDR (Current
Driven Router) capable of routing analog multiterminal
signal nets with current driven wire widths. The widths
used during routing are determined by current properties
per terminal gained by simulation or manually specified
by circuit designers.
The algorithm presented computes a Steiner tree
layout satisfying all specified current constraints while
obeying the maximum allowed current densities on all
connections. CDR calculates the Steiner tree topology,
computes the unknown currents of wires connecting two
Steiner points and generates the final Steiner tree layout
in a single step thus eliminating the need for a separate
layout post-processing step common to power and ground
routing algorithms.
CDR uses a connection graph for layout representation
and applies an advanced minimum detour algorithm
in combination with a modified "three-point steinerization"
heuristic for Steiner tree based layout construction.
-
Static Timing Analysis Taking Crosstalk into Account [p. 451]
-
M. Ringe, T. Lindenkreuz, E. Barke
Capacitance coupling can have a significant impact
on gate delay in today's deep submicron circuits. In this paper
we present a static timing analysis tool that calculates
the longest path of synchronous circuits taking the impact
of crosstalk on gate delays into account. We show that passive
modeling of the coupling capacitance can significantly
underestimate the delay and that an assumption of permanent
worst-case coupling unnecessarily overestimates it.
Our method is validated by comparison to Spice simulations.
Moderators: W. Daehn, Infineon Technologies, D
B. Straube, FhG IIS/EAS Dresden, D
-
A New IEEE 1149.1 Boundary Scan Design for the Detection of Delay Defects [p. 458]
-
S. Park, T. Kim
Delay defects on I/O pads, interconnections of a board, or
interconnections among embedded cores can not be tested
with the current IEEE 1149.1 boundary scan design. This
paper introduces a simple design technique which slightly
modifies the TAP controller to test delay defects by
postponing the UpdateDR with EXTEST instruction.
Furthermore 2log(N+2) interconnect test patterns are
proposed for both static and delay testing.
-
Alternative Test Methods using IEEE 1149.4 [p. 463]
-
U. Kac, F. Novak, S. Macek, M. Zarnik
IEEE 1149.4 infrastructure has been aimed primarily for
printed circuit board (PCB) interconnect test, parametric
test of discrete components and functional test of IC cores.
Methods to perform these test have been published and experimental
results using evaluation samples of IEEE 1149.4 ICs have been
reported. So far, most attention has been paid to test and
measurement techniques for the first two issues. Proposed
methods typically employ IEEE 1149.4 infrastructure in the
function of a built-in test probe that enables external test
and measurement equipment to access the internal PCB points
via the analog test bus.
This paper describes an alternative approach based on functional
transformation of the tested board by means of the existing
IEEE 1149.4 resources. In this way, efficient go no-go
functional test can be performed. Case studies are given to
illustrate the proposed approach.
-
Test Quality and Fault Risk in Digital Filter Datapath BIST [p. 468]
-
L. Goodby, A. Orailoglu
An objective of DSP testing should be to ensure that any
errors due to missed faults are infrequent compared to a
circuit's intrinsic errors, such as overflow. A method is proposed
for quantifying test quality for digital filters by measuring
the risk associated with any untested faults. Techniques
for finding upper bounds on fault activation rates
under worst-case operating conditions are described. These
techniques enable test designers to objectively discriminate
significant missed faults from near-redundant faults, which
are unlikely to be activated in normal operation of the device.
This complements fault coverage as a measure of
test quality, providing a means of locating high-risk missed
faults even in very high coverage test regimes.
-
A Fault Simulation Methodology for MEMS [p. 476]
-
A. Rosing, A. Richardson, A. Dorey
Efficient built-in and external test strategies are becoming
essential in MicroElectroMechanical Systems (MEMS),
especially for high reliability and safety critical
applications. To be realistic however, internal and
external test must be properly validated in terms of fault
coverage. Fault simulation is hence likely to become a
critical utility within the design flow.
This paper will discuss methods for achieving test support
based on the extension of tools and techniques currently
being introduced into the mixed signal ASIC market.
Moderators: T. Kropf, Robert Bosch, D
L. Pierre, CMU/Provence, F
-
Abstraction from Counters: An Application on Real-Time Systems [p. 486]
-
G. Logothetis, K. Schneider
We present abstraction techniques for systems containing
counters, which allow to significantly reduce their state
spaces for their efficient verification. In contrast to previous
approaches, our abstraction technique lifts the entire verification
problem, i.e., also the specification, to the abstract
level.
As an application, we consider the reduction of real-time
systems by replacing discrete clocks of timed automata with
abstract counters. The presented method allows the reduction
of such systems to very small state spaces. As bench-mark
examples, we consider the generalized railroad crossing
and Fischer's mutual exclusion protocol.
-
Automatic Abstraction for Worst-Case Analysis of Discrete Systems [p. 494]
-
F. Balarin
Recently, a methodology for worst-case analysis of discrete
systems has been proposed [1, 2]. The methodology
relies on a user-provided abstraction of system components.
In this paper we propose a procedure to automatically
generate such abstractions for system components
with Boolean transition functions. We use a binary decision
diagram (BDD) of the transition function to generate a formula in
Presburger arithmetic representing the desired abstraction.
Our experiments indicate that the approach can
be applied to control-dominated embedded systems.
-
Iterative Abstraction-based CTL Model Checking [p. 502]
-
J. Jang, I. Moon, G. Hachtel
A paradigm for automatic approximation/refinement in
conservative CTL model checking is presented. The approximations
are used to verify a given formula conservatively
by computing upper and lower bounds to the set of
satisfying states at each sub-formula. These approximations
attempt to perform conservative verification with the
least possible number of BDD variables and BDD nodes.
We present new forms of operational graphs to avoid limitations
associated with previously used operational graphs.
Three new techniques for efficient automatic refinement of
approximate system are presented. These methods make it
easier to find the locality. We also present a new type of
don't cares (Approximate Satisfying Don't Cares) that can
make model checking more efficient in time and space. On
average, an order of magnitude speedup was achieved.
Moderator and Organizer: Joseph Borel, STMicroelectronics, F
Panelists: Jean-Jacques Bronner, Alcatel Business Systems, F
Frank Ghenassia, STMicroelctronics, F
Irmtraud Rugen-Herzig, Infineon, D
Wolfgang Rosenstiel, FZI Karlsruhe, D
Anton Sauer, MEDEA Office, F
-
Panel Statement [p. 510]
-
Design automation in EUROPE needs to be revitalized
through a more cooperative approach of problems and
solutions .
MEDEA has been instrumental in bringing cooperation
between process and applications and showing weaknesses
of design automation solutions with present players . A new
burgeoning eclosion of start up's in strategic areas shows
good promises in EUROPE .
The goal is to launch new design solutions based on standard
market tools complemented by new European start up's
early offering in strategic areas . These design solutions will
be dedicated to European needs but addressing global
worldwide markets .
The aim of this workshop is to debate on the MEDEA
design automation roadmap as a European permanent forum
for ideas exchanges and new strategic developments for the
European industry.
Moderators: L. Silveira, TU Lisbon, PT
P. Feldmann, Bell Labs, USA
-
Wire-Sizing for Delay Minimization and Ringing Control using Transmission Line Model [p. 512]
-
Y. Gao, D. Wong
`In this paper, we consider continuous wire-sizing optimization for
delay minimization and ringing control. The optimization is based
on a fast and accurate delay estimation method under a finite ramp
input, where an analytical expression is also derived to estimate
overshoot/under shoot voltage. In this paper, we specify the wire
shape to be of the form f(x) = alpha e-bx, since
previous studies under the Elmore delay model suggest that
exponential wire shape is effective for delay minimization. The
relevant transmission line equations are solved by using Picard-Carson
method. The transient response in the time domain is derived as
a function of alpha and b.
The coefficients alpha and b
are then determined such that either
the actual delay (50% delay)
is minimized, or the wiring area is
minimized subject to a delay bound. At the same time, the over-shoot/
undershoot voltage is bounded to prevent false switching.
Our method for delay estimation is very efficient. In all the experiments
we performed, it is far more accurate than the Elmore delay
model and the estimated delay values are very close to SPICE's
results. We also find that in determining the optimal shape which
minimizes delay, the Elmore delay model performs as good as the
our method in terms of the minimum actual delay it achieves, i.e.
the Elmore delay model has high fidelity. However, in determining
the optimal shape which minimizes area subject to a delay bound,
the Elmore delay model performs much worse than our method.
We also find that the constraint for overshoot/undershoot control
does affect optimization results for both delay and area minimization
objectives.
-
Predicting Coupled Noise in RC Circuits [p. 517]
-
B. Sheehan
A novel method which can be regarded as the noise-counterpart
of the celebrated Elmore's delay formula -- both
being based on the first two moments of the network's
transfer function -- efficiently and accurately predicts
maximum noise between two capacitively coupled RC
networks, without simulation. The method applies to
general topologies (with significant simplification for
coupled trees), accurately models how coupling varies with
driver transition time, and quantifies the uncertainty in the
calculated noise values. Efficient enough for large circuits,
the new method can serve as a key ingredient in CAD
methodologies to ensure that a layout is noise-problem free.
-
Clocktree RLC Extraction with Efficient Inductance Modeling [p. 522]
-
N. Chang, S. Lin, L. He, O. Nakagawa, W. Xie
In this paper, we present an efficient yet accurate
inductance extraction methodology and also apply it to
clocktree RLC extraction. We first show that without loss
of accuracy, the inductance extraction problem of n
traces with or without ground planes can be reduced to
a number of one-trace and two-trace subproblems. We
then solve one-trace and two-trace subproblems via a
table-based approach. We finally validate the linear
cascading assumption that enables us to apply our
inductance extraction approach to clocktree RLC
extraction and optimization.
-
All Digital Built-In Delay and Crosstalk Measurement for On-Chip Buses [p. 527]
-
C. Su, Y. Chen, G. Chen, M. Huang, C. Lee
This paper proposes an all digital on-chip bus delay
and crosstalk measurement methodology. A diagnosis
procedure is derived to distinguish the delay faults in
drivers, receivers, and wires. The crosstalk profile is
plotted by monitoring the changes in delay with the
presence of the crosstalk. The distinguished features
include all digital design and low hardware overhead.
The SPICE simulation results prove the feasibility of
the methodology.
Moderators: P. Wambacq, IMEC, B
F. Fernandez, Seville U, ES
-
A VHDL-based Methodology for Design and Verification of Pipeline A/D Converters [p. 534]
-
E. Peralías, A. Acosta, A. Rueda, J. Huertas
This paper proposes a methodology for designing sampled-
data Mixed-Signal circuits by using VHDL-based behavioural
descriptions. The goal is using a VHDL
description of both the analog and the digital part, to simulate
and verify the entire mixed-signal system, as well as to
facilitate the synthesis and fault simulation of the digital
part. As an example of the proposed methodology, a digitally
corrected/calibrated pipeline A/D converter (ADC) has
been designed. Among other aspects of general interest, we
will show how analog dynamic effects are incorporated in
order to obtain accurate high level simulations. Results
from simulations carried out using QuickHDL in Mentor-Graphics
prove the feasibility of the approach and are in
agreement with those obtained experimentally from a Silicon
prototype.
-
Assessing the Cost Effectiveness of Integrated Passives [p. 539]
-
M. Scheffler, G. Tröster
Passive components integrated into a high-density substrate
can be a tolerable way to overcome the size and manufacturing
limits of SMD passives mounted onto the system
board. Still, this technology is perceived as being "too
risky" and not cost effective. In this paper we propose a
"passives optimized" solution combining the advantages
from both SMD and integrated technology and avoiding the
respective drawbacks. Exemplified by a GPS receiver front
end, we present a methodology to assess the possible benefits
when using the mixed technology.
-
Non-Linear Components for Mixed Circuits Analog Front-End [p. 544]
-
L. Carro, A. Souza Jr., M. Negreiros, G. Jahn, D. Franco
This paper presents the development of some front-end
analog circuits for mixed signals systems. The paper
proposes the use of externally linear, internally non-linear
analog circuits. Using this approach, analog area
is greatly reduced, and circuits can be built on top of
completely digital technologies. Experimental results in
the analog and digital domain support the proposed
approach to mixed circuits design.
Moderators: R. Ernst, TU Braunschweig, D
L. Thiele, TU Zurich, CH
-
Static Timing Analysis of Embedded Software on Advanced Processor Architectures [p. 552]
-
A. Hergenhan, W. Rosenstiel
This paper examines several techniques for static timing
analysis. In detail, the first part of the paper analyzes
the connection of prediction accuracy (worst case execution
time) and applicability of a methodology for modeling and
analysis of instruction as well as data cache behavior. The
second part of the paper proposes a timing analysis technique
for super-scalar processors. The objects of our studies
are two processors of the PowerPC family, in particular
the PPC403 and the MPC750.
-
Efficient Resource Arbitration in Reconfigurable Computing Environments [p. 560]
-
I. Ouaiss, R. Vemuri
In a multi-FPGA synthesis system, ideally the designer
has only an abstract view of the board architecture. This abstract
modeling of the underlying reconfigurable computer
poses complex challenges to the synthesis and partitioning
tools. Since the design specification is not constrained by
the number of memory segments on the board or the number
of pins between FPGAs, it is difficult for the CAD tools
to transform the design into one that maps onto the multi-FPGA
board. This paper describes an arbitration mechanism
that bridges the abstraction between the input design
and the reconfigurable architecture. Since this mechanism
allows such architecture abstraction between the design and
the board, it becomes easier to port a design from one target
architecture to another. This arbitration mechanism introduces
very little overhead in terms of area and delay. It
has been used in data-dominated applications; in this paper,
Fast Fourier Transform (FFT) is shown as an illustrative
example.
-
Bus Access Optimization for Distributed Embedded Systems based on Schedulability Analysis [p. 567]
-
P. Pop, P. Eles, Z. Peng
We present an approach to bus access optimization and schedulability
analysis for the synthesis of hard real-time distributed embedded systems.
The communication model is based on a time-triggered protocol. We have
developed an analysis for the communication delays proposing four
different message scheduling policies over a time-triggered communication
channel. Optimization strategies for the bus access scheme are developed,
and the four approaches to message scheduling are compared using
extensive experiments.
Moderator and Organizer: Christopher Lennard, Cadence Design Systems, USA
Speakers: Patrick Schaumont, IMEC, B
Gjalt de Jong, Alcatel, B
Anssi Haverinen, Nokia
Peter Hardee, Coware, USA
-
Standards for System-Level Design: Practical Reality or Solution in Search of a Question? [p. 576]
-
C. Lennard, P. Schaumont, G. de Jong, A. Haverinen, P. Hardee
We address the issue of standards development for
the system-level design space. System-level design
IP re-use standards are key to the future of the VSIA.
However, the concept of system-level standards has
its share of sceptics: what role can standards play in
this developing market segment? In response we
present an overview of three standards in the
system-level VC integration space, and describe two
distinct industrial case studies to support their
practicality.
Moderators: D. Nikolos, Patras U, GR
R. Leveugle, TIMA, Grenoble, F
-
Evaluating System Dependability in a Co-Design Framework [p. 586]
-
M. Lajolo, M. Rebaudengo, M. Sonza Reorda, M. Violante, L. Lavagno
The widespread adoption of embedded microprocessor-based
systems for safety critical applications mandates
the use of co-design tools able to evaluate system
dependability at every steps of the design cycle. In this
paper, we describe how Fault Injection techniques have
been integrated in an existing co-design tool and which
advantages come from the availability of such an enhanced
tool. The effectiveness of the proposed tool is
assessed on a simple case study.
-
Cost Reduction and Evaluation of a Temporary Faults Detecting Technique [p. 591]
-
L. Anghel, M. Nicolaidis
IC technologies are approaching the ultimate
limits of silicon in terms of channel width, power supply
and speed. By approaching these limits, circuits are
becoming increasingly sensitive to noise, which will result
on unacceptable rates of soft-errors. Furthermore, defect
behavior is becoming increasingly complex resulting on
increasing number of timing faults that can escape
detection by fabrication testing. Thus, fault tolerant
techniques will become necessary even for commodity
applications. This work considers the implementation and
improvements of a new soft error and timing error
detecting technique based on time redundancy. Arithmetic
circuits were used as test vehicle to validate the approach.
Simulations and performance evaluations of the proposed
detection technique were made using time and logic
simulators. The obtained results show that detection of
such temporal faults can be achieved by means of
meaningful hardware and performance cost.
-
Detection of Defective Sensor Elements using sigma-delta-Modulation and a Matched Filter [p. 599]
-
D. Weiler, O. Machul, D. Hammerschmidt, B. Hosticka
We present an integrable solution for detection of
defective sensor elements using sigma-delta-(SD)-modulation
and a matched filter. The sensor element
is stimulated using a pseudo random binary sequence
(PRBS). The sensor signal is read out and
the analog output is digitized using a SD-modulator.
The binary pulse density stream of the SD-modulator
is the output of the sensor system and thus should
ideally contains the PRBS. A matched filter has the
task of detecting the pseudo random sequence in the
pulse density stream and its sampled output is compared
to a threshold thus making it possible to judge
the functionality of the sensor element. By evaluating
the magnitude of the matched filter output it is also
possible to measure the sensor sensitivity. We present
a discrete solution of this method, but an integrated
chip using a standard 1.2mm CMOS-process
has been designed and is being fabricated.
Moderators: L. Benini, DEIS, Bologna U, IT
E. Macii, Politecnico di Torino, IT
-
System Level Online Power Management Algorithms [p. 606]
-
D. Ramanathan, R. Gupta
The problem of power management for an embedded system
is to reduce system level power dissipation by shutting
off parts of the system when they are not being used
and turning them back on when they are required.
Algorithms for this problem are online in nature where
the algorithm must operate without access to the
complete data set of its characteristics. In this
paper, we present online algorithms to manage power for
embedded systems and provide experimental analysis to
back up the theoretical results.
Specifically, this paper makes four contributions.
We propose and optimal online algorithm for power
management. We present an analysis of algorithmic
efficiency using a technique called competitive
analysis which is particularly suitable for online
algorithms. Using the analysis technique, we develop
a lower bound for the non-adaptive version of the
power management problem and show that our algorithms that
try to shut down the system based on historical data.
We provide a lower bound for any algorithm that uses
adaptive methods to manage power. We also propose
an algorithm that is independent of the input data
distribution, practical and usable in both hardware
and software systems with guaranteed performance.
Finally, we compare these algorithms with previously
proposed heuristics both theoretically and experimentally.
For the experiments, we model the disk drive of a laptop
computer as an embedded system. The results show that
the proposed algorithms perform well in practice
with guaranteed bounds on their performance. Further,
this paper conclusively demonstrates that to implement
aggressive power management techniques for power
critical subsystems, designers will have to commit
greater resources such as dedicated registers and
ALU units.
-
Architectural Power Optimization by Bus Splitting [p. 612]
-
C. Hsieh, M. Pedram
A split-bus architecture is proposed to
improve the power dissipation for global data exchange
among a set of modules. The resulting bus splitting
problem is formulated and solved combinatorially.
Experimental results show that the power saving of the
split-bus architecture compared to the monolithic-bus
architecture varies from 16% to 50%, depending on the
characteristics of the data transfer among the modules and
the configuration of the split bus. The proposed split-bus
architecture can be extended to multi-way split-bus when
a large number of modules are to be connected.
-
A Power Reduction Technique with Object Code Merging for Application Specific Embedded Processors [p. 617]
-
T. Ishihara, H. Yasuura
In this paper, a power reduction technique
which merges frequently executed sequences of object codes
into a set of single instructions is proposed. The merged sequence
of object codes is restored by an instruction decompressor
before decoding the object codes. The decompressor
is implemented by a ROM. In many programs, only a few
sequences of object codes are frequently executed. Therefore,
merging these frequently executed sequences into a
single instructions leads to a significant energy reduction.
Our experiments with actual read only memory(ROM)
modules and some benchmark program demonstrate significant
energy reductions up to more than 65% at best case
over a instruction memory without the object code merging.
-
Automating RT-Level Operand Isolation to Minimize Power Consumption in Datapaths [p. 624]
-
M. Münch, B. Wurth, R. Mehra, J. Sproch, N. Wehn
Designs which do not fully utilize their arithmetic datapath
components typically exhibit a significant overhead
in power consumption. Whenever a module performs an
operation whose result is not used in the downstream circuit,
power is being consumed for an otherwise redundant
computation. Operand isolation [3] is a technique to minimize
the power overhead incurred by redundant operations
by selectively blocking the propagation of switching activity
through the circuit.
This paper discusses how redundant operations can be
identified concurrently to normal circuit operation, and
presents a model to estimate the power savings that can be
obtained by isolation of selected modules at the register-transfer
(RT) level. Based on this model, an algorithm is
presented to iteratively isolate modules while minimizing
the cost incurred by RTL operand isolation. Experimental
results with power reductions of up to 30% demonstrate the
effectiveness of the approach.
Co-Organized with IEEE Design & Test of Computers
Moderator and Organizer: Rolf Ernst, TU Braunschweig, D
Panelists: Oz Levia, Improv Systems, USA
Grant Martin, Cadence Design Systems, USA
Pierre Paulin, STMicroelectronics, F
Kees Vissers, Philips Research, NL
Vassiladis Stamatis, TU Delft, NL
-
Panel Statement: The Future of Flexible HW Platform Architectures [p. 634]
-
Flexible hardware platform architectures have emerged as
a competitor and alternative to specialized SOC designs.
Important applications are mobile systems or multimedia
devices. Design cost and design time reduction are the main
benefits of such platforms while cost and power efficiency
are the main concerns. Today, hardware platforms own a
small but growing share of the embedded systems market in
wireless and multimedia terminals, and telecommunication
and automotive devices.
With the increasing importance of embedded software and
the demand for system function updates over a product's
lifetime, supported by a ubiquitous Internet access,
programmable HW platforms might become dominant
players. The success very much depends on the efficiency
and flexibility of the programmable platform when
compared to dedicated hardware solutions.
Organizer and Speaker: Sani Nassif, IBM Austin, USA
-
Designing Closer to the Edge [p. 636]
-
S. Nassif
Modern deep submicron CMOS processes cost $2B
or more to develop, qualify and deploy. Yet the incremental
impact of each technology generation has
been steadily decreasing due to a variety of phenomena
such as increasing wire delay, power dissipation
and reliability limits, and increasing process tolerances.
This increase is portrayed in Figure 1 which
shows the SIA Roadmap[1] predictions of variability
for five technologies in the 250 to 70nm gate length
regime. These observations lead to the conclusion
that we need to make better use of existing and future
manufacturing processes in order to recoup our
investment.
Moderators: J. Pineda, Philips Research, NL
P. Teixeira, IST/INESC, PT
-
Reducing the Complexity of Defect Level Modeling using the Clustering Effect [p. 640]
-
J. de Sousa, V. Agrawal
Accounting for the clustering effect is fundamental to
increase the accuracy of Defect Level (DL) modeling. This
result has long been known in yield modeling but, as far
as known, only one DL model directly accounts for it. In this paper, we
improve this model, reducing its number of parameters from three to two
by noticing that multiple faults caused by a single defect can also be
modeled as additional clustering. Our result is supported by test
data from a real production line. Keywords:
defect clustering, defect level, fault clustering, fault coverage,
reject ratio.
-
Influence of Manufacturing Variations in IDDQ Measurements: A New Test Criterion [p. 645]
-
J. Díez, J. López
This work presents a new IDDQ-based test criterion supported
by the characteristics of a set of experimental testing
measurements realized over different samples of industrial
ICs and by the definition of the corresponding simulation
model. Comparing the current consumptions of a specific
circuit a significant correlation between measurements can
be observed. The current behaviour can be divided into
two parts: (1) a circuit dependent one, which has a major
contribution, and affects equally all the devices in a given
die, and (2) a smaller die dependent fraction due to variations,
defective and non-defective, of each of the devices
of a specific die. In this paper, a current model is defined,
introducing the effects of manufacturing variations in the
basic equations of the sub-threshold current to explain that
double behaviour. The results show how it is possible to obtain
a lot of information from IDDQ measurements and how
other test selection criteria can be applied to increase the
IDDQ testing sensitivity and quality.
-
Parametric Fault Simulation and Test Vector Generation [p. 650]
-
K. Saab, N. Ben-Hamida, B. Kaminska
Process variation has forever been the major fail cause
of analog circuit where small deviations in component
values cause large deviations in the measured output
parameters. This paper presents a new approach for
parametric fault simulation and test vector generation.
The proposed approach utilizes the process information
and the sensitivity of the circuit principal components in
order to generate statistical models of the fault-free and
the faulty circuit. The obtained information is then used
as a measurement to quantify the testability of the circuit.
This approach extended by hard fault testing has been
implemented as automated tool set for IC testing called
FaultMaxx and TestMaxx.
Moderators: M. Pfaff, Linz U, A
H. Fleurkens, Philips Research NL
-
Parallel and Distributed VHDL Simulation [p. 658]
-
D. Lungeanu, C. Shi
This paper presents a methodology for parallel and distributed
simulation of VHDL using the PDES (parallel
discrete-event simulation) paradigm. To achieve better features
and performance, some PDES protocols assume that
simultaneous events may be processed in arbitrary order.
We describe a solution of how to apply these algorithms
to have a correct simulation of the distributed VHDL cycle,
including the delta cycle. The solution is based on tie-breaking
the simultaneous events using Lamport's logical
clocks to causally order them according to the VHDL simulation
cycle, and defining the VHDL virtual time as a pair of
simulation physical time and cycle/phase logical time. The
paper also shows how to use this method with a PDES protocol
that relaxes the simulation of simultaneous events to
arbitrary order, allowing the LPs to self-adapt to optimistic
or conservative mode, without the lookahead requirement.
The lookahead is application-dependent and for some systems
may be zero or unknown. The parallel simulation of
VHDL designs ranging from 5531 to 14704 LPs using these
methods obtained a promising, almost linear speedup.
-
Fast Hardware-Software Coverification by Optimistic Execution of Real Processor [p. 663]
-
S. Yoo, J. Lee, J. Jung, K. Rha, Y. Cho, K. Choi
To achieve fast verification of the software part of embedded
system, we propose to run the target processor optimistically,
which effectively reduces the synchronization overhead
with other simulators. For the optimistic processor
execution, we present a processor execution platform and
state saving/restoration methods. We performed optimistic
execution of ARM710A processor in the coverification of an
IS-95 CDMA cellular phone system and obtained up to orders
of magnitude higher performance compared with the
case that the processor runs conservatively.
-
Retargeting of Compiled Simulators for Digital Signal Processors using a Machine Description Language [p. 669]
-
S. Pees, A. Hoffmann, H. Meyr
This paper presents a methodology to retarget the technique
of compiled simulation for Digital Signal Processors
(DSPs) using the modeling language LISA. In the
past, the principle of compiled simulation as means for
speeding up simulators has only been implemented for
specific DSP architectures. The new approach presented
here discusses methods of integrating compiled simulation
techniques to retargetable simulation tools. The
principle and the implementation are discussed in this
paper and results for the TI TMS320C6201 DSP are
presented.
-
Logic Simulation using Networks of State Machines [p. 674]
-
P. Maurer
This paper shows how to simulate a circuit as an
interlocked collection of state machines. Separate state-machines
are used to represent nets and gates. The
technique permits intermixing of logic models, direct
simulation of higher-level functions, and optimization
techniques for fanout free circuits. These techniques are
an extension of techniques that have been used to achieve
high-performance event-driven simulations. New, more
efficient state-machine implementations are presented,
and experimental data is presented that show the
efficiency of the new techniques.
-
A New Partitioning Method for Parallel Simulation of VLSI Circuits on Transistor Level [p. 679]
-
N. Fröhlich, V. Glöckel, J. Fleischmann
Simulation is still one of the most important subtasks
when designing a VLSI circuit. However, more and more elements
on a chip increase simulation runtimes. Especially
on transistor level with highly accurate element modelling,
long simulation runtimes of typically several hours delay
the design process. One possibility to reduce these runtimes
is to divide the circuit into several partitions and to simulate
the partitions in parallel. But the success of such a parallel
simulation is heavily depending on the quality of the
partitioning. This paper presents a new approach for partitioning
VLSI circuits on transistor level and gives runtimes
of parallel simulations of large industrial circuits. The
resulting runtimes show considerable improvement compared
to a known partitioning method, the Node Tearing
method [10].
Moderators: J. Madsen, TU Denmark, D
D. Verkest, IMEC, B
-
From High-Level Specifications down to Software Implementations of
Parallel Embedded Real-Time Systems [p. 686]
-
C. Rust, F. Stappert, P. Altenbernd, J. Tacken
In this paper we describe a methodology and accompanied
tool support for the development of parallel and
distributed embedded real-time system software. The presented
approach comprises the complete design flow from
the modeling of a distributed controller system by means
of a high-level graphical language down to the synthesis
of executable code for a given target hardware, whereby
the implementation is verified to meet hard real-time constraints.
The methodology is mainly based upon the tools
SEA (System Engineering and Animation) and CHaRy (The
C-LAB Hard Real-Time System).
-
An Object Oriented Design Method for Reconfigurable Computing Systems [p. 692]
-
M. Edwards, P. Green
We present a novel method for developing
reconfigurable systems targeted at embedded system
applications. We show how an existing object oriented
design method (MOOSE) has been adapted and enhanced
to include reconfigurable hardware (FPGAs). Our work
represents a significant advance over current embedded
system design methods in that it integrates the use of
reconfigurable hardware components with a systematic
design method for complete systems. The objective is to
produce an object oriented design methodology where
system objects can be seamlessly implemented in either
software or reconfigurable hardware.
-
System Synthesis for Multiprocessor Embedded Applications [p. 697]
-
L. Carro, M. Kreutz, F. Wagner, M. Oyamada
This paper presents the system synthesis techniques
available in S3E2S, a CAD environment for the
specification, simulation, and synthesis of embedded
electronic systems that can be modeled as a combination
of analog parts, digital hardware, and software. S3E2S
is based on a distributed, object-oriented system model,
where objects are initially modeled by their abstract
behavior and may be later refined into digital or analog
hardware and software. System synthesis is targeted to a
multiprocessor platform. Each processor, either a
custom-designed one or an off-the-shelf component, can
have a specialized behavior, like signal processing or
control processing. The environment selects processors
that best match the desired application by analyzing and
comparing processor and application characteristics.
The paper illustrates the architecture selection process
with concrete examples.
-
System Design based on Single Language and Single-Chip Java ASIP Microcontroller [p. 703]
-
S. Ito, L. Carro, R. Jacobi
Microcontrollers have been playing an important role in the embedded market.
However, the designer of microcontroller based systems must deal with
different languages and tools in the hardware and software development,
despite of their distinct design process. This paper presents a new design
strategy to implement embedded applications described uniquely in Java,
while maintaining software compatibility throughout the design process.
Moreover, the target hardware is a single chip FPGA, taking benefit from
their low cost and easy reconfiguration to customize the microcontroller.
This paper presents the environment and some results of system synthesis.
Moderators: S. Hellebrand, Stuttgart U, D
B. Bennetts, Bennetts Associates, UK
-
Cost and Benefit Models for Logic and Memory BIST [p. 710]
-
J. Lu, C. Wu
We present cost and benefit models and analyze the economics
effects of built-in self-test (BIST) for logic and memory
cores. In our cost and benefit models for BIST, we take
into consideration the design verification time and test development
time associated with testability. Experimental results
for logic BIST and memory BIST examples show that
a threshold volume exists when BIST is profitable for the
logic core under consideration -- it is not recommended for
a higher volume. However, BIST is a good choice for memory
cores in general.
-
Scan Latch Partitioning into Multiple Scan Chains for Power Minimization in
Full Scan Sequential Circuits [p. 715]
-
N. Nicolici, B. Al-Hashimi
Power dissipated during test application is substantially
higher than power dissipated during functional operation
[22] which can decrease the reliability and lead to yield
loss. This paper presents a new technique for power
minimization during test application in full scan sequential
circuits. The technique is based on classifying scan
latches into compatible, incompatible and independent scan
latches. Based on their classification scan latches are partitioned
into multiple scan chains. A new test application
strategy which applies an extra test vector to primary inputs
while shifting out test responses for each scan chain, minimizes
power dissipation by eliminating the spurious transitions
which occur in the combinational part of the circuit.
Unlike previous approaches [9] which are test vector and
scan latch order dependent and hence are not able to handle
large circuits due to the complexity of the design space, this
paper shows that with low test area and test data overhead
substantial savings in power dissipation during test application
are achieved in very low computational time. For
example, in the case of benchmark circuit s15850 it takes
3600s in computational time and 1% in test area and
test data overhead to achieve 80% savings in power dissipation.
-
Detecting Undetectable Controller Faults using Power Analysis [p. 723]
-
J. Carletta, C. Papachristou, M. Nourani
In systems consisting of interacting datapaths and controllers, the
datapaths and controllers are traditionally tested separately by isolating
each component from the environment of the system during
test. This is not possible when the controller-datapath pair is an
embedded system designed as a hard core. This work facilitates
the testing of controller-datapath pairs in a truly integrated fashion.
The key to the approach is a careful examination of the types of gate
level stuck-at faults that can occur within the controller. A class
of faults that are undetectable in an integrated test by traditional
means is identified. These faults create faulty but functional circuits.
The effect of these faults on power consumption is explored,
and a method based on power analysis is given for detecting these
faults. Analysis is given for three example systems.
-
Multi-Node Static Logic Implications for Redundancy Identification [p. 729]
-
K. Gulrajani, M. Hsiao
This paper presents a method for redundancy
identification (RID) using multi-node logic implications.
The algorithm discovers a large
number of direct and indirect implications by
extending single node implications [7] to multiple
nodes. The large number of implications
found by multi-node implication method introduces
a new redundancy identification technique. Our approach
uses an effective node-pair
selection method which is O(n) in the number of
nodes to reduce execution time, and it can be
used as an efficient preprocessing phase for test
generation. Application of these multi-node
static logic implications uncovered more redundancies
in ISCAS85 combinational circuits than
previous single-node methods without excessive
computational effort.
-
Dynamic Power Management of Laptop Hard Disk [p. 736]
-
T. Simunic, L. Benini, P. Glynn, G. De Micheli
Optimal power management policies for laptop hard
disk are obtained with a system model that can handle
non-exponential interarrival times in the idle and the sleep
states. The measurement results on Sony Vaio laptop show
that our policy has 1.7 times less power consumption as
compared to the default Windows timeout policy with still
high performance.
-
Lower Bounds on the Power Consumption in Scheduled Data Flow Graphs with Resource Constraints [p. 737]
-
L. Kruse, E. Schmidt, G. Jochens, A. Stammermann, W. Nebel
The problem of estimating lower bounds on the power
consumption in scheduled data flow graphs with a fixed
number of allocated resources prior to binding is
addressed. The estimated bound takes into account the
effects of resource sharing. It is shown that by introducing
Lagrangian multipliers and relaxing the low power binding
problem to the Assignment Problem, which can be
solved in , a tight and fast computable bound is
achievable. Experimental results show the good quality of
the bound. In most cases, deviations smaller than 5% from
the optimal binding were observed. The proposed technique
can for example be applied in branch and bound
high-level synthesis algorithms for efficient pruning of the
design space.
-
Area Optimization of Analog Circuits Considering Matching Constraints [p. 738]
-
C. Paulus, U. Kleine, R. Thewes
A new, fully analytical method is presented to optimize
active device area in complex, device mismatch sensitive
analog circuits. It represents an efficient alternative to time
consuming Monte-Carlo simulations and numerical iteration
procedures for design centering.
-
XFridge: A SPICE-based, Portable, User-Friendly Cell-Level Sizing Tool [p. 739]
-
F. Pérez-Montes, F. Medeiro, R. Domínguez-Castro, F. Fernández, A. Rodríguez-Vázquez
This paper presents a user-friendly tool which allows
automated sizing of IC cells. It comprises an open optimization-based
sizing program, a database which allows
knowledge re-use and also easy addition of new knowledge,
and a powerful graphical user interface.
-
Evaluation of Interconnects with TDR [p. 740]
-
U. Pillkahn
In this paper, a novel technique is presented for the
verification of board level connections on PCBs. The
time domain reflectometry (TDR) method is used to
identify whether a pin connection is faulty or not.
The test pulse - and evaluation circuitry is part of the
chip. Although the chip size increases slightly, the
method is highly efficient. No Automatic Test
Equipment (ATE) is necessary to carry out the test
and since only the physical behaviour of the
connection from the internal driver via pin to board is
examined, no test vectors are needed. The test time
and the test preparation time are lower compared
with conventional test methods.
-
Structural Testing on Real Boards [p. 741]
-
P. Bach, M. Bosch
For structural interconnect testing a graph is generated
from the physical layout of the interconnects. The vertices
are then colored. The number of colors determines
the number of different serial test patterns needed. Based
on real PCB layout data we give experimental results, that
show how the choice of the graph generation method and
of the coloring algorithm influence the number of colors.
-
Cycle-True Simulation of the ST10 Microcontroller [p. 742]
-
L. Gauthier, A. Jerraya
With the rising complexity of electronic systems, containing
more and more both hardware and software parts, it becomes
necessary to simulate simultaneously hardware and
software parts at whatever abstraction level. These simulation
techniques, called co-simulation, require fast and
flexible simulators. In this paper, we introduce the elaboration
of a microcontroller simulator for an accurate hardware/
software co-simulation at the clock-cycle level. It is
our goal to have a simulator which is fast enough to simulate
a few minutes of real time execution within a reasonable
laps of time. To be more precise, we deal here with the realization
of a simulator for the ST10 microcontroller and its
integration into a co-simulation environment.
-
Cycle-based Simulation Algorithms for Digital Systems using High-Level Decision Diagrams [p. 743]
-
A. Morawiec, R. Ubar, J. Raik
The paper addresses the problem of speeding up functional
cycle-based simulation of digital systems. The system
is represented as a network of interconnected Decision
Diagrams (DD). Three new innovative simulation
algorithms are introduced to implement the idea of
simulation execution according to activities of the system
variables: forward event-driven algorithm and two
versions of back-tracing algorithms. Experiments are
presented to show the simulation efficiency improvement
offered by those algorithms.
-
Mixed-Signal BIST using Correlation and Reconfigurable Hardware [p. 744]
-
J. da Silva, J. Duarte, J. Matos
Reducing the area overhead required by BIST structures
can be achieved by reconfiguring existing hardware to perform
test related control and processing functions. This
work shows how the resources required for these operations
can be implemented in-circuit, taking advantage of
programmable logic available in the system. Structural and
functional tests are performed using correlation to obtain
iDD and vOUT cross-correlation
signatures, and to measure gain, phase, and total harmonic distortion.
-
An Experimental Study of Satisfiability Search Heuristics [p. 745]
-
F. Aloul, J. Marques-Silva, K. Sakallah
Interest in propositional satisfiability (SAT) has been on
the rise lately, spurred in part by the recent availability of
powerful solvers that are sufficiently efficient and robust to
deal with the large-scale SAT problems that typically arise in
electronic design automation application. A frequent question
that CAD tool developers and users typically ask is
which of these various solvers is "best;" the quick answer is,
of course, "it depends." In this paper we attempt to gain
some insight into, rather than definitively answer, this question.
-
A Memory Architecture with 4-Address Configurations for Video Signal Processing [p. 746]
-
S. Chang, J. Kim, L. Kim
A memory architecture with four address configurations
is proposed for video signal processing. The implemented
8-words X 64-bits 8-port SRAM has 256-bit simultaneous
data accessibility by horizontal and vertical address configurations
and has 25.6 Gbits/s of high bandwidth.
-
A Hardware Platform for VLIW based Emulation of Digital Designs [p. 747]
-
G. Haug, U. Kebschull, W. Rosenstiel
In [2] the concept of a very long instruction word
(VLIW) processor based system to emulate synthesized RT-level
descriptions has been presented. As described in [2]
the RAVE System (RT-Architecture-VLIW-Emulator) overcomes
many of the problems common to FPGA based emulation
and prototyping systems. Particularly, these are
area problems in conjunction with large data paths, long
turnaround times and low emulation clock frequencies. This
abstract briefly describes the hardware of the RAVE System.
-
Architecture Exploration of Parameterizable EPIC SOC Architectures [p. 748]
-
A. Halambi, R. Cornea, P. Grun, N. Dutt, A. Nicolau
Design Space Exploration (DSE) of programmable systems-on-chip (SOC) incorporating parameterizable processor cores
is difficult due to the complex and intrinsically non-structured
interactions between different architectural features of the processor
(such as wide parallelism, and deep pipelines), the compiler
and the application. Changing different processor features
implies generating detailed operation conflict information --
represented as Reservation Tables (RTs). If done manually,
it can be a very tedious and error prone task, especially for
deep pipelines, with complex resource sharing and large non-structured
instruction sets. In this paper we use RTGEN[2], an
approach for automatic generation of RTs, to drive rapid architectural
exploration of a large number of designs. We present
exploration experiments on a large set of VLIW-like EPIC 1
architectures,
for varying port sharing, number of functional units,
multicycling units, and with varied latency configurations. Our
experiments uncovered several non-intuitive architecture design
points, giving the system-level designer further flexibility in exploration
of programmable SOC architectures.
-
Improving the Schedule Quality of Static-List Time-Constrained Scheduling [p. 749]
-
S. Govindarajan, R. Vemuri
The most compelling reason for High-Level Synthesis (HLS)
to be accepted in the state-of-the-art CAD flow is its ability
to perform design space exploration. Design space exploration
requires efficient scheduling techniques that have
a low complexity and yet produce good quality schedules.
The Time-Constrained Scheduling (TCS) problem minimizes
the number of functional units required to schedule a particular
Data Flow Graph (DFG) within a specified number
of time steps. Over the past few years a number of techniques
[1, 2] have been proposed to solve the TCS problem.
Heuristic list scheduling algorithms have been widely used
for their low-complexity and good performance. The complexity
of a dynamic-list scheduling algorithm, such as the
Force Directed Scheduling (FDS), is O(T * N2), where
T is the time constraint and N is the number of operations.
list scheduling [1, 2] algorithms are the least complex
among the known class of scheduling techniques with
a linear time complexity of O(T * N) . Typically, staticlist
algorithms, in order to maintain low-complexity,
do not perform any look-ahead like that of FDS. The draw-back
that, static-list scheduling algorithms may not generate
high-quality schedules.
-
Synthesis for Mixed CMOS/PTL Logic [p. 750]
-
C. Yang, M. Ciesielski
High noise immunity and level-restoring capabilities of static
CMOS gates, combined with small area and low power of PTL
cells, make a mixed CMOS/PTL design style an ideal alternative
to the all-CMOS technology. However, the synthesis of mixed
CMOS/PTL circuits imposes a great challenge to the existing
synthesis methodology. Neither traditional techniques based on
algebraic factorization nor methods based on direct BDD mapping
[1] [2] [3] are applicable to this new circuit style.
We have recently proposed a new BDD-based logic optimization
method for static CMOS [4]. It is based on iterative BDD
decomposition using various dominators which correspond to decomposable
BDD structures leading to AND, OR, XOR and MUX
decompositions. Synthesis results show that the method is very
efficient for both AND/OR- and XOR-intensive functions. Since
PTL structures can be easily identified on a BDD, our method
can be readily extended to perform logic decomposition leading
to mixed CMOS/PTL logic implementation. In contrast to
other PTL synthesis techniques, based on direct BDD mapping,
our method is not limited to decomposition onto PTLs only; its
logic decomposition and optimization is driven by the capabilities
of both the static CMOS and PTL logic. Our BDD decomposition
method can also account for various parameters associated
with circuit performance, thus avoiding drawbacks of direct
BDD mapping-based synthesis, such as large fanouts and
long transistor chains.
bulk of our BDD decomposition theory has been published
in [4]. Table I summarizes the different types of BDD decompositions
available; it can be seen that all types of atomic
decompositions and their corresponding BDD structures can be
easily identified.
-
TOP: An Algorithm for Three-Level Optimization of PLDs [p. 751]
-
E. Dubrova, P. Ellervee, D. Miller, J. Muzio
In this paper we present an heuristic algorithm TOP
(Three-level Optimization of PLDs), targeting a three-level
logic expression of type g1 o g2, where
g1 and g2 are sum-of-products
and "o" is a binary operation. Such an expression
can be implemented by a three-level Programmable Logic
Device (PLD) consisting of PLA1 and PLA2, implementing
the first two levels of logic, and a set of two-input logic expanders,
implementing the third level. Each logic expander
can be programmed to realize any function of two variables.
PLD of this type seems to give a good trade-off between
the speed of a flat PLA and density of a multi-level network
of PLAs. TOP chooses the functionality of the logic
expanders so that the area of the PLAs is minimized. To
the best of our knowledge, this is the first work addressing
this problem for an arbitrary operation "o" and attempting
to choose the operation which results in the smallest total
number of product-terms. Several algorithms for the specified
cases of "o" have been presented in the past (see [2]
for overview). An algorithm, constructing the expansion
of type for an arbitrary "o" with Xg, Xh
&propersubset; X and Xg &union; Xh = X
and is described in [3]. However,
this algorithm does not target the minimal number of
products, and does not consider the case when or
equals , which is allowed in our case.
-
Testing Arithmetic Coprocessor in System Environment [p. 752]
-
J. Sosnowski, T. Bech
Arithmetic coprocessors (AC) are quite complex
circuits and testing them is an important and not easy
problem (not covered in the literature). Analyzing
diagnostic software for IBM PCs we found that the testing
procedures for ACs are limited to simple basic checks.
Hence we decided to develop efficient test procedures in a
systematic way. They are executed on the main processor
and generate appropriate stimuli to AC functional blocks
(e.g. instruction sequencer, data path units) and verify test
responses. An important contribution of this paper is the
integration of various approaches to testing and increased
test observability of test results assured by on-chip event
monitors and system exceptions.
-
A Flexibile Specification Framework for Hardware-Software Codesign [p. 753]
-
J. Moya, S. Domínguez, F. Moya, J. López
In this poster, we present a new specification technique
for complex hardware-software systems, based on standard
high-level programming languages, such as C, C++, Java,
Scheme, or Ada, without extensions or semantic changes.
Unlike previous approaches, the designer may choose the
model of computation and the specification language that
best suits her needs, while still being able to formally verify
the correctness of the specification. The details of the
available hardware and software resources, and the implementation
of the different models of computation are encapsulated
in libraries to maximize reuse in system specifications.
-
An Integrated Design Environment for Early Stage Conceptual Design [p. 754]
-
J. Zuo, S. Director
Conceptual design, the preliminary phase of design in
which both well-defined problem specifications and high
level design solutions are developed, is becoming increasingly
important as design complexity increases. In spite of
the importance of this activity, few tools exist to support
this phase of design. In this paper we present a systematic
and flexible model of conceptual design and describe how
this model has been employed to realize a prototype conceptual
design process management environment, called
Clio II.
-
A Web-based System for Assessing and Searching for Designs [p. 755]
- H. Kahn, A. Carpenter, N. Whitaker
Users need to access design data for a variety of
reasons. Designers may be interested in accessing
repositories of IP blocks for possible inclusion in their
own designs. Alternatively, EDA tool developers and
purchasers need a representative set of designs to evaluate
or benchmark software. This poster presents a web-based
system used both for profiling designs and for searching
for designs with specific characteristics. The STEED
system summarised here is based on external information
models that tailor it to user requirements.
-
A Versatile Built-In Self Test Scheme for Delay Fault Testing [p. 756]
-
Y. Tsiatouhas, T. Haniotakis, D. Nikolos, A. Arapoyanni
A new Built-In Self Test (BIST) scheme is presented that can be used for
both off-line production or periodic testing of delay faults as well as
for concurrent detection of faults causing signal delays in the field.
The scheme is based on the IDDT monitoring of the outputs
of the circuit under test (CUT). The proposed scheme has minimal impact
on the performance and silicon area of the design since the same response
verifier circuit is used for both off-line and concurrent detection of
errors in the field.
-
Effective Low Power BIST for Datapaths [p. 757]
-
D. Gizopoulos, N. Kranitis, A. Paschalis, M. Psarakis, Y. Zorian
Power in processing cores (microprocessors, DSPs) is primarily consumed
in the datapath part. Among the datapath functional modules, multipliers
consume the largest amount of power due to their size and complexity. We
propose a low power BIST scheme for datapaths built around
accumulator pairs. The target is low average power dissipation between
successive test vectors. This is achieved by taking advantage of the regularity
of multiplier modules and achieving very high fault coverage by a
sized test set with as small as possible input switching activity. The proposed
BIST scheme is more efficient than pseudorandom BIST for the same high
fault coverage target. Up to 77.25% power saving is achieved in the set of
experimental results provided in the paper.
-
Exploiting Hierarchy for Multiple Error Correction in Combinational Circuits [p. 758]
-
D. Hoffmann, T. Kropf
Boolean equivalence checking has turned out to be a
powerful method for verifying combinational circuits and
is already an integrated part of the design cycle. If equivalence
checking fails, Design Error Diagnosis and Correction
(DEDC) is performed. DEDC tries to locate and
correct design errors fully automatically and can therefore
considerably speed up the whole design cycle.
The methods can roughly be divided into three classes:
ATPG based approaches (e.g. [4]), structure based approaches
[3], and logic based (symbolic) approaches (e.g.
[1]). Most approaches rely on the "single error assumption"
and cannot be applied if multiple errors occur in a
circuit. This is a hard restriction for practical applications
as the average number of design errors is usually greater
one. However, multiple error rectification is a challenging
task since the search space grows exponentially with the
number of design errors.
Our method is a symbolic method for multiple error
rectification of combinational circuits and further development
of [1] that can correct single errors, only.
-
Automatic Equivalence Check of Circuit Descriptions at Clocked Algorithmic and
Register Transfer Level [p. 759]
-
J. Schönherr, B. Straube
One of the big challenges in circuit design is the formal
verification at clocked algorithmic or register-transfer level.
To overcome the limits of BDD based approaches we apply
an abstraction of the datapath by uninterpreted functions
[2]. A function f is uninterpreted if all properties except
&universal;i(si => ti)) = f(s1,...,sn) = f (t1,...,tn) are dropped.
In the past symbolic execution and theorem proving were
used to check the equivalence of two sequential circuits that
are abstracted by uninterpreted functions. Symbolic execution
is an enumeration of states reachable from the initial
state [2]. Because of the uninterpreted functions there is no
general termination condition of such procedures.
In the theorem prover based approach [4] the proof is
usually carried out using the induction principle. Often lemmas
are needed to prove the equivalence. These lemmas
are also proven by induction. These lemmas are often invariants.
The proof of the induction step is automated by
decision procedures.
In our approach symbolic execution is used to generate
potential invariants. Then the equivalence is proven by automatic
induction proofs of the lemmas. A more detailed
description of the procedure can be found in [3].
-
A Single Phase Latch for High Speed GaAs Domino Circuits [p. 760]
-
S. Nooshabadi, J. Montiel-Nelson, A. Núñez
A Single Phase Latch (SPL) suitable for GaAs domino
logic gates and compatible with DCFL is presented. Two
versions of the SPL are reported in this work: Single Ended
SPL used in pure domino logic and Differential SPL used in
dynamic Cascode Voltage Switch Logic. SPL is compared
with other common GaAs dynamic circuits and latches. The
results demonstrate that SPL is superior in terms of device
count, area, clock rate and power consumption.
-
An Incremental Specification Flow for Real Time Embedded Systems [p. 761]
-
A. Niemegeers, G. de Jong
The fast growing complexity of today's real time
embedded systems necessitates new design methods and
tools to face the problems of integration and validation
of complex systems. We have combined a number of
different hardware and software methods into one system
level design method. The proposed flow is based on
UML concepts, executable specifications and platform
based design.
-
Improving the Error Detection Ability of Concurrent Checkers by Observation Point Insertion in the Circuit under Check [p. 762]
-
V. Vardanian, L. Mirzoyan
A heuristic design-for-checkability method based on
observation point insertion in the Circuit Under Check
(CUC) is proposed to increase the error detection ability
of Concurrent Checkers (CC). In particular, at least 99%
of error detection is obtained for parity checkers and
almost all ISCAS'85 benchmark circuits by inserting 2-5
groups of observation points compacted by parity trees.
-
On-Line Testing and Diagnosis of Bus Lines with Respect to Intermediate Voltage Values [p. 763]
-
C. Metra, M. Favalli, B. Riccò
This paper presents a self-checking, on-line testing and
diagnosis scheme for bus lines affected by intermediate voltage
values possibly due to bridging faults, or to different
kinds of faults affecting the bus connected units.
-
Efficient Method of Failure Detection in Iterative Array Multiplier [p. 764]
-
A. Drozd
In this paper we present a method for on-line testing of
multiplier. The method is based on the time and
information natural redundancy and provides design of
the simple self-checking checker for hard failure detection
in 8-bit and 16-bit array multiplier.
-
Incorporation of Hard-Fault-Coverage in Model-based Testing of Mixed-Signal ICs [p. 765]
-
C. Wegener, M. Kennedy
The application of the Linear Error
Mechanism Modeling Algorithm (LEMMA [1]) to various
DAC and ADC architectures has raised the issue of
including hard-fault-coverage as an integral part of the
algorithm.
In this work, we combine defect-oriented functionality
tests and specification-oriented linearity tests of a Mixed-Signal
IC to save test time. The key development is a novel
test point selection strategy which not only optimizes the
INL-prediction variance of the model, but also satisfies hard-fault-
coverage constraints.
|