| |
DATE 2002 ABSTRACTS
Sessions:
[Keynote]
[1A]
[1B]
[1C]
[1D]
[1E]
[2A]
[2B]
[2C]
[2D]
[2E]
[2F]
[3A]
[3B]
[3C]
[3D]
[3E]
[3F2]
[4A]
[4B]
[4C]
[4D]
[4E]
[5A]
[5B]
[5C]
[5D]
[5E]
[6A]
[6B]
[6C]
[6D]
[6E]
[6F]
[7A]
[7B]
[7C]
[7D]
[7E]
[8A]
[8B]
[8C]
[8D]
[8E]
[9A]
[9B]
[9C]
[9D]
[9E]
[9G]
[10A]
[10B]
[10C]
[10D]
[10E]
[Posters]
Plenary -- Keynote Session
Moderator: J. da Franca, ChipIdea, PT
-
On Nanoscale Integration and Gigascale Complexity in the Post .Com World [p. 12]
-
Hugo De Man, Professor, KU Leuven, Senior Research Fellow, IMEC, BE
While process technologists are obsessed to follow Moore's curve down to nanoscale dimensions, design
technologists are confronted with gigascale complexity. On the other hand, post-PC and post dotcom products
require zero cost, zero energy yet software programmable novel system architectures to be sold in huge volumes and
to be designed in exponentially decreasing time. How do we cope with these novel silicon architectures? What
challenges in research does this create? How to create the necessary tools and skills and how to organize research
and education in a world driven by shareholders value? Can you spare half an hour to reflect
on these challenges to the design community?
-
Global Responsibilities in SOC Design [p. 12]
-
Taylor Scanlon, President & CEO, Virtual Silicon Technology, US
The technical complexities of advanced SoC design are compounded by changes in the economic structure of the
worldwide semiconductor industry. A look at some of the organizational and personal responsibilities that will be
required to meet the challenges of SoC design in the Future
Organizer: Yervant Zorian, Virage Logic, US
Moderator: Nic Mokhoff, EE Times, US
-
How to Choose Semiconductor IP? -- Embedded Processors [p. 14]
-
I. Phillips
It is well recognised that the process of new product
development, introduction and marketing is
fraught with difficulty. Indeed the probability of
achieving plan timescale, costs and budget are so
low, that some degree of failure is inevitable. So
whilst the primary role of the Manager is to identify
and minimise all major risks, and make sure the ones
remaining are adequately resourced: A secondary
role is to make sure that what failure does occur,
does not damage his/her reputation!
The Virtual Component appears in the context of
risk minimisation. CPU or UART, the motive is the
same, get the right product from concept to customer
as quickly as possible. The make or buy decision is a
risk/cost trade-off, and as the cost of failure is
normally so high, risk emerges as the dominant
factor; Is it riskier to design, or buy-in?
-
Make Your SOC Design a Winner: Select the Right Memory IP [p. 15]
-
V. Ratford
The 2000 SIA roadmap shows over 50 % of the
area in an SOC being occupied by embedded memory.
The selection of the memory IP and supplier is critical to
the success of the design and the ramp to volume. The
Memory IP can determine yield, reliability, cost, speed
and/or power. Mr. Ratford will help you navigate
through the evaluation process by discussing key
requirements and possible solutions when evaluating
memory for your next SOC design.
-
How to Choose Semiconductor IP: Embedded Software [p. 16]
-
G. Martin
Embedded software Intellectual Property (IP) is
becoming vital for today's complex System-on-Chips We
first define the notion of Hardware-dependent Software,
and then review the multidimensional criteria for choosing
ESW IP, including retargetability and portability,
flexibility, optimisation, validation and certification.
-
IP Day: How to Choose Semiconductor IP? [p. 17]
-
P. Bricaud
The semiconductor industry gave the most tremendous
challenge to the electronic design community and EDA
industry by making available a silicon capacity that
exceeds by far what today's designer can utilise in a
reasonable amount of time. Reasonable timeframes for
System-on-a-Chip developments in the multimedia and
communication markets are less than eighteen months,
when not even nine! I would like to give credit to Gary
Smith, Chief Analyst at Dataquest, to have raised a very
pertinent media alert in his article, `The Revolution isn't
Coming -- It's Already Here', in Virtual Chip Design, May
1997. It was clearly stated that in order to fill the design
gap between available gates on silicon and design
methodology, the solution was through system-level
integration (SLI) using what was called at that time
system-level macros (SLM). The electronic design
community and EDA companies picked up the gauntlet
and started what will be known as the Virtual Components
creation through the industry organisation called the
Virtual Socket Interface Alliance (VSIA). This was
followed by Mentor Graphics and Synopsys who signed a
Design Reuse Partnership, which led to the publishing of
the "Reuse Methodology Manual for SoC Designs". The
last stage was to create an industry accepted Virtual
Component Quality Spreadsheet by merging the two
efforts.
Moderators: L Fix, Intel, ISR; T. Kropf, Bosch, DE
-
Formal Verification of the Pentium 4 Floating-Point Multiplier [p. 20]
-
R. Kaivola and N. Narasimhan
We present the formal verification of the floating-point
multiplier in the Intel IA-32 Pentium© 4 microprocessor. The
verification is based on a combination of theorem-proving and BDD
based model-checking tasks performed in a unified hardware verification
environment. The tasks are tightly integrated to accomplish complete
verification of the multiplier hardware coupled with the rounder logic.
The approach does not rely on specialized representations like Binary
Moment Diagrams or its variants.
-
Using Rewriting Rules and Positive Equality to Formally Verify Wide-Issue Out-of-Order
Microprocessors with a Reorder Buffer [p. 28]
-
M. Velev
Rewriting rules and Positive Equality [4] are combined in an
automatic way in order to formally verify out-of-order processors
that have a Reorder Buffer, and can issue/retire multiple
instructions per clock cycle. Only register-register instructions
are implemented, and can be executed out-of-order, as soon as
their data operands can be either read from the Register File, or
forwarded as results of instructions ahead in program order in
the Reorder Buffer. The verification is based on the Burch and
Dill correctness criterion [6]. Rewriting rules are used to prove
the correct execution of instructions that are initially in the Reorder
Buffer, and to remove them from the correctness formula.
Positive Equality is then employed to prove the correct execution
of newly fetched instructions. The rewriting rules resulted in up
to 5 orders of magnitude speedup, compared to using Positive
Equality alone. That made it possible to formally verify processors
with up to 1,500 instructions in the Reorder Buffer, and
issue/retire widths of up to 128 instructions per clock cycle.
-
Automatic Verification of In-Order Execution In Microprocessors with Fragmented Pipelines and
Multicycle Functional Units [p. 36]
-
P. Mishra, N. Dutt, A. Nicolau, and H. Tomiyama
As embedded systems continue to face increasingly higher
performance requirements, deeply pipelined processor architectures
are being employed to meet desired system performance.
System architects critically need modeling techniques
that allow exploration, evaluation, customization
and validation of different processor pipeline configurations,
tuned for a specific application domain. We propose a novel
Finite State Machine (FSM) based modeling of pipelined
processors and define a set of properties that can be used to
verify the correctness of in-order execution in the presence
of fragmented pipelines and multicycle functional units. Our
approach leverages the system architect's knowledge about
the behavior of the pipelined processor, through Architecture
Description Language (ADL) constructs, and thus allows a
powerful top-down approach to pipeline verification. We applied
this methodology to the DLX processor to demonstrate
the usefulness of our approach.
-
A Case Study for the Verification of Complex Timed Circuits: IPCMOS [p. 44]
-
M. Peña, J. Cortadella, E. Pastor, and A. Smirnov
The verification of a n-stage pulse-driven IPCMOS pipeline, for
any n > 0, is presented. The complexity of the system is 32n transistors
and delay information is provided at the level of transistor.
The correctness of the circuit highly depends on the timed behavior
of its components and the environment. To verify the system,
three techniques have been combined: (1) relative-timing-based
verification from absolute timing information [13], (2) assume-guarantee
reasoning to verify untimed abstractions of timed components
and (3) mathematical induction to verify pipelines of any
length. Even though the circuit can interact with pulse-driven environments,
the internal behavior between stages commits a handshake
protocol that enables the use of untimed abstractions. The
verification not only reports a positive answer about the correctness
of the system, but also gives a set of sufficient relative-timing
constraints that determine delay slacks under which correctness
can be maintained.
Moderators: R.H.J.M. Otten, TU Eindhoven, NL; M.D.F. Wong, Texas U, US
-
FPGA Placement by Thermodynamic Combinatorial Optimization [p. 54]
-
J. De Vicente, J. Lanchares, and R. Hermida
In this paper, the placement problem on FPGAs is faced using Thermodynamic
Combinatorial Optimization (TCO). TCO is a new combinatorial optimization
method based on both Thermodynamics and Information Theory. In TCO two
kinds of processes are considered: microstate and macrostate transformations.
Applying the Shannon's definition of Entropy to microstate reversible
transformations, a probability of acceptance based on Fermi-Dirac statistics
is derived. On the other hand, applying thermodynamic laws to reversible
macrostate transformations, an efficient annealing schedule is
provided TCO has been compared with Simulated Annealing (SA) on a set
of benchmark circuits for the FPGA placement problem. TCO has achieved
large time reductions with respect to SA, while providing interesting
adaptive properties.
-
An Enhanced Q-Sequence Augmented with Empty-Room-Insertion and Parenthesis Trees [p. 61]
-
C. Zhuang, Y. Kajitani, K. Sakanushi, and L. Jin
After the discussion on the difference between floorplanning
and packing in VLSI placement design, this paper
adapts the floorplanner that is based on the Q-sequence to
packing algorithm. For the purpose, some empty room insertion
is required to guarantee not to miss the optimum packing.
To increase the performance in packing, a new move that perturbs
the floorplan is introduced in terms of the Parenthesis-Tree Pair
. A Simulated Anealing based packing search algorithm
was implemented. Experimental results showed the
effect of empty room insertion.
-
Arbitrary Convex and Concave Rectilinear Module Packing Using TCG [p. 69]
-
J. Lin, H. Chen, and Y. Chang
In this paper, we deal with arbitrary convex and concave rectilinear
module packing using the Transitive Closure Graph (TCG) representation.
The geometric meanings of modules are transparent to TCG and its
induced operations, which makes TCG an ideal representation for floorplanning/
placement with arbitrary rectilinear modules. We first partition a
rectilinear module into a set of submodules and then derive necessary and
sufficient conditions of feasible TCG for the submodules. Unlike most
previous works that process each submodule individually and thus need
post processing to fix deformed rectilinear modules, our algorithm treats
a set of submodules as a whole and thus not only can guarantee the feasibility
of each perturbed solution but also can eliminate the need of the
post processing on deformed modules, implying better solution quality
and running time. Experimental results show that our TCG-based algorithm
is capable of handling very complex instances; further, it is very
efficient and results in better area utilization than previous work.
Moderators: J. Segura, Illes Balears U, ES; H. Manhaeve, Q-Star Test, BE
-
A Test Design Method for Floating Gate Defects (FGD) in Analog Integrated Circuits [p. 78]
-
M. Pronath, H. Graeb, and K. Antreich
A unified approach to fault simulation for FGDs is introduced.
Instead of a direct fault simulation, the proposed
approach calculates indirectly from the simulator output the
sets of undetectable values of the trapped charge on the
floating gate transistor. It covers all potential gate charges
of an FGD at one or more transistors and allows the application
of conventional circuit simulators for simulating DC,
AC and transient test.
Based on this fault simulation, a test design methodology
is presented that can determine all test sets that detect all
FGDs for all possible values of gate charge.
-
Exact Grading of Multiple Path Delay Faults [p. 84]
-
S. Padmanaban and S. Tragoudas
The problem of fault grading for multiple path delay faults is
studied and a method of obtaining the exact coverage is presented.
The faults covered are represented and manipulated
as sets by zero-suppressed binary decision diagrams (ZBDD),
which are shown to be able to store a very large number of path
delay faults. For the extreme case of memory problem, a method
to estimate the coverage of the test set is also presented. The
problem of fault grading is solved with a polynomial number of
BDD operations. Experimental results on the ISCAS'85 benchmark
include test sets from ATPG tools and specifically designed
tests in order to investigate the limitations and properties of the
proposed method.
-
Modeling Techniques and Tests for Partial Faults in Memory Devices [p. 89]
-
Z. Al-Ars and A. van de Goor
It has always been assumed that fault models
in memories are sufficiently precise for specifying the faulty
behavior. This means that, given a fault model, it should
be possible to construct a test that ensures detecting the
modeled fault. This paper shows that some faults, called
partial faults, are particularly difficult to detect. For these
faults, more operations are required to complete their fault
effect and to ensure detection. The paper also presents
fault analysis results, based on defect injection and simulation,
where partial faults have been observed. The impact
of partial faults on testing is discussed and a test to detect
these partial faults is given.
Key words: partial faults, DRAMs, fault models, defect
simulation, memory testing, completing operations.
-
A New ATPG Algorithm to Limit Test Set Size and Achieve Multiple Detections of All Faults [p. 94]
-
S. Lee, B. Cobb, J. Dworak, M. Grimaila, and M. Mercer
Deterministic observation and random excitation of fault
sites during the ATPG process dramatically reduces the
overall defective part level. However, multiple observations
of each fault site lead to increased test set size and require
more tester memory. In this paper, we propose a new ATPG
algorithm to find a near-minimal test pattern set that detects
faults multiple times and achieves excellent defective
part level. This greedy approach uses 3-value fault simulation
to estimate the potential value of each vector candidate
at each stage of ATPG. The result shows generation of a
close to minimal vector set is possible only using dynamic
compaction techniques in most cases. Finally, a systematic
method to trade-off between defective part level and test size
is also presented.
Moderators: E. Macii, Politecnico di Torino, IT; K. Roy, Purdue U, US
-
Low Power Error Resilient Encoding for On-Chip Data Buses [p. 102]
-
D. Bertozzi, L. Benini, and G. De Micheli
As technology scales toward deep submicron, on-chip interconnects
are becoming more and more sensitive to noise
sources such as power supply noise, crosstalk, radiation
induced effects, etc. Transient delay and logic faults are
likely to reduce the reliability of data transfers across datapath
bus lines. This paper investigates how to deal with
these errors in an energy efficient way. We could opt for
error correction, which exhibits larger decoding overhead,
or for the retransmission of the incorrectly received data
word. Provided the timing penalty associated with this latter
technique can be tolerated, we show that retransmission
strategies are more effective than correction ones from an
energy viewpoint, both for the larger detection capability
and for the minor decoding complexity. The analysis was
performed by implementing several variants of a Hamming
code in the VHDL model of a processor based on the Sparc
V8 architecture, and exploiting the characteristics of AMBA
bus slave response cycles to carry out retransmissions in a
way fully compliant with this standard on-chip bus specification.
-
Managing Power Consumption in Networks on Chip [p. 110]
-
T. Simunic and S. Boyd
Systems on a chip (SOCs) are rapidly evolving into larger networks on a
chip (NOCs). This work presents a new methodology for managing
power consumption for NOCs. Power management problem is
formulated using closed-loop control concepts, with the estimator tracking
changes in the system parameters and recalculating the new power
management policy accordingly. Dynamic voltage scaling and local
power management are formulated in the node-centric manner, where
each core has its local power manager that determines units power states.
The local power manager's interaction with the other system cores
regarding the power and the QoS needs enables network-centric power
management. The new methodology for power management of NOCs is
tested on a system consisting of four satellite units, each with the local
power manager capable of both node and network centric power
management. The results show large savings in power with good QoS.
-
Competitive Analysis of Dynamic Power Management Strategies for Systems with
Multiple Power Savings States [p. 117]
-
S. Irani, R. Gupta, and S. Shukla
We present strategies for "online" dynamic power management(DPM)
based on the notion of the competitive ratio
that allows us to compare the effectiveness of algorithms
against an optimal strategy. This paper makes two contributions:
it provides a theoretical basis for the analysis of DPM
strategies for systems with multiple power down states; and
provides a competitive algorithm based on probabilistically
generated inputs that improves the competitive ratio over
deterministic strategies. Experimental results show that our
probability-based DPM strategy improves the efficiency of
power management over the deterministic DPM strategy by
25%, bringing the strategy to within 23% of the optimal offline DPM.
-
AccuPower: An Accurate Power Estimation Tool for Superscalar Microprocessors [p. 124]
-
D. Ponomarev, G. Kucuk, and K. Ghose
This paper describes the AccuPower toolset -- a set of
simulation tools accurately estimating the power
dissipation within a superscalar microprocessor.
AccuPower uses a true hardware level and cycle level
microarchitectural simulator and energy dissipation
coefficients gleaned from SPICE measurements of actual
CMOS layouts of critical datapath components. Transition
counts can be obtained at the level of bits within data and
instruction streams, at the level of registers, or at the level
of larger building blocks (such as caches, issue queue,
reorder buffer, function units). This allows for an accurate
estimation of switching activity at any desired level of
resolution.
The toolsuite implements several variants of
superscalar datapath designs in use today and permits the
exploration of design choices at the microarchitecture level
as well as the circuit level, including the use of voltage and
frequency scaling. In particular, the AccuPower toolsuite
includes detailed implementations of currently used and
proposed techniques for energy/power conservations
including techniques for data encoding and compression,
alternative circuit approaches, dynamic resource
allocation and datapath reconfiguration. The
microarchitectural simulation components of AccuPower
can be used for accurate evaluation of datapath designs in
a manner well beyond the scope of the widely-used
Simplescalar tools.
Organizer: Y. Zorian, Virage Logic, US
Moderator: K. Bartleson, Synopsys, US
Panellists: J. Tully, Gartner Dataquest, US; G. Toomajanian, Dain Rauscher Wessels, US;
E. Desai, Desaisive Technology Research, US; M. Hosseini, WIT Soundview, US; V. Essi, AH&H, UA
-
IP is All About Implementation and Customer Satisfaction [p. 132]
-
V. Essi, Jr.
Intellectual property, or IP, takes on many
different meanings depending upon the context within
which it is utilized. Our IP discussion focuses on the
rapidly evolving world of technology IP and, more
specifically, semiconductor IP. Our core belief is that
in order to be successful, semiconductor IP must be
more than an idea or innovation. It must be
implemented seamlessly, with little resistance from
the customer and have compelling value add to the
customer upon implementation and thereafter.
The heart of the customer's purchase
decision is where we believe semiconductor IP
models need to be the most focused. Is there a right
model in every case? No. In fact, we would argue
that the right model is the one that makes your
customer's adoption the easiest.
In some respects, we would compare most
IP purchase decisions as fitting the classic make or
buy scenario. Customers are only willing to embrace
third party IP to save costs. Sure we can get off the
track and discuss technology leads or other forms of
"killer IP", but cost is at the root of almost every IP
decision and, more precisely, a make or buy analysis.
Moderators: T. Shiple, Synopsys, FR; R. Drechsler, Bremen U, DE
-
Using Problem Symmetry in Search Based Satisfiability Algorithms [p. 134]
-
E. Goldberg, M. Prasad, and R. Brayton
We introduce the notion of problem symmetry in searchbased
SAT algorithms. We develop a theory of essential
points to formally characterize the potential search-space
pruning that can be realized by exploiting problem symmetry.
We unify several search-pruning techniques used in
modern SAT solvers under a single framework, by showing
them to be special cases of the general theory of essential
points. We also propose a new pruning rule exploiting
problem symmetry. Preliminary experimental results validate
the efficacy of this rule in providing additional searchspace
pruning beyond the pruning realized by techniques
implemented in leading-edge SAT solvers.
-
BerkMin: A Fast and Robust Sat-Solver [p. 142]
-
E. Goldberg and Y. Novikov
We describe a SAT-solver, BerkMin, that inherits such
features of GRASP, SATO, and Chaff as clause recording,
fast BCP, restarts, and conflict clause "aging". At the
same time BerkMin introduces a new decision making
procedure and a new method of clause database
management. We experimentally compare BerkMin with
Chaff, the leader among SAT-solvers used in the EDA
domain. Experiments show that our solver is more robust
than Chaff. BerkMin solved all the instances we used in
experiments including very large CNFs from a
microprocessor verification benchmark suite. On the other
hand, Chaff was not able to complete some instances even
with the timeout limit of 16 hours.
-
Dynamic Scheduling and Clustering in Symbolic Image Computation [p. 150]
-
G. Cabodi, P. Camurati, and S. Quer
The core computation in BDD-based symbolic synthesis and verification
is forming the image and pre-image of sets of states under the transition
relation characterizing the sequential behavior of the design. Computing
an image or a pre-image consists of ordering the latch transition relations,
clustering them and eventually re-ordering the clusters. Existing
algorithms are mainly limited by memory resources. To make them as
efficient as possible, we address a set of heuristics with the main target
of minimizing the memory used during image computation. They include
a dynamic heuristic to order the latch relations, a dynamic framework to
cluster them, and the application of conjunctive partitioning during image
computation. We provide and integrate a set of algorithms and we report
references and comparisons with recent work. Experimental results are
given to demonstrate the efficiency and robustness of the approach.
Moderators: S. Huss, TU Darmstadt, DE; D. Auvergne, LIRMM, F
-
Wire Placement for Crosstalk Energy Minimization in Address Buses [p. 158]
-
L. Macchiarulo, E. Macii, and M. Poncino
We propose a novel approach to bus energy minimization that targets
crosstalk effects. Unlike previous approaches, we try to reduce
energy through capacitance optimization, by ad opting nonuniform
spacing between wires. This allows reduction of power,
and at the same time takes into account signal integrity. Therefore,
performance is not degraded. Results show that the method
saves up to 30% of total bus energy at no cost in performance
or complexity of the design (no encoding-decoding circuitry is
needed), and limited cost in area.
-
Dynamic VTH Scaling Scheme for Active Leakage Power Reduction [p. 163]
-
C. Kim and K. Roy
We present a Dynamic VTH Scaling (DVTS) scheme to
save the leakage power during active mode of the circuit.
The power saving strategy of DVTS is similar to that of the
Dynamic VDD Scaling (DVS) scheme, which adaptively
changes the supply voltage depending on the current
workload of the system. Instead of adjusting the supply
voltage, DVTS controls the threshold voltage by means of
body bias control, in order to reduce the leakage power.
The power saving potential of DVTS and its impact on
dynamic and leakage power when applied to future
technologies are discussed. Pros and cons of the DVTS
system are dealt with in detail. Finally, a feedback loop
hardware for the DVTS which tracks the optimal VTH for a
given clock frequency, is proposed. Simulation results show
that 92% energy savings can be achieved with DVTS for
70nm circuits.
-
Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints [p. 168]
-
A. Azevedo, I. Issenin, R. Cornea, R. Gupta, N. Dutt, A. Veidenbaum, and A. Nicolau
Dynamic voltage scaling (DVS) is a known effective
mechanism for reducing CPU energy consumption without
significant performance degradation. While a lot of work
has been done on inter-task scheduling algorithms to implement
DVS under operating system control, new research
challenges exist in intra-task DVS techniques under software
and compiler control. In this paper we introduce a
novel intra-task DVS technique under compiler control using
program checkpoints. Checkpoints are generated at
compile time and indicate places in the code where the processor
speed and voltage should be re-calculated. Checkpoints
also carry user-defined time constraints. Our technique
handles multiple intra-task performance deadlines
and modulates power consumption according to a run-time
power budget. We experimented with two heuristics for adjusting
the clock frequency and voltage. For the particular
benchmark studied, one heuristic yielded 63% more energy
savings than the other. With the best of the heuristics we designed,
our technique resulted in 82% energy savings over
the execution of the program without employing DVS.
-
Sizing Power/Ground Meshes for Clocking and Computing Circuit Components [p. 176]
-
A. Mukherjee, K. Wang, L. Chen, and M. Marek-Sadowska
This paper presents a new formulation and an efficient
solution of the power and ground mesh sizing problem. We
use the key observations that (1) the drops in power and
ground node potentials are due not only to currents drawn by
the computing blocks, but also to those drawn by the clock
buffers, and (2) changes of circuit component delays are
linearly proportional to the power/ground IR-drops. This
leads to a linear quantification of the timing relations
between the clocking and computing components in terms of
the power/ground IR-drops. Our method removes all IR-drop
related timing violations that occur in about 2% of paths
when grids are sized using the existing methods that satisfy
the maximum IR-drop constraints. In addition, we achieve
supply mesh area improvements of the order of 30% while
simultaneously reducing the power dissipated in the circuits
by about 6.6% compared to traditional grid sizing methods.
Moderators: J. Huertas, CNM-IMSE, ES; B. Kaminska, Fluence Technology, US
-
A Signature Test Framework for Rapid Production Testing of RF Circuits [p. 186]
-
R. Voorakaranam, S. Cherubal, and A. Chatterjee
Production test costs for today's RF circuits are
rapidly escalating. Two factors are responsible for this
cost escalation: (a) the high cost of RF ATEs and
(b) long test times required by elaborate performance
tests. In this paper, we propose a framework for low-cost
signature test of RF circuits using modulation of
a baseband test signal and subsequent demodulation of
the DUT response. The demodulated response of the
DUT is used as a "signature" from which all the performance
specifications are predicted. The applied test
signal is optimized in such a way that the error between
the measured DUT performances and the predicted DUT
performances is minimized. The proposed
low-cost solution can be easily built into a load board
that can be interfaced to an inexpensive tester.
-
Analog IP Testing: Diagnosis and Optimization [p. 192]
-
C. Guardiani, P. McNamara, L. Daldoss, S. Saxena, S. Zanella, W. Xiang, and S. Liu
In this paper, we present an innovative methodology to
estimate and improve the quality of analog and mixed-signal
circuit testing. We first detect and reduce the redundancy
in the electrical test measurements (e-tests), then we
identify the e-test acceptability regions by considering performance
specifications as well as process parameter distributions.
Finally, we provide an effective metric for the
accurate assessment of the parametric test coverage of
embedded analog IP. Experimental results confirm the
validity of the proposed methodology and its broad applicability
to analog, mixed-signal and RF applications for
different process technologies.
-
A New Design Flow and Testability Measure for the Generation of a Structural Test and BIST for
Analogue and Mixed-Signal Circuits [p. 197]
-
C. Hoffmann
For the generation of defect-oriented tests a system is developed
that includes the synthesis of self-test structures.
With the objective to generate a highly efficient analogue
test, the fault simulation methods are greatly enhanced: (1)
A new testability measure, (2)the possibility to distinguish
between not-to-detect and hard-to-detect faults with respect
to the tolerances of the respective measurement system. By
presenting a new design flow and using the fault simulation
in a very early design stage a tool-suite is developed. It
allows to control the defect-robust layout and to eliminate
those faults that limit the efficiency of a measurement system.
This allows for economic self-test applications! It is
demonstrated that the system finds the most efficient and
less expense test for a given fault set. With the presented
results it is possible to include the defect-oriented approach
from the fault simulation to the automatic generation of
layout rules and the test synthesis in an industrial design
flow.
-
Built-In Dynamic Current Sensor for Hard-to-Detect Faults in Mixed-Signal ICs [p. 205]
-
Y. Lechuga, R. Mozuelos, M. Martínez, and S. Bracho
There are some types of faults in analogue and mixed
signal circuits which are very difficult to detect using
either voltage or current based test methods. However, it
is possible to detect these faults if we add to the
conventional dynamic power supply current test methods
IDDT, the analysis of the changes in the slope of this
dynamic power supply current. In this work, we present a
Built-In Current Sensor (BICS) which is able to process
the highest frequency components in the dynamic power
supply current of the circuit under test (CUT). The BICS
add to the resistive sensor an inductance made from a
gyrator and a capacitor to carry out the current to voltage
conversion. Moreover, the proposed test method improves
the fault coverage in continuous circuits and switched
current circuits as well.
Moderators: A. Sauer, FhG EAS/IIS, DE; A. Pawlak, ITE Warsaw, PL
-
E-Design Based on the Reuse Paradigm [p. 214]
-
L. Ghanmi, A. Ghrab, M. Hamdoun, B. Missaoui, K. Skiba, and G. Saucier
This paper gives an overview on a Virtual
electronic component or IP (Intellectual Property)
exchange infrastructure whose main components are
a XML "well structured IP e-catalog Builder TM"
and a" XML IP profiler TM While the first module is a
e_publishing and an exchange management module
the second has as role to extract from the design
directories the IP files and to trigger their transfer
to the user site possibly via an IP distribution server
under the catalog control. Direct Design file
extraction from commercial configuration systems
such as CVS and Clearcase is supported; notice also
that the architecture supports if required a network
of IP distribution servers preventing from a
performance bottleneck when exchanging IPs; both
modules have been implemented respectively in Java
Servlet and as a Java client/server application.
-
Internet-Based Collaborative Test Generation with MOSCITO [p. 221]
-
A. Schneider, K. Diener, E. Ivask, J. Raik, R. Ubar, P. Miklos, T. Cibáková, and E. Gramatová
This paper offers an Internet-based environment for
enhancing problem-specific design flows with test pattern
generation and fault simulation capabilities. Automatic
Test Pattern Generation (ATPG) and fault simulation tools
at structural and hierarchical levels available at geographically
different places running under the virtual environment
using the MOSCITO system are presented. These
tools can be used separately, or in multiple applications,
for test pattern generation of digital circuits. In order to
link different tools together and with commercial design
systems, respectively a set of translators was developed.
The functionality of the integrated design and test system
was verified by several benchmark circuits.
-
A Two-Tier Distributed Electronic Design Framework [p. 227]
-
T. Kazmierski and N. Clayton
We present the concept of a distributed, web-based
electronic design framework. The salient feature of our
system is the extension of the client-server architecture to
two-tiers, with the web server serving client requests
whilst acting as client to the tool servers. In the sample
application of the framework, developed in Java, any of
the servers can be based on Linux, MS Windows or Sun-SPARC
server. The web server that has been used to
demonstrate the framework for on-line access to VAMS (a
VHDL-AMS parser) and Avant! HSPICE is currently
available for Linux but has been developed with a truly
platform independent implementation in mind.
-
Embedded System Design Based On Webservices [p. 232]
-
A. Rettberg and W. Thronicke
The structure of Internet applications and
scenarios is changing rapidly today. This offers
new potential for established technologies and
methods to expand their area of application. New
technologies encourage new methodologies to
design processes and business-to-business
applications. The application of such new
advancements should be extended into the
domain of the electronic design automation
(EDA) industry. In this paper we present an
approach to use webservices in the field of
embedded system design.
Organizer/Moderator: W. Wolf, Princeton U, US
Panellists: M. Pinto, Agere, US; P. Paulin, STMicroelectronics, CA; C. Rowen, Tensilica, US;
O. Levia, Improv Systems, US; G. Saucier, Design-Reuse, FR; V. Gerousis, Infineon, DE
-
Who Owns the Platform? [p. 238]
-
As VLSI technology advances, it
forces changes in the business organization
of the industry. Traditional vertically
integrated semiconductor manufacturers are
concentrating less on manufacturing as
foundries such as TSMC, UMC, and
Chartered grow. These foundries supply
capacity not only to fables houses but also to
even large semiconductor manufacturers.
As a result, these semiconductor
houses are spending more time creating novel
platforms for important applications. This
puts them in competition with the systems
houses that traditionally were their
customers.
In the middle, fabless semiconductor
companies try to create new and improved
platforms as well, generally with fewer
resources than are available to established
semiconductor houses.
At the other end, IP companies
provide platforms without themselves
designing chips. They must rely on
persuading customers to license IP rather
than designing it internally.
Organizer: D. Gizopoulos, Piraeus U, GR
Moderator: G. Smith, Gartner Dataquest, US
Speakers: M. Milligan, HPL Technologies, US; Y. Zorian, Virage Logic, US; S. Pateras, LogicVision, US;
M. Nicolaidis, iRoC Technologies, FR
-
IP for Embedded Robustness [p. 240]
-
M. Nicolaidis
Drastic device shrinking, power supply reduction,
and increasing operating speeds that accompany the
technological evolution to very deep submicron, reduce
significantly the noise margins and affect the reliability
of very deep submicron ICs. Timing faults escaping
timing closure analysis and/or manufacturing testing, as
well as soft-errors, are creating reliability issues in the
field.
Soft Errors: In this context, single event upsets
(SEUs) are becoming one of the major signal integrity
problems. Atmospheric neutrons have become a major
source of SEUs in modern VDSM technologies. An
SEU is the consequence of a single event transient
(SET) created on a sensitive node by a particle striking
an integrated circuit. When an SET occurs on a
memory-cell node and flips the state of the cell it is
transformed to a single event upset (SEU). An
additional problem is that in today technologies, soft
errors concern not only memories (which has been the
case so far) but also logic. An SET, occurring on a node
of a logic network, is transformed to an SEU when a
latch captures it.
-
Embedded Diagnosis IP [p. 242]
-
S. Pateras
Today's market conditions are driving increasingly shorter time to market requirements for semiconductor devices. Effective techniques for achieving quick and accurate debug and fault diagnosis of increasingly complex SOC devices are therefore becoming indispensable. This presentation covers new embedded test based IP and related software tools that provide the desired level of debug and diagnosis.
-
Embedded Robustness IPs [p. 244]
-
E. Dupont, M. Nicolaidis, and P. Rohr
Due to the VDSM evolution and an electronic
systems market starving for performance, the
semiconductor industry is used to hit big technology
walls. Challenge after challenge, brand new domains of
competencies are popping up followed by fast and
accurate tools. Synthesis, routers, verification, DFT,
embedded systems, SoC, ... are well established as
standard competencies to achieve high quality, high
performance and high yield chip production.
In recent roadmaps (ITRS, Medea, D&T), signal integrity
has been pointed out as a major challenge. More and
more causes can affect signal integrity as geometries are
shrinking. One of the growing effects is the so-called
"transient errors" which are due to temporary condition
of use and environment. Cross-coupling, ground bounce,
external terrestrial radiations create more and more
unpredictable transient and soft errors which affect
system reliability in unacceptable ways.
In addition, reliability in devices like memories become a
critical issue: the MTBF (mean time before failure) level
decreasing the global system FIT ( Failure in Time) rate
approaching the critical border line for the end user.
Hence, for memories and for logic blocks as well using
high-end process technologies, self-correcting
intelligence embedded in SoC is needed to enable
electronic systems to react against unpredictable and
insidious errors.
Moderators: M. Berkelaar, Magma Design Automation, NL; W. Kunz, Kaiserslautern U, DE
-
CHESMIN: A Heuristic for State Reduction in Incompletely Specified Finite State Machines [p. 248]
-
S. Gören and F. Ferguson
A heuristic is proposed for state reduction in incompletely
specified finite state machines (ISFSMs). The algorithm is
based on checking sequence generation and identification
of sets of compatible states. We have obtained results as
good as the best exact method in the literature but with
significantly better run-times. In addition to finding a
reduced FSM, our algorithm also generates an I/O
sequence that can be used as test vectors to verify the
FSM's implementation.
-
Generalized Early Evaluation in Self-Timed Circuits [p. 255]
-
M. Thornton, K. Fazel, R. Reese, and C. Traver
Phased logic has been proposed as a technique for realizing
self-timed circuitry that is delay-insensitive and requires
no global clock signals. Early evaluation techniques
have been applied to asynchronous circuits in the past in
order to achieve throughput increases. A general method
for computing early evaluation functions is presented for
this design style. Experimental results are given that show
the increase in throughput of various benchmark circuits.
The results show that as much as a 30% speedup can be
achieved in some cases.
-
Dual Threshold Voltage Domino Logic Synthesis for High Performance with Noise and Power Constraint [p. 260]
-
S. Jung, K. Kim, and S. Kang
We introduce a new dual threshold voltage technique for
domino logic. Since domino logic is much more sensitive
to noise, noise margins have to be taken into account when
applying dual threshold voltages to domino logic. To guarantee
the signal integrity in domino logic, we carefully consider
the effect of transistor sizing and threshold voltage
selection. For optimal design, tradeoffs need to be made
among noise margin, power, and performance. Based on
the characteristics of each logic gate, we propose noise and
power constrained domino logic synthesis for high performance.
ISCAS85 benchmark results show that performance
can be improved up to 18.62% with 2% active power increase,
while maintaining noise margin.
Moderators: F. Férnandez, IMSE-CNM, ES; A. Konczykowska, Alcatel R&I, FR
-
A Fitting Approach to Generate Symbolic Expressions for Linear and Nonlinear Analog Circuit
Performance Characteristics [p. 268]
-
W. Daems, G. Gielen, and W. Sansen
This paper presents a novel method to automatically generate symbolic
expressions for both linear and nonlinear circuit characteristics using a
template-based fitting of numerical, simulated data. The aim of the method
is to generate convex, interpretable expressions. The posynomiality of the
generated expressions enables the use of efficient geometric programming
techniques when using these expressions for circuit sizing and optimization.
Attention is paid to estimating the relative `goodness-of-fit' of the
generated expressions. Experimental results illustrate the capabilities of
the approach.
-
Parameter Controlled Automatic Symbolic Analysis of Nonlinear Analog Circuits [p. 274]
-
R. Popp, J. Oehmen, L. Hedrich, and E. Barke
In this paper we introduce an approach for parameter controlled symbolic
analysis of nonlinear analog circuits. Based on a state-of-the-art algorithm,
it enables the removal of specific circuit parameters from a symbolic
circuit description, given as a set of nonlinear differential algebraic
equations (DAEs). During the removal, singularities are considered, which
includes structural changes of the set of DAEs. The feasibility of our
approach is shown by several circuit examples.
-
Constructing Symbolic Models for the Input/Output Behavior of Periodically Time-Varying Systems
Using Harmonic Transfer Matrices [p. 279]
-
P. Vanassche, G. Gielen, and W. Sansen
A new technique is presented for generating symbolic expressions
for the harmonic transfer functions of linear periodically
time-varying (LPTV) systems, like mixers and PLL's. The algorithm,
which we call Symbolic HTM, is based on the organisation
of the harmonic transfer functions into a harmonic transfer matrix.
This representation allows to manipulate LPTV systems in a
way that is similar to linear time-invariant (LTI) systems, making it
possible to generate symbolic expressions which relate the overall
harmonic transfer functions to the characteristics of the building
blocks. These expressions can be used as design equations or as
parametrized models for use in simulations. The algorithm is illustrated
for a downconversion mixer.
-
Taylor Expansion Diagrams: A Compact, Canonical Representation with Applications to
Symbolic Verification [p. 285]
-
M. Ciesielski, P. Kalla, Z. Zeng, and B. Rouzeyre
This paper presents a new, compact, canonical
graph-based representation, called Taylor Expansion Diagrams
(TEDs). It is based on a general non-binary decomposition
principle using Taylor series expansion. It can be exploited
to facilitate the verification of high-level (RTL) design
descriptions. We present the theory behind TEDs, comment
upon its canonicity property and demonstrate that the representation
has linear space complexity. Its application to equivalence
checking of high-level design descriptions is discussed.
Organizers: L. Guarnirei, Barcelona Design, US; E. Chen, Celestry Design Technologies, US
Moderator: C. Ajluni, Wireless Systems Design, US
Presenters: S. Savage, Cypress Semiconductors, US; M. Hershenson, Barcelona Design, US;
X. Zhang, Celestry Design Technologies, US
-
EDA Tools for RF: Myth or Reality? [p. 292]
-
Designing circuits that operate at radio
frequencies (above 1 GHz) is a challenge for many
reasons. Nearly every aspect of producing chips is
stressed at high frequency, including technology
development, modeling, CAD, design, integration,
and packaging. From a device modeling perspective,
devices have shrink to extreme dimensions to
achieve the required high frequency performance
metrics, while exotic materials are being added to the
process. This is straining the limits of industry
standard models, as newer, more capable device
models struggle to reach the level of generic support
necessary to achieve widespread adoption. Substrate
currents and losses, device and substrate noise, and
device mismatch all need to be accurately modeled
as well in RF design.
Electromagnetic effects (both desirable and
parasitic) are also much more significant as
operating frequencies rise. Lumped RC networks
are no longer sufficient to represent interconnect
parasitics. Inductive coupling is now significant on
chip, while packages and boards are larger today
(relative to the wavelength of operation) than ever
before, requiring fullwave electromagnetic
simulation. Integrated passives (on chips and
packages) have significantly reduced integration
costs, but require accurate high frequency models
that can be incorporated into analog simulators.
Finally, hierarchical, block based, mixed
signal design methodologies are very complicated
and not currently well integrated into EDA tools.
The models for interaction between blocks is often
too simplistic and the coupling between analog and
digital components on a chip is often ignored. The
result can be resignation to designing in silicon,
which keeps design cycle time and the cost of
advanced RF chips high.
This presentation will present details of the
issues mentioned above to help the audience
understand the complexity and depth of the
problems, and serve as an invitation to the EDA
industry to present solutions to the issues.
Moderators: W. Wolf, Princeton U, US; N. Mártinez Madrid, FZI Karlsruhe, DE
-
Dynamic Runtime Re-Scheduling Allowing Multiple Implementations of a Task for
Platform-Based Designs [p. 296]
-
T. Lee, W. Wolf, and J. Henkel
This paper introduces an extension to the RMS scheduling technique that we call "Hot Swapping". Hot Swapping enables a system to choose between various selected implementations of one task on-the-fly and thus to optimize the system's cost (e.g. power savings). The on-the-fly swapping between those implementations requires extra time to save and/or transform states of a certain task implementation. Even if the two steady-state schedules before and after the swapping are feasible, the transient schedule with the additional swapping computation time may exceed the system's capacity. Our technique is an extension to Rate Monotonic Scheduling (RMS). While maintaining and meeting performance requirements, our technique shows an average reduction of 31% in power consumption compared to systems using a pure static scheduling approach (RMS) that cannot make use of task swapping. We have evaluated our algorithm through simulation of five real-world task sets and in addition by use of a large number of generated task sets.
-
Techniques to Evolve a C++ Based System Design Language [p. 302]
-
R. Pasko, S. Vernalde, and P. Schaumont
Complex systems-on-chip present one of the most challenging
design problems of today. To meet this challenge,
new design languages capable to model such heterogeneous,
dynamic systems are needed. For implementation
of such a language, the use of an object oriented C++ class
library has proven to be a promising approach, since new
classes dealing with design- and platform-specific problems
can be added in a conceptual and seamlessly reusable way.
This paper shows the development of such an extension
aimed to provide a platform-independent high-level structured
storage object through hiding of the low-level implementation
details. It results in a completely virtualised,
user-extendible component, suitable for use in heterogeneous
systems.
-
A Mixed-Signal Design Reuse Methodology Based on Parametric Behavioural Models with
Non-Ideal Effects [p. 310]
-
A. Ginés, E. Peralías, A. Rueda, N. Madrid, and R. Seepold
Current System-on-Chip (SoC) designs incorporate an
increasing number of mixed-signal components. Design
reuse techniques have proved successful for digital design
but these rules are difficult to transfer to mixed-signal design.
A top-down methodology is missing but the low level
of abstraction in designs makes system integration and verification
a very difficult, tedious and complex task. This paper
presents a contribution to mixed-signal design reuse
where a design methodology is proposed based on modular
and parametric behavioural components. They support a
design process where non-ideal effects can be incorporated
in an incremental way, allowing easy architectural selection
and accurate simulations. A working example is used
through the paper to highlight and validate the applicability
of the methodology.
Moderators: A. Ródriguez-Vázquez, IMSE-CNM, ES; D. Leenaerts, Philips, NL
-
Test Structure for IC(VBE) Parameter Determination of Low Voltage Applications [p. 316]
-
W. Rahajandraibe, C. Dufaza, D. Auvergne, B. Cialdella, B. Majoux, and V. Chowdhury
The temperature dependence of the IC(VBE) relationship
can be characterised by two parameters: EG and XTI. The
classical method to extract these parameters consists in a
"best fitting" from measured VBE(T) values, using least
square algorithm at constant collector current. This
method involves an accurate measurement of VBE voltage
and an accurate value of the operating temperature. We
propose in this paper, a configurable test structure
dedicated to the extraction of temperature dependence of
IC(VBE) characteristic for BJT designed with bipolar or
BiCMOS processes. This allows a direct measurement of
die temperature and consequently an accurate
measurement of VBE(T). First, the classical extraction
method is explained. Then, the implementation techniques
of the new method are discussed, the improvement of the
design is presented.
-
Global Optimization Applied to the Oscillator Problem [p. 322]
-
S. Lampe and S. Laur
The oscillator problem consists of determining good initial
values for the node voltages and the frequency of oscillation
and the avoidance of the DC solution. Standard approaches
for limit cycle calculations of autonomous circuits
exhibit poor convergence behavior in practice. By introducing
an additional periodic probe voltage source to the oscillator
circuit, the system of autonomous differential algebraic
equations (DAEs) can be reformulated as a system of
non-autonomous DAEs with the constraint, that the current
through the source has to be zero for the limit cycle. Using a
two stage approach leads to a greater range of convergence
as the standard approach, but the success of the algorithm
is heavily dependent on the initial amplitude of the probe
source and the frequency of oscillation. This paper presents
a fast and reliable optimization based initialization procedure
which overcomes the initialization problem of the two
stage algorithm.
Organizer: W. Rosenstiel, FZI/Tuebingen U, DE
Moderator: G. Mathéron, Director of MEDEA+ Office, FR
Panellists: J. Borel, STMicroelctronics, US; G. Matheron, MEDEA+ Office; A. Jerraya, TIMA, Grenoble, FR;
S. Resve, UC Berkeley, US; M. Rogers, Intel, US; W. Rosenstiel, FZI/Tuebingen U, DE;
I. Rugen-Herzig, Infineon Technologies, DE; F. Theeuwen, Philips Research, NL
-
MEDEA+ and ITRS Roadmaps [p. 328]
-
The ITRS Technology Roadmap recent revision has
shown again an acceleration of the Very Deep
Submicron process availability with design capabilities
forecasted in hundred millions of gates per square
centimeter in 2010.
This will again raise the question on how to cope
with such complexities and functionalities (A-D, HWSW,
MEMS ...) in EDA solutions.
In this panel will be discussed what are the main
priorities in EDA as seen through the applications
specificities in USA (ITRS-2001 DESIGN ITWG) and
in Europe (The MEDEA EDA Roadmap).
The panelists will present the strategies in their
respective fields of interest, resulting from their
working groups conclusions. They will underline the
breakthroughs and potential developments of solutions
and the milestones to reduce design times and increase
design quality.
The focus will be on application driven solutions,
mostly in the SoC domains (covering both hardware,
embedded and application software).
Moderators: M. Renaudin, TIMA, Grenoble, FR; L. Lavagno, Politecnico di Torino, IT
-
A Burst-Mode Oriented Back-End for the Balsa Synthesis System [p. 330]
-
T. Chelcea, S. Nowick, A. Bardsley, and D. Edwards
This paper introduces several new component clustering techniques
for the optimization of asynchronous systems. In particular, novel
"Burst-Mode aware" restrictions are imposed to limit the cluster sizes
and to ensure synthesizability. A new control specification language,
CH, is also introduced which facilitates the manipulation and optimization
of handshake control components. The new method has been
fully integrated into a comprehensive asynchronous synthesis package,
Balsa. Experimental results on several substantial design examples,
including an 32-bit microprocessor core, indicate significant
performance improvements for the optimized circuits.
-
Detecting State Coding Conflicts in STGs Using Integer Programming [p. 338]
-
V. Khomenko, M. Koutny, and A. Yakovlev
The paper presents a new method for checking Unique
and Complete State Coding, the crucial conditions in the
synthesis of asynchronous control circuits from Signal Transition
Graphs (STGs). The method detects state coding conflicts
in an STG using its partial order semantics (unfolding
prefix) and an integer programming technique. This leads to
huge memory savings compared to methods based on reachability
graphs, and also to significant speedups in many
cases. In addition, the method produces execution paths
leading to an encoding conflict. Finally, the approach is extended
to checking the normalcy property of STGs, which is
a necessary condition for their implementability using gates
whose characteristic functions are monotonic.
-
Verifying Clock Schedules in the Presence of Cross Talk [p. 346]
-
S. Hassoun, E. Calvillo-Gámez, and C. Cromer
This paper addresses verifying the timing of circuits containing
level-sensitive latches in the presence of cross talk.
We show that three consecutive periodic occurrences of the
aggressor's input switching window must be compared with
the victim's input switching window. We propose a new
phase shift operator to allow aligning the aggressor's three
relevant switching windows with the victim's input signals.
We solve the problem iteratively in polynomial time, and
show an upper bound on the number of iterations equal to
the number of capacitors in the circuit. Our experiments
demonstrate that eliminating false coupling results in finding
a smaller clock period at which a circuit will run.
Moderators: A. Kaiser, ISEN, FR; P. Wambacq, IMEC, BE
-
Analysis of Nonlinearities in RF Front-End Architectures Using a Modified Volterra Series Approach [p. 352]
-
M. Goffioul, P. Wambacq, G. Vandersteen, and S. Donnay
RF front-end architectures of today's wireless applications
need to meet tough requirements on nonlinear distortion
to minimize unwanted effects such as crosstalk. An analysis
of the nonlinear behavior of analog communication
circuits or architectures is not straightforward. This paper
presents a modified Volterra series approach to the simulation
of nonlinear systems described at the architectural
level. The total computed response is decomposed in its
nonlinear contributions and the main nonlinearities can be
identified. This yields a better insight into the system's nonlinear
behavior and allows simplifications. The simplified
system can then be simulated more efficiently. The implementation
is only based on vector calculation to minimize
the computation time, and has been applied to a complete
5 GHz WLAN receiver front-end.
-
Systematic Design of a 200 Ms/S 8-bit Interpolating A/D Converter [p. 357]
-
J. Vandenbussche, E. Lauwers, K. Uyttenhove, M. Steyaert, and G. Gielen
The systematic design of a high-speed, high-accuracy
Nyquist A/D converter is proposed. The presented design
methodology covers the complete flow and is supported by
software tools. A generic behavioral model is used to
explore the A/D converter's specifications during high level
design and exploration. The inputs are the
specifications of the A/D converter and the technology
process. The result is a generated layout and the
corresponding extracted behavioral model. The approach
has been applied to a real-life test case, where a Nyquistrate
8-bit 200MS/s 4-2 interpolating A/D converter was
developed for a WLAN application.
-
Bio-Inspired Analog VLSI Design Realizes Programmable Complex Spatio-Temporal
Dynamics on a Single Chip [p. 362]
-
R. Carmona, F. Jiménez-Garrido, R. Domínguez-Castro, S. Espejo, and A. Rodríguez-V&accute;zquez
A bio-inspired model for an analog parallel array processor
(APAP), based on studies on the vertebrate retina,
permits the realization of complex spatio-temporal dynamics
in VLSI. This model mimics the way in which images
are processed in the visual pathway what renders a feasible
alternative for the implementation of early vision tasks in
standard technologies. A prototype chip has been designed
in CMOS. Design challenges, trade-offs and the
building blocks of such a high-complexity system
( transistors, most of them operating in analog
mode) are presented in this paper.
Moderators: M. Flottes, LIRMM, FR; A. Benso, Politecnico di Torino, IT
-
An Incremental Algorithm for Test Generation in Illinois Scan Architecture Based Designs [p. 368]
-
A. Pandey and J. Patel
As the complexity of VLSI circuits is increasing due to the
exponential rise in transistor count per chip, testing cost is
becoming an important factor in the overall integrated circuit
(IC) manufacturing cost. This paper addresses the issue
of decreasing test cost by lowering the test data bits and
the number of clock cycles required to test a chip. We propose
a new incremental algorithm for generating tests for
Illinois Scan Architecture (ILS) based designs and provide
analysis of test data and test time reduction. This algorithm
is very efficient in generating tests for a number of ILS designs
in order to find the most optimal configuration.
-
Gate Level Fault Diagnosis in Scan-Based BIST [p. 376]
-
I. Bayraktaroglu and A. Orailoglu
A gate level, automated fault diagnosis scheme is proposed
for scan-based BIST designs. The proposed scheme
utilizes both fault capturing scan chain information and failing
test vector information and enables location identification
of single stuck-at faults to a neighborhood of a few gates
through set operations on small pass/fail dictionaries. The
proposed scheme is applicable to multiple stuck-at faults and
bridging faults as well. The practical applicability of the
suggested ideas is confirmed through numerous experimental
runs on all three fault models.
-
An Interval-Based Diagnosis Scheme for Identifying Failing Vectors in a Scan-BIST Environment [p. 382]
-
C. Liu, K. Chakrabarty, and M. Goessel
We present a new scan-BIST approach for determining
failing vectors for fault diagnosis. This approach is based on
the application of overlapping intervals of test vectors to the
circuit under test. Two MISRs are used in an interleaved
fashion to generate intermediate signatures, thereby obviating
the need for multiple test sessions. The knowledge of failing
and non-falling intervals is used to obtain a set S of candidate
failing vectors that includes all the actual (true) failing vectors.
We present analytical results to determine an appropriate
interval length and the degree of overlap, an upper bound on
the size of S, and a lower bound on the number of true failing
vectors; the latter depends only on the knowledge of failing
and non-failing intervals. Finally, we describe two pruning
procedures that allow us to reduce the size of S, while
retaining most true failing vectors in S. We present
experimental results for the ISCAS 89 benchmark circuits to
demonstrate the effectiveness of the proposed scan-BIST
diagnosis approach.
-
Reducing Test Application Time Through Test Data Mutation Encoding [p. 387]
-
S. Reda and A. Orailoglu
In this paper we propose a new compression algorithm
geared to reduce the time needed to test scan-based designs.
Our scheme compresses the test vector set by encoding the
bits that need to be flipped in the current test data slice in
order to obtain the mutated subsequent test data slice. Exploitation
of the overlap in the encoded data by effective
traversal search algorithms results in drastic overall compression.
The technique we propose can be utilized as not
only a stand-alone technique but also can be utilized on
test data already compressed, extracting even further compression.
The performance of the algorithm is mathematically
analyzed and its merits experimentally confirmed on
the larger examples of the ISCAS'89 benchmark circuits.
Moderators: R. Leupers, TU Aachen, DE; R. Ernst, TU Braunschweig, DE
-
Hardware/Software Trade-Offs for Advanced 3G Channel Coding [p. 396]
-
H. Michel, A. Worm, N. Wehn, and M. Münch
Third generation's wireless communications systems
comprise advanced signal processing algorithms that increase
the computational requirements more than ten-fold
over 2G's systems. Numerous existing and emerging standards
require flexible implementations ("software radio").
Thus efficient implementations of the performance-critical
parts as Turbo decoding on programmable architectures
are of great interest. Besides high-performance DSPs,
application-customized RISC cores offer the required performance
while still maintaining the aspired flexibility. This
paper presents for the first time Turbo decoder implementations
on customized RISC cores and compares the results
with implementations on state-of-the-art VLIW DSPs. The
results of our studies show that the Log-MAP performance
is about 50% higher than on an ST120, a current VLIW
architecture.
-
An Efficient Compiler Technique for Code Size Reduction Using Reduced Bit-Width ISAs [p. 402]
-
A. Halambi, A. Shrivastava, P. Biswas, N. Dutt, and A. Nicolau
For many embedded applications, program code size is a critical design
factor. One promising approach for reducing code size is to employ a
"dual instruction set", where processor architectures support a normal
(usually 32 bit) Instruction Set, and a narrow, space-efficient (usually
16 bit) Instruction Set with a limited set of op-codes and access to a
limited set of registers. This feature, however, requires compilers that
can reduce code size by compiling for both Instruction Sets. Existing
compiler techniques operate at the function-level granularity and are unable
to make the trade-off between the increased register pressure (resulting
in more spills) and decreased code size. We present a profitability based
compiler heuristic that operates at the instruction-level granularity and
is able to effectively take advantage of both Instruction Sets. We also
demonstrate improved code size reduction, for the MIPS 32/16 bit ISA, using
our technique. Our approach more than doubles the code size reduction
achieved by existing compilers.
-
Assigning Program and Data Objects to Scratchpad for Energy Reduction [p. 409]
-
S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel
The number of embedded systems is increasing and a remarkable
percentage is designed as mobile applications.
For the latter, the energy consumption is a limiting factor
because of today's battery capacities. Besides the processor,
memory accesses consume a high amount of energy.
The use of additional less power hungry memories
like caches or scratchpads is thus common.
Caches incorporate the hardware control logic for moving
data in and out automatically. On the other hand, this
logic requires chip area and energy. A scratchpad memory
is much more energy efficient, but there is a need for software
control of its content.
In this paper, an algorithm integrated into a compiler is
presented which analyses the application and selects program
and data parts which are placed into the scratchpad.
Comparisons against a cache solution show remarkable advantages
between 12% and 43% in energy consumption for
designs of the same memory size.
Moderator/Organizer: G. De Micheli, Stanford U, US
-
Networks on Chip: A New Paradigm for Systems on Chip Design [p. 418]
-
G. De Micheli and L. Benini
This paper is meant to be a short introduction to a new
paradigm for systems on chip (SoC) design. We refer the interested
reader to an extended overview of this problem [1] and to
some recent results in this area in industry [21, 10] and academia
[4, 5]. The premises are that a component-based design methodology
will prevail in the future, to support component re-use in
a plug-and-play fashion. At the same time, SoCs will have to
provide a functionally-correct, reliable operation of the interacting
components. The physical interconnections on chip will be a
limiting factor for performance and energy consumption.
The international technology roadmap for semiconductors
(ITRS) [23] projects that we will be designing multi-billion transistor
chips by the end of this decade, with feature sizes around
50nm and clock frequencies around 10GHz. Delays on wires
will dominate: global wires spanning a significant fraction of
the chip size will carry signals whose propagation delay will
exceed the clock period. Whereas relatively large delays can
be managed with wire pipelining techniques, timing uncertainty
will be more problematic for designers. Moreover, synchronization
of chips with a single clock source and negligible skew will
be extremely hard or impossible. The most likely synchronization
paradigm for future chips is globally-asynchronous locally synchronous
(GALS), with many different clocks. Global wires
will span multiple clock domains, and synchronization failures in
communicating between different clock domains will be rare but
unavoidable events [7].
-
Communication Mechanisms for Parallel DSP Systems on a Chip [p. 420]
-
J. Williams, N. Heintze, and B. Ackland
We consider the implication of deep sub-micron VLSI
technology on the design of communication frameworks
for parallel DSP systems-on-chip. We assert that
distributed data transfer and control mechanisms are
necessary to manage many independent processing
subsystems and software tasks. An example of a parallel
DSP architecture is given and used to demonstrate these
mechanisms at work. We show the similarity of these
mechanism and those used in large scale computing
networks.
-
Networks on Silicon: Combining Best-Effort and Guaranteed Services [p. 423]
-
K. Goossens, P. Wielage, A. Peeters, and J. van Meerbergen
We advocate a network on silicon (NOS) as a hardware architecture
to implement communication between IP cores in
future technologies, and as a software model in the form of
a protocol stack to structure the programming of NOSs. We
claim guaranteed services are essential. In the ÆTHEREAL
NOS they pervade the NOS as a requirement for hardware design,
and as foundation for software programming.
Moderators: W. Nebel, OFFIS, DE; M. Miranda, IMEC, BE
-
Data Reuse Exploration Techniques for Loop-Dominated Applications [p. 428]
-
T. Van Achteren, G. Deconinck, F. Catthoor, and R. Lauwereins
Efficient exploitation of temporal locality in the memory
accesses on array signals can have a very large impact on
the power consumption in embedded data dominated applications.
The effective use of an optimized custom memory
hierarchy or a customized software controlled mapping on a
predefined hierarchy, is crucial for this. Only recently effective
systematic techniques to deal with this specific design
step have begun to appear. They were still limited in their
exploration scope. In this paper we introduce an extended
formalized methodology based on an analytical model of
the data reuse of a signal. The cost parameters derived from
this model define the search space to explore and allow us
to exploit the maximum data reuse possible. The result is an
automated design technique to find power efficient memory
hierarchies and generate the corresponding optimized code.
-
EAC: A Compiler Framework for High-Level Energy Estimation and Optimization [p. 436]
-
I. Kadayif, M. Kandemir, N. Vijaykrishnan, M. Irwin, and A. Sivasubramaniam
This paper presents a novel Energy-Aware Compilation
(EAC) framework that can estimate and optimize energy
consumption of a given code taking as input the architectural
and technological parameters, energy models, and
energy/performance constraints. The framework has been
validated using a cycle-accurate architectural-level energy
simulator and found to be within 6% error margin while
providing significant estimation speedup. The estimation
speed of EAC is the key to the number of optimization alternatives
that can be explored within a reasonable compilation
time.
-
Power Savings in Embedded Processors through Decode Filer Cache [p. 443]
-
W. Tang, R. Gupta, and A. Nicolau
In embedded processors, instruction fetch and decode
can consume more than 40% of processor power. An instruction
filter cache can be placed between the CPU core
and the instruction cache to service the instruction stream.
Power savings in instruction fetch result from accesses to a
small cache. In this paper, we introduce decode filter cache
to provide decoded instruction stream. On a hit in the decode
filter cache, fetching from the instruction cache and the
subsequent decoding is eliminated, which results in power
savings in both instruction fetch and instruction decode.
We propose to classify instructions into cacheable or
uncacheable depending on the decoded width. Then sectored
cache design is used in the decode filter cache so that
cacheable and uncacheable instructions can coexist in a decode
filter cache sector. Finally, a prediction mechanism
is presented to reduce the decode filter cache miss penalty.
Experimental results show average 34% processor power
reduction and less than 1% performance degradation.
-
Hardware-Assisted Data Compression for Energy Minimization in Systems with Embedded Processors [p. 449]
-
L. Benini, D. Bruni, A. Macii, and E. Macii
In this paper, we suggest hardware-assisted data compression
as a tool for reducing energy consumption of core-based embedded
systems. We propose a novel and efficient architecture
on-the-y data compression and decompression whose field
operation is the cache-to-memory path. Uncompressed cache
lines are compressed before they are written back to main memory,
and decompressed when cache refills take place.
We explore two classes of compression methods, profile-driven
and differential, since they are characterized by compact HW
implementations, and we compare their performance to those
provided by some state-of-the-art compression methods (e.g.,
we have considered a few variants of the Lempel-Ziv encoder
We present experimental results about memory traffic and energy
consumption in the cache-to-memory path of a core-based
system running standard benchmark programs. The achieved
average energy savings range from 4.2% to 35.2%, depending
on the selected compression algorithm.
Moderators: E. Barke, Hannover U, DE; P. Groeneveld, Magma Design Automation, NL
-
Analysis of Noise Avoidance Techniques in DSM Interconnects Using a Complete Crosstalk Noise Model [p. 456]
-
M. Becer, V. Zolotov, D. Blaauw, R. Panda, and I. Hajj
Noise estimation and avoidance are becoming critical,
"must have" capabilities in today's high performance IC design.
An accurate yet efficient crosstalk noise model which
contains as many driver/interconnect parameters as possible,
is necessary for any sensitivity based noise avoidance
approach. In this paper, we present a complete analytical
crosstalk noise model which incorporates all physical properties
including victim and aggressor drivers, distributed RC
characteristics of interconnects and coupling locations in
both victim and aggressor lines. We present closed-form analytical
expressions for peak noise and noise width as well
as sensitivities to all model parameters. We then use these
model parameter sensitivities to analyze and evaluate various
noise avoidance techniques such as driver sizing, wire
sizing, wire spacing and layer assignment. Both our model
and noise avoidance evaluations are verified using realistic
circuits in 0.13µ technology. We also present effectiveness of
discussed noise avoidance techniques on a high performance
microprocessor core.
-
Hierarchical Current Density Verification for Electromigration Analysis in Arbitrary Shaped
Metallization Patterns of Analog Circuits [p. 464]
-
G. Jerke and J. Lienig
Electromigration is caused by high current density stress
in metallization patterns and is a major source of breakdown
in electronic devices. It is therefore an important
reliability issue to verify current densities within all
stressed metallization patterns. In this paper we propose a
new methodology for hierarchical verification of current
densities in arbitrarily shaped analog circuit layouts,
including a quasi-3D model to verify irregularities such
as vias. Our approach incorporates thermal simulation
data to account for the temperature dependency of electromigration.
The described methodology, which can be
integrated into any IC design flow as a design rule check
(DRC), has been successfully tested and verified in commercial
design flows.
-
A Polynomial Time Optimal Diode Insertion/Routing Algorithm for Fixing Antenna Problem [p. 470]
-
L. Huang, X. Tang, H. Xiang, D. Wong, and I. Liu
Antenna problem is a phenomenon of plasma induced gate
oxide degradation. It directly affects manufacturability of VLSI
circuits, especially in deep-submicron technology using high density
plasma. Diode insertion is a very effective way to solve this
problem. Ideally diodes are inserted directly under the wires that
violate antenna rules. But in today's high-density VLSI layouts,
there is simply not enough room for "under-the-wire" diode insertion
for all wires. Thus it is necessary to insert many diodes
at legal "off-wire" locations and extend the antenna-rule violating
wires to connect to their respective diodes. Previously only
simple heuristic algorithms were available for this diode insertion
and routing problem. In this paper, we show that the diode insertion
and routing problem for an arbitrary given number of routing
layers can be optimally solved in polynomial time. Our algorithm
guarantees to find a feasible diode insertion and routing solution
whenever one exists. Moreover, we can guarantee to find a feasible
solution to minimize a cost function of the form alpha . L + beta . N
where L is the total length of extension wires and N
is the total
number of vias on the extension wires. Experimental results show
that our algorithm is very efficient.
Moderators: Y. Zorian, LogicVision, US; D. Gizopoulos, Piraeus U, GR
-
Test Planning and Design Space Exploration in a Core-Based Environment [p. 478]
-
E. Cota, L. Carro, M. Lubaszewski, and A. Orailoglu
This paper proposes a comprehensive model for test
planning in a core-based environment. The main contribution
of this work is the use of several types of TAMs and the
consideration of different optimization factors (area, pins
and test time) during the global TAM and test schedule definition.
This expansion of concerns makes possible an efficient yet fine-grained
search in the huge design space of
a reuse-based environment. Experimental results clearly
show the variety of trade-offs that can be explored using
the proposed model, and its effectiveness on optimizing the
system test design.
-
A Hierarchical Test Scheme for System-On-Chip Designs [p. 486]
-
J. Li, H. Huang, J. Chen, C. Su, C. Wu, C. Cheng, S. Chen, C. Hwang, and H. Lin
System-on-chip (SOC) design methodology is becoming
the trend in the IC industry. Integrating reusable cores
from multiple sources is essential in SOC design, and different
design-for-testability methodologies are usually required
for testing different cores. Another issue is test integration.
The purpose of this paper is to present a hierarchical
test scheme for SOC with heterogeneous core test and
test access methods. A hierarchical test manager (HTM)
is proposed to generate the control signals for these cores,
taking into account the IEEE P1500 Standard proposal. A
standard memory BIST interface is also presented, linking
the HTM and the memory BIST circuit. It can control the
BIST circuit with the serial or parallel test access mechanism.
The hierarchical test control scheme has low area and
pin overhead, and high flexibility. An industrial case using
this scheme has been designed, showing an area overhead
of only about 0.63%.
-
Efficient Wrapper/TAM Co-Optimization for Large SOCs [p. 491]
-
V. Iyengar, K. Chakrabarty, E. Marinissen
Core test wrappers and test access mechanisms (TAMs) are important
components of a system-on-chip (SOC) test architecture.
Wrapper/TAM co-optimization is necessary to minimize the SOC testing
time. Most prior research in wrapper/TAM design has addressed
wrapper design and TAM optimization as separate problems, thereby
leading to results that are sub-optimal. We present a fast heuristic
technique for wrapper/TAM co-optimization, and demonstrate its
scalability for several industrial SOCs. This extends recent work on
exact methods for wrapper/TAM co-optimization based on integer linear
programming and exhaustive enumeration. We show that the SOC
testing times obtained using the new heuristic algorithm are comparable
to the testing times obtained using exact methods. Moreover,
more than two orders of magnitude reduction can be obtained in the
CPU time compared to exact methods. Furthermore, we are now
able to design efficient test access architectures with a larger number
of TAMs.
-
Beyond UML to an End-of-Line Functional Test Engine [p. 499]
-
A. Baldini, A. Benso, P. Prinetto, S. Mo, and A. Taddei
In this paper, we analyze the use of UML as a starting
point to go from design issues to end of production testing
of complex embedded systems. The first point is the
analysis of the big gap between system signals and UML
messages; then the paper focuses on the additional
information necessary to fill such gap; different test types
are considered, focusing on the application software test;
finally the actuation and observation are both analyzed
inside the test environment, with particular care to the
black -box requirement for behavioral testing. The
emphasis of the work is on the resulting test engine
definition, verified on a complex case study of a top-of-the-line
automotive application; this application is a
modern car console, grouping many controls of carrelated
devices, such as phone, navigation, radio, CD. The
testing of GSM capabilities of such device is studied in
particular.
Moderators: J. López, Castilla-La Mancha U, ES; F. Rousseau, TIMA, Grenoble, FR
-
Event Model Interfaces for Heterogeneous System Analysis [p. 506]
-
K. Richter and R. Ernst
Complex embedded systems consist of hardware and software
components from different domains, such as control and signal
processing, many of them supplied by different IP vendors. The
embedded system designer faces the challenge to integrate, optimize
and verify the resulting heterogeneous systems. While formal
verification is available for some subproblems, the analysis
of the whole system is currently limited to simulation or emulation.
In this paper, we tackle the analysis of global resource sharing,
scheduling, and buffer sizing in heterogeneous embedded systems.
For many practically used preemptive and non-preemptive
hardware and software scheduling algorithms of processors and
busses, semi-formal analysis techniques are known. However, they
cannot be used in system level analysis due to incompatibilities of
their underlying event models. This paper presents a technique to
couple the analysis of local scheduling strategies via an event interface
model. We derive transformation rules between the most
important event models and provide proofs where necessary. We
use expressive examples to illustrate their application.
-
Energy-Efficient Mapping and Scheduling for DVS Enabled Distributed Embedded Systems [p. 514]
-
M. Schmitz, B. Al-Hashimi, and P. Eles
In this paper, we present an efficient two-step iterative synthesis
approach for distributed embedded systems containing dynamic
voltage scalable processing elements (DVS-PEs), based
on genetic algorithms. The approach partitions, schedules, and
voltage scales multi-rate specifications given as task graphs
with multiple deadlines. A distinguishing feature of the proposed
synthesis is the utilisation of a generalised DVS method.
In contrast to previous techniques, which "simply" exploit
available slack time, this generalised technique additionally
considers the PE power profile during a refined voltage selection
to further increase the energy savings. Extensive experiments
are conducted to demonstrate the efficiency of the proposed
approach. We report up to 43.2% higher energy reductions
compared to previous DVS scheduling approaches based
on constructive techniques and total energy savings of up to
82.9% for mapping and scheduling optimised DVS systems.
-
A Layered, Codesign Virtual Machine Approach to Modeling Computer Systems [p. 522]
-
J. Paul and D. Thomas
By using a macro/micro state model we show how
assumptions on the resolution of logical and physical timing
of computation in computer systems has resulted in design
methodologies such as component-based decomposition,
where they are completely coupled, and function/architecture
separation, where they are completely independent. We
discuss why these are inappropriate for emerging
programmable, concurrent system design. By contrast,
schedulers layered on hardware in concurrent systems
already couple logical correctness with physical
performance when they make effective resource sharing
decisions. This paper lays a foundation for understanding
how layered logical and physical sequencing will impact the
design process, and provides insight into the problems that
must be solved in such a design environment. Our layered
approach is that of a virtual machine. We discuss our MESH
research project in this context.
-
Automatic Evaluation of the Accuracy of Fixed-Point Algorithms [p. 529]
-
D. Menard and O. Sentieys
The minimization of cost, power consumption and time-to-market
of DSP applications requires the development
of methodologies for the automatic implementation of
floating-point algorithms in fixed-point architectures. In
this paper, a new methodology for evaluating the quality
of an implementation through the automatic determination
of the Signal to Quantization Noise Ratio (SQNR) is under
consideration. The theoretical concepts and the different
phases of the methodology are explained. Then, the ability
of our approach for computing the SQNR efficiently and its
beneficial contribution in the process of data word-length
minimization are shown through some examples.
Organizer: K. Brock, Virtual Silicon Technology, US
Moderator: C. Edwards, Electronic Times, UK
Panellists: R. Lannoo, Alcatel, BE; U. Schlichtmann, Infineon Technologies, DE; A. Domic, Synopsys, US;
J. Benkoski, Monterey, US; D. Overhauser, Simplex, US;
M. Kliment, Virtual Silicon, US
-
Power Crisis in SoC Design: Strategies for Constructing Low-Power, High-Performance SoC Designs [p. 538]
-
This special panel session brings together
several leading technologists to discuss the
challenges and solutions in constructing SoC
designs that achieve their performance goals
within a very tight power budget. These
challenges are addressed from the often
conflicting perspectives of semiconductor design
teams and commercial solutions providers of
EDA construction tools, EDA analysis tools and
semiconductor IP (SIP).
Moderators: R.. Hartenstein, Kaiserslautern U, DE; U. Kebschull, Leipzig U, DE
-
A Video Compression Case Study on a Reconfigurable VLIW Architecture [p. 540]
-
D. Rizzo and O. Colavin
In this paper, we investigate the benefits of a flexible,
application-specific instruction set by adding a run-time
Reconfigurable Functional Unit (RFU) to a VLIW
processor. Preliminary results on the motion estimation
stage in an MPEG4 video encoder are presented. With
the RFU modeled at functional level and under realistic
assumptions on execution latency, technology scaling and
reconfiguration penalty, we explore different RFU
instructions at fine-grain (instruction-level) and coarsegrain
(loop-level) granularity to speedup the application
execution. The memory bandwidth bottleneck, typical for
streaming applications, is alleviated through the
combined adoption of custom prefetch pattern
instructions and an extent of local memory. Performance
evaluations indicate up to 8x improvement, with looplevel
optimizations is achieved under various
architectural assumptions.
-
A Complete Data Scheduler for Multi-Context Reconfigurable Architectures [p. 547]
-
M. Sánchez-Élez, M. Férnandez, R. Maestre, R. Hermida, N. Bagherzadeh, and F. Kurdahi
A new technique is presented in this paper to improve the
efficiency of data scheduling for multi-context
reconfigurable architectures targeting multimedia and DSP
applications. The main goal is to improve the applications
execution time minimizing external memory transfers.
Some amount of on-chip data storage is assumed to be
available in the reconfigurable architecture. Therefore the
Complete Data Scheduler tries to optimally exploit this
storage, saving data and result transfers between on-chip
and external memories. In order to do this, specific
algorithms for data placement and replacement have been
designed. We also show that a suitable data scheduling
could decrease the number of transfers required to
implement the dynamic reconfiguration of the system.
-
Highly Scalable Dynamically Reconfigurable Systolic Ring-Architecture for DSP Applications [p. 553]
-
G. Sassatelli, L. Torres, P. Benoit, T. Gil, C. Diou, G. Cambon, and J. Galy
Microprocessors are today getting more and more
inefficient for a growing range of applications. Its
principles -The Von Neumann paradigm[3]- based on the
sequential execution of algorithms will no longer be able
to cope with the kind of highly computing intensive
applications of multimedia world.
Nowadays approaches to deal with these limitations
consist in the following:
- The first, and most natural way to increase the
computing power is obviously to decrease the cycle
execution time, thanks to new silicon technology: The
functional frequencies for the newcomers CPUs are now
getting on the way to 2 GHz.
- The second approach is co-design. The intended general
purpose CPU will confide the computation of the most
time demanding applications to a dedicated core. The
most famous example are PC graphic cards which
manage all the 2D and 3D display operations that even
high-end CPUs are not able to handle efficiently.
Both methods are not satisfying. The first one quickly
finds its limitations in however limited functional
frequencies and power consumption reduction, as the
second requires the design of a new core for each
intended algorithm. New parallel execution based
machine paradigms must be considered. Thanks to their
high level of flexibility structurally programmable
architectures are potentially interesting candidates to
overcome classical CPUs limitations.
Based on a parallel execution model, we present in this
paper a new dynamically reconfigurable architecture,
dedicated to data oriented applications acceleration.
Principles, realizations and comparative results will be
exposed for some classical applications, targeted on
different architectures.
-
(Self-)reconfigurable Finite State Machines: Theory and Implementation [p. 559]
-
J. Teich and M. Köster
In this paper, we introduce the concept of (self-)reconfigurable
finite state machines as a formal model to describe
state-machines implemented in hardware that may
be reconfigured during operation. By the advent of reconfigurable
logic devices such as FPGAs, this model may become
important to characterize and implement (self-)reconfigurable
hardware. An FSM is called (self-)reconfigurable
if reconfiguration of either output function or transition
function is initiated by the FSM itself and not based on external
reconfiguration events. We propose an efficient hardware
realisation and give algorithmic solutions and bounds
for the reconfiguration overhead of migrating a given FSM
specification into a new target FSM.
Moderators: H. Graeb, TU Munich, DE; G. Gielen, KU Leuven, BE
-
A Linear-Centric Simulation Framework for Parametric Fluctuations [p. 568]
-
E. Acar, S. Nassif, and L. Pileggi
The relative tolerances for interconnect and device parameter
variations have not scaled with feature sizes which have brought
about significant performance variability. As we scale toward
10nm technologies, this problem will only worsen. New circuit
families and design methodologies will emerge to facilitate construction
of reliable systems from unreliable nanometer scale
components. Such methodologies require new models of performance which
accurately capture the manufacturing realities.
Recently, one step toward this goal was made via a new variational reduced
order interconnect model that efficiently captures
large scale fluctuations in global parameter values. Using
variational calculus the linear interconnect systems are represented
by analytical models that include the global variational
parameters explicitly. In this work we present a framework which
extends the previous work to a linear-centric simulation methodology
with accurate nonlinear device models and their fluctuations. The
framework is applied to generate path delay
distributions under nonlinear and linear parameter fluctuations.
-
Automatic Generation of Common-Centroid Capacitor Arrays with Arbitrary Capacitor Ratio [p. 576]
-
M. Dessouky and D. Sayed
The key performance of many analog circuits is
directly related to accurate capacitor ratios. It is well
known that capacitor ratio precision is greatly enhanced
by paralleling identical size unit capacitors in a commoncentroid
geometry. In this paper, a general algorithm for
fitting arbitrary capacitor ratios in a common-centroid
unit-capacitor array is presented. The algorithm gives
special care to both non-integer and identical ratios in
order to minimize mismatch. A method for capacitance
mismatch estimation based upon an oxide gradient model
is also introduced. It enables the comparison of different
unit-capacitor array assignments. Layout issues are
discussed with emphasis on a generic routing model.
Both the algorithm and the mismatch estimation method
are implemented in an automatic capacitor array
generation tool.
-
Analog Circuit Sizing Using Adaptive Worst-Case Parameter Sets [p. 581]
-
R. Schwencker, F. Schenkel, M. Pronath, and H. Graeb
In this paper, a method for nominal design of analog integrated
circuits is presented that includes process variations
and operating ranges by worst-case parameter sets.
These sets are calculated adaptively during the sizing process
based on sensitivity analyses. The method leads to robust
designs with high parametric yield, while being much
more efficient than design centering methods.
-
High-Frequency Nonlinear Amplifier Model for the Efficient Evaluation of Inband Distortion Under
Nonlinear Load-Pull Conditions [p. 586]
-
G. Vandersteen, P. Wambacq, S. Donnay, and F. Verbeyst
Designing complex analog systems needs different
abstraction levels to reduce the overall complexity. The
required level of abstraction depends on the accuracy and
the purpose of the model. High-frequency amplifier models
can vary from simple transfer functions for efficient biterror-rate
analysis up to detailed transistor level
descriptions for accurate load-pull prediction. This paper
introduces a nonlinear black-box model for high-frequency
amplifiers. It extends the linear S-parameter representation
to enable both efficient system-level simulations and loadpull
prediction. Both are demonstrated on the
measurements of a high-frequency amplifier excited using
WLAN-OFDM modulation.
Moderators: Z. Peng, Linköping U, SE; B. Rouzeyre, LIRMM, FR
-
Effective Software Self-Test Methodology for Processor Cores [p. 592]
-
N. Kranitis, A. Paschalis, D. Gizopoulos, and Y. Zorian
Software self-testing for embedded processor cores
based on their instruction set, is a topic of increasing
interest since it provides an excellent test resource
partitioning technique for sharing the testing task of
complex Systems-on-Chip (SoC) between slow,
inexpensive testers and embedded code stored in memory
cores of the SoC. We introduce an efficient methodology
for processor cores self-testing which requires knowledge
of their instruction set and Register Transfer (RT) level
description. Compared with functional testing
methodologies proposed in the past, our methodology is
more efficient in terms of fault coverage, test code size
and test application time. Compared with recent software
based structural testing methodologies for processor
cores, our methodology is superior in terms of test
development effort and has significantly smaller code size
and memory requirements, while virtually the same fault
coverage is achieved with an order of magnitude smaller
test application time.
-
Test Resource Partitioning and Reduced Pin-Count Testing Based on Test Data Compression [p. 598]
-
A. Chandra and K. Chakrabarty
We present a new test resource partitioning (TRP) technique
for reduced pin-count testing of system-on-a-chip
(SOC). The proposed technique is based on test data
compression and on-chip decompression. It makes effective
use of frequency-directed run-length codes, internal
scan chains, and boundary scan chains. The compression/
decompression scheme decreases test data volume and
the amount of data that has to be transported from the tester
to the SOC. We show via analysis as well as through experiments
that the proposed TRP scheme reduces testing time
and allows the use of a slower tester with fewer I/O channels.
Finally, we show that an uncompacted test set applied
to an embedded core after on-chip decompression is likely
to increase defect coverage.
-
Improving Compression Ratio, Area Overhead, and Test Application Time for System-on-a-Chip
Test Data Compression/Decompression [p. 604]
-
P. Gonciari, B. Al-Hashimi, and N. Nicolici
This paper proposes a new test data compression/
decompression method for systems-on-a-chip. The
method is based on analyzing the factors that influence
test parameters: compression ratio, area overhead and test
application time. To improve compression ratio, the new
method is based on a Variable-length Input Huffman Coding
(VIHC), which fully exploits the type and length of the patterns,
as well as a novel mapping and reordering algorithm
proposed in a pre-processing step. The new VIHC algorithm
is combined with a novel parallel on-chip decoder that simultaneously
leads to low test application time and low area
overhead. It is shown that, unlike three previous approaches
[2, 3, 10] which reduce some test parameters at the expense
of the others, the proposed method is capable of improving
all the three parameters simultaneously. For example, the
proposed method leads to similar or better compression ratio
when compared to frequency directed run-length coding
[2], however with lower area overhead and test application
time. Similarly, there is comparable or lower area overhead
and test application time with respect to Golomb coding [3],
with improvements in compression ratio. Finally, there is
similar or improved test application time when compared
to selective coding [10], with reductions in compression ratio
and significantly lower area overhead. An experimental
comparison on benchmark circuits validates the proposed
method.
-
Problems Due to Open Faults in the Interconnections of Self-Checking Data-Paths [p. 612]
-
M. Favalli and C. Metra
In this work, the problem of open faults affecting the interconnections
of SC circuits composed by data-path and control
is analyzed. In particular, it is shown that, in case opens
affect control signals, some problems may arise even if both
control and data-path signals are concurrently checked. In
particular, wrong codewords may be generated at the outputs
of multiplexers and registers. To address this problem, new
registers and multiplexers are proposed which allow the design
data-paths which are TSC with respect to opens (and resistive
opens). These components are also TSC with respect
to stuck-at, transistor and gross delay faults. They present a
good testability with respect to resistive bridgings.
Moderators: B. Al-Hashimi, Southampton U, UK; P. Schwarz, FhG IIS/EAS Dresden, DE
-
Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design [p. 620]
-
S. Yoo, G. Nicolescu, L. Gauthier, and A. Jerraya
To enable fast and accurate evaluation of HW/SW implementation
choices of on-chip communication, we present
a method to automatically generate timed OS simulation
models. The method generates the OS simulation models
with the simulation environment as a virtual processor.
Since the generated OS simulation models use final
OS code, the presented method can mitigate the OS code
equivalence problem. The generated model also simulates
different types of processor exceptions. This approach provides
two orders of magnitude higher simulation speedup
compared to the simulation using instruction set simulators
for SW simulation.
-
Window-Based Susceptance Models for Large-Scale RLC Circuit Analyses [p. 628]
-
Z. Zheng, L. Pileggi, M. Beattie, and B. Krauter
Due to the increasing operating frequencies and the manner
in which the corresponding integrated circuits and systems
must be designed, the extraction, modeling and
simulation of the magnetic couplings for final design verification
can be a daunting task. In general, when modeling inductance
and the associated return paths, one must consider
the on-chip conductors as well as the system packaging. This
can result in an RLC circuit size that is impractical for traditional
simulators. In this paper we demonstrate a localized,
window-based extraction and simulation methodology
that employs the recently proposed susceptance (the inverse
of inductance matrix) concept. We provide a qualitative explanation
for the efficacy of this approach, and demonstrate
how it facilitates pre-manufacturing simulations that would
otherwise be intractable. A critical aspect of this simulation
efficiency is owed to a susceptance-based circuit formulation
that we prove to be symmetric positive definite. This
property, along with the sparsity of the susceptance matrix,
enables the use of some advanced sparse matrix solvers. We
demonstrate this extraction and simulation methodology on
some industrial examples.
-
A Linear-Centric Modeling Approach to Harmonic Balance Analysis [p. 634]
-
P. Li and L. Pileggi
In this paper we propose a new harmonic balance simulation
methodology based on a linear-centric modeling approach.
A linear circuit representation of the nonlinear
devices and associated parasitics is used along with corresponding
time and frequency domain inputs to solve for the
nonlinear steady-state response via successive chord (SC) iterations.
For our circuit examples this approach is shown to
be up to 60x more run-time efficient than traditional Newton-Raphson
(N-R) based iterative methods, while providing the
same level of accuracy. This SC-based approach converges
as reliably as the N-R approaches, including for circuit
problems which cause alternative relaxation-based harmonic
balance approaches to fail[1][2]. The efficacy of this linear-centric
methodology further improves with increasing
model complexity, the inclusion of interconnect parasitics
and other analyses that are otherwise difficult with traditional
nonlinear models.
-
An Energy Estimation Method for Asynchronous Circuits with Application to an
Asynchronous Microprocessor [p. 640]
-
P. Pénzes and A. Martin
This paper presents a simulator operating on a logical representation of
an asynchronous circuit that gives energy estimates within 10% of electrical
(hspice) simulation. Our simulator is the first such tool in the literature
specifically targeted to efficient energy estimation of QDI asynchronous
circuits.
As an application, we show how the simulator has been used to accurately
estimate the energy consumption in different parts of an asynchronous MIPS
R3000 microprocessor. This is the first energy breakdown of an asynchronous
microprocessor in the literature.
Moderator/Organizer: R. Otten, TU Eindhoven, NL
Speakers: R. Camposano, Synopsys, US; P. Groeneveld, Magma Design Automation, US;
R. Otten, TU Eindhoven, NL
-
Design Automation for Deepsubmicron: Present and Future [p. 650]
-
Advancing technology drives design technology and
thus design automation (EDA). How to model interconnect,
how to handle degradation of signal integrity and
increasing power density are changing now, and have led
to integrating logic and layout synthesis. Aggressive gate
sizing to control timing has become part of any modern
back-end. From 0.13µ and down, chips will be more susceptive
to breakdown during fabrication (antenna effect)
or to wear out over time (electromigration) and dealing
with these issues will require careful planning.
More integration of fast and accurate analysis with a
complete design ow (chip planning, synthesis, placement
and routing) will be needed, and still, advancing complexity
will affect design and verification. Using hundreds
of millions of devices effectively will be possible only by
reusing pre-designed intellectual property (IP) effectively
and by addressing system-level issues in EDA.
In the long term only more radical changes will keep
us on Moore's track, changes that ultimately will have
us depart from the two+-dimensional confinement and
lead to multiple active layers, and changes that will affect
deeply the face of EDA altogether.
Organizer: D. Davis, Actel, US
Moderator: B. Lewis, Gartner/Dataquest, US
Panellists: I. Bolsens, Xilinx, US; B. Gupta, STMicroelectronics, US; R. Lauwereins, IMEC, BE;
Y. Tanurhan, Actel Corporation, US; C. Wheddon, Quicksilver Technology, US
-
Reconfigurable SoC . What Will it Look Like? [p. 660]
-
The argument against ASIC SoCs is that they have
always taken too long and cost too much to design. As
new process technologies come on line, the issue of
inflexible, unyielding designs fixed in silicon becomes a
serious concern. Without the flexibility of reconfigurable
logic, will standard cell ASICs disappear and go the way
of gate arrays? Will ASIC manufacturers lose their edge
in providing intellectual value and become mere
purveyors of square die area?
The argument in favor of FPGAs is that they have
always provided great design flexibility because they
were configurable. The argument against FPGAs is that
compared to ASICs they have always been larger, slower
and more expensive. Will FPGAs ever become efficient
enough to replace ASICs in volume production
applications? ASSPs can be designed with partial
reconfigurability. Will they become the norm? Or, will
new reconfigurable logic cores change the SoC game
completely?
The answers to these questions will clearly impact
system designers throughout the world and shape the
future of the electronics industry. A panel of key industry
executives each coming from a different area of the
market with unique views will debate these highly
controversial topics.
-
Congestion-Aware Logic Synthesis [p. 664]
-
D. Pandini, L. Pileggi, and A. Strojwas
In this era of Deep Sub-Micron (DSM) technologies, the impact of
interconnects is becoming increasingly important as it relates to
integrated circuit (IC) functionality and performance. In the
traditional top-down IC design flow, interconnect effects are first
taken into account during logic synthesis by way of wireload
models. However, for technologies of 0.25mm and below, the
wiring capacitance dominates the gate capacitance and the delay
estimation based on fanout and design legacy statistics can be
highly inaccurate. In addition, logic block size is no longer
dictated solely by total cell area, and is often limited by wiring
area resources. For these reasons, wiring congestion is an
extremely important design factor, and should be taken into
consideration at the earliest possible stages of the design flow. In
this paper we propose a novel methodology to incorporate
congestion minimization within logic synthesis, and present results
for industrial circuits that validate our approach.
-
Layout Driven Decomposition with Congestion Consideration [p. 672]
-
T. Kutzschebauch and L. Stok
We present a novel algorithm that applies physical layout
information during common subexpression extraction to improve
wiring congestion and delay, resulting in improved design
closure.
As feature sizes decrease and chip sizes increase, the
traditional separation of physical design and logic synthesis
proves to be increasingly detrimental. Interconnect delay and
wiring congestion, among the most critical objective functions to
meet design closure, are not considered during logic synthesis.
On the other hand, physical design is too deep in the design
process to be able to significantly restructure the already
technology mapped netlist. While this problem has been
addressed previously, the existing solutions only apply simple
synthesis transforms during physical design. Hence they are
generally unable to reverse decisions made during logic
restructuring which have a major negative impact on the circuit
structure.
In our novel approach, we propose a layout driven algorithm
for the concurrent extraction of common subexpressions, one of
the most important steps that affect the overall circuit structure,
and consequently congestion and wire length during logic
synthesis. In addition, we consider dependency relations between
cube divisors to improve the extraction process. As a result, our
layout driven decomposition algorithm combines logic synthesis
and physical layout information to effectively decrease wire
length and improve congestion for improved design closure.
-
Improving Placement under the Constant Delay Model [p. 677]
-
K. Sulimma, W. Kunz, I. Neumann, and L. van Ginneken
In this paper, we show that under the constant delay
model the placement problem is equivalent to minimizing
a weighted sum of wire lengths. The weights can be
efficiently computed once in advance and still accurately
reflect the circuit area throughout the placement process.
The existence of an efficient and accurate cost function
allows us to directly optimize circuit area. This leads to
better results compared to heuristic edge weight estimates
or optimization for secondary criteria such as wire length.
We leverage this property to improve a recursive
partitioning based tool flow. We achieve area savings of
27% for some circuits and 15% on average. The use of the
constant delay model additionally enables timing closure
without iterations.
-
Crosstalk Alleviation for Dynamic PLAs [p. 683]
-
T. Tien, T. Tsai, and S. Chang
The dynamic PLA style has become popular in designing high performance
microprocessors because of its high speed and predictable routing delay.
However, like all other dynamic circuits, dynamic PLAs have suffered
from the crosstalk noise problem. In this paper,we propose two techniques
to alleviate crosstalk noise for dynamic PLAs. The first technique makes use
of the fact that depending on the ordering of product lines, some crosstalk
does not cause errors in outputs. A proper ordering can greatly reduce the
number of lines affected by crosstalk noise. For those product lines which
can be affected by crosstalk, we attempt to reduce the parallel length by
re-ordering the input and output lines. We have performed experiments on a
large set of MCNC benchmark circuits. The results show that after re-ordering,
86.7% of product lines become crosstalk immune and need not be considered for
crosstalk prevention.
Moderators: J. Lienig, Bosch, DE; F. Johannes, TU Munich, DE
-
Flip-Flop and Repeater Insertion for Early Interconnect Planning [p. 690]
-
R. Lu, G. Zhong, C. Koh, and K. Chao
We present a unified framework that considers flip-flop and
repeater insertion and the placement of flip-flop/repeater
blocks during RT or higher level design. We
introduce the concept of independent feasible regions in
which flip-flops and repeaters can be inserted in an interconnect
to satisfy both delay and cycle time constraints.
Experimental results show that, with flip-flop insertion, we
greatly increase the ability of interconnects to meet timing
constraints. Our results also show that it is necessary to
perform interconnect optimization at early design steps as
the optimization will have even greater impact on the chip
layout as feature size continually scales down.
-
Congestion Estimation with Buffer Planning in Floorplan Design [p. 696]
-
W. Wong, C. Sham, and F. Young
In this paper, we study and implement a routability-driven
floorplanner with buffer block planning. It evaluates
the routability of a floorplan by computing the probability
that a net will pass through each particular location
of a floorplan taken into account buffer locations and routing
blockages. Experimental results show that our congestion
model can optimize congestion and delay (by successful
buffer insertions) of a circuits better with only a slight
penalty in area.
-
Maze Routing with Buffer Insertion under Transition Time Constraints [p. 702]
-
L. Huang, M. Lai, D. Wong, and Y. Gao
In this paper, we address the problem of simultaneous routing
and buffer insertion. Recently in [12, 22], the authors considered
simultaneous maze routing and buffer insertion under the
Elmore delay model. Their algorithms can take into account both
routing obstacles and restrictions on buffer locations. It is well
known that Elmore delay is only a first-order approximation of
signal delay and hence could be very inaccurate. Moreover, we
cannot impose constraints on the transition times of the output
signal waveform at the sink or at the buffers on the route. In
this paper we extend the algorithm in [12] so that accurate delay
models (e.g., transmission line model, delay look-up table from
SPICE, etc.) can be used. We show that the problem of finding
a minimum-delay buffered routing path can be formulated as a
shortest path problem in a specially constructed weighted graph.
By including only the vertices with qualifying transition times
in the graph, we guarantee that all transition time constraints are
satisfied. Our algorithm can be easily extended to handle buffer
sizing and wire sizing. It can be applied iteratively to improve
any given routing tree solution. Experimental results show that
our algorithm performs well.
-
Optimal Transistor Tapering for High-Speed CMOS Circuits [p. 708]
-
L. Ding and P. Mazumder
Transistor tapering is a widely used technique applied
to optimize the geometries of CMOS transistors in high-performance
circuit design with a view to minimizing the
delay of a FET network. Currently, in a long series-connected
FET chain, the dimensions of the transistors are
decreased from bottom transistor to the top transistor in a
manner where the width of transistors is tapered linearly
or exponentially. However, it has not been mathematically
proved whether either of these tapering schemes yields optimal
results in terms of minimization of switching delays of
the network. In this paper, we rigorously analyze MOS circuits
consisting of long FET chains under the widely used
Elmore delay model and derive the optimality of transistor
tapering by employing variational calculus. Specifically,
we demonstrate that neither linear nor exponential tapering
alone minimizes the discharge time of the FET chain.
Instead, a composition of exponential and constant tapering
actually optimizes the delay of the network. We have
also corroborated our analytical results by performing extensive
simulation of FET networks and showing that both
analytical and simulation results are always consistent.
Moderators: P. Teixeira, INESC-IST, PT; B. Straube, FhG IIS/EAS Dresden, DE
-
Incremental Diagnosis and Correction of Multiple Faults and Errors [p. 716]
-
A. Veneris, J. Liu, M. Amiri, and M. Abadir
An incremental simulation-based approach to fault diagnosis
and logic debugging is presented. During each iteration
of the algorithm, a single suspicious location is identified
and fault modeled such that the functionality of the new
design becomes "closer" to its specification. The method
is based on a simple and, at a first glance, counter-intuitive
theoretical result along with a number of heuristics which
help avoid the exponential complexity inherent to the problems.
Experiments on multiple design errors and multiple
stuck-at faults confirm its effectiveness and accuracy, which
scales well with increasing number of errors.
-
Test Enrichment for Path Delay Faults Using Multiple Sets of Target Faults [p. 722]
-
I. Pomeranz and S. Reddy
Test sets for path delay faults in circuits with large
numbers of paths are typically generated for path delay
faults associated with the longest circuit paths. We show
that such test sets may not detect faults associated with the
next-to-longest paths. This may lead to undetected failures
since shorter paths may fail without any of the longest
paths failing. In addition, paths that appear to be shorter
may actually be longer than the longest paths if the procedure
used for estimating path length is inaccurate. We
propose a test enrichment procedure that increases
significantly the number of faults associated with the
next-to-longest paths that are detected by a (compact) test
set. This is achieved by allowing the underlying test generation
procedure the flexibility of detecting or not detecting
the faults associated with the next-to-longest paths.
Faults associated with next-to-longest paths are detected
without increasing the number of tests beyond that
required to detect the faults associated with the longest
paths. The proposed procedure thus improves the quality
of the test set without increasing its size.
-
FACTOR: A Hierarchical Methodology for Functional Test Generation and Testability Analysis [p. 730]
-
V. Vedula and J. Abraham
This paper develops an improved approach for hierarchical
functional test generation for complex chips. In order
to deal with the increasing complexity of functional test
generation, hierarchical approaches have been suggested
wherein functional constraints are extracted for each module
under test (MUT) within a design. These constraints
describe a simplified ATPG view for the MUT and thereby
speed up the test generation process. This paper develops
an improved approach which applies this technique at
deeper levels of hierarchy, so that effective tests can be
developed for large designs with complex submodules. A
tool called FACTOR (FunctionAl ConsTraint extractOR),
which implements this methodology is described in this
work. Results on the ARM design prove the effectiveness of
FACTOR-ising large designs for test generation and testability
analysis.
Moderators: W. Grass, Passau U, DE; E. Villar, Cantabria U, ES
-
An Environment for Dynamic Component Composition for Efficient Co-Design [p. 736]
-
F. Doucet, S. Shukla, R. Gupta, and M. Otsuka
This article describes the Balboa component integration
environment that is composed of three parts: a script language
interpreter, compiled C++ components, and a set of
Split-Level Interfaces to link the interpreted domain to the
compiled domain. The environment applies the notion of
split-level programming to relieve system engineers of software
engineering concerns and to let them focus on system
architecture. The script language is a Component Integration
Language because it implements a component model
with introspection and loose typing capabilities. Component
wrappers use split-level interfaces that implement the
composition rules, dynamic type determination and type
inference algorithms. Using an interface description language
compiler automatically generates the split-level interfaces.
The contribution of this work is two fold: an active
code generation technique, and a three-layer environment
that keeps the C++ components intact for reuse. We
present an overview of the environment; demonstrate our
approach by building three simulation models for an adaptive
memory controller, and comment on code generation
ratios.
-
Functional Verification for SystemC Descriptions Using Constraint Solving [p. 744]
-
F. Ferrandi, M. Rendine, and D. Sciuto
This paper addresses the problem of test vectors generation
starting from an high level description of the system
under test, specified in SystemC. The verification method
considered is based upon the simulation of input sequences.
The system model adopted is the classical Finite State Machine
model. Then, according to different strategies, a set
of sequences can be obtained, where a sequence is an ordered
set of transitions. For each of these sequences, a set
of constraints is extracted. Test sequences can be obtained
by generating and solving the constraints, by using a constraint
solver (GProlog). A solution of the constraint solver
yields the values of the input signals for which a sequence
of transitions in the FSM is executed. If the constraints cannot
be solved, it implies that the corresponding sequence
cannot be executed by any test. The presented algorithm is
not based on a specific fault model, but aims at reaching the
highest possible path coverage.
-
The Modelling of Embedded Systems Using HASoC [p. 752]
-
M. Edwards and P. Green
We present a design method (HASoC) for the lifecycle
modelling of embedded systems that are targeted
primarily, but not necessarily, at SoC implementations.
The object-oriented development technique is based on
our experiences of using an existing modelling technique
(MOOSE) and supports a lifecycle that explicitly
separates the behaviour of a system from its hardware
and software implementation technologies. The design
process, which uses a UML-RT-based notation, begins
with the incremental development and validation of an
executable model of a system. This model is then
partitioned into hardware and software to create a
committed model, which is mapped onto a system
platform. The methodology emphasises the reuse of preexisting
hardware and software platforms to ease the
development process. An example application is
presented in order to illustrate the main concepts in
HASoC.
-
A Functional Specification Notation for Co-Design of Mixed Analog-Digital Systems [p. 760]
-
A. Dobol and R. Vemuri
This paper discusses aBlox -- a specification notation for
high-level synthesis of mixed-signal systems. aBlox addresses
three important aspects of mixed-signal system specification:
(1) description of functionality and (2) performance
issues and (3) expression of analog-digital interactions.
The semantics of aBlox embeds concepts and rules
of a functional computational model, and uses a declarative
style to denote performance elements. The paper
shows some mixed-signal specifications that we developed
in aBlox. Finally, we describe a high-level analog synthesis
experiment that used aBlox specifications as inputs.
Moderator/Organizer: L. Lavagno, Politecnico di Torino, IT
-
The Real-Time UML Standard: Definition and Application [p. 770]
-
B. Selic
This very short paper describes the objectives,
content, and usage of a real-time UML profile that has
been standardized by the Object Management Group.
This profile defines a common framework for describing
the quantitative aspects of software systems. In addition,
it provides specific facilities for analysing real-time
systems for schedulability or performance.
-
UML for Embedded Systems Specification and Design: Motivation and Overview [p. 773]
-
G. Martin
The specification, design and implementation of
embedded systems demands new approaches which go
beyond traditional hardware-based notations such as
HDLs. The growing dominance of software in embedded
systems design requires a careful look at the latest
methods for software specification and analysis. The
development of the Unified Modeling Language (UML),
and a number of extension proposals in the realtime
domain holds promise for the development of new design
flows which move beyond static and traditional partitions
of hardware and software. However, UML as currently
defined lacks several key capabilities. In this paper, we
will survey the requirements for system-level design of
embedded systems, and give an overview of the extensions
required to UML that will be dealt with in more detail in
the related papers. In particular, we will discuss how the
notions of platform-based design intersect with a UML
based development approach.
-
A UML-Based Design Methodology for Real-Time and Embedded Systems [p. 776]
-
G. de Jong
The fast growing complexity of today's real time
embedded systems necessitates new design methods and
tools to face the problems of design, analysis,
integration and validation of complex systems. We
present a system level design method for embedded real-time
systems combining the informal strengths of UML
with the formal strengths of SDL. We demonstrate our
flow by the design example of a telecommunications
application from the wireless or access domain, showing
the applicability of the flow to control and data -
dominated types of systems. Finally we will show how
the application results and other end-user needs and
requirements influenced the current UML 2.0 proposal
with support for real-time and embedded systems.
Moderators: Z. Peng, Linköping U, SE; J. Sifakis, VERIMAG, FR
-
Minimum Energy Fixed-Priority Scheduling for Variable Voltage Processor [p. 782]
-
G. Quan and X. Hu
To fully exploit the benefit of variable voltage processors,
voltage schedules must be designed in the context of work
load requirement. In this paper, we present an approach to
finding the least-energy voltage schedule for executing realtime
jobs on such a processor according to a fixed priority,
preemptive policy. The significance of our approach is that
the theoretical limit in terms of energy saving for such systems
is established, which can thus serve as the standard to
evaluate the performance of various heuristic approaches.
Two algorithms for deriving the optimal voltage schedule
are provided. The first one explores fundamental properties
of voltage schedules while the second one builds on the first
one to further reduce the computational cost. Experimental
results are shown to compare the results of this paper with
previous ones.
-
A Dynamic Voltage Scaling Algorithm for Dynamic-Priority Hard Real-Time Systems
Using Slack Time Analysis [p. 788]
-
W. Kim, J. Kim, and S. Min
Dynamic voltage scaling (DVS), which adjusts the clock
speed and supply voltage dynamically, is an effective technique
in reducing the energy consumption of embedded realtime
systems. The energy efficiency of a DVS algorithm largely
depends on the performance of the slack estimation method
used in it. In this paper, we propose a novel DVS algorithm
for periodic hard real-time tasks based on an improved slack
estimation algorithm. Unlike the existing techniques, the proposed
method takes full advantage of the periodic characteristics
of the real-time tasks under priority-driven scheduling
such as EDF. Experimental results show that the proposed algorithm
reduces the energy consumption by 2040% over the
existing DVS algorithm. The experiment results also show that
our algorithm based on the improved slack estimation method
gives comparable energy savings to the DVS algorithm based
on the theoretically optimal (but impractical) slack estimation
method.
-
Extending Synchronous Languages for Generating Abstract Real-Time Models [p. 795]
-
G. Logothetis and K. Schneider
We present an extension of synchronous programming languages
that can be used to declare program locations irrelevant
for verification. An efficient algorithm is proposed
to generate from the output of the usual compilation an
abstract real-time model by ignoring the irrelevant states,
while retaining the quantitative information. Our technique
directly generates a single real-time transition system,
thus overcoming the known problem of composing several
real-time models. A major application of this approach
is the verification of real-time properties by symbolic model
checking.
Moderators: J. Phillips, Cadence Berkeley Labs, US; L. Silveira, IST/INESC, PT
-
An Interconnect-Aware Methodology for Analog and Mixed Signal Design, Based on
High Bandwidth (Over 40 Ghz) On-Chip Transmission Line Approach [p. 804]
-
D. Goren, M. Zelikson, T. Galambos, R. Gordin, B. Livshitz, A. Amir, A. Sherman, and I. Wagner
This paper presents an on-chip, interconnect-aware
methodology for high-speed analog and mixed signal
(AMS) design which enables early incorporation of on-chip
transmission line (T-line) components into AMS design flow.
The proposed solution is based on a set of parameterized
T-line structures, which include single and two coupled microstrip
lines with optional side shielding, accompanied by
compact true transient models. The models account for frequency
dependent skin and proximity effects, while maintaining
passivity requirements due to their pure RLC nature.
The signal bandwidth supported by the models covers
a range from DC to 100 GHz. The models are currently
verified in terms of S-parameter data against hardware (up
to 40 GHz) and against EM solver (up to 100 GHz). This
methodology has already been used for several designs implemented
in SiGe (Silicon-Germanium) BiCMOS technology.
-
Closed-Form Crosstalk Noise Metrics for Physical Design Applications [p. 812]
-
L. Chen and M. Marek-Sadowska
In this paper we present efficient closed-form formulas to estimate
capacitive coupling-induced crosstalk noise for distributed
RC coupling trees. The efficiency of our approach stems
from the fact that only the five basic operations are used in the
expressions: addition (x + y), subtraction (x − y), multiplication
(x × y), division (x/y) and square root (√x). The formulas
do not require exponent computation or numerical
iterations. We have developed closed-form expressions for the
peak crosstalk noise amplitude, the peak noise occurring time
and the width of the noise waveform. Our approximations are
conservative and yet achieve acceptable accuracy. The formulas
are simple enough to be used in the inner loops of performance
optimization algorithms or as cost functions to guide routers.
They capture the influence of coupling direction (near-end and
far-end coupling) and coupling location (near-driver and
near-receiver).
-
Formulation of Low-Order Dominant Poles for Y-Matrix of Interconnects [p. 820]
-
Q. Xu and P. Mazumder
This paper presents an efficient approach to compute the dominant
poles for the reduced-order admittance (Y parameter)
matrix of lossy interconnects. Using the global approximation
technique, the efficient frameworks are constructed to
transform the frequency-domain Telegrapher's equations into
compact linear algebraic equations. The dominant poles and
residues can be extracted by directly solving the linear equations.
The closed-form formulas are derived to compute the
low-order dominant poles. Due to high accuracy of the global
approximation, the extracted poles can accurately represent the
exact admittance matrices in a wide frequency range. By using
the recursive convolution technique, the pole-residue models
can be represented by companion models, which have linear
complexity with respect to the computational time. The
presented modeling approaches are shown to preserve passivity.
Numerical experiments of transient simulation show that
the presented modeling approaches lead to higher efficiency,
while maintaining comparable accuracy.
-
Library Compatible Ceff for Gate-Level Timing [p. 826]
-
B. Sheehan
Accurate gate-level static timing analysis in the presence
of RC loads has become an important problem for
modern deep-submicron designs. Non-capacitive loads
are usually analyzed using the concept of an effective
capacitance, Ceff. Most published algorithms for Ceff,
however, require special cell characterization or
supplemental information that is not part of standard
timing libraries. In this paper we present a novel Ceff
algorithm that is strictly compatible with existing timing
libraries. It is also fast, easily implemented, and quite
accurate--within 3% of transistor-level simulation in our
tests. The method is based on approximating a gate by a
current source, estimating the delay difference when the
gate drives the actual RC load and a reference
capacitor, and then converting the delay discrepancy
into a Ceff value. Central to carrying out this program is
the innovative concept of delay correction transfer
function.
Moderators: L. Bouzaida, STMicroelectronics, FR; A. Singh, Auburn U, US
-
Self-Checking Scheme for the On-Line Testing of Power Supply Noise [p. 832]
-
C. Metra, L. Schiano, B. Riccò, and M. Favalli
We propose a self-checking scheme for the on-line testing
of power supply noise exceeding a tolerance bound to
be chosen accordingly to system's constraints. Upon the occurrence
of such a noise, our scheme provides an output error
message, which can be exploited for diagnosis purposes
or to recover from the detected noise (thus guaranteeing the
system's correct operation). As far as we are concerned, no
on-line testing scheme for power supply noise has been proposed
up to now. Our scheme negligibly impacts system's
performance, features self-checking ability with respect to
a wide set of possible internal faults and keeps on revealing
on-line the occurrence of power supply noise, despite the
possible presence of noise affecting also ground.
-
Automatic Modifications of High Level VHDL Descriptions for Fault Detection or Tolerance [p. 837]
-
R. Leveugle
The need for integrated mechanisms providing on-line
error detection or fault tolerance is becoming a major
concern due to the increasing sensitivity of the circuits to
their environment. This paper reports on a tool
automating the implementation of such mechanisms by
modifying high-level VHDL descriptions. The
modifications are compatible with industrial design flows
based on commercial synthesis and simulation tools. The
results demonstrate the feasibility and the efficiency of the
approach.
-
Exploiting Idle Cycles for Algorithm Level Re-Computing [p. 842]
-
K. Wu and R. Karri
Although algorithm level re-computing techniques can
trade-off the detection capability of Concurrent Error
Detection (CED) vs. time overhead, it results in 100%
time overhead when the strongest CED capability is
achieved. Using the idle cycles in the data path to do the
re-computation can reduce this time overhead. However
dependencies between operations prevent the recomputation
from fully utilizing the idle cycles.
Deliberately breaking some of these data dependencies
can further reduce the time overhead associated with
algorithm level re-computing.
-
New Techniques for Speeding-Up Fault-Injection Campaigns [p. 847]
-
L. Berrojo, I. Gónzález, F. Corno, M. Sonza Reorda, G. Squillero, L. Entrena, and C. López
Fault-tolerant circuits are currently required in several
major application sectors, and a new generation of CAD
tools is required to automate the insertion and validation
of fault-tolerant mechanisms. This paper outlines the
characteristics of a new fault-injection platform and its
evaluation in a real industrial environment. It also details
techniques devised and implemented within the platform to
speed-up fault-injection campaigns. Experimental results
are provided, showing the effects of the different
techniques, and demonstrating that they are able to reduce
the total time required by fault-injection campaigns by at
least one order of magnitude.
Moderators: J. Teich, Paderborn U, DE; W. Kruijtzer, Philips Research, NL
-
System Design for Flexibility [p. 854]
-
C. Haubelt, J. Teich, K. Richter, and R. Ernst
With the term flexibility, we introduce a new design dimension
of an embedded system that quantitatively characterizes
its feasibility in implementing not only one, but possibly
several alternative behaviors. This is important when
designing systems that may adopt their behavior during operation,
e.g., due to new environmental conditions, or when
dimensioning a platform-based system that must implement
a set of different behaviors. A hierarchical graph model is
introduced that allows to model flexibility and cost of a system
formally. Based on this model, an efficient exploration
algorithm to find the optimal flexibility/cost-tradeoff-curve
of a system using the example of the design of a family of
Set-Top boxes is proposed.
-
Accurate Area and Delay Estimators for FPGAs [p. 862]
-
A. Nayak, M. Haldar, A. Choudhary, and P. Banerjee
We present an area and delay estimator in the context of a compiler
that takes in high level signal and image processing applications
described in MATLAB and performs automatic design space
exploration to synthesize hardware for a Field Programmable Gate
Array (FPGA) which meets the user area and frequency specifications.
We present an area estimator which is used to estimate
the maximum number of Configurable Logic Blocks (CLBs) consumed
by the hardware synthesized for the Xilinx XC4010 from
the input MATLAB algorithm. We also present a delay estimator
which finds out the delay in the logic elements in the critical
path and the delay in the interconnects. The total number of CLBs
predicted by us is within 16% of the actual CLB consumption and
the synthesized frequency estimated by us is within an error of
13% of the actual frequency after synthesis through Synplify logic
synthesis tools and after placement and routing through the XACT
tools from Xilinx. Since the estimators proposed by us are fast
and accurate enough, they can be used in a high level synthesis
framework like ours to perform rapid design space exploration.
-
A Powerful System Design Methodology Combining OCAPI and Handel-C for Concept Engineering [p. 870]
-
K. Buchenrieder, A. Pyttel, and A. Sedlmeier
In this paper, we present an efficient methodology to
validate high performance algorithms and prototype them
using reconfigurable hardware. We follow a strict topdown
Hardware/Software Codesign paradigm using stepwise
refinement techniques. Starting from a performance
evaluation on the data-flow level using the OCAPI system,
we partition the simulated high-level data-flow description
into hardware and software modules. The hardware parts,
described in Handel-C, are compiled and mapped to Xilinx
Virtex 2000E FPGAs, and the software is executed on a PC
processor that hosts the Virtex boards. Hardware/software
interfacing and communication between processor and
FPGA is established via the PCI bus by shared memory
DMA transfers.
This paper presents the methodology and illustrates
the method with an example of a channel coder.
-
Automated Concurrency Re-Assignment in High Level System Models for
Efficient System-Level Simulation [p. 875]
-
N. Savoiu, S. Shukla, and R. Gupta
Simple and powerful modeling of concurrency and reactivity
along with their efficient implementation in the simulation
kernel are crucial to the overall usefulness of system
level models using the C++-based modeling frameworks.
However, the concurrency alignment in most modeling
frameworks is naturally expressed along hardware
units, being supported by the various language constructs,
and the system designers express concurrency in their system
models by providing threads for some modules/units
of the model. Our experimental analysis shows that this
concurrency model leads to inefficient simulation performance,
and a concurrency alignment along dataflow gives
much better simulation performance, but changes the conceptual
model of hardware structures. As a result, we propose
an algorithmic transformation of designs written in
these C++-based environments with concurrency alignment
along units/modules. This transformation, provided as a
compiler front-end, will re-assign the concurrency along
the dataflow, as opposed to threading along concurrent
hardware/software modules, keeping the functionality of the
model unchanged. Such a front-end transformation strategy
will relieve hardware system designers from concerns about
software engineering issues such as, threading architecture,
and simulation performance, while allowing them to design
in the most natural manner, whereas, the simulation performance
can be enhanced up to almost two times as shown in
our experiments.
Moderators/Organizers: I. Rugen-Herzig, Infineon Technologies, DE; R. Sommer, Infineon Technologies, DE
-
From System Specification To Layout: Seamless Top-Down Design Methods for Analog and
Mixed-Signal Applications [p. 884]
-
R. Sommer, I. Rugen-Herzig, E. Hennig, U. Gatti, P. Malcovati, F. Maloberti,
K. Einwich, C. Clauss, P. Schwarz, and G. Noessing
Design automation for analog/mixed-signal (A/MS) circuits
and systems is still lagging behind compared to what
has been reached in the digital area. As System-on-Chip
(SoC) designs include analog components in most cases,
these analog parts become even more a bottleneck in the
overall design process. The paper is dedicated to latest
R&D activities within the MEDEA+ project ANASTASIA+.
Main focus will be the development of seamless top-down
design methods for integrated analog and mixed-signal systems
and to achieve a high level of automation and reuse in
the A/MS design process. These efforts are motivated by the
urgent need to close the current gap in the industrial design
flow between system specification and design on the one
hand and block-level circuit design on the other hand. The
paper will focus on three subtopics starting with the topdown
design flow with applications from circuit sizing, design
centering, and automated behavioral modeling. The
next part focuses on modeling and simulation of specific
functionalities in sigma-delta design while the last section
is dedicated to a mixed-signal System-on-Chip design environment.
Moderators: P. Eles, Linköping U, SE; B. Mesman, Philips/TU Eindhoven, NL
-
Memory System Connectivity Exploration [p. 894]
-
P. Grun, N. Dutt, and A. Nicolau
In programmable embedded systems, the memory subsystem
represents a major cost, performance and power bottleneck.
To optimize the system for such different goals, the
designer would like to perform Design Space Exploration,
evaluating different memory modules from a memory IP library,
and selecting the most promising designs. However,
while the memory modules are important, the rate at which
the memory system can produce the data for the CPU is significantly
impacted by the connectivity architecture between
the memory subsystem and the CPU. Thus, it is critical to
consider the connectivity architecture early in the design flow,
in conjunction with the memory architecture. We present a
connectivity architecture exploration approach, evaluating a
wide range of cost, performance, and energy connectivity architectures.
When coupled with our memory modules exploration
approach, we can significantly improve the system behavior.
We present experiments on a set of large real-life
benchmarks, showing significant performance improvements
for varied cost and power characteristics, allowing the designer
to tailor the performance, cost and power of the programmable
embedded system.
-
Performance-Area Trade-Off of Address Generators for Address Decoder-Decoupled Memory [p. 902]
-
S. Hettiaratchi, P. Cheung, and T. Clarke
Multimedia applications are characterized by a large
number of data accesses and complex array index manipulations.
The built-in address decoder in the RAM memory
model commonly used by most memory synthesis tools, unnecessarily
restricts the freedom of address generator synthesis.
Therefore a memory model in which the address decoder
is decoupled from the memory cell array is proposed.
In order to demonstrate the benefits and limitations of this
alternative memory model, synthesis results for a Shift Register
based Address Generator that does not require address
decoding are compared to those for a counter-based
address generator that requires address decoding. Results
show that delay can be nearly halved at the expense of increased
area.
-
Multiple-Precision Circuits Allocation Independent of Data-Objects Length [p. 909]
-
M. Molina, J. Mendias, and R. Hermida
This paper presents an heuristic method to solve the
combined resource selection and binding problems for the
high-level synthesis of multiple-precision specifications.
Traditionally, the number of functional (and storage)
units in a datapath is determined by the maximum number
of operations scheduled in the same cycle, with their
respective widths depending on the number of bits of the
wider operations. When these wider operations are not
scheduled in such "busy" cycle, this way of acting could
produce a considerable waste of area.
To overcome this problem, we propose the selection of
the set of resources taking into account the only truly
relevant aspect: the maximum number of bits calculated
and stored simultaneously in a cycle. The implementation
obtained is a multiple-precision datapath, where the
number and widths of the resources are independent of
the specification operations and data objects.
Moderators: P. Feldmann, Celight Inc, US; G. Vandersteen, IMEC, BE
-
Efficient Model Reduction of Linear Time-Varying Systems via Compressed Transient System Function [p. 916]
-
E. Gad and M. Nakhla
This paper presents a new approach for model-order reduction
of linear time varying system based on expanding
the time-varying system in the right half plane of the s-domain.
The proposed algorithm is developed through introducing Krylov
subspace-based reduction to time-varying transfer functions. The
proposed algorithm does not require solution of large system of
equations to construct a basis for the time-varying moments. Instead,
it computes such a basis through time-domain integration of the
corresponding linear time-varying differential algebraic equations.
Numerical experiments show that expanding in the right-half plane
compresses the transient phase of the response of these equations
by several orders of magnitude.
-
Passive Constrained Rational Approximation Algorithm Using Nevanlinna-Pick Interpolation [p. 923]
-
C. Coelho, L. Silveira, and J. Phillips
As system integration evolves and tighter design constraints
must be met, it becomes necessary to account for
the non-ideal behavior of all the elements in a system. For
high-speed digital, and microwave systems, it is increasingly
important to model previously neglected frequency domain
effects.
In this paper, results from Nevanlinna-Pick interpolation
theory are used to develop a bounded real matrix rational
approximation algorithm. A method is presented that allows
for the generation of guaranteed passive rational function
models of passive systems by approximating their scattering
parameter matrices. Since the order of the models may in
some cases be high, an incremental fitting strategy is also
proposed that allows for the generation of smaller models
while still meeting the required passivity and accuracy
requirements. Results of the application of the proposed
method to several real-world examples are also shown.
-
Model Reduction in the Time-Domain Using Laguerre Polynomials and Krylov Methods [p. 931]
-
Y. Chen, V. Balakrishnan, C. Koh, and K. Roy
We present a new passive model reduction algorithm-based
on the Laguerre expansion of the time response of interconnect
networks. We derive expressions for the Laguerre
coefficient matrices that minimize a weighted square of the
approximation error, and show how these matrices can be
computed efficiently using Krylov subspace methods. We
discuss the connections between our method and other methods
such as PRIMA [4]. Numerical simulations show that
our method can better approximate the original model as
compared to PRIMA.
Moderators: H. Obermeir, Infineon Technologies, DE; M. Sonza Reorda, Politecnico di Torino, IT
-
An Optimal Algorithm for the Automatic Generation of March Tests [p. 938]
-
A. Benso, S. Di Carlo, G. Di Natale, and P. Prinetto
This paper presents an innovative algorithm for the
automatic generation of March Tests. The proposed
approach is able to generate an optimal March Test for
an unconstrained set of memory faults in very low
computation time.
-
Minimal Test for Coupling Faults in Word-Oriented Memories [p. 944]
-
A. van de Goor, M. Abadir, and A. Carlin
Most industrial memories have an external word-width of more than one bit.
However, most published memory test algorithms assume 1-bit memories;
they will not detect coupling faults between the cells of a word. This
paper improves upon the state of the art in testing word-oriented memories
by presenting a new method for detecting state coupling faults between
cells of the same word, based on the use of m-out-of-n codes. The result
is a reduction in test time, which varies between 20% and 30%.
Key words: State coupling faults, word-oriented memories, data backgrounds,
m-out-of-n codes. The result is a reduction in test time, which varies between
20% and 30%.
Keywords: State coupling faults, word-oriented memories, tests, data
backgrounds, m-out-of-n codes.
-
Maximizing Impossibilities for Untestable Fault Identification [p. 949]
-
M. Hsiao
This paper presents a new fault-independent method for maximizing local
conflicting value assignments for the purpose of untestable faults
identification. The technique first computes a large number of logic
implications across multiple time-frames and stores them in an implication
graph. Then, by maximizing conflicting scenarios in the circuit, the
algorithm identifies a large number of untestable faults that require such
impossibilities. The proposed approach identifies impossible combinations
locally around each Boolean gate in the circuit, and its complexity is thus
linear in the number of nodes, resulting in short execution times. Experimental
results for both combinational and sequential benchmark circuits showed that
many more untestable faults can be identified with this approach efficiently.
-
Automated Modeling of Custom Digital Circuits for Test [p. 954]
-
S. Bose
Models meant for logic verification and simulation are
often used for ATPG. For custom digital circuits, these models
contain many tristate devices, which leads to lower fault
coverage. Unlike other research in the literature, the modeling
algorithms presented in this paper analyze each channel
connected component in the context of its environment, thereby
capturing the relationship among its input signals. This reduces
the number of tristates and increases the modeling efficiency, as
measured by faults coverage. Experimental results demonstrate the
superiority of this approach.
Moderators: H. Hsieh, UC Riverside, US; R. Lauwereins, IMEC, BE
-
False Path Elimination in Quasi-Static Scheduling [p. 964]
-
G. Arrigoni, L. Duchini, C. Passerone, L. Lavagno, and Y. Watanabe
We have developed a technique to compute a Quasi Static
Schedule of a concurrent specification for the software partition
of an embedded system. Previous work did not take
into account correlations among run-time values of variables,
and therefore tried to find a schedule for all possible
outcomes of conditional expressions. This is advantageous
on one hand, because by abstracting data values one can
find schedules in many cases for an originally undecidable
problem. On the other hand it may lead to exploring false
paths, i.e., paths that can never happen at run-time due to
constraints on how the variables are updated. This affects
the applicability of the approach, because it leads to an explosion
in the running time and the memory requirements of
the compile-time scheduler itself. Even worse, it also leads
to an increase in the final code size of the generated software.
In this paper, we propose a semi-automatic algorithm to
solve the problem of false paths: the designer identifies and
tags critical expressions, and synchronization channels are
automatically added to the specification to drive the search
of a schedule. As a proof of concept, the proposed technique
has been applied to a subsystem of an MPEG-2 decoder, and
allowed us to find a schedule that previous techniques could
not identify.
-
A Data Analysis Method for Software Performance Prediction [p. 971]
-
G. Bontempi and W. Kruijtzer
This paper explores the role of data analysis methods
to support system-level designers in characterising the
performance of embedded applications. In particular, we
address the performance modelling of software applications
running on an embedded microprocessor. We
propose a data analysis method, which, on the basis of a
parameterisation of the software functionality and the
hardware architecture, is able to predict the number of
execution cycles on an embedded processor. Experiments
with standard computational code (sorting, mathematical
computation) and with MPEG variable length decoding
are presented to support this claim.
-
A Code Transformation-Based Methodology for Improving I-Cache Performance of DSP Applications [p. 977]
-
N. Liveris, N. Zervas, D. Soudris, and C. Goutis
This paper focuses on I-cache behaviour
enhancement through the application of high-level
code transformations. Specifically, a flow for the
iterative application of the I-Cache performance
optimizing transformations is proposed. The
procedure of applying transformation is driven by a
set of analytical equations, which receive parameters
related to code and I-cache structure and predict the
number of I-cache misses. Experimental results from
a real-life demonstration application shows that
order of magnitude reductions of the number of Icache
misses can be achieved by the application of
the proposed methodology.
-
A Compiler-Based Approach for Improving Intra-Iteration Data Reuse [p. 984]
-
M. Kandemir
Intra-iteration data reuse occurs when multiple array
references exhibit data reuse in a single loop iteration. An
optimizing compiler can exploit this reuse by clustering (in
the loop body) array references with data reuse as much
as possible. This reduces the number of intervening references
between references to the same array and improves
overall execution time and energy consumption. In this paper,
we present a strategy where inter-statement and intrastatement
optimizations are used in concert for optimizing
intra-iteration data reuse. The objective is to cluster (within
the loop body) the array references with spatial or temporal
reuse. Using four array-intensive applications from image
processing domain, we show that our approach improves
the cache behavior of programs by 13.8% on the average.
Moderator: A. Jerraya, TIMA, Grenoble, FR
-
European CAD from the 60's to the New Millenium [p. 992]
-
Joseph Borel, J.B.-R&D Consulting, FR
CAD has always been hardly understood by the CEO's of companies because it obeys rules (if any) very different
from the process. A rich variety of CAD and TCAD solutions have been developed in Europe in the early days of
the CAD industry. These solutions have come to introduce real innovations in the field, but because they were
mostly internal to the companies they have never reached the proper engineering level that would have enabled their
introduction in the market. A review of the CAD history activity in Europe will be presented in this Plenary Session,
together with some prospects on how it could evolve in the coming years and change from its lackluster industrial
visibility.
Organizer/Moderator: I. Bolsens, Xilinx, US
Speakers: D. Verkest, IMEC, BE; S. Guccione, Xilinx, US; S. Singh, Xilinx, US
-
Design Technology for Networked Reconfigurable FPGA Platforms [p. 994]
-
S. Guccione, D. Verkest, and I. Bolsens
Future networked appliances should be able to
download new services or upgrades from the network
and execute them locally. This flexibility is typically
achieved by processors that can download new software
over the network, using JAVA technology. This paper
demonstrates that FPGAs are a realistic implementation
platform for thin server or client applications. FPGAs
can offer the same end-user experience as software
based systems, combined with more computational
power and lower cost.
Moderators: N. Dutt, UC Irvine, US; M. Renaudin, TIMA, Grenoble, FR
-
High-Speed Non-Linear Asynchronous Pipelines [p. 1000]
-
R. Ozdag, P. Beerel, M. Singh, and S. Nowick
Many approaches recently proposed for high-speed asynchronous
pipelines are applicable only to linear datapaths. However,
real systems typically have non-linearities in their datapaths,
i.e. stages may have multiple inputs ("joins") or multiple outputs
("forks"). This paper presents several new pipeline templates that
extend existing high-speed approaches for linear dynamic logic
pipelines, by providing efficient control structures that can
accommodate forks and joins. In addition, constructs for conditional
computation are also introduced. Timing analysis and SPICE simulations
show that the performance overhead of these extensions is fairly low (5% to 20%).
-
Single-Track Asynchronous Pipeline Templates Using 1-of-N Encoding [p. 1008]
-
M. Ferretti and P. Beerel
This paper presents a new fast and templatized
family of fine-grain asynchronous pipeline stages based
on the single-track protocol. No explicit control wires
are required outside of the datapath and the data is 1-
of-N encoded. With a forward latency of 2 transitions
and a cycle time of 6 for most configurations, the new
family can run at 1.6 GHz using MOSIS TSMC 0.25 µm
process. This is significantly faster than all known
quasi-delay-insensitive templates and has less timing
assumptions than the recently proposed ultra-highspeed
GasP bundled-data circuits.
-
Power-Manageable Scheduling Technique for Control Dominated High-Level Synthesis [p. 1016]
-
C. Chen and M. Sarrafzadeh
Optimizing power consumption at high-level is a critical
step towards power-efficient digital system designs. This
paper addresses the power management problem by
scheduling a given control-dominated data flow graph. We
discuss delay and power issues with scheduling, and
propose an improvement algorithm for insertion of so called
soft edges which enable power optimization under
timing constraints. Power savings obtained by our approach
on tested circuits range between 15% and 30% of the initial
power dissipation.
-
Practical Instruction Set Design and Compiler Retargetability Using Static Resource Models [p. 1021]
-
Q. Zhao, B. Mesman, and T. Basten
The design of application (-domain) specific instruction-set
processors (ASIPs), optimized for code size, has traditionally
been accompanied by the necessity to program assembly,
at least for the performance critical parts of the application.
The highly encoded instruction sets simply lack
the orthogonal structure present in e.g. VLIW processors,
that allows efficient compilation. This lack of efficient compilation
tools has also severely hampered the design space
exploration of code-size efficient instruction sets, and correspondingly,
their tuning to the application domain. In [13]
a practical method is demonstrated to model a broad class
of highly encoded instruction sets in terms of virtual resources
easily interpreted by classic resource constrained
schedulers (such as the popular list-scheduling algorithm),
thereby allowing efficient compilation with well understood
compilation tools. In this paper we will demonstrate the
suitability of this model to also enable instruction set design
(-space exploration) with a simple, well-understood
and proven method long used in the High-Level Synthesis
(HLS) of ASICs. A small case study proves the practical
applicability of the method.
Moderators: E. Sicard, INSA, FR; G. Vandenbosch, KU Leuven, BE
-
Hierarchical Simulation of Substrate Coupling in Mixed-Signal ICs Considering the
Power Supply Network [p. 1028]
-
T. Brandtner and R. Weigel
This paper presents a novel substrate coupling simulation
tool that is well suited to floorplanning of large mixed signal
IC designs. The IC layout may consist of several
subcircuits, hence a hierarchical design flow, which is usually
used for IC circuit design and layout, is supported.
Coupling data modelling the substrate inside subcircuits
are precalculated and subsequently used during floorplanning
leading to shorter simulation time. In addition, the
impedance model of the power grid is considered as well
making it possible to provide estimation results of substrate
coupling quickly after only one simulation step. The approach
is verified by experimental results in 0.13µm CMOS
and 0.25µm BiCMOS technologies.
-
Fast Method to Include Parasitic Coupling in Circuit Simulations [p. 1033]
-
B. Van Thielen and G. Vandenbosch
S-parameter based circuit simulators are used a lot for the
design of microwave circuits. The accuracy of these
simulators is limited by the fact that they do not take the
electromagnetic coupling between the components and
transmission lines that compose a circuit into account. In
this article we present a technique that enables us to take
this coupling into account without increasing the
calculation time too much.
-
Accurate Estimating Simultaneous Switching Noises by Using Application Specific Device Modeling [p. 1038]
-
L. Ding and P. Mazumder
In this paper, we study the simultaneous switching noise
problem by using an application-specific modeling method.
A simple yet accurate MOSFET model is proposed in order
to derive closed-form formulas for simultaneous switching
noise voltage waveforms. We first derive a simple formula
assuming that the inductances are the only parasitics. And
through HSPICE simulation, we show that the new formula
is more accurate than previous results based on the same
assumption. We then study the effect of the parasitic capacitances
of ground bonding wires and pads. We show
that the maximum simultaneous switching noise should be
calculated using four different formulas depending on the
value of the parasitic capacitances and the slope of the input
signal. The proposed formulas, modeling both parasitic
inductances and capacitances, are within 3% of HSPICE
simulation results.
-
Macromodeling of Digital I/O Ports for System EMC Assessment [p. 1044]
-
I. Stievano, F. Canavero, I. Maio, Z. Chen, D. Becker, and G. Katopis
This paper addresses the development of accurate and efficient
behavioral models of digital integrated circuit input
and output ports for EMC and signal integrity simulations.
A practical modeling process is proposed and applied to
some example devices. The modeling process is simple and
efficient, and it yields models performing at a very high accuracy
level.
Organizer: I. Moussa, TNI-Valiosys, FR
Moderator: R. Pacalet, ENST Paris, FR
Panellists: J. Blasquez, Texas Instruments, Villeneuve-Loubet, FR; M. van Hulst, Philips, Eindhoven, NL;
A. Fedeli, STMicroelectronics, Agrate, IT; J. Lambert, TNI-Valiosys, FR; D. Borrione, TIMA-UJF, FR;
C. Hanoch, Verisity, FR; P. Bricaud, Mentor Graphics, FR
-
Formal Verification Techniques: Industrial Status and Perspectives [p. 1050]
-
Research in applied formal verification has become a hot topic in circuit and system design due
to rising circuit complexity. Design verification presents the biggest bottleneck in digital hardware
design. Major hardware bugs found in ASIC design may cause expensive project delays when they are
discovered during system test on the real silicon chip. The consequences are severe, from cost overruns to
lost market opportunity. Simulation and emulation tools, which are traditionally used to find bugs in a
design, often cannot find the corner cases or hard-to-find bugs that may occur only after hundreds of
thousands of cycles, and are well beyond the reach of conventional simulation and emulation
technologies. Formal methods have emerged as an alternative approach to ensure the quality and
correctness of hardware designs, overcoming some of the limitations of traditional validation techniques
such as simulation and testing.
But, the use of formal methods in the industry is still quite limited, due to the difficulty of use of
many formal methods available nowadays and the lack of integration between them. In order to provide
insight into the scope and limitations of currently available formal verification techniques, this panel will
address questions such as the following:
ASIC's have been designed for more than twenty years without formal methods. Are formal methods
really necessary? How can the research community convince designers to use formal methods? Is it easy
to integrate them into traditional design flow?
Not all domains seem suitable for formal methods. Is it possible to isolate those application domains that
are best suited for formal methods?
Formal verification requires specially trained people who understand how to apply the mathematical
techniques to verify the design. Is there a re-education requirement for the design community in order to
benefit from these tools?
The panel will also examine the use of formal verification in the design of SOC's. The questions
here are: can formal methods be very effective for finding errors at high levels of abstraction before a
large design time is invested in implementing a flawed system architecture? are verification tools ready
for System-on-Chip design verification? are they mature enough to give IP credibility and robustness?
Moderators: W. Fornaciari, Politecnico di Milano, IT; L. Lavagno, Politecnico di Torino, IT
-
Low Power Embedded Software Optimization Using Symbolic Algebra [p. 1052]
-
A. Peymandoust, T. Simunic, and G. De Micheli
The market demand for portable multimedia
applications has exploded in the recent years.
Unfortunately, for such applications current compilers and
software optimization methods often require designers to
do part of the optimization manually. Specifically, the
high-level arithmetic optimizations and the use of complex
instructions are left to the designers' ingenuity. In this
paper, we present a tool flow, SymSoft, that automates the
optimization of power-intensive algorithmic constructs
using symbolic algebra techniques combined with energy
profiling. SymSoft is used to optimize and tune the
algorithmic level description of an MPEG Layer III (MP3)
audio decoder for the SmartBadge [2] portable embedded
system. We show that our tool lowers the number of
instructions and memory accesses and thus lowers the
system power consumption. The optimized MP3 audio
decoder software meets real-time constraints on the
SmartBadge system with low energy consumption.
Furthermore, the performance improves by a factor of 7.27
and the energy consumption decreases by a factor of 4.45
over the original executable specification.
-
An Adaptive Dictionary Encoding Scheme for SOC Data Buses [p. 1059]
-
T. Lv, W. Wolf, J. Henkel, and H. Lekatsas
As bus lengths on multi-hundred-million transistor SOCs (Systems-On-a-Chip)
gro and as inter-wire capacitances of sub-0.10u technologies increase,
the resulting high switching capacitances of buses (and interconnects in
general) have a non-negligible impact on the power consumption of a whole
SOC. In this paper, we address this problem by introducing our bus encoding
technique 'ADES' that minimizes the power consumption of data buses through
a dictionary-based encoding technique. We show that our technique saves
between 18% and 40% of bus energy compared to the non-encoded cases using
a large set of (freely-accessible) real-world applications. Furthermore, we
compare our technique to the best-known data bus encoding techniques to date
and it exceeds all of them in energy savings for the same set of applications.
The additional hardware effort for our bus en/decoder is thereby very small.
-
Power Efficient Embedded Processor IP's through Application-Specific Tag Compression in Data Caches [p. 1065]
-
P. Petrov and A. Orailoglu
In this paper, we present a methodology for power minimization
by data cache tag compression. The set of tags being accessed by
the major application loops is analyzed statically during compile
time and an efficient and optimal compression scheme is proposed.
Only a very limited number of tag bits are stored in the tag array for
cache conflict identification, thus achieving a significant reduction
in the number of active bitlines, sense amps, and comparator cells.
The underlying hardware support for dynamically compressing the
tags consists of a highly cost and power efficient programmable
encoder, which lies outside the cache access path, thus not affecting
the processor cycle time. A detailed VLSI implementation has
been performed and a number of experimental results on a set of
embedded applications and numerical kernels is reported. Energy
dissipation decreases of up to 95% can be observed for the tag arrays,
while significant energy reductions in the range of 10%-50%
are observed when amortized across the overall cache subsystem.
-
Systematic Power-Performance Trade-Off in MPEG-4 by Means of Selective Function
Inlining Steered by Address Optimization Opportunities [p. 1072]
-
M. Palkovic, M. Miranda, and F. Catthoor
The hierarchical structure of real-life data dominated
applications limits the exploration space for high level optimisations.
This limitation is often overcome by function
inlining. However, it increases the basic block code
size, which causes a significant growth of instruction cache
misses and thus performance slow-down. This effect has
been confirmed on experiments with our applications.
We have developed a novel methodology for selective
function inlining steered by cost/gain balance to trade-off
power and performance. Although this results in a speed
up, the increase of the instruction cache misses is still
present, i.e. the memory power consumption is higher. This
implies the possibility of the Pareto-optimal trade-offs between
memory power and performance. Our methodology
is demonstrated on an MPEG-4 video decoder.
-
An Approach to Model Checking for Nonlinear Analog Systems [p. 1080]
-
W. Hartong, L. Hedrich, and E. Barke
We present the first approach to model checking for nonlinear
analog systems. Based on digital CTL model checking
ideas, results in hybrid model checking and special
needs in analog verification, a new model checking tool has
been implemented.
Published model checking tools for hybrid systems require
discrete or partly linear system descriptions. Our focus
is on nonlinear analog behavior, therefore a new approach
is necessary. There are mainly two aspects to be
considered. Firstly, a discrete model retaining the essential
nonlinear analog behavior has to be developed. Secondly,
model checking for analog systems requires extensions of
the language to define analog system properties in a reasonable
way.
-
Speeding up SAT for EDA [p. 1081]
-
S. Pilarski and G. Hu
This paper presents performance results for a new
SAT solver designed specifically for EDA applications.
The new solver significantly outperforms
most efficient SAT solvers -- Chaff[2], SATO[3],
and GRASP[1] -- on a large set of benchmarks.
Performance improvements for standard benchmark
groups vary from 1.5x to 60x. They were achieved
through a new decision-making strategy and more
efficient boolean constraint propagation (BCP).
-
Search-Based SAT Using Zero-Suppressed BDDs [p. 1082]
-
F. Aloul, M. Mneimneh, and K. Sakallah
We introduce a new approach to Boolean satisfiability
(SAT) that combines backtrack search techniques and
zero-suppressed binary decision diagrams (ZBDDs). This
approach implicitly represents SAT instances using
ZBDDs, and performs search using an efficient implementation
of unit propagation on the ZBDD structure. The
adaptation of backtrack search algorithms to such an
implicit representation allows for a potential exponential
increase in the size of problems that can be handled.
-
An Encoding Technique for Low Power CMOS Implementations of Controllers [p. 1083]
-
M. Martínez, M. Avedillo, J. Quintana, M. Koegst, S. Rülke, and H. Süße
Power consumption is becoming one of the most
critical parameters in VLSI design. In this paper we
describe a novel state assignment algorithm targeting
towards low power CMOS realizations of controllers. The
main features of the new approach can be summarized as
follows: 1) flexible column encoding strategy which
allows handling the area and the register activity cost
functions separately and 2) preliminary analysis of the
FSM to control relative weight of each cost function.
Experimental results show that on average there is a 25%
reduction in power consumption compared to an standard
tool and without area penalty.
-
Composition Trees in Finding Best Variable Orderings for ROBDDs [p. 1084]
-
E. Dubrova
The algorithms for static reordering of Reduced Ordered
Binary Decision Diagrams (ROBDDs) rely on dependable
properties for grouping of variables. Two such properties
have been studied so far: keeping symmetric variables adjacent
[1] and minimizing the ROBDD's width [2]. However,
counterexamples have been found for the both cases [1], [3].
In this paper, we introduce a new condition for grouping
of variables, suggesting to keep adjacent the variables from
all bound sets of the function which are explicitly given by
its composition tree. Bound set is a proper subset Y of
the variables X of a function f : {0,1}|X|
-> {0,1} resulting
in the decomposition of type f(X) =g(h(Y),Z),
where Z = X − Y. Composition tree of is a
structure reflecting all its non-overlapping bound sets [4]-
[6]. Bound-set-preserving ordering (X) of the variables
of a ROBDD for f(X) is a vector, describing the variables
of X in order from top to bottom of the ROBDD,
in which the variables of any node of T(f) are adjacent
in (X). For example, if a function f(x1, x2,
x3) has
a single non-trivial bound set {x1, x2}, then the
orderings (x1, x2, x3), (x3,
x1, x2), (x3, x2, x1)
are bound-set-preserving ones, while the orderings (x1,
x3, x2) and (x2, x3, x1)
are not. A composition tree T(f) is unique
for f (up to isotopy) and therefore any Boolean function
has a unique bound-set-preserving ordering. We prove that
the intersection of the set of bound-set-preserving orderings
and the set of best orderings in non-empty for any Boolean
function:
-
A Direct Mapping System for Datapath Module and FSM Implementation into LUT-Based FPGAs [p. 1085]
-
J. Abke and E. Barke
Today's high capacity Field-Programmable Gate Arrays
(FPGAs) and the upcoming trend to System-On-Programmable-Chip
(SOPC) require novel implementation
strategies. These have to overcome long implementation
times of traditional synthesis approaches. In this poster,
a unique approach for technology mapping of both datapath
modules and controller descriptions into Look-Up Table
(LUT)-based FPGAs is presented. The proposed method
starts at Register-Transfer-Level (RTL) and follows the Library
of Parameterized Modules (LPM) standard. The mapping
environment includes an implicit state minimization algorithm
for FSMs.
-
Concurrent and Selective Logic Extraction with Timing Consideration [p. 1086]
-
P. Rezvani and M. Pedram
We study the problem of concurrent and selective logic extraction in a Boolean circuit. We first model the
problem using graph theory, prove it to be NP-hard, and subsequently formulate it as a Maximum-Weight Independent
Set problem in a graph. We then use efficient heuristics for solving the MWIS problem. Concurrent logic extraction
not only allows us to achieve larger literal saving and smaller area due to a more global view of the extraction
space, but also provides us with a framework for reducing the circuit delay.
-
Improved Technology Mapping for PAL-Based Devices Using a New Approach to
Multi-Output Boolean Functions [p. 1087]
-
D. Kania
The effective technology mapping for PAL-based
devices is presented in this paper. The aim of this method
is to cover a multiple-output function by a minimal number
of PAL-based logic blocks. The product terms included in
a logic block can be shared by several functions.
Experimental results are compared to the classical
technology mapping method.
-
Efficient and Effective Redundancy Removal for Million-Gate Circuits [p. 1088]
-
M. Berkelaar and K. van Eijk
Redundancy removal of combinational circuits has been
the subject of many papers over the last decades. Most of
these papers work with the relatively small circuits available
as benchmarks in the logic synthesis community. In
Magma's BlastFusion and BlastChip software, very large
blocks of logic (millions of gates) are handled flat (Blast-Fusion
and BlastChip are registered trademarks of Magma
Design Automation). We implemented redundancy removal
in a way that will allow it to run efficiently (fast, low memory
usage) and robustly (no run time or memory explosion
on any netlist) on industrial designs of up to several million
gates. We achieve this without resorting to partitioning.
Other than most published approaches we do not try to
identify all redundancies in a circuit, as an exact solution
to this NP-hard problem is infeasible for the large circuits
we face. Instead we try to identify as many as possible in a
reasonable run time.
We use a carefully engineered combination of Fault Collapsing,
Random Test Generation (RTG) and the good old
D-algorithm. As the goal is finding redundancies, and not
sets of test vectors, these algorithms need changes and adaptations
for optimal efficiency and robustness. Fault Collapsing
can be more aggressive than for test generation. RTG
was implemented with a novel dynamic control of the bitparallelism
employed. The D-algorithm's effort control was
not implemented with a traditional backtrack limit, but on
a more fine-grain level, to increase robustness. For details,
please refer to [1].
Results on 11 industrial netlists are shown in table 1. All
tests were run on a Sun Ultra-80 workstation. A comparison
is shown to a state-of-the-art SAT-based approach. Our
approach is clearly faster while identifying more redundancies.
-
Visualization of Partial Order Models in VLSI Design Flow [p. 1089]
-
A. Bystrov, M. Koutny, and A. Yakovlev
A new method, algorithms and tool for the
visualisation of a finite complete prefix (FCP) of a Petri net
(PN) or a signal transition graph are presented. A transformation
is defined that converts such a prefix into a two-level
model. At the top level, it has a finite state machine
(FSM), describing modes of operation and transitions between
them. At the low level, there are marked graphs,
which can be drawn as waveforms, embedded into the top
level nodes. The models of both levels are abstractions
traditionally used by electronics engineers. The resultant
model is completed trace equivalent to the original prefix.
Moreover, the branching structure of the latter is preserved
as much as possible.
-
High-Level Modeling and Design of Asynchronous Arbiters for On-Chip Communication Systems [p. 1090]
-
J. Rigaud, L. Fesquet, M. Renaudin, and J. Quartana
This poster presents the design of complex
arbitration modules, like those required in SoC
communication systems. Clock-less, delay-insensitive
arbiters are studied in the perspective of making easier
and more practical the design of future GALS or GALA
SoCs. This work focuses on high-level modeling and
delay-insensitive implementations of low-power and reliable
fixed and dynamic priority arbiters.
-
Power-Efficient Trace Caches [p. 1091]
-
J. Hu, N. Vijaykrishnan, M. Kandemir, and M. Irwin
The paper exploits the drawbacks of wasting power when accessing
the instruction cache that stores only static sequence of instructions.
Although trace cache is first introduced to catch the dynamic characteristics
of instructions in execution, conventional trace cache (CTC) does increase
the power consumption in fetch unit. A Sequential Trace Cache (STC) has
been investigated for its power efficiency in this paper.
-
Reducing Cache Access Energy in Array-Intensive Applications [p. 1092]
-
M. Kandemir and I. Kolcu
Cache memories are known to consume a large percentage
of on-chip energy in current microprocessors. For example,
[1] reports that the on-chip cache in DEC Alpha
21264 consumes approximately 25% of the on-chip energy.
Both sizes and complexities of state-of-the-art caches play
a major role in their energy consumption. Direct-mapped
caches are, in general, more energy efficient (from a per access
energy consumption viewpoint) as they are simpler as
compared to set-associative caches, and require no complex
line replacement mechanisms (i.e., there is no decision concerning
which line has to be evicted when a new line is to
be loaded).
While there exists a large body of compiler-based techniques
to manipulate access pattern of a given code to improve
its cache utilization, there are not many compiler
techniques that try to improve cache energy consumption
of a given code. Rather, in many cases, a reliance is placed
upon the observation that optimizing cache locality also optimizes
cache energy. This is true to some extent as optimizing
locality (performance) of memory accesses reduces
the activity between cache and off-chip memory, and consequently,
decreases the number of writes into cache. Recent
work (e.g., [2]) also shows that the classical performance-oriented
compiler optimizations (e.g., loop-level transformations)
can be very effective in reducing overall memory
system energy.
-
The Use of Runtime Configuration Capabilities for Networked Embedded Systems [p. 1093]
-
C. Nitsch and U. Kebschull
Reconfiguration is a very helpful feature that can improve
the design life cycle of an embedded system and its
quality. Reconfiguration means that software AND hardware
parts may be updated in the field. The update of system
hardware implies the use of FPGAs in a shipped
system. Normally, the update is done server controlled,
which means that the active role comes from an external instance.
We present a new automatic reconfiguration approach
that stores all system configuration data in XML
format. The system itself searches for the related components
a component broker, and sets up during start up. A
case study shows that especially when dealing with permanently
connected devices, we achieve promising results
while spending a reasonable price.
-
A SAT Solver Using Software and Reconfigurable Hardware [p. 1094]
-
I. Skliarova and A. Ferrari
In this paper we propose a novel approach for
solving the Boolean satisfiability problem by combining
software and reconfigurable hardware. The suggested
technique avoids instance-specific hardware
compilation and, as a result, achieves a higher
performance than pure software approaches. Moreover,
it permits problems that exceed the resources of the
available reconfigurable hardware to be solved.
-
A New Time Model for the Specification, Design, Validation and Synthesis of
Embedded Real-Time Systems [p. 1095]
-
R. Münzenberger, M. Dörfel, F. Slomka, and R. Hofmann
An essential characteristic of embedded systems is realtime,
but the commonly used specification techniques do not
consider temporal aspects in general like fulfillment of high
level timing requirements or dynamic reactions on timing
violations. We show a new formal time model that fills this
gap: Timing requirements specify the timing behaviour of
real-time systems. Different models allow the specification
of clock properties and the relations between clocks. With
this time model, timing requirements as well as the desired
properties of the involved clocks can be specified within a
formal description technique.
-
Improved Constraints for Multiprocessor System Scheduling [p. 1096]
-
M. Grajcar and W. Grass
MILP-based models are useful for finding optimal schedules
and for proving their optimality. Because of the problem
complexity, model improvements have to be investigated.
We analyze the constraints necessary for precluding resource
conflicts, present novel formulations, and evaluate them.
The efficiency of the solution process can be improved significantly
by selecting the proper formulation.
-
A Fast Johnson-Mobius Encoding Scheme for Fault Secure Binary Counters
-
K.S. Papadomanolakis, A.P. Kakarountas, N. Sklavos and C.E. Goutis
The major characteristic of a counting unit is its
performance. The basic properties that a fast counter must
have are: i) high counting rate, preferably independent of
the counter size, ii) a binary output; read on-the-fly, iii)
sampling rate equal to the counting rate, and iv) a regular
implementation suitable for VLSI.
For safety critical applications, the synchronous
operation of a fault-secure binary counter makes reading
the counter's value difficult and reduces the counting rate
proportionally to counter's size. In this paper an
implementation of a fault-secure binary counter using the
Johnson-Mobius encoding scheme is presented.
-
Maximizing Conditional Reuse by Pre-Synthesis Transformations [p. 1097]
-
O. Penalba, J. Mendias, and R. Hermida
The property called mutual exclusiveness, responsible for
the degree of conditional reuse achievable after a high-level
synthesis (HLS) process, is intrinsic to the systems
behavior. But sometimes it is only partially reflected in the
actual description written by a designer. Our algorithm
performs a transformation of the input description that
exploits the maximum conditional reuse of the behavior,
independently of description style, allowing the HLS tools
to obtain circuits with less area.
-
Control Circuit Templates for Asynchronous Bundled-Data Pipelines [p. 1098]
-
S. Tugsinavisut and P. Beerel
This paper proposes the use of templatized
asynchronous control circuits with single-rail datapaths to
create low-power bundled-data non-linear pipelines. First,
we adapt an existing templatized control style for 1-of-N
rail pipelines, the Pre-Charged Full Buffer PCFB [1], to
bundled-data pipelines. Then, we present a novel true 4-phase
template (T4PFB) that has lower control overhead.
Simulation results indicate 12%-44% higher throughput
for the pipeline stage equivalent to 8 to 40 gates.
-
Transforming Arbitrary Structures into Topologically Equivalent Slicing Structures [p. 1099]
-
O. Peyran and W. Zhuang
Floorplanning is an important step of IC design.
Traditionally, floorplan representation has been
segregated between slicing and non-slicing structures. We
present a heuristic that translates any arbitrary structure
into a slicing one, topologically equivalent to the initial
one after a 1-D compaction.
-
A New Formulation for SOC Floorplan Area Minimization Problem [p. 1100]
-
C. Lee, Y. Lin, W. Fu, C. Chang, and T. Hsieh
In this poster, we presented a new formulation by introducing the concept
of block partition such that the shape of modules can be automatically
determined based on the goal of optimization. Experimental results from
MCNC benchmarks indicate that the zero dead space solutions can be obtained
for most test cases under our formulation.
-
Non-Rectangular Shaping and Sizing of Soft Modules in Floorplan Design [p. 1101]
-
C. Chu and F. Young
In this paper, we study the problem of changing the
shapes and dimensions of the flexible modules to fill up the
unused area of a preliminary floorplan, while keeping the
relative positions between the modules unchanged. The selection
of modules and empty spaces is made by the users
interactively. We formulate the problem as a mathematical
program. We use the Lagrangian relaxation technique [1, 2]
to solve the problem. The formulation is in such a perfect
way that the dimensions of all the rectangular and non-rectangular
modules can be computed by closed form equations
efficiently.
-
EZ Encoding: A Class of Irredundant Low Power Codes for Data Address and
Multiplexed Address Buses [p. 1102]
-
Y. Aghaghiri, M. Pedram, and F. Fallah
In this paper, we introduce a class of irredundant
low power encoding techniques for memory address buses. For
a data address bus, the proposed encoding techniques make
use of two working zones in the memory address space,
whereas for a multiplexed data and instruction address bus, up
to four working zones can be supported. The zones are
dynamically updated to increase the saving in switching
activity. Our techniques decrease the switching activity of data
address and multiplexed address buses by an average of 55%
and 77%, respectively, up from 25% and 64% achieved by
previous methods.
-
Estimation of Power Consumption in Encoded Data Buses [p. 1103]
-
A. Garcia, L. Kabulepa, and M. Glesner
Because of the increasing importance of cross coupled
capacitances in deep submicron technologies [1], it is of
great interest to extend the existing high-level power estimation
techniques by considering the spatial correlation
between adjacent lines. This work addresses the modeling
and estimation of power dissipation in on-chip buses based
on the statistical properties of data sequences. Using the
derived models, a power estimation technique is proposed
and evaluated for various coding schemes. For different
DSP applications, our results depict less than 5 % discrepancy
with precise bit level estimations.
-
Optimization Techniques for Design of General and Feedback Linear Analog
Amplifier with Symbolic Analysis [p. 1104]
-
T. Hieu
The analysis of linear analog amplifiers at the beginning
of the design process shows in some cases an unwanted
resonance in the amplitude response or an unwanted overshooting
in the time domain. It is important for the designer
to know design methods for compensating this effect.
An approach of the symbolic analysis, that supports the
representation of a signal-flow graph with feedback for an
amplifier circuit, will be introduced. The method is based
on the node analysis and mathematical handling of symbolic
expressions. Using the proposed approach the feedback,
the open-loop gain and the loop gain can be analyzed
and calculated. With the analysis of pole-zero of the symbolic
loop gain, parameters of the amplifier can be determined
for the compensation of the amplitude response.
-
Critical Comparison among Some Analog Fault Diagnosis Procedures Based on Symbolic Techniques [p. 1105]
-
A. Luchetta, S. Manetti, and M. Piccirilli
The parametric fault diagnosis techniques hold an
important part in the field of analog fault diagnosis.
These techniques, starting from a series of measurements
carried out on a previously selected test point set, given
the circuit topology and the nominal values of the
components, are aimed at determining the effective
values of the circuit parameters by solving a set of
equations nonlinear with respect to the component values.
Here the role of symbolic techniques in the automation of
parametric fault diagnosis of analog circuits is
investigated. Being in fact the actual component values
the unknown quantities, symbolic approach results
particularly suitable for the automation of parametric
fault diagnosis techniques, as shown, for example, in [1].
Obviously all this is applicable to linear analog circuits or
to nonlinear circuits suitably linearized. On the other
hand, present trend is moving as much as possible to
techniques of design that lead to linear analog circuits, so
this is not a so serious restriction [2].
-
The Selective Pull-Up (SP) Noise Immunity Scheme for Dynamic Circuits [p. 1106]
-
M. Stan and A. Panigrahi
Noise is an important consideration in the design of integrated
circuits. Increased immunity to noise, however, typically
comes at the expense of increased delay. So, it is
very important to have an adequate noise immunity with a
minimum penalty in performance. "Global" noise immunity
schemes can be used when the noise is approximately
the same on all nodes in the circuit; but when a few nodes
are noisier then others much better results can be obtained
by selective noise immunity schemes.
The Selective Pull-up (SP) technique for dynamic circuits
is a method for improving the noise immunity of inputs
selectively, so that the least penalty in delay is paid for
inputs that intrinsically have higher noise immunity.
-
Substrate Parasitic Extraction for RF Integrated Circuits [p. 1107]
-
A. Cathelin, D. Saias, D. Belot, Y. Leclercq, and F. Clement
Accurately predicting the impact of substrate parasitics in
Radio Frequency design with simulations is one of the
major concerns to ensure first silicon success in a System on
Chip approach.
The practical design experience of a 2GHz RF front-end
circuit (designed in a 0.35 mm SiGe Bicmos technology),
presented here, illustrates how measurements results can
be accurately predicted using a substrate parasitic
extractor.
-
A Complete Phase-Locked Loop Power Consumption Model [p. 1108]
-
D. Duarte, N. Vijaykrishnan, and M. Irwin
A PLL power model that accurately estimates the power
consumption during both lock and acquisition states is
presented. The model is within 5% of circuit level
simulation (SPICE) values. No significant power
overhead (+/- 5% of the power consumed at the final
frequency) is incurred during the acquisition process.
-
Statistical Timing Driven Partitioning for VLSI Circuits [p. 1109]
-
C. Ababei and K. Bazargan
In this poster we present statistical-timing
driven partitioning for performance optimization. We
show that by using the concept of node criticality we can
enhance the Fiduccia-Mattheyses (FM) partitioning
algorithm to achieve, on average, around 20%
improvements in terms of timing, among partitions with
the same cut size. By incorporating mechanisms for
timing optimization at the partitioning level, we facilitate
wire-planning at high levels of the design process.
-
DAISY-CT: A High-Level Simulation Tool for Continuous-Time DeltaSigma Modulators [p. 1110]
-
K. Francken, M. Vogels, E. Martens, and G. Gielen
To reduce the long circuit-level simulation time of 16
modulators, a variety of techniques and tools exist that use
high-level models for discrete-time (DT) 16 modulators.
There is, however, no rigorous methodology implemented in
a tool for the continuous-time (CT) counterpart. Therefore,
we have developed a methodology for the high-level simulation
of CT 16 modulators and implemented this method
in a user-friendly tool. Key features are the simulation
speed, accuracy and extensibility. Non idealities such as finite
gain, finite GBW, output impedance and also the important
effect of jitter are modelled. Finally, experiments
were carried out using the tool, exploring important design
trade-offs.
-
Automated Optimal Design of Switched-Capacitor Filters [p. 1111]
-
A. Hassibi and M. Hershenson
We present a method for automated design of
CMOS switched-capacitor filters (SCFs) from user-defined
top-level specifications to component sizes and
physical layout. In other words, we present a complete
top-down design ow for SCFs.
The method is based on careful analysis and modeling
of the SCF using analog circuit design and system
engineering expertise, formulating design constraints
in a special convex form, and numerical optimization
(geometric programming).
-
On-Chip Inductance Models: 3D or Not 3D? [p. 1112]
-
T. Lin, M. Beattie, and L. Pileggi
Full 3D lumped partial inductance models usually contain
a tremendous amount of forward coupling terms. To reduce
the complexity of simulation and analysis, a simplified
model that excludes the forward coupling terms is often
adopted in practice [3][4]. This paper addresses the question
whether ignoring forward couplings is always an acceptable
choice or if full 3D models are necessary in certain
cases. We show that the significance of the forward coupling
inductance depends on various aspects of the design.
-
Simple and Efficient Approach for Shunt Admittance Parameters Calculations of
VLSI On-Chip Interconnects on Semiconducting Substrate [p. 1113]
-
H. Ymeri, B. Nauwelaers, K. Maex, D. De Roest, M. Stucchi, and S. Vandenbergheo
The purpose of this paper is a slight modification of a recently
proposed series expansion method [1, 2], developed for the
electrical modeling of lossy-coupled multilayer interconnection
lines, that does not involve iterations and yields solutions of
sufficient accuracy for most practical interconnections as used in
common VLSI chips. We use here a Fourier series restricted to
cosine functions. The solution for the layered medium is found by
matching the potential expressions in the different homogeneous
layers with the help of boundary conditions. In the plane of
conductors, the boundary conditions are satisfied only at a finite,
discrete set of points (point matching procedure).
-
Compact Macromodel for Lossy Coupled Transmission Lines [p. 1114]
-
R. Khazaka and M. Nakhla
This paper describes a systematic algorithm for obtaining passive
time domain reduced order transmission line macromodels.
The proposed algorithm makes use of a new order reduction
technique that removes the redundant poles obtained using conventional
order reduction methods. The reduced macromodel is
passive by construction.
-
An EMC-Compliant Design Method of High-Density Integrated Circuits [p. 1115]
-
J. Levant and M. Ramdani
This paper deals with an innovative method of EMC-compliant
design. This technique helps to optimize
emission level as soon as in the design phase, and
provides noise-related solutions which will be
evaluated and integrated into the silicon.
This method allows to model the activity of
thousand-gate circuits thanks to only two current
generators which represent supply current
consumption in the VDD and the VSS rails.
This allows EMC evaluation and optimization
(conducted noise) for a packaged integrated circuit
within its electrical environment.
-
Finding a Common Fault Response for Diagnosis during Silicon Debug [p. 1116]
-
I. Pomeranz, J. Rajski, and S. Reddy
When a design is manufactured for the first time, it may suffer
from timing-related errors that result from inaccuracies in the
timing analysis tool used during the design process. Such errors
will appear as delay faults in all (or many) of the manufactured
chips. In addition, variations that occur during the manufacturing
process may cause delay defects that vary across chips. It
necessary to diagnose and correct failures of the first type (in the
presence of failures of the second type) before the chip can be
manufactured again. This may have to be repeated until design
errors are eliminated.
-
IDDT Testing of Embedded CMOS SRAMs [p. 1117]
-
S. Kumar, R. Makki, and D. Binkley
This paper presents an iDDT test method for embedded
CMOS SRAMs. A total of 192 faults were inserted
and simulated using parameters from a 0.35 um
process. The SRAM model includes realistic effects
such as wire bonding inductance and resistance
parameters as well as bypass capacitance. A sensor is
introduced and incorporated into the SRAM cell
array to detect abnormal iDDT switching. Figure 1
shows a 1-bit SRAM organized into 64 128 x 128
cell blocks
with an iDDT sensor monitoring each cell block. The
SRAM model includes the following parameters:
On-chip wire bonding inductance of 2 nH
On-chip wire bond resistance of .01 Ohms
On-chip bypass capacitance of 1 pF
Bitline capacitance of 3 pF
Power line capacitance of 40 pF
The results of the fault simulations comparing voltage,
IDDQ and iDDT test methods are given in Table 1.
-
Fault Detection and Diagnosis Using Wavelet Based Transient Current Analysis
[p. 1118]
-
S. Bhunia and K. Roy
We present a novel integrated method for fault detection
and localization using wavelet transform of transient
current (IDD) waveform. The time-frequency resolution
property of wavelet helps us detect as well as
localize faults in digital CMOS circuits. Experiments
performed on an 8-bit ALU show promising results for
both detection and localization.
-
An Efficient Test and Diagnosis Scheme for the Feedback Type of Analog Circuits
with Minimal Added Circuits [p. 1119]
-
J. Lin, C. Lee, and J. Chen
This paper presents a test and diagnosis scheme for feedback type of linear analog circuits with minimal added
circuits. For testing, the scheme transforms the circuit-under-test (CUT) into an oscillation circuit by (1) increasing
the loop gain of the circuit, and/or (2) reconfiguring the circuit through selectively powering-off operational
amplifiers (OP) of the circuit. This eliminates the need of added global paths as in the conventional oscillation test
scheme. For diagnosis, the scheme transforms the circuit into a Schmitt trigger type of circuit with a positive feedback.
The output of the circuit under an applied triangular input gives signatures which are used to identify
faults. Benchmark circuits have been applied with this scheme and results show that it is very effective for testing
and diagnosing the feedback type of linear analog circuit.
-
On the Use of an Oscillation-Based Test Methodology for CMOS
Micro-Electro-Mechanical Systems [p. 1120]
-
V. Beroulle, Y. Bertrand, L. Latorre, and P. Nouet
This paper introduces the use of the oscillation test
technique for MEMS testing. This well-known test
technique is here adapted to MEMS. Its efficiency is
evaluated based on a case study: A CMOS electromechanical
magnetometer.
-
Directed-Binary Search in Logic BIST Diagnostics [p. 1121]
-
R. Kapur, T. Williams, M. Mercer
Logic BIST is about to become a more main
stream test method for IC testing. In some flows
when a failure is encountered the IC is diagnosed
to determine the cause of the failure. Diagnosing
fails in Logic BIST is significantly different from
that in a stored pattern test methodology. The first
step is to determine the failing pattern or interval
among the many patterns that were applied. Today
this involves a binary search of the tests that were
applied with Logic BIST. In this paper we improve
on this binary search strategy to reduce the time
taken to isolate the failing patterns by orders of
magnitude.
-
An Evolutionary Approach to the Design of On-Chip Pseudorandom Test Pattern Generators [p. 1122]
-
M. Favalli and M. Dalpasso
Weighted pseudorandom test generation (WPRTG) uses test sequences
characterized by non-uniform distributions of test vectors
in order to increase the detection probability of random resistant
faults. Such non-uniform distributions are characterized by the values
of signal probability of the CUT inputs (weights). Since different
faults may require different distributions, a (small) number of
distributions is typically used [1]. The weights of such distributions
are identified by analyzing the CUT. The corresponding pseudorandom
sequences are typically obtained by inserting a combinational
network between the TPG and the CUT.
Several different methodologies have been proposed in order to
calculate the weights. Some approaches make use of deterministic
test sequences [2]. Another class of heuristics, instead, makes
use of numerical optimization strategies to determine the set(s) of
weights [1]. More recently, genetic algorithms have been identified
to provide a good solution to weights selection [3]. All such methods
evaluate only the first order coefficients of the distribution(s)
and may suffer from a few problems. In particular, the detection of
some random resistant faults may strongly depend on signal correlations.
Even if the effects of signal correlations can be reduced,
some problems are still in order. Consider, for instance, a fault that
can be detected by a test vector and its complement. Any WPRTG
method using signal probability evaluation would provide (when
targeting such a fault) the same coefficients of a uniform distribution.
-
Fault Isolation Using Tests for Non-Isolated Blocks [p. 1123]
-
I. Pomeranz and Y. Zorian
Design methodologies for large designs produce circuits that
consist of interconnections of functional blocks. If the blocks
are large, as in core-based designs, they may be isolated for testing
purposes (e.g., by test wrappers) such that different blocks
can be tested independently. However, even if a test wrapper
exists, it is advantageous to test functional paths that go through
two or more blocks by using test vectors that propagate fault
effects through several blocks. This contributes to testing of
defects that cannot be detected if each block is tested separately.
One of the issues that arises when several blocks are tested by
the same test is that of fault isolation. If a test that propagates
fault effects through blocks C1 and C2 produces a faulty
response on the outputs of C2, the goal of fault isolation is to
identify which one of C1 and C2 is faulty. Fault isolation is perfect
if every faulty response on the outputs of the circuit can be
uniquely attributed to a single block. This happens when every
pair of faults belonging to different blocks is distinguishable. If
faults of different blocks remain indistinguished, fault isolation
is not possible when responses equal to the responses produced
by these faults are produced by the circuit-under-test.
It may appear that tests for several non-isolated blocks
will not be able to isolate faults. In this work, we study this issue
and demonstrate that perfect or close-to-perfect fault isolation is
possible with tests that propagate fault effects through several
blocks.
-
A Heuristic for Test Scheduling at System Level [p. 1124]
-
M. Flottes, J. Pouget, and B. Rouzeyre
This paper considers the test-scheduling problem of a
SoC. The proposed approach is based on a "sessionless"
test scheme. It minimizes the system test time while
respecting a power dissipation limit and test resource
sharing constraints. Experimental results show that our
approach outperforms other related test scheduling
solutions.
-
Formulation of SOC Test Scheduling as a Network Transportation Problem [p. 1125]
-
S. Koranne and V. Choudhary
Reusability of tests is crucial for reducing total design time.
This raises the problem of test knowledge transfer, physical
test application and test scheduling. We present a formulation
of the embedded core-based system-on-chip (SOC)
test scheduling problem (ECTSP) as a network transportation
problem. The problem is NP-hard and we present a
O(mn(m+2n)) 2-approximation algorithm using the result
of the single source unsplittable flow problem. We describe
the single source unsplittable flow problem (UFP) as given
in [1]; let G = (V,E) be a capacitated directed graph with
edge capacities c : E -> R+, a source s and k commodities
with terminals ti and demands di is a member of R+,
1 <= i >= k. A
vertex may contain a number of terminals. For each i, we
would like to route di units of commodity i along a single
path from s to the corresponding terminal so that the total
flow through an edge e is at most its capacity c(e).
-
A Novel Methodology for the Concurrent Test of Partial and Dynamically
Reconfigurable SRAM-Based FPGAs [p. 1126]
-
M. Gericota, G. Alves, M. Silva, and J. Ferreira
This poster presents the first truly non-intrusive
structural concurrent test approach, aimed to test partial
and dynamically reconfigurable SRAM-based FPGAs
without disturbing its operation. This is accomplished by
using a new methodology to carry out the replication of
active Configurable Logic Blocks (CLBs), i.e. CLBs that
are part of an implemented function that is actually being
used by the system, releasing it to be tested in a way that is
completely transparent to the system.
-
Efficient On-Line Testing Method for a Floating-Point Iterative Array
Divider [p. 1127]
-
A. Drozd, M. Lobachev, and J. Drozd
This work is a part of researches directed to checking
methods development for approximate calculations
executed in floating-point circuits in a mantissa part. A
problem of the truncated non-restoring division residue
checking is solved. It provides an efficient implementation
of truncated division reduced almost twice hardware
amount and time in iterative array divider.
-
An Instruction-Level Methodology for Power Estimation and Optimization of
Embedded VLIW Cores [p. 1128]
-
A. Bona, M. Sami, D. Sciuto, V. Zaccaria, C. Silvano, and R. Zafalon
The overall goal of this work is to define an instruction-level
power macro-modeling and characterization methodology
for VLIW embedded processor cores. The approach
presented in this paper is a major extension of the work
previously proposed in [1-3], targeting an instruction-level
energy model to evaluate the energy consumption associated
with a program execution on a pipelined VLIW core.
Our first goal is the reduction of the complexity of the
processor's energy model, without reducing the accuracy of
the results. The second goal is to show how the energy
model can be further simplified by introducing a methodology
to automatically cluster the whole Instruction Set with
respect to their average energy cost, in order to converge
to an highly effective design of experiments for the actual
characterization task. The paper describes also the application
of the proposed model to a real industrial VLIW core
(the Lx Architecture developed by HP Labs and STMicroelectronics),
to validate the effectiveness and accuracy of the
proposed methodology.
-
The Fraunhofer Knowledge Network (FKN) for Training in Critical Design
Disciplines [p. 1129]
-
A. Sauer, G. Elst, L. Krahn, and W. John
For the application of new technologies with ever
shorter lifecycles, the availability of the most recent
knowledge is mandatory. The intervals within which
acquired knowledge bases therefore have to be updated,
become shorter and shorter. It is well known that software
development tools and systems are getting more and more
sophisticated, and the learning expenditure for the
personnel is growing accordingly. This tendency affects
major parts of the electrical and electronics industry where
demand for qualified workforce already manifests itself in
the `designer crisis'. The combined effects of the
increased functionality of new tool generations, the
change of application areas of relevant methods due to
technological progress and the improvement of the
information exchange facilities lead to increased
requirements with respect to further professional training.
The microelectronic industry and related business
sectors are extremely innovative and knowledge based.
Students, engineers, scientists and others need to develop,
transfer and share knowledge. The above mentioned
knowledge processes and knowledge flow from
researchers and universities to industry and vice versa
need to be strengthened to ensure a leading edge position
for European companies and institutes in this market.
-
Comparative Analysis and Application of Data Repository Infrastructure for
Collaboration-Enabled Distributed Design Environments [p. 1130]
-
L. Indrusiak, M. Glesner, and R. Reis
A collaborative design system depends heavily on the chosen
collaboration methodology, as well as on its technological infrastructure.
This paper presents three data repository technologies
and discusses their pros and cons on the role of supporting a
collaborative design system.
-
FlexBench: Reuse of Verification IP to Increase Productivity [p. 1131]
-
S. Stöhr, M. Simmons, and J. Geishauser
This paper presents FlexBench, which is a complete
framework for SoC verification at the Module
and SoC level, both with and without embedded
processors. The focus is to increase the productivity
of the verification engineer by providing a framework
to reuse verification IP, which includes parts
of the testbench and the test stimulus.
-
Mappability Estimation of Architecture and Algorithm [p. 1132]
-
J. Soininen, J. Kreku, and Y. Qu
Method for the selection of processor core and
algorithm combinations for system on chip designs is
presented. The method uses a mappability concept that is
an addition to performance and cost metrics used in
codesign. The mappability estimation is based on the
analysis of the correlations of algorithm and core
characteristics. The method is demonstrated with an
analysis tool and the experimental results with DSP cores
and algorithms are similar to expectations.
-
Behavioural Modelling of Operational Amplifier Faults Using VHDL-AMS [p. 1133]
-
P. Wilson, J. Ross, M. Zwolinski, A. Brown, and Y. Kiliç
The use of behavioural modelling for operational
amplifiers has been well known for many years and
previous work has included modelling of specific fault
conditions using a macro-model. In this paper, the models
are implemented in a more abstract form using an
Analogue Hardware Description Language (AHDL),
VHDL-AMS, taking advantage of the ability to control the
behaviour of the model using high-level fault condition
states. The implementation method allows a range of fault
conditions to be integrated without switching to a
completely new model. The various transistor faults are
categorised, and used to characterise the behaviour of the
HDL models. Simulations compare the accuracy and
speed of the transistor and behavioural level models
under a set of representative fault conditions.
-
A Parallel LCC Simulation System [p. 1134]
-
K. Hering
Cycle-based simulation at RT- and gate level realized by
a Levelized Compiled Code (LCC) technique represents a
well established method for functional verification in processor
design. We present a parallel LCC simulation system
developed to run on loosely-coupled processor systems
allowing significant simulation acceleration. It comprises
three parallel simulators and a complex model partitioning
environment. A key idea of our approach is to valuate
circuit model partitions with respect to the expected parallel
simulation run-time and to integrate corresponding cost
functions into partitioning algorithms. Experimental results
are given with respect to IBM processor models of different
size.
-
Error Simulation Based on the SystemC Design Description Language [p. 1135]
-
F. Bruschi, M. Chiamenti, F. Ferrandi, and D. Sciuto
The combined effects of devices increased complexity
and reduced design cycle time creates a testing
problem: an increasing larger portion of the design
time is devoted to testing and verification. Today EDA
tools, moving towards higher levels of abstraction,
promise greater designer productivity, resulting in
increased design complexity and size.
In order to reduce the testing and verification time,
different high-level approaches have been proposed in
literature [2]. Most of these approaches are based on
the definition of an error or fault model, applicable at a
higher level of abstraction of the description of the
system to be implemented.
In this paper we concentrate our attention on the
evaluation of error models, used in test generation and
in functional verification. Evaluation of error models
is also an important aspect when fault injection
methodologies are used to evaluate the dependability
of complex system.
The ideas proposed by this work try to solve this
evaluation and analysis problem starting from the
following requirements:
• the error simulation task should be based only
on the original hardware description language
primitives;
• the flow from the given specification to the
fault simulation should be as automatic as
possible;
-
Towards a Kernel Language for Heterogeneous Computing [p. 1136]
-
D. Björklund and J. Lilius
What is characteristic of modern embedded systems like
mobile phones, multimedia terminals, etc. is that their design
requires several different description techniques: The
radio-frequency part of a mobile phone is designed using
analog techniques, the signal processing part can be described
using synchronous data-flow, while the protocol
stack uses an extended finite state machine based description
model. This heterogeneity poses a challenge to embedded
system design methodologies, and has resulted in
a search for a System Level Design Language (SLDL) for
describing both software and hardware.
We believe that to obtain a good SLDL one needs to first
understand what the combination of models of computation
means. To this end we are developing a kernel language in
which it is possible to use different models of computation.
The main contributions of this work are: (1) a common set
of concepts that form the basis of the kernel language, (2)
a formally defined operational semantics, which also makes
it possible to verify designs using e.g. model-checking, (3)
the explicit use of atomicity and, (4) the introduction of the
notion of execution policy.
-
Top-Down System Level Design Methodology Using SpecC, VCC and SystemC [p. 1137]
-
L. Cai, D. Gajski, P. Kritzinger, and M. Olivares
There appears to be an increasing trend towards the use
of the C/C++ language as a basis for the next generation
modeling tools and platform methodology to encompass
design reuse. However, even with this convergence,
industry is suffering the pain that there is no one tool or a
complete tool flow methodology that can implement a top-down
design methodology from C to silicon .
In this paper we suggest a top-down methodology from
C to silicon. In our methodology, we focus on methods to
make the design flow smooth, efficient, and easy. The
proposed methodology is a pure top-down methodology.
We developed our design methodology by using SpecC
[1], VCC[2], and SystemC[3]. We choose SpecC, VCC
and SystemC because they are all C-related and each have
strong support in at least one field of design. Our proposal
for a methodology is based on our experiences of
attempting to model the JPEG encoder with SpecC,
SystemC and VCC, and one internal project, attempting to
implement architecture exploration for MPEG encoding
and decoding using VCC.
-
Automatic Topology-Based Identification of Instruction-Set Extensions for
Embedded Processors [p. 1138]
-
L. Pozzi, M. Vuletic, and P. Ienne
The need for high performance in ASIC embedded processors,
coupled with aggressive energy and area goals, is
pushing researchers and designers toward processor specialisation
for a given application-domain. In this paper,
specialisation is addressed through introduction of Ad-hoc
Functional Units--special arithmetic/logic units added to a
traditional architecture to perform domain-specific complex
operations.
-
Steady State Calculation of Oscillators Using Continuation Methods [p. 1139]
-
H. Brachtendorf, S. Lampe, R. Laur, R. Melville, and P. Feldmann
Shooting, finite difference or Harmonic Balance techniques
in conjunction with Newton's method are widely employed
for the numerical calculation of limit cycles of oscillators.
The resulting set of nonlinear equations are normally
solved by damped Newton's method. In some cases
however, divergence occurs when the initial estimate of
the solution is not close enough to the exact one. A two-dimensional
homotopy method is presented in this paper
which overcomes this problem. The resulting linear set of
equations employing Newton's method is under-determined
and is solved in a least squares sense for which a rigorous
mathematical basis can be derived.
|