Moderators: Y. Zorian, LogicVision, USA,
P. Plaza, Telefonca I+D, Spain
The most common practice to model the transistor chain, as it appears in CMOS gates, is to collapse it to a single equivalent transistor. This method is analyzed and improvements are presented in this paper. Inherent shortcomings are removed and an effective transistor width is calculated taking into account the operating conditions of the structure, resulting in very good agreement with SPICE simulations. The actual time point when the chain starts conducting which influences significantly the accuracy of the model is also extracted. Finally, an algorithm to collapse every possible input pattern to a single input is presented.
The basic drawback of parity prediction arithmetic operators is that they may not be fault secure for single faults. In a recent work we have proposed a theory for achieving fault secure design for parity prediction multipliers and dividers. This paper has not considered the case of Booth multipliers using operand recoding. This case is analyzed here. Parity prediction logic and fault secure implementation for this scheme is derived.
Keywords: Self-checking circuits, Booth multipliers
PASTEL is a parameterized memory characterization system which extracts the characteristics of ASIC on-chip-memories such as delay, timing and power consumption which are important in LSI logic design. PASTEL is a fully-automated process from exact wire-RC extraction through circuit reduction, input vector generation, waveform measurement, data-sheet and library creation. The circuit reduction scheme can reduce the circuit simulation time by 2 order of magnitude while maintaining delay error within 100pSec of exact simulation.
Moderators: K. Buchenrieder, Siemens AG, Germany,
A. Jerraya, TIMA, Grenoble, France
This paper presents a novel hardware resource allocation technique for hardware/software partitioning. It allocates hardware resources to the hardware data-path using information such as data-dependencies between operations in the application, and profiling information. The algorithm is useful as a designer's/designtool's aid to generate good hardware allocations for use in hard-ware/software partitioning. The algorithm has been implemented in a tool under the LYCOS system [9]. The results show that the allocations produced by the algorithm come close to the best allocations obtained by exhaustive search.
This paper presents an integrated approach to hardware software partitioning and hardware design space exploration. We propose a genetic algorithm which performs hardware software partitioning on a task graph while simultaneously contemplating various design alternatives for tasks mapped to hardware. We primarily deal with data dominated designs typically found in digital signal processing and image processing applications. A detailed description of various genetic operators is presented. We provide results to illustrate the effectiveness of our integrated methodology.
One of the key problems in hardware/software co-design is communication synthesis which determines the amount and type of interconnect between the hardware components of a digital system. To do so, communication synthesis derives a communication topology to determine which components are to be connected to a common communication channel in the final hardware implementation. In this paper, we present a novel approach to cluster processes to share a communication channel. An iterative graph-based clustering algorithm is driven by a heterogeneous cost function which takes into account bit widths, the probability of access collisions on the channels, cost for arbitration logic as well as the availability of interface resources on the hardware components to trade-off cost against performance in a most optimum fashion. The key aspects of the approach are demonstrated on a small example.
Moderators: A. Vachoux, Ecole Polytechnique Federale de Lausanne, Switzerland,
T. Kazmierski, University of Southampton, UK
This paper presents a straightforward approach for synthesizing a standard VHDL description of an asynchronous circuit from a behavioural VHDL description. The asynchronous circuit style is based on `micropipelines', a style currently used to develop asynchronous microprocessors at Manchester University. The rules of partition and conversion which are used to implement the synthesizer are also described. The synthesizer greatly reduces the design time of a complex micropipeline circuit.
The systematic top-down design of mixed-signal systems requires an abstract specification of the intended functions. However, hybrid systems are systems whose parts are specified using different time models. Specifications of hybrid systems are not purely functional as they also contain structural information. The structural information is introduced bypartitioning the specification into blocks with a homogeneous time model. This often leads to inefficient implementations. In order to overcome this problem, a homogeneous representation for behavior of hybrid systems -- KIR -- is introduced. This representation makes it possible to represent behavior in all time models in a common way so that the separation in different modeling styles is no longer necessary. Rules for re-writing the KIR-graph are given which permit the description of the same behaviour in another time model.
After the IEEE ballot accepted the first draft language reference manual for VHDL-AMS (IEEE PAR 1076.1) in October 1997, we now can spend time and effort on applying the new arising methodology to real world problems outside the electronic domain. In automotive engineering we have system design problems dealing with hydraulic or mechanic components and their controlling units, for which we expect a major advantage by introducing unified modelling to all domains. With the Brite/EuRam-Project TOOLSYS (a joined effort of automotive industry and tool makers to apply VHDLAMS as unified modelling language on mixed-domain applications) we prove the suitability as unified modelling and interchange language for real-world systems and components. First experiments with hydraulic components reveal numerical problems on analog circuit simulators. None of the available strategies for these particularly hard problems are included by the electronic simulator makers. With VHDLAMS multi-domain modelling seems possible, now we need multi-domain simulation environments.
Moderators: H.-J. Wunderlich, University of Stuttgart, Germany,
M. Nicolaidis, TIMA, Grenoble, France
Built-in self-test (BIST) techniques modify functional hardware to give a data path the capability to test itself. The modification of data path registers into registers (BIST resources) that can generate pseudo-random test patterns and/or compress test responses, incurs an area overhead penalty. We show how scheduling and module assignment in high-level synthesis affect BIST resource requirements of a data path. A scheduling and module assignment procedure is presented that produces schedules which, when used to synthesize data paths, result in a significant reduction in BIST area overhead and hence total area.
This paper presents a high-level test synthesis algorithm for operation scheduling and data path allocation. Contrary to other works in which scheduling and allocation are performed independently, our approach integrates these two tasks by performing them simultaneously so that the effects of scheduling and allocation on testability are exploited more effectively. The approach is based on an algorithm which applies a sequence of semantics-preserving transformations to a design to generate an efficient RT level implementation from a VHDL behavioral specification. Experimental results show the advantages of the proposed algorithm.
This paper proposes a methodology for testing the configurable logic of RAM-based FPGAs taking into account the configurability of such flexible devices. The methodology is illustrated using the XILINX 4000 family. On this example of FPGA, we obtain only 8 basic Test Configurations to fully test the whole matrix of CLBs. In the proposed Test Configurations, all the CLBs have exactly the same configuration forming a set of one-dimensional iterative arrays. The iterative arrays present a C-testability property in such a way that the number of Test Configurations 8 is fixed and independent of the FPGA size.
This paper presents a novel technique for testing Field Programmable Gate Arrays (FPGAs), suitable to be used in case of frequent FPGA reuse and rapid dynamic modifiability of the implemented function.
Moderators: Y. Torroja, Polytechnical University of Madrid, Spain,
R. Sarmiento, University of Las Palmas de Gran Canaria, Spain
The design and Implementation of an ATM Traffic Shaper (ATS) is here described. This IC was realised on a 0.35 m CMOS technology. The main function of the ATS is the collection of low bit rate traffics to fill a higher bit rate pipe in order to reduce the cost of ATM based services, nowadays mainly influenced by transmission cost. The circuit fits in several ATM system configurations but mainly will be used at the User-Network Interfaces or Network-Network interfaces. The IC was designed with a Top-Down methodology using as HDL, Verilog. The Chip is pad limited and is encapsulated on a 208 PQFP Package. The circuit complexity is 38 Kgates and its working frequency is 32Mhz. A circuit prototype was build with FPGAs in order to validate the RTL description.
A tool for the synthesis of fuzzy controllers is presented in this paper. This tool takes as input the behavioral specification of a controller and generates its VHDL description according to a target architecture. The VHDL code can be synthesized by means of two implementation methodologies, ASIC and FPGA. The main advantages of using this approach are rapid prototyping, and the use of well-known commercial design environments like Synopsys, Mentor Graphics, or Cadence.
A novel neural chip SAND (Simple Applicable Neural Device) is described. It is highly usable for hardware triggers in particle physics. The chip is optimized for a high input data rate (50 MHz, 16 bit data) at a very low cost basis. The performance of a single SAND chip is 200 MOPS due to four parallel 16 bit multipliers and 40 bit adders working in one clock cycle. The chip is able to implement feedforward neural networks with a maximum of 512 input neurons and three hidden layers. Kohonen feature maps and radial basis function networks may be also calculated. Four chips will be implemented on a PCI-board for simulation and on a VME board for trigger and on- and off-line analysis.
Moderators: S.A. Huss, Darmstadt University of Technology, Germany,
H.-P. Amann, University of Neuchatel, Switzerland
Hardware-software co-synthesis of an embedded system requires mapping of its specifications into hardware and software modules such that its real-time and other constraints are met. Embedded system specifications are generally represented by acyclic task graphs. Many embedded system applications are characterized by aperiodic as well as periodic task graphs. Aperiodic task graphs can arrive for execution at any time and their resource requirements vary depending on how their constituent tasks and edges are allocated. Traditional approaches based on a fixed architecture coupled with slack stealing and/or on-line determination of how to serve aperiodic task graphs are not suitable for embedded systems with hard real-time constraints, since they cannot guarantee that such constraints would always be met. In this paper, we address the problem of concurrent co-synthesis of aperiodic and periodic specifications of embedded systems. We estimate the resource requirements of aperiodic task graphs and allocate execution slots on processing elements and communication links for executing them. Our approach guarantees that the deadlines of both aperiodic and periodic task graphs are always met. We have observed that simultaneous consideration of aperiodic task graphs while performing co-synthesis of periodic task graphs is vital for achieving superior results compared to the traditional slack stealing and dynamic scheduling approaches. To the best of our knowledge, this is the first co-synthesis algorithm which provides simultaneous support of periodic and aperiodic task graphs with hard real-time constraints. Application of the proposed algorithm to several examples from real-life telecom transport systems shows that up to 28% and 34% system cost savings are possible over co-synthesis algorithms which employ slack stealing and rate-monotonic scheduling, respectively.
The demands in terms of processing performance, communication bandwidth and real-time throughput of many multimedia applications are much higher than today's processing architectures can deliver. The PROPHID heterogeneous multiprocessor architecture template aims to bridge this gap. The template contains a general purpose processor connected to a central bus, as well as several high-performance application domain specific processors. A high-throughput communication network is used to meet the high bandwidth requirements between these processors. In this network multiple time-division-multiplexed data streams are transferred over several parallel physical channels. This paper presents a method for guaranteeing the throughput for hard-real-time streams in such a network. At compile time sufficient bandwidth is assigned to these streams. The assignment can be determined in polynomial time. Remaining bandwidth is assigned to soft-real-time streams at run time. We thus achieve efficient stream communication with guaranteed performance.
We present an approach to process scheduling based on an abstract graph representation which captures both dataflow and the flow of control. Target are architectures consist of several processors, ASICs and shared buses. We have developed a heuristic which generates a schedule table so that the worst delay is minimized. Several experiments demonstrate the efficiency of the approach.
Moderators: E. Villar, University of Cantabria, Spain,
D. Sciuto, Politecnico di Milano, Italy
As the complexity of circuit designs grows, designers look toward formal verification to achieve better test coverage for validating complex designs. However, this approach is inherently computationally intensive, and hence, only small designs can be verified using this method. To achieve better performance, model abstraction is necessary. Model abstraction reduces the number of states necessary to perform formal verification while maintaining the functionality of the original model with respect to the specifications to be verified. As a result, model abstraction enables large designs to be formally verified. In this paper, we describe three methods for model abstraction based on semantics extraction from user models to improve the performance of formal verification tools.
This paper presents an analysis process targeted for the verification of fault secure systems during their design phase. This process deals with a realistic set of microdefects at the device level which are mapped into mutant and saboteur based VHDL fault models in the form of logical and/or performance degradation faults. Automatic defect injection and simulation are performed through a VHDL test bench. Extensive post processing analysis is performed to determine defect coverage, figure of merit for fault secureness, and MTTF.
Several hardware compilers on the market convert from so-called RT level VHDL subsets to logic level descriptions. Such models still need clock signals and the notion of physical time in order to be executable. In a stage of a top-down design starting from the algorithmic level, register transfers are considered, where the timing is not controlled by clock signals and where physical time is not yet relevant. We propose an executable VHDL subset for such register transfer models.
In this paper we evaluate parallel VHDL simulation based on conservative parallel discrete event simulation (conservative PDES) algorithms. We focus on a conservative simulation algorithm based on critical and external distances. This algorithm exploits the interconnection structure within the simulation model to increase parallelism. Further, a general method is introduced to automatically transform a VHDL model into a PDES model. Additionally, we suggest a method to further optimize parallel simulation performance. Finally, our first simulation results on a IBM parallel computer are presented. While these results are not sufficient for a general evaluation they show that a good speedup can be obtained.
Moderators: E. Aas, Norwegian University of Science and Technology, Norway,
Z. Peng, Linköping University, Sweden
This paper presents a new method for the testing of the datapath of DSP cores based on self-test program. During the test, random patterns are loaded into the core, exercise different components of the core, and then are loaded out of the core for observation under the control of the self-test programs. We propose a systematic approach to generate the self-test program based on two metrics. One is the structured coverage and the other is the testability metric. Experimental results show the self-test program obtained by this approach can reach very high fault coverage in programmable core testing.
After write operations, BIST schemes for RAMs relying on signature analysis must compress the entire memory contents to update the reference signature. This paper introduces a new scheme for output data compression which avoids this overhead while retaining the benefits of signature analysis. The proposed technique is based on new memory characteristic derived as the modulo-2 sum of all addresses pointing to non-zero cells. This characteristic can be adjusted concurrently with write operations by simple EXOR-operations on the initial characteristic and on the addresses affected by the change.
In this paper, a new compaction technique based on signature analysis is presented. Rather than comparing the final signature with the expected one after the test is completed, the binary output of the MISA is converted into an alternating binary signal by two simple cover circuits. An error is indicated whenever the alternation of the output signal is disturbed. This technique results in a higher fault coverage, improved fault diagnosis capability, a greater test autonomy in core-based designs, and early fault notification.
Moderators: I. Bolsens, IMEC, Belgium,
A. Nunez, University of Las Palmas de Gran Canaria, Spain
The inverse discrete cosine transformation (IDCT) is used in a variety of decoders (e.g. MPEG). On one hand, highly optimized algorithms that are characterized by an irregular structure and a minimum number of operations are known from software implementations. On the other hand, regular structured architectures are often used in hardware realizations. In this paper a comparison of regular and irregular structured IDCT algorithms for efficient hardware realization is presented. The irregular structured algorithms are discussed with main emphasis on assessment criteria for algorithm selection and high-level synthesis for hardware cost estimation.
A novel Smart Pixel Opto-VLSI architecture to implement a complete 2-D wavelet transform of real-time captured images is presented. The Smart Pixel architecture enables the realisation of a highly parallel, compact, low power device capable of real-time capture, compression, decompression and display of images suitable for Mobile Multimedia Communication applications.
This paper presents a VLSI Architecture to implement the forward and inverse 2-D Discrete Wavelet Transform (FDWT/IDWT), to compress medical images for storage and retrieval. Lossless compression is usually required in the medical image field. The word length required for lossless compression makes too expensive the area cost of the architectures that appear in the literature. Thus, there is a clear need for designing an architecture to implement the lossless compression of medical images using DWT. The datapath word-length has been selected to ensure the lossless accuracy criteria leading a high speed implementation with small chip area. The result is a pipelined architecture that supports single chip implementation in VLSI technology. The architecture has been simulated in VHDL and has a hardware utilization efficiency greater than 99%. It can compute the FDWT/IDWT at a rate of 3.5 512.512 12 bit images/s corresponding to a clock speed of 33MHz.
Moderators: R. Ernst, Technical University of Braunschweig, Germany,
P. van der Wolf, Philips Research Laboratories, The Netherlands
Fast evaluation of functional and timing properties is becoming a key factor to enable cost-effective exploration of mixed bw/sw design alternatives for embedded applications. The goal of this paper is to present a modeling strategy to specify functionality and timing properties of uncommitted mixed bw/sw systems. In addition, the paper proposes a simulation algorithm able to perform fast high-level simulation of the system by taking into account the initial hw vs sw allocation of system modules. The related CAD simulation environment allows the designer to access profiling information which can be useful to remodel the system to meet the functional/timing goals as well as to drive the following hw vs sw partitioning activity. Experimental data obtained by reengineering an industrial design are also included in the paper.
Currently, run-time operating systems are widely used to implement concurrent embedded applications. This run-time approach to multi-tasking and inter-process communication can introduce significant overhead to execution times and memory requirements -- prohibitive in many cases for embedded applications where processor and memory resources are scarce. In this paper, we present a static compilation approach that generates ordinary C programs at compile-time that can be readily retargeted to different processors, without including or generating a run-time scheduler. Our method is based on a novel Petri net theoretic approach.
This paper describes a method to estimate the implementation cost of the hardware part in a mixed hardware/software system, as well as the related performance. These estimations try to avoid the use of many implementation details in order to keep the complexity order of the process under control. The concepts of hardware sharing and parallelism are exploited to make a picture of the whole hardware cost associated to a given partition.
Objective of the methodology presented in this paper is to perform design space exploration on a high level of abstraction by applying high-level transformations. To realize a design loop which is close and settled on upper design levels, a high-level estimation step is integrated. In this paper, several estimation methodologies fixed on different states of the high-level synthesis process are examined with respect to their aptitude on controlling the transformational design space exploration process. Estimation heuristics for several design characteristics are derived and experimentally validated.
Moderators: S. Maginot, LEDA, France, W. Ecker, Siemens AG, Germany
Object-oriented techniques like inheritance promise great benefits for the specification and design of parallel hardware systems. The difficulties which arise from the use of inheritance in parallel hardware systems are analysed in this article. Similar difficulties are well known in con-current object-oriented programming as inheritance anomaly but are not yet investigated in object-oriented hardware design. A solution how to successfully deal with the anomaly is presented for a type based object-oriented extension to VHDL. Its basic idea is to separate the synchronisation code (protocol specification) and the actual behaviour of a method. Method guards which allow a method to execute if a guard expression evaluates to true are proposed to model synchronisation constraints. It is shown how to implement a suitable re-schedule mechanism for methods as part of the synchronisation code to handle the case that a guard expression is evaluated to false.
When defining an object-oriented extension to VHDL, the necessary message passing is one of the most complex issues and has a large impact on the whole language. This paper identifies the requirements for message passing suited to model hardware and classifies different approaches. To allow abstract communication and reuse of protocols on system level, a new, flexible message passing mechanism proposed for Objective VHDL 1 will be introduced.
This paper presents a proposal for enabling VHDL to better support reuse and collaboration. Base idea is passing on the adequate information to partners working in an object-oriented hardware design environment. Appropriate subgoals for achieving this are:
- an optimal mix of necessary abstraction and sufficient precision,
- a formal description consisting of implementation constraints and knowledge requirements, and
- the non-formal concept of mutual consideration. Several loans are made from
- the software domain: Java interfaces, type models, and the request for habitability,
- the VHDL Annotation Language.
This is not an experience report, for the idea of adopting the mentioned software concepts to hardware design is new. It is rather a guided tour to some 'panorama views'. Although they may not seem related to each other at first glance, they turn out to altogether support a common goal: understanding and communicating VHDL-based designs better.
In this paper, we enrich VHDL with new specification constructs intended for hardware verification. Using our extensions, total correctness properties may now be stated whereas only partial correctness can be expressed using the standard VHDL assert statement. All relevant properties can now be specified in such a way that the designer does not need to use formalisms like temporal logics. As the specifications are independent from a certain formalism, there is no restriction to a certain hardware verification approach.
Moderators: T. Vierhaus, Technical University of Cottbus, Germany,
R. Segers, Philips Semiconductors, The Netherlands
A high-level synthesis approach is proposed for design of semi-concurrently self-checking devices; attention is focussed on data path design. After identifying the reference architecture against which cost and performances should be evaluated, a simultaneous scheduling-and-allocation algorithm is presented, allowing resource sharing between nominal and checking data paths. The algorithm grants that the required checking periodicity is satisfied while minimizing additional costs in terms of functional units. Risk of error aliasing due to resource sharing is analysed.
Design validation for embedded arrays remains as a challenging problem in today's microprocessor design environment. At Somerset, validation of array designs relies on both formal verification and vector simulation. Although several methods for array design validation have been proposed and had great success [[6], [9], [10], [13]], little evidence has been reported for the effectiveness of these methods with respect to the detection of design errors. In this paper, we propose a new way of measuring the effectiveness of different validation approaches based on automatic design error injection and simulation. This technique provides a systematic way for the evaluation of the quality of various validation approaches. Experimental results using different validation approaches on recent PowerPC microprocessor arrays will be reported.
Functional scan chains are scan chains that have scan paths through a circuit's functional logic and flip-flops. Establishing functional scan paths by test point insertion (TPI) has been shown to be an effective technique to reduce the scan overhead. However once the scan chain is allowed to go through functional logic, the traditional alternating test sequence is no longer enough to ensure the correctness of the scan chain. We identify the faults that affect the functional scan chain, and show a methodology to find tests for these faults. Our results have the number of undetected faults at only 0.006% of the total number of faults, or 0.022% of the faults affecting the scan chain.
Co-ordinators: Carlo Guardiani, SGS-Thomson, Italy
Wolfgang Nebel, Oldenburg University and OFFIS, Germany
Moderator: Alberto Sangiovanni-Vincentelli, University of California at Berkeley, USA
Speakers: Grant Martin, Cadence, USA
Mike Muller, ARM, UK
Bart De Loore, Philips Semiconductors, The Netherlands
Panelists: Doug Fairbairn, VSI Alliance, USA
Pietro Erratico, SGS-Thomson, Italy
Faysal Soheil, Synopsys, USA
System-chip design which starts at the RTL-level today has hit a plateau of productivity and re-use which can be characterised as a 'Silicon Ceiling'. Breaking through this plateau and moving to higher and more effective re-use of IP blocks and system-chip architectures demands a move to a new methodology: one in which the best aspects of today's RTL based methods are retained, but complemented by new levels of abstraction and the commensurate tools to allow designers to exploit the productivity inherent in these higher levels of abstraction. In addition, the need to quickly develop design derivatives, and to differentiate products based on standards, requires an increasing use of software IP. This paper will describe today's situation, the requirements to move beyond it, and sketch the outlines of near-term possible and practical solutions.
In the era of IP reuse, what is going to make the difference between different system-on-a-chip providers? To answer this question, it suffices to depict the competencies required to be a successful silicon provider. We distinguish technical, organizational and To be successful, a system-on-a-chip provider will have to be excellent in:
Selecting the right product
Implementing the product with a right mix of design time, cost, dissipation
Delivery performance and customer support
Moderators: J. Heaton, ICL, UK, R. Seepold, FZI Karlsruhe, Germany
In this paper a number of reuse approaches for circuit design are analysed. Based on this analysis an algebraic core model for discussion of a general reuse strategy is proposed. Using this model, the aim is to classify different reuse approaches for circuit design, to compare the applied terms and definitions, and to formulate classes of typical reuse tasks. In a practical application with focus on retrieval and parameterisation techniques, this model is on the way to being applied to DSP design issues.
A new set of tools for Teamwork, Organization Units, Workspace and Build management of VHDL-based reusable components, organized in libraries, accessible through an heterogeneous and distributed environment is presented. These tools support the collaborative and distributed development of systems-on-a-chip reusing VHDL components available through the intranets and the Internet. They must be used as a complementary support to the design tools (simulation, synthesis, etc.) already available in the market to enhance productivity, facilitating maintenance, improving reliability, efficiency and interoperability, and finally, capitalizing on the IP library components investment.
This paper presents a hierarchical, object-oriented model as a basis for reuse of components in the design process of digital systems. The model forms a uniform knowledge base which consists of formal descriptions about functional, qualitative, and quantitative properties of systems and components. It supports the synthesis of systems from the described components. Starting at a system specification different models and descriptions are generated for simulation, prototyping, analysis and high level synthesis.
Moderators: E. Barke, University of Hannover, Germany,
I. Rugen-Herzig, Temic Telefunken Microelectronic GmbH, Germany
We describe the methodology used for the design of the CMOS processor chipset used in the IBM S/390 Parallel Enterprise Server - Generation 3. The majority of the logic is implemented by standard cell elements placed and routed flat, using timing-driven techniques. The result is a globally optimized solution without artificial floorplan boundaries. We will show that the density in terms of transistors per mm 2 is comparable to the most advanced custom designs and that the impact of interconnect delay on the cycle time is very small. Compared to custom design, this approach offers excellent turn-around-time and considerably reduces overall effort.
The state-of-the-art methods for the placement of large-scale standard cell designs work in a top-down fashion. After some iterations, where more and more detailed placement information is obtained, a final procedure for finding a legal placement is needed. This paper presents a new method for this final task, based on efficient algorithms from combinatorial optimization.
We describe the timing analysis and optimization methodology used for the chipset inside the IBM S/390 Parallel Enterprise Server - Generation 3. After an introduction to the concepts of static timing analysis, we describe the timing-modeling for the gates and interconnects, explain the optimization schemes and present obtained results.
Sequential routing algorithms using maze-running are very suitable for general Over-the-Cell-Routing but suffer often from the high memory or runtime requirements of the underlying path search routine. A new algorithm for this subproblem is presented that computes shortest paths in a rectangular grid with respect to euclidean distance. It achieves performance and memory requirements similar to fast line-search algorithms while still being optimal. An additional application for the computation of minimal rip-up sets will be presented. Computational results are shown for a detailed router based on these algorithms that is used for the design of high performance CMOS processors at IBM.
Co-ordinator: Ivo Bolsens, IMEC, Belgium
Moderator: Nadir Bagherzadeh, University of California at Irvine, USA
Speakers: W. Shields Neely, National Semiconductor, USA
Jan Rabaey, University of California at Berkeley, USA
Ian Page, University of Oxford, UK
The electronic systems of the future will be implemented in terms of multi-million gate 'systems on a chip'. These systems will require an enormous investment in design and manufacturing; yet the pace of technological change (e.g., new algorithm development, new processor and memory designs) and ever changing requirements puts them in danger of obsolescence soon after they are created -- applications always want to take advantage of new technical advances and must meet changed requirements. What is needed are single chip systems that are designed to be adaptable to a family of applications. The emerging technology of configurable logic offers the promise of large-scale silicon systems that are adaptive after manufacture, with little or no sacrifice in execution efficiency compared to hard-wired systems.
As the 'system-on-a-chip' concept is rapidly becoming a reality, time-to-market and product complexity push the reuse of complex macromodules. Circuits combining a variety of the macromodules (micro-processors, DSPs, programmable logic and embedded memories) are being reported by number of companies [2]. Most of these systems target the embedded market where speed, area, and power requirements are paramount, and a balance between hardware and software implementation is needed. Reconfigurable computing devices have recently emerged as one of the major alternative implementation approaches, addressing most of the requirements outlined above.
This paper describes a vision in which future systems consisting of novel hardware and software components are designed and implemented by a single type of professional engineer. That professional has more in common with today's programmer than a hardware designer, although both of these existing bodies of pro-fessionals have a strong contribution to make to understanding, defining and bringing about this transformation in product creation.
Moderators: Peter Schwarz, Fraunhofer EAS Dresden, Germany,
H. Fleurkens, Philips Research Laboratories, The Netherlands
Despite its importance, we find that a rigorous theoretical foundation for performing timing analysis has been lacking so far. As a result, we have initiated a research project that aims to provide such a foundation for functional timing analysis. As part of this work we have developed an abstract automaton based delay model that accounts for the various analog factors affecting delay, such as signals slopes, near simultaneous switching, etc., while at the same time accounting for circuit functionality. This paper presents this delay model.
Within this paper the gate-level power-simulation tool GliPS (Glitch Power Simulator) is presented, which gives excellent accuracy (in the range of transistor-level simulators) at high performance. The high accuracy is achieved by putting emphasis on delay- and power-modelling. The impact of these modelling factors on accuracy and performance is demonstrated by comparing GliPS to other tools on circuit-level and a simple toggle count based power simulator TPS on gate level.
This paper presents the optimistic synchronization mechanism Predictive Time Warp (PTW) based on the implementation Time Warp of the Virtual Time paradigma for use in the simulation of electronic systems and high level system simulation. In comparison to most existing approaches extending and improving classical Time Warp, the aim of this development was to reduce the rollback frequency of optimistic logical processes without imposing waiting periods. Part of PTW is the introduction of forecast events predicting a certain period in the future and thus reduce the rollback probability. On the example of a distributed logic simulation the benefit of the PTW synchronization approach is shown.
Moderators: F. Kurdahi, University of California, Irvine, USA,
A. Jerraya, TIMA, Grenoble, France
We describe a Codesign approach based on a parallel and scalable ASIP architecture, which is suitable for the implementation of reactive systems. The specification language of our approach is extended statecharts. Our ASIP architecture is scalable with respect to the number of processing elements as well as parameters such as bus widths and register file sizes. Instruction sets are generated from a library of components covering a spectrum of space/time trade-off alternatives. Our approach features a heuristic static timing analysis step for statecharts. An industrial example requiring the real-time control of several stepper motors illustrates the benefits of our approach.
Code generation methods for DSP applications are hampered by the combination of tight timing constraints imposed by the performance requirements of DSP algorithms, and resource constraints imposed by a hardware architecture. In this paper , we present a method for register binding and instruction scheduling based on the exploitation and analysis of resource and timing constraints. The analysis identifies sequencing constraints between operations additional to the precedence constraints. Without the explicit modeling of these sequencing constraints, a scheduler is often not capable of finding a solution that satisfies the timing , resource and register constraints. The presented approach results in an efficient method of obtaining high quality instruction schedules with low register requirements.
In this paper, we present an approach to synthesize multiple behavior modules. Given n DFGs to be implemented, the previous methods scheduled each of them sequentially, and implemented them as a single module. Though the method is appropriate for sharing the functional units, it ignored the following two aspects: 1) different interconnection patterns among DFGs can increase the interconnection area and delay of the critical path, 2) the sequential scheduling of DFGs has a difficulty in considering the effects on the other DFGs not scheduled yet. We show an efficient way to solve the problems using a selective grouping method and the extensions of the traditional scheduling methods. The experimentation reveals that the result obtained by the proposed method is better to reduce interconnection area and to meet the timing constraints than those obtained by the previous methods.
We develop a 0-1 non-linear programming (NLP) model for combined temporal partitioning and high-level synthesis from behavioral specifications destined to be implemented on reconfigurable processors. We present tight linearizations of the NLP model. We present effective variable selection heuristics for a branch and bound solution of the derived linear programming model. We show how tight linearizations combined with good variable selection techniques during branch and bound yield optimal results in relatively short execution times.
Moderators: M.D.F. Wong, University of Texas at Austin, USA,
F.M. Johannes, Technical University of Munich, Germany
This paper shows how algorithmic techniques and parallel processing can speed up general connectivity computation. A new algorithm, called Concurrent Group Search Algorithm (CGSA), is proposed that divides N(N-1)/2 vertex pairs into N-1 groups. Within each group general connectivities of all pairs can be calculated concurrently. Our experimental results show that this technique can achieve speedup of 12 times for one circuit. In addition, group computations are parallelized on a 16-node IBM SP2 with a speedup of 14 times over its serial counterpart observed. Combining the two approaches could result in a total speedup of up to 170 times, reducing CPU time from over 200 hours to 1.2 hour for one circuit. Our new model is better than those without clustering because it characterizes the connection graph more accurately, is faster to compute and produces better results. The best performance improvements are 43% for one circuit and 49% for another.
Given a weighted graph and a family of k disjoint groups of nodes, the Group Steiner Problem asks for a minimum-cost routing tree that contains at least one node from each group. We give polynomial-time O(k^{e})-approximation algorithms for arbitrarily small values of e > 0, improving on the previously known O(k^{1/2})-approximation. Our techniques also solve the graph Steiner arborescence problem with an O(k) approximation bound. These results are directly applicable to a practical problem in VLSI layout, namely the routing of nets with multi-port terminals. Our Java implementation is available on the Web.
We present an interactive two layer router integrated in an analog IC design environment used in an SDL (schematic driven layout) design flow. Special features are its customizability, the treatment of arbitrary polygons and an advanced handling of source/target polygons in order to avoid net internal design rule violations during connection phase. A global routing algorithm is used to split the route into separate parts each routable in a single layer. After via placement a specialized maze router performs the advanced single layer routes in 90 or 45 degree mode. The resulting route can be modified by interactive via movement and rerouting of obsolete partial routes.
Organizers: Wolfgang Rosenstiel, University of Tübingen, Germany Gerry Musgrave, Brunel University, UK
Moderator: Gerry Musgrave, Brunel University, UK
Panelists: Dominique Borrione, TIMA-UJF, France
Antun Domic, Synopsys, USA
Ramayya Kumar, Verysys, Germany
Alan Page, Abstract Design Automation, UK
Michael Payer, Siemens, Germany
Formal verification has been the province of academic research for many years. More recently tools have become available from vendors to tackle some aspects of the design verification problems. There have been considerable learning scenarios in order to understand how this technique can fit in the real industrial design flow. The Panel, consisting of academics, vendors and users, will endeavour to clarify what these tools can do, what their potential will be and the experiences to date in helping validate today's complex designs.
Moderators: J. Forrest, UMIST, Manchester, UK
M. Pfaff, Johannes Kepler University Linz, Austria
Common approaches to hardware implementation of networking components start at the VHDL level and are based on the creation of regression test benches to perform simulative validation of functionality. The time needed to develop test benches has proven to be a significant bottle-neck with respect to time-to-market requirements. In this paper, we describe the coupling of a telecommunication network simulator with a VHDL simulator and a hardware test board. This co-verification approach enables the designer of hardware for networking components to verify the functional correctness of a device under test against the corresponding algorithmic description and to perform functional chip verification by reusing test benches from a higher level of abstraction.
Digital systems, especially those for mobile applications are sensitive to power consumption, chip size and costs. Therefore they are realized using fixed-point architectures, either dedicated HW or programmable DSPs. On the other hand, system design starts from a floating-point description. These requirements have been the motivation for FRIDGE 1 , a design environment for the specification, evaluation and implementation of fixed-point systems. FRIDGE offers a seamless design flow from a floating-point description to a fixed-point implementation. Within this paper we focus on two core capabilities of FRIDGE: (1) the concept of an interactive, automated transformation of floating-point programs written in ANSIC into fixed-point specifications, based on an interpolative approach. The design time reductions that can be achieved make FRIDGE a key component for an efficient HW/SW-CoDesign. (2) a fast fixed-point simulation that performs comprehensive compile-time analyses, reducing simulation time by one order of magnitude compared to existing approaches.
One of the main tasks within the high-level synthesis (HLS) process is the verification problem to prove automatically the correctness of the synthesis results. Currently, the results are usually checked by simulation. In consequence, both the behavioral specification and the HLS results have to be simulated by the same set of test vectors. Due to the HLS and the inherent changes in the cycle-by-cycle behaviour, the synthesis results require an adaption of the initial test vector set. This reduces the advantage gained by using the automated HLS process. In order to decrease these simulation efforts, in this paper a new method will be presented that enables the usage of the same simulation vectors at both abstraction levels and the execution of an automated simulation comparison.
Moderators: P. Marwedel, University of Dortmund, Germany,
A. Timmer, Philips Research Laboratories, The Netherlands
In this paper, we address the problem of layout-driven scheduling-binding as these steps have a direct relevance on the final performance of the design. The importance of effective and efficient accounting of layout effects is well-established in High-Level Synthesis (HLS), since it allows more efficient exploration of the design space and the generation of solutions with predictable metrics. This feature is highly desirable in order to avoid unnecessary iterations through the design process.By producing not only an RTL netlist but also an approximate physical topology of implementation at the chip level, we ensure that the solution will perform at the predicted metric once implemented, thus avoiding unnecessary delays in the design process.
This paper presents a new approach to cross-level hierarchical high-level synthesis. A methodology is presented, that supports the efficient synthesis of hierarchical specified systems while preserving the hierarchical structure. After synthesis of each subsystem the determined component schedule and the synthesized RT-structure are added to its algorithmic specification. This provides an automatic selection of optimized complex components. Furthermore, the component schedule enables the sharing of unused subcomponents across different hierarchical levels of the design.
Scheduling and binding are two major tasks in architectural synthesis from behavioral descriptions. The information about the mutually exclusive pairs of operations is very useful in reducing both the total delay of the schedule and the resource usage in the final circuit implementation. In this paper, we present an algorithm to identify the largest set of mutually exclusive operation pairs in behavioral descriptions. Our algorithm uses dataflow analysis on a tabular model of system functionality, and is shown to work better than the existing methods for identifying mutually exclusive operations.
Moderators: R. Peset Llopis, Philips Research Laboratories, The Netherlands,
B. Schürmann, University of Kaiserslautern, Germany
This paper presents a new performance-driven MCM router, named MRC, with special consideration of crosstalk reduction. Router MRC completes an initial routing with an adequate performance trade-off including wire length, vias, number of layers, timing and crosstalk. Then a crosstalk reduction algorithm is used to make the routing solution crosstalk-free without big influence on other routing performances. Thus, efficiently handling timing and crosstalk problems becomes the unique feature of MRC. Router MRC has been implemented and tested on MCM benchmarks and the experimental results are very promising.
Interconnect tuning is an increasingly critical degree of freedom in the physical design of high-performance VLSI systems. By interconnect tuning, we refer to the selection of line thicknesses, widths and spacings in multi-layer interconnect to simultaneously optimize signal distribution, signal performance, signal integrity, and interconnect manufacturability and reliability. This is a key activity in most leading-edge design projects, but has received little attention in the literature. Our work provides the first technology-specific studies of interconnect tuning in the literature. We center on global wiring layers and interconnect tuning issues related to bus routing, repeater insertion, and choice of shielding/spacing rules for signal integrity and performance. We address four basic questions. (1) How should width and spacing be allocated to maximize performance for a given line pitch? (2) For a given line pitch, what criteria affect the optimal interval at which repeaters should be inserted into global interconnects? (3) Under what circumstances are shield wires the optimum technique for improving interconnect performance? (4) In global interconnect with repeaters, what other interconnect tuning is possible? Our study of question (4) demonstrates a new approach of offsetting repeater placements that can reduce worst-case cross-chip delays by over 30% in current technologies.
An interconnect joining a source and a sink is divided into fixed-length uniform-width wire segments, and some adjacent segments have buffers in between. The problem we considered is to simultaneously size the buffers and the segments so that the Elmore delay from the source to the sink is minimized. Previously, no polynomial time algorithm for the problem has been reported in literature. In this paper, we present a polynomial time algorithm SBWS for the simultaneous buffer and wire sizing problem. SBWS is an iterative algorithm with guaranteed convergence to the optimal solution. It runs in quadratic time and uses constant memory for computation. Also, experimental results show that SBWS is extremely efficient in practice. For example, for an interconnect of 10000 segments and buffers, the CPU time is only 0.127 second.
Co-ordinators: Wolfgang Rosenstiel, University of Tübingen, Germany Joachim Kunkel, Synopsys, USA
Moderator: Joachim Kunkel, Synopsys, USA
Panelists: Misha Burich, Cadance/Alta, USA
Raul Camposano, Synopsys, USA
Mark Genoe, Alcatel, Belgium
Lev Markov, Mentor Graphics, USA
Steve Schulz, Texas Instruments, USA
This panel discusses the requirements for the next generation system design tools and presents the latest developments from the industrial leaders. Attendees are representatives from system houses as well as from the electronic system design automation companies. The panel is chaired by Joachim Kunkel, Director Engineering for System Level Design Tools at Synopsys. The electronic system design companies are represented by Misha Burich, VP Engineering from the Alta Group of Cadence, Raul Camposano, Senior VP and General Manager for the Design Tools Group of Synopsys and Lev Markov, Chief Scientist for system level co-design of Mentor Graphics. Marc Genoe, Chairman of the System Level Design and Verification Working Group of the Virtual Socket Interface Alliance will discuss the standardization process with respect of system level design. Steve Schulz, Texas Instruments, the initiator of the System Level Design Language initiative, will present the status of this recent development. In addition, system house representatives will discuss future requirements for system level design tools.
Moderators: M. Sachdev, Philips Research Laboratories, The Netherlands,
B. Straube, FhG IIS/EAS Dresden, Germany
The defective IDDQ in deep-submicron full complementary MOS circuits with shorts is estimated. High performance and also low power scenarios are considered. The technology scaling, including geometry reductions of the transistor dimensions, power supply voltage reduction, carrier mobility degradation and velocity saturation, is modeled. By means of the characterization of the saturation current of a simple MOSFET, a lower bound of IDDQ defective consumption versus Leff is found. Quiescent current consumption lower bound for shorts intragate, and shorts intergate affecting at least one logic node is evaluated. The methodology is used to estimate the IDDQ distribution, for a given input vector, of defective circuits. This IDDQ estimation allows the determination of the threshold value to be used for the faulty/fault-free circuit classification.
This paper describes a new Digital controlled Cjf-Chip I_{DDQ} Measurement Unit (DOCIMU), which provides reliable precision and relatively fast measurements, even with a high capacivity load, while the Device Under Test (DUT) is unaffected. The maximal resolution is 50nA and the accurate measurement range is 1mA. Unlike other I_{DDQ} monitors, the DOCIMY copes with external interference, as it needs no analogue pin to set the I_{DDQ} limit and the noise at the V_{DD} is eliminated via a special S/H feature. The DOCIMU is also a testable I_{DDQ} monitor, which is another unique feature.
Most memory test algorithms are optimized tests for a particular memory technology and a particular set of fault models, under the assumption that the memory is bit-oriented; i.e., read and write operations affect only a single bit in the memory. Traditionally, word-oriented memories have been tested by repeated application of a test for bit-oriented memories whereby a different data background (which depends on the used intra-word fault model) is used during each iteration. This results in time inefficiencies and limited fault coverage. A new approach for testing word-oriented memories is presented, distinguishing between inter-word and intra-word faults and allowing for a systematic way of converting tests for bit-oriented memories to tests for word-oriented memories. The conversion consists of concatenating the bit-oriented test for inter-word faults with a test for intra-word faults. This approach results in more efficient tests with complete coverage of the targeted faults. Because most memories have an external data path which is wider than one bit, word-oriented memory tests are very important.
Moderators: J. Bausells, CNM, Barcelona, Spain,
M. Glesner, Technical University of Darmstadt, Germany
For MEMS devices modern technologies are used to integrate very complex components and subsystems closely together. Due to mixed-domain problems as well as the occuring interactions between the closely coupled system components the design is a sophisticated process. The interactions between the MEMS components have to be analysed by system simulation already in an early design stage. In this paper a modeling approach is introduced that enables the incorporation of mechanical microsystem components into the system simulation using network and system simulators like SABER. The approach is based on multi-terminal models of basic mechanical elements and their composition to more complex microsystems. First results for a micromechanical resonator are presented.
Two different field solver tools have been developed in order to facilitate fast thermal and electrostatic simulation of microsystem elements. The mS-THERMANAL program is capable for the fast steady-state and dynamic simulation of suspended multilayered microsystem structures. The 2D-SUNRED program is the first version of a general field solver based on an original method, the successive node reduction. SUNRED offers a very fast and accurate substitute of FEM programs for the solution of the Poisson equation. Steady-state and dynamic simulation examples demonstrate the usability of the novel tool.
In this work a Computer-Aided Testing (CAT) tool is proposed that brings a systematic way of dealing with testing problems in emerging microsystems. Experiments with case-studies illustrate the techniques and tools embedded in the CAT environment. Some of the open problems that shall be addressed in the near future as an extension to this work are also discussed.
Moderators: F.M. Johannes, Technical University of Munich, Germany,
J. Koehl, IBM Deutschland Entwicklung GmbH, Germany
This paper introduces SyMPVL, an algorithm for the approximation of the symmetric multi-port transfer function of an RLC circuit. The algorithm employs a symmetric block-Lanczos algorithm to reduce the original circuit matrices to a pair of typically much smaller, banded, symmetric matrices. These matrices determine a matrix-Padé approximation of the multi-port transfer function, and can serve as a reduced-order model of the original circuit. They can be "stamped" directly into the Jacobian matrix of a SPICE-type circuit simulator, or can be used to synthesize an equivalent smaller circuit. We also prove stability and passivity of the reduced-order models in the RL, RC, and LC special cases, and report numerical results for SyMPVL applied to example circuits.
As VLSI circuit speeds have increased, the need for accurate three-dimensional interconnect models has become essential to accurate chip and system design. In this paper, we describe an integral equation approach to modeling the impedance of interconnect structures accounting for both the charge accumulation on the surface of conductors and the current traveling along conductors. Unlike previous methods, our approach is based on a modified nodal analysis formulation and can be used directly to generate guaranteed passive low order interconnect models for efficient inclusion in a standard circuit simulator.
In this paper, an optimization scheme is proposed for interconnect design with wire width and series resistance being design variables. Due to the distributed nature of interconnects, poles of such systems are transcendental and infinite in number. First, a two-pole approximation is used to capture the system behavior. Lower-order moments are employed to obtain two approximate dominant poles. Then, the two parameters, damping ratio and natural undamped frequency, are expressed as functions of the two dominant poles. Since the output response is characterized by the two parameters, the parameters are used to define the objective function and constraints, which form a constrained multivariable nonlinear optimization problem. After that, the optimization problem is solved using gradient projection method. One advantage of our approach is the ability to explicitly control the maximum overshoot of the observation points. Two numerical examples are given.
Moderators: M. Servit, Czech Technical University, Czech Republic,
R. Peset Llopis, Philips Research Laboratories, The Netherlands
This paper proposes a vision for a new research domain emerging on the interface between design and manufacturing of VLSI circuits. The key objective of this domain is the minimization of the mismatch between design and manufacturing which is rapidly growing with the increase in complexity of VLSI designs and IC technologies. This broad objective is partitioned into a number of specific tasks. Often, one of the most important task is the extraction of VLSI design attributes that may be relevant from a manufacturing efficiency standpoint. The second task is yield analysis performed to detect process and design attributes responsible for inadequate yield. This paper postulates both, an overall change in the design-manufacturing interface, as well as a methodology to address the growing design-manufacturing mismatch. Attributes of a number of tools needed for this purpose are discussed as well.
This paper illustrates via examples problems at the design-manufacturing interface that exist in the IC industry today, and the ability of the YAN/PODEMA framework [1] in solving these problems. The need for further development of the framework is also emphasized.
Traditional VLSI design objectives are to minimize time-to-first-silicon while maximizing performance. Such objectives lead to designs which are not optimum from manufacturability perspective. The objective of this paper is to illustrate the above claim by performing performance/manufacturability tradeoff analysis. The basis for such an analysis, in which the relationship between a product's clock frequency and wafer productivity is modeled, is described in detail . New applied yield models are discussed as well.
Moderators: C. Landrault, LIRMM, France,
D. Medina, Italtel, Italy
A new approach for sequential circuit test generation is proposed that combines software testing based techniques at the high level with test enhancement techniques at the gate level. Several sequences are derived to ensure 100% coverage of all statements in a high-level VHDL description, or to maximize coverage of paths. The sequences are then enhanced at the gate level to maximize coverage of single stuck-at faults. High fault coverages have been achieved very quickly on several benchmark circuits using this approach.
We extend the subsequence removal technique to provide significantly higher static compaction for sequential circuits. We show that state relaxation techniques can be used to identify more or larger cycles in a test set. State relaxation creates more opportunities for subsequence removal and hence, results in better compaction. Relaxation of a state is possible since not all memory elements in a finite state machine have to be specified for a state transition. The proposed technique has several advantages: (1) test sets that could not be compacted by existing subsequence removal techniques can now be compacted, (2) the size of cycles in a test set can be significantly increased by state relaxation and removal of the larger sized cycles leads to better compaction, (3) only two fault simulation passes are required as compared to trial and re-trial methods that require multiple fault simulation passes, and (4) significantly higher compaction is achieved in short execution times as compared to known subsequence removal methods. Experiments on ISCAS89 sequential benchmark circuits and several synthesized circuits show that the proposed technique consistently results in significantly higher compaction in short execution times.
We propose several compaction procedures for synchronous sequential circuits based on test vector restoration. Under a vector restoration procedure, all or most of the test vectors are first omitted from the test sequence. Test vectors are then restored one at a time or in subsequences only as necessary to restore the fault coverage of the original sequence. Techniques to speed-up the restoration process are investigated. These include limiting the test vectors initially omitted from the test sequence, consideration of several faults in parallel during restoration, and the use of a parallel fault simulator.
Moderators: J. van Meerbergen, Philips Research Laboratories,
The Netherlands, H. Hermanani, Lebanese American University, Lebanon
This paper deals with integrating an interactive simulator within a behavioral synthesis tool, thereby allowing concurrent synthesis and simulation. The resulting environment provides a cycle based simulation of a behavioral module under synthesis. The simulator and the behavioral synthesis are based on a single model that allows to link the behavioral description and the architecture produced by synthesis. The basic simulation-synthesis model is extended in order to allow for concurrent architectural simulation of several modules under synthesis. This paper also discusses an implementation of this concept resulting in a simulator, called AMIS. This tool assists the designer for understanding the results of behavioral synthesis and for architecture exploration. It may also be used to debug the behavioral specification.
We present a grammar based specification method for hardware synthesis of data communication protocols in which the specification is independent of the port size. Instead, it is used during the synthesis process as a constraint. When the width of the output assignments exceed the chosen output port width, the assignments are split and scheduled over the available states. We present a solution to this problem and results of applying it to some relevant problems.
The importance of fault tolerant design has been steadily increasing as reliance on error free electronics continues to rise in critical military, medical, and automated transportation applications. While rollback and checkpointing techniques facilitate area efficient fault tolerant designs, they are inapplicable to a large class of time-critical applications. We have developed a novel synthesis methodology that avoids rollback, and provides both zero reduction in throughput and near-zero error latency. In addition, our design techniques reduce power requirements associated with traditional approaches to fault tolerance.
Moderators: T. Filkorn, Siemens AG, Germany,
H. Eveking, Darmstadt University of Technology, Germany
Word-Level Decision Diagrams (WLDDs), like *BMDs and K*BMDs, have recently been introduced as a data structure for verification. The size of WLDDs largely depends on the chosen variable ordering, i.e. the ordering in which variables are encountered, and on the decompositions carried out in each node. In this paper we present a framework for dynamic minimization of WLDDs. We discuss the difficulties with previous techniques if applied to WLDDs and present a new approach that efficiently adapts both variable ordering and decomposition type choice. Experimental results demonstrate that this method out-performs "classical" reordering with respect to run-time and representation size during dynamic minimization of word-level functions.
Because general algorithms for sequential equivalence checking require a state space traversal of the product machine, they are computationally expensive. In this paper, we present a new method for sequential equivalence checking which utilizes functionally equivalent signals to prove the equivalence of both circuits, thereby avoiding the state space traversal. The effectiveness of the proposed method is confirmed by experimental results on retimed and optimized ISCAS'89 benchmarks.
Incremental methods are successfully applied to deal with successive verifications of slightly modified switch-level networks. That is, only those parts affected by the changes are symbolically traversed for verification. In this paper, we present an incremental technique for symbolic simulators which is inspired in both existing incremental techniques for non-symbolic simulators and a token-passing mechanisms in Petri nets.
Organizer & Moderator: Erik Jan Marinissen, Philips Research Labs, The Netherlands co-organized in cooperation with IEEE's Design & Test of Computers
Speakers: Karel van Doorselaer, Alcatel Telecom, Belgium
Sridhar Narayanan, Sun Microsystems, USA
Gert Jan van Rootselaar, Philips Research Labs, The Netherlands
Moderators: G. Gielen, Katholieke Universiteit Leuven, Belgium,
C. Descleves, Dolphin Integration, France
This paper presents a new method for hierarchical characterization of analog integrated circuits. For each circuit class, a fundamental set of performances is defined and extracted topology-independently. A circuit being characterized is decomposed in general subcircuits. Sizing rules of these topology-independent subcircuits are included into the characterization by functional constraints. In this way, bad circuit sizing is detected and located.
The EASY analog design system includes a qualitative analysis tool for examination of the principal aptitude of a chosen circuit structure, as well as a symbolic analysis component. It allows the deduction of compact but sufficiently accurate design equations. These tools support the first steps of the design process and give insight in the behavior of the analog circuit.
This contribution presents an approach to formal verification of linear analog circuits with parameter tolerances. The method proves that an actual circuit fulfills a specification in a given frequency interval for all parameter variations. It is based on a curvature driven bound computation for value sets using interval arithmetic. Some examples demonstrate the feasibility of our approach.
Moderators: A. ten Berg, Philips Research Laboratories, The Netherlands,
M. Berkelaar, Eindhoven University of Technology, The Netherlands
This paper formalizes the synthesis process of wiring signature-invariant (WSI) combinational circuit mutants. The signature σ_{o} is defined by a reference circuit η_{o}, which itself is modeled as a canonical form of a directed bi-partite graph. A wiring perturbation γ induces a perturbed reference circuit η_{γ}. A number of mutant circuits η_{γi} can be resynthesized from the perturbed circuit η_{γ}. The mutants of interest are the ones that belong to the wiring-signature-invariant equivalence class N_{σo}, i.e. the mutants η_{γi} ∈ N_{σo}. Circuit mutants η_{γi} ∈ N _{σo} have a number of useful properties. For any wiring perturbation γ, the size of the wiring-signature-invariant equivalence class is huge. Notably, circuits in this class are not random, although for unbiased testing and benchmarking purposes, mutant selections from this class are typically random. For each reference circuit, we synthesized eight equivalence subclasses of circuit mutants, based on 0 to 100% perturbation. Each subclass contains 100 randomly chosen mutant circuits, each listed in a different random order. The 14,400 benchmarking experiments with 3200 mutants in 4 equivalence classes, covering 13 typical EDA algorithms, demonstrate that an unbiased random selection of such circuits can lead to statistically meaningful differentiation and improvements of existing and new algorithms.
Keywords: signature-invariance, equivalence class, circuit mutants, benchmarking.
This paper presents a technology mapping approach for the standard cell technology, which takes into account both gate area and routing area so as to minimize the total chip area after layout. The routing area is estimated using two parameters available at the mapping stage; one is the fanout count of a gate, and the other is the "overlap of fanin level intervals". To estimate the routing area in terms of accurate fanout counts, an algorithm is proposed which solves the problem of dynamic fanout changes in the mapping process. This also enables us to calculate the gate area more accurately. Experimental results show that this approach provides an average reduction of 15% in the final chip area after placement and routing.
Partial Scan techniques have been widely accepted as an effective solution to improve sequential ATPG performance while keeping acceptable area and performance overheads. Several techniques for flip-flop selection based on structural analysis have been presented in the literature. In this paper, we first propose a new testability measure based on the analysis of the circuit State Transition Graph through symbolic techniques. We then describe a scan flip flop selection algorithm exploiting this measure. We resort to the identification of several circuit macros to address large sequential circuits. When compared to other techniques, our approach shows good results, especially when it is used to optimize a set of flip-flops previously selected by means of structural analysis.
Moderators: C. Piguet, CSEM, Switzerland,
E. Macii, Politecnico di Torino, Italy
This paper presents one of the first analysis of the temperature dependence of CMOS integrated circuit delay at low voltage. Based on a low voltage extended Sakurai's a-power current law, a detail analysis of the temperature and voltage sensitivity of CMOS structure delay is given. Coupling effects between temperature and voltage are clearly demonstrated. Specific derating factors are defined for the low voltage range (1-3V T0 ). Experimental validations are obtained on specific ring oscillators integrated on a 0.7 mm process by comparing the temperature and voltage evolution of the measured oscillation period to the calculated ones. A low temperature sensitivity operating region has been clearly identified and appears in excellent agreement with the expected calculated values.
In this paper we present an efficient technique to reduce the power dissipation in a technology mapped CMOS sequential circuit based on logic and structural transformations. The power reduction is achieved by adding sequential redundancies from low switching activity gates to high switching activity gates (targets) such that the switching activities at the output of the targets are significantly reduced. We show that the power reducing transformations result in a circuit that is a valid replacement of the original. The notion of validity used here is that of a delay safe replacement [11, 12]. The potential transformations are found by direct logic implications applied to the circuit netlist. Therefore the complexity of the proposed transformation is polynomial in the size of the circuit, allowing the processing of large designs.
This paper presents a zero-skew gated clock routing technique for VLSI circuits. The gated clock tree has masking gates at the internal nodes of the clock tree, which are selectively turned on and off by the gate control signals during the active and idle times of the circuit modules to reduce switched capacitance of the clock tree. This work extends the work of [4] so as to account for the switched capacitance and the area of the gate control signal routing. Various tradeoffs between power and area for different design options and module activities are discussed and detailed experimental results are presented.
We present an integer-linear-programming-based approach for estimating the maximum instantaneous current through the power supply lines for CMOS circuits. It produces the exact solutions for the maximum instantaneous current for small circuits, and tight upper bounds for large circuits. We formulate the maximum instantaneous current estimation problem as an integer linear programming (ILP) problem, and solve the corresponding ILP formulae to obtain the exact solution. For large circuits we propose to partition the circuits, and apply our ILP-based approach for each sub-circuit. The sum of the exact solutions of all sub-circuits provides an upper bound of the exact solution for the entire circuit. Our experimental results show that the upper bounds produced by our approach combined with the lower bounds produced by a genetic-algorithm-based approach confine the exact solution to a small range.
Co-ordinator: Ivo Bolsens, IMEC, Belgium
Moderator: Ivo Bolsens, IMEC, Belgium
Speakers: Norbert Wehn, University of Kaiserslautern, Germany
Soren Hein, Siemens, Germany
Francky Catthoor, IMEC, Belgium
Roelof Salters, Philips Research Labs, The Netherlands
In this paper we discuss system-related aspects in embedded DRAM/logic designs. We focus on large embedded memories which have to be implemented as DRAMs.
Both in custom and programmable instruction-set processors for data-dominated multi-media applications, many of the architecture components are intended to solve the data transfer and storage issues. Recent experiments at several locations have clearly demonstrated that due to this fact, the main power (and largely also area) cost is situated in the memory units and the communication hardware. In this paper, the main reasons for this problem will be reviewed and a perspective will be provided on the expected near-future evolution. It will be shown that the circuit and process technology advances have been very significant in the past decade. Still, these are not sufficient to fully solve this power and area bottle-neck which has been created in the same period. Therefore, also several possible design methodology remedies will be proposed for this critical design issue, with emphasis on effective system-level memory management methodologies. These promise very large savings on energy-delay also on area for multi-media applications, while still meeting the real-time constraints.
Moderators: J. Franca, IST, Lisbon, Portugal,
H. Kerkhoff, University of Twente, The Netherlands
The complete application of a hierarchical top-down design methodology to analog sensor interface front-ends is presented: from system-level specifications down to implementation in silicon, including high-level synthesis, analog block generation and layout generation. A new approach for implementing accurate and fast power/area estimators for the different blocks in the architecture is described. These estimators provide the essential link between the high-level synthesis and the block generation in our hierarchical top-down methodology. The methodology is illustrated by means of the design of a complex and realistic example. Measurement results are included.
Analog simulation methodologies for the generation of macromodels of analog functional blocks, as reported in literature, are of limited use in practical circuit simulation due to frequent accuracy and efficiency problems. In this paper, a new approach to model the behaviour of nonlinear functional blocks is proposed. The approach is based upon the principles of systems theory. The outlined methodology supports the mapping of models from component into behavioural level. The nonlinearity of complex analog modules is reflected efficiently while the electrical signals are maintained.
In this paper an accurate, analytical model for the evaluation of the CMOS inverter delay in the sub-micron regime, is presented. A detailed analysis of the inverter operation is provided which results to accurate expressions describing the output waveform. These analytical expressions are valid for all the inverter operation regions and input waveform slopes. They take into account the influences of the short-circuit current during switching, and the gate-to-drain coupling capacitance. The presented model shows clearly the influence of the inverter design characteristics, the load capacitance, and the slope of the input waveform driving the inverter on the propagation delay. The results are in excellent agreement with SPICE simulations.
Moderators: M. Berkelaar, Eindhoven University of Technology,
The Netherlands, L. Stok, IBM T.J. Watson Research Center, USA
Redundancy removal is an important step in combinational logic optimization. After a redundant wire is removed, other originally redundant wires may become irredundant, and some originally irredundant wires may become redundant. When multiple redundancies exist in a circuit, this creates a problem where we need to decide which redundancy to remove first. In this paper, we present an analysis and a very efficient heuristic to deal with multiple redundancies. We associate with each redundant wire a Boolean function that describes how the wire can remain redundant after removing other wires. When multiple redundancies exist, this set of Boolean functions characterizes the global relationship among redundancies.
Functional decomposition is an important technique in logic synthesis, especially for the design of lookup table based FPGA architectures. We present a method for functional decomposition with a novel concept for the exploitation of don' t cares thereby combining two essential goals: the minimization of the number of decomposition functions in the current decomposition step and the extraction of common subfunctions for multi-output Boolean functions. The exploitation of symmetries of Boolean functions plays an important role in our algorithm as a means to minimize the number of decomposition functions not only for the current decomposition step but also for the (recursive) decomposition algorithm as a whole. Experimental results prove the effectiveness of our approach.
In this paper we introduce the first divide and conquer algorithm that is capable of exact hazard-free logic minimization in a constructive way. We compare our algorithm with the method of Dill/Nowick, which was the only known method for exact hazard-free minimization. We show that our algorithm is much faster than the method proposed by Dill/Nowick by avoiding a significant part of the search space. We argue that the proposed algorithm is a promising framework for the development of efficient heuristic algorithms.
Simple disjunctive decomposition is a special case of logic function decompositions, where variables are divided into two disjoint sets and there is only one newly introduced variable. This paper presents that many simple disjunctive decompositions can be found easily by detecting symmetric variables or checking variable cofactors. We also propose an algorithm that constructs a new logic representation for a simple disjunctive decomposition by assigning constant values to variables in the original representation. The algorithm enables us to apply the decomposition with keeping good structures of the original representation. We have performed experiments to restructure fanout free cones of multi-level logic circuits, and obtained better results than when not restructuring them.
Moderators: W. Nebel, University of Oldenburg and OFFIS, Germany,
J. Benkoski, Synopsys, France
This paper presents a methodology for power estimation of designs described at the behavioral-level as the interconnection of functional modules. The input/output behavior of each module is implicitly stored using BDDs, and the power consumed by the network is estimated using a novel and accurate entropy-based approach. As a demonstration example, we have used the proposed power estimation technique to evaluate and compare the effects of some architectural transformations applied to a reference design specification on the power dissipation of the corresponding implementations.
We propose a new approach to RT-level power modeling for combinational macros, that does not require simulation-based characterization. A pattern-dependent power model for a macro is analytically constructed using only structural information about its gate-level implementation. The approach has three main advantages over traditional techniques: i) it provides models whose accuracy does not depend on input statistics, ii) it offers a wide range of trade-off between accuracy and complexity, and iii) it enables the construction of pattern-dependent conservative upper bounds.
This paper illustrates, analytically and quantitatively, the effect of high-order temporal correlations on steady-state and transition probabilities in finite state machines (FSMs). As the main theoretical contribution, we extend the previous work done on steady-state probability calculation in FSMs to account for complex spatiotemporal correlations which are present at the primary inputs when the target machine models real hardware and receives data from real applications. More precisely: 1) using the concept of constrained reachability analysis, the correct set of Chapman-Kolmogorov equations is constructed; and 2) based on stochastic complementation and iterative aggregation/ disaggregation techniques, exact and approximate methods for finding the state occupancy probabilities in the target machine are presented. From a practical point of view, we show that assuming temporal independence or even using first-order temporal models is not sufficient due to the inaccuracies induced in steady-state and transition probability calculations. Experimental results show that, if the order of the source is underestimated, not only the set of reachable sets is incorrectly determined, but also the steady-state probability values can be more than 100% off from the correct ones. This strongly impacts the accuracy of the total power estimates that can be obtained via probabilistic approaches.
Moderators: L. Claesen, IMEC, Belgium, C. Delgado Kloos,
ETSI Telecommunicacion, Spain
This paper presents a new formal method for the efficient verification of concurrent systems that are modeled using a safe Petri net representation. Our method generalizes upon partial-order methods to explore concurrently enabled conflicting paths simultaneously. We show that our method can achieve an exponential reduction in algorithmic complexity without resorting to an implicit enumeration approach.
Petri nets are a graph-based formalism appropriate to model concurrent systems such as asynchronous circuits or network protocols. Symbolic techniques based on Binary Decision Diagrams (BDDs) have emerged as one of the strategies to overcome the state explosion problem in the analysis of systems modeled by Petri nets. The existing techniques for state encoding use a variable-per-place strategy that leads to encoding schemes with very low density. This drawback has been partially mitigated by using Zero-Suppressed BDDs, that provide a typical reduction of BDD sizes by a factor of two. This work presents novel encoding schemes for Petri nets. By using algebraic techniques to analyze the topology of the net, sets of places 'structurally related' can be derived and encoded by only using a logarithmic number of boolean variables. Such approach allows to drastically decrease the number of variables for state encoding and reduce memory and CPU requirements significantly.
Waveform narrowing is an attractive framework for circuit delay verification as it can handle different delay models and component delay correlation efficiently. The method can give false negative results because it relies on local consistency techniques. We present two methods to reduce this pessimism: 1) global timing implications and necessary assignments, and 2) a case analysis procedure that finds a test vector that violates the timing check or proves that no violation is possible. Under floating-mode, global implications eliminate timing check violation without case analysis in the c1908 benchmark, while for a tighter requirement case analysis finds a test vector after only 5 backtracks.
We present a new combinational verification technique where the functional specification of a circuit under verification is utilized to simplify the verification task. The main idea is to assign to each primary input a general function, called a coordinate function, instead of a single variable function as in most BDD-based techniques. BDDs of intermediate nodes are then constructed based on these coordinate functions in a topological order from primary inputs to primary outputs. Coordinate functions depend on primary input variables and extra variables. Therefore combinational verification is performed not over the set of primary input variables but over the extended set of variables. Coordinate functions are chosen in such a way that in the process of computing intermediate functions the dependency on the primary input variables is gradually replaced with that on the extra variables, thereby making boolean functions associated with primary outputs simple functions only in terms of the extra variables. We show that such a smart choice of coordinate functions is possible with the help of the high-level functional specification of the circuit.
Moderators: A. Richardson, University of Lancaster, UK,
M. Sachdev, Philips Research Laboratories, The Netherlands
An approach to test optimization in switched-capacitor systems based on fault simulation at switch-level is presented in this paper. The advantage of fault simulation at this granularity level is that it facilitates test integration as early as possible in the design of these systems. Due to their mixed-signal nature, both catastrophic and parametric faults must indeed be considered for test optimization. Adequate switch-level fault models are presented. Test stimuli and test measures can be selected as a function of fault coverage. The impact of design parameters such as switch resistance on fault coverage is studied and design parts of poor testability are located.
The paper describes an approach to optimize the application of the multi-configuration DFT technique for analog circuits. This technique allows to emulate the circuit in a number of new test configurations targeting the maximum fault coverage. The brute force application of the multi-configuration is shown to produce a very significant improvement of the original poor testability. An optimized approach is proposed to apply this DFT technique in a more refined way. The optimization problem consists in choosing among the various permitted test configurations, a set that leads to the best testability/cost trade-off. This set is selected according to ordered requirements: (i) the fundamental requirement of maintaining the maximum fault coverage and (ii) non-fundamental requirements of satisfying some user-defined cost functions such as test time, silicon overhead or performance degradation. Results are given that exhibit very interesting features in terms of either test procedure simplicity or DFT penalty reduction.
Earlier approaches dealt with the detection of catastrophic faults based on IDD monitoring. Consideration of the more subtle parametric faults and the ADC quantization noise, however, is essential for high-quality analog testing. The paper presents a new design method for analog test of parametric and catastrophic faults by IDD monitoring. ADC quantization noise is systematically considered throughout the method. Results prove its effectiveness.
Moderators: L. Stok, IBM T.J. Watson Research Center, USA
A. ten Berg, Philips Research Laboratories, The Netherlands
One essential step in sequential logic synthesis consists of finding a state encoding that meets some requirements, such as optimal implementation, or correctness in the case of asynchronous FSMs. Dichotomy-based constrained encoding is more general than other constrained encoding frameworks, but it is also more difficult to solve. This paper introduces a new formalization of this problem, which leads to original exact and heuristic algorithms. Experimental results show that the resulting exact solver outperforms the previous approaches.
Traditionally, state assignment algorithms follow the two-step strategy of first constraint generation and secondly constraint-guided encoding. There are well known drawbacks in both currently used models for constraint generation. Approaches following the input model generate face constraints without taking into account the sharing of logic among next state lines. Approaches following the input-output model generate face constraints for a priori determined set of dominance/disjunctive relations among the codes of the states which may not hold in final encoding. To overcome these limitations, we propose a dynamic input model which implements both above cited steps concurrently. The dynamic constraints are of the face type but they are generated during the encoding process and so take advantage of actual relations among partial codes. A general algorithm based on this model and which can target two-level as well as multiple-level implementations is described. Results obtained with the algorithm on the IWLS'93 machines are shown and they compare favorably with standard tools.
Delay-constrained area optimization is an important step in synthesis of VLSI circuits. Minimum area (minarea) retiming is a powerful technique to solve this problem. The minarea retiming problem has been formulated as a linear program; in this work we present techniques for reducing the size of this linear program and efficient techniques for generating it. This results in an efficient minarea retiming method for large level-clocked circuits (with tens of thousands of gates).
Moderators: M. Pedram, University of Southern California, USA,
M. Poncino, Politecnico di Torino, Italy
In this paper, we present a comprehensive high-level synthesis system that is geared towards reducing power consumption in control-flow intensive circuits. An iterative improvement algorithm is at the heart of the system. The algorithm searches the design space by handling scheduling, module selection, resource sharing and multiplexer network restructuring simultaneously. The scheduler performs concurrent loop optimization and implicit loop unrolling. It minimizes the expected number of cycles of the schedule without compromising on the minimum and maximum schedule lengths. A fast simulation technique based on trace manipulation aids power estimation in driving synthesis in the right direction. Experimental results demonstrate power reduction of up to85% with minimal overhead in area over area-optimized designs operating at 5V.
This paper propose an instruction scheduling technique to reduce power consumed for off-chip driving. The technique minimizes the switching activity of a data bus between an on-chip cache and a main memory when instruction cache misses occur. The scheduling problem is formulated and a scheduling algorithm is also presented. Experimental results demonstrate the effectiveness and the efficiency of the proposed algorithm.
The power dissipated by system-level buses is the largest contribution to the global power of complex VLSI circuits. Therefore, the minimization of the switching activity at the I/O interfaces can provide significant savings on the overall power budget. This paper presents innovative encoding techniques suitable for minimizing the switching activity of system-level address buses. In particular, the schemes illustrated here target the reduction of the average number of bus line transitions per clock cycle. Experimental results, conducted on address streams generated by a real microprocessor, have demonstrated the effectiveness of the proposed methods.
Moderators: M. Kovac, University of Zagreb, Croatia,
W. Glauert, University of Erlangen-Nurnberg, Germany
The paper presents a scalable architecture for multi-threaded Java applications. Threads enable modeling of concurrent behavior in a more or less natural way. Thus threads give a migration path to multi-processor machines. The proposed architecture consists of multiple application-specific processing elements, each able to execute a single thread at one time. The architecture is evaluated by implementing a portable and scalable Java machine onto an FPGA board for demonstration.
In this paper, we show how hardware/software co-evaluation can be applied to instruction set definition. As a case study, we show the definition and evaluation of instruction set extensions for fuzzy processing. These instructions are based on the use of subword parallelism to fully exploit the processor�s resources by processing multiple data streams in parallel. The proposed instructions are evaluated in software and hardware to gain a balanced view of the costs and benefits of each instruction. We have found that a simple instruction optimized to perform fuzzy rule evaluation offers the most benefit to improve fuzzy processing performance. The instruction set extensions are added to a RISC processor core based on the MIPS instruction set architecture. The core has been described in VHDL so that hardware implementations can be generated using logic synthesis.
This paper presents a system-level design environment for date transport processing systems. In this environment, designers can easily verify system behavior by formally defining data structures and their related actions, without considering detailed timing. In addition, the verified specification can be translated into synthesizable RTL descriptions by a dedicated RTL generator. Thus, using lower-level EDA tools, actual hardware can be obtained directly from a system-level specification.
Moderators: J.L. Huertas, Centro Nacional de Microelectronica, Spain,
J. Pikkarainen, Nokia Mobile Phones, Finland
Industry trends aimed at integrating higher levels of circuit functionality have triggered a proliferation of mixed analog-digital systems. Magnified noise coupling through the common chip substrate has made the design and verification of such systems an increasingly difficult task. In this paper we present a fast eigen-decomposition technique that accelerates operator application in BEM methods and avoids the dense-matrix storage while taking all of the substrate boundary effects into account explicitly. This technique can be used for accurate and efficient modeling of substrate coupling effects in mixed-signal integrated circuits.
This paper describes a method to improve the efficiency of nonlinear DC fault simulation. The method uses the Newton-Raphson algorithm to simulate each faulty circuit. The key idea is to order the given list of faults in such a way that the solution of previous faulty circuit can serve as a good initial point for the simulation of the next faulty circuit. To build a good ordering, one step Newton-Raphson iteration is performed for all the faulty circuits once, and the results are used to quantify how faulty circuits and eh good circuit are close in their behaviors. With one-step Newton-Raphson iteration implementation by Householder's formula, the proposed method has virtually no overhead. Experimental results on a set of 36 MCNC benchmark circuits show an average speedup of 4.4 and as high as 15 over traditional stand-alone fault simulation.
This paper presents an approach towards realistic fault prediction in analog circuits. It exploits the Inductive Fault Analysis (IFA) methodology to generate explicit models able to give the probability of occurrence of faults associated with devices in an analog cell. This information intends to facilitate the integration of design and test phases in the development of an IC since it provides a realistic fault list for simulation before going to the final layout, and also makes possible layout optimization towards what we can call layout level design for testability.
Two main aspects in hardware/software co-design are hardware/software partitioning and co-synthesis. Most co-design approaches work only on one of these problems. In this paper, an approach coupling hardware/software partitioning and co-synthesis will be presented, working fully-automatic. The techniques have been integrated in the co-design tool Cool 1 supporting the complete design flow from system specification to board-level implementation for multi-processor and multi-ASIC target architectures for dataflow dominated applications.
This paper presents SHAPES, a tool for hardware-software partitioning. It is based on two main paradigms: the implementation of the partitioning tool by means of an expert system, and the use of fuzzy logic to model the parameters involved in the process.
A formal definition of the general VHDL-AMS analogue system has been proposed to relate the way in which the language affects the specification of a non-linear discontinuous analogue system. It has been suggested to model the break set as a separate system in order to facilitate the interaction between the analogue equation set and the digital abstract machine. The significance of the proposed model is that it may be used in semantic validation of VHDL-AMS description and may also facilitate mixed-signal equation formulation for an underlying VHDL-AMS simulator.
Partial scan DFT is a commonly used technique for improving testability of sequential circuits while maintaining overhead as low as possible. In this context, the selection of the partial scan chain [1] is usually performed at gate-level (e.g [2],[3]). In this paper, we present a method for quickly selecting the partial Scan Chain (SC) in datapath-like circuits. The so-obtained SC is such that the number of scan FFs is optimized and such that the achievable fault coverage is the same than with full scan approach.
IDEA is a symmetric block cipher with a 128-bit key proposed to replace DES where a strong encryption is required. Many applications need speed of a hardware encryption implementation while trying to preserve flexibility and low cost of a software implementation. In this paper we have presented one solution of this problem. Our system architecture uses single core module named Round to implement IDEA algorithm. Using the core we were able to implement and test example application in only three days. This "cf the shef" solution for designing cryptographic application using IDEA algorithm significantly reduced design cycle, thus greatly reducing time-to-market and cost of such designs. By increasing the number of the round modules system designer can linearly increase speed of the design. This system design methodology makes it possible to achieve necessary performance, or to preserve area (and reduce costs) when needed unlike other known approaches. We have implemented one round UNICORN architecture in Xilinx FPGA. After implementation the chip has been tested using the standard test vectors and it was capable of performing 2.8Mbps encryption in both ECB and CBC mode.
We present a technique for determining the best data cache size required for a given memory-intensive application. A careful memory and cache line assignment strategy based on the analysis of the array access patterns effects a significant reduction in the required data cache size, with no negative impact on the performance, thereby freeing vital on-chip silicon area for other hardware resources. Experiments on several benchmark kernels performed on LSI Logic's CW4001embedded processor simulator confirm the soundness of our cache sizing and memory assignment strategy and the accuracy of our analytical predictions.
This paper presents an innovative technique to efficiently develop hardware and software code generators. The specification model is first converted into its equivalent data structure. Target programs result from a set of transformation rules applied to the data structure. These rules are written in a textual form named Script. Moreover, transformations for a specific code generator are easier to describe because our solution uses a template of the required output as another input. The result is a meta-generator entirely written in Java. The concept and its implementation have been demonstrated by developing a C/WxWorks code generator, a behavioral VHDL generator, a synthesizable VHDL generator.
This paper describes a new code optimization technique for digital signal processors (DSPs). One important characteristic of DSP algorithms are iterative accesses to data array elements within loops. DSPs support efficient address computations for such array accesses by means of dedicated address generation units (AGUs). We present a heuristic technique which, given an AGU i with a fixed number of address registers, minimizes the number of instructions needed for array address computations in a program loop.
This paper discusses issues of graphical modelling of Finite State Machines with Datapath (FSMDs). Tools supporting the graphical entry of state based systems are usable by intuition, but need to be based on an exact definition of semantics of graphical elements. This paper pro-poses to define semantics of graphical models based on the hardware description language VHDL.
Attribute grammars have been used extensively in every phase of traditional compiler construction. Recently, it has been shown that they can also be effectively adopted to handle scheduling algorithms in high-level synthesis. Their main advantages are modularity and declarative notation in the development of design automation environments. In this paper, past results are further elaborated and more scheduling techniques are presented and implemented in a flexible environment for the design automation of digital systems. This novel approach can be proven valuable for fast evaluation of new algorithms and techniques in the field.
Three navigation tools are presented to statically and interactively analyze Soft-Cores described in VHDL, [1]. These tools ease the adoption of mechanisms to perform reviews and audits procedures similar to those adopted in software development, [2]. These navigation tools help to better understand and reuse VHDL Soft-Cores. The three navigators are integrated in a VHDL-ICE environment, [3], to get design data management support.
To validate the functionality of a new highly complex graphics processor described in VHDL the working environment of the processors has to be modelled. In some cases appropriate models for the external components are commercially available, in other cases these models have to be created. In this paper a general memory model for SGRAMs is presented which had to be implemented to have a flexible simulation environment for a high speed graphics processor at hand. Key features are the generality, the support of SGRAM arrays of various shapes and functions supporting the simulation process. This functionality goes far beyond the capabilities of currently commercially available SGRAM models.
This paper investigates some design flows to obtain final designs on Xilinx XC4000 FPGAs. The examples generated by high level synthesis were mapped including placement and routing. This reveals that the common criteria of area optimal or delay-optimal circuits should be enlarged by routability and computing time.
A new approach to mixed-signal circuit interfacing based on fuzzy logic models is presented. Due to their continuous rather than discrete character, fuzzy logic models offer a significant improvement compared with the classical D-A interface models. Fuzzy logic D-A interfaces can represent the boundary between the digital and analogue worlds accurately without a significant loss of computational efficiency. The potential of mixed-signal interfacing based on fuzzy logic is demonstrated by an example of spike propagation from the digital to analogue world. A model of inertial propagation delay and non-linear DC gain suitable for fuzzy logic gates is also suggested.
An optimized hardware software cosimulation method based on the backplane approach is presented in this paper. To enhance the performance of cosimulation, efforts are focused on reducing control packets between simulators as well as concurrent execution of simulators without roll-back.
A Gallium Arsenide automated layout generation system (OLYMPO) for SSI, MSI and LSI circuits used in GaAs VLSI design has been developed. We introduce a full-custom layout style, called RN-based cell model, that it is suited to generate low self-inductance circuit layouts of cells and macrocells. The cell compiler can be used as a cell library builder and it is embedded in a random logic macrocell and an iterative logic array generator. Experimental results demonstrate that OLYMPO generates complex and compact layouts and the synthesis process can be interactively used at the system design level.
Verifying an implementation produced from high-level synthesis is a challenging problem due to many complex design tasks involved in the design process. In this paper, we present an architectural rule checking approach for high-level design verification. This technique detects and locates various design errors and verifies both the consistency and correctness of an implementation. Besides describing different rule suites, we also report a working environment for the architectural rule checking. Finally, we highlight the value of the proposed approach with a real-life design.
This paper proposed the unified design technique which combines electromagnetic field analysis [FDTD technique] with circuit simulator [HSPICE]. Proposed technique can analyze the integrated circuits [ICs], multi-chip-module [MCM], and printed circuit board [PCB] design in high-efficiency and high-accuracy including the rounding noise throughout the substrate. Furthermore, this technique can not only analyze the small signal operation but also large signal operation.
The SIA Roadmap [1] predicts a very aggressive path of technologies from 0.35 um technology design to 0.10 um technology design. Increasing frequencies together with decreasing geometries lead to a number of issues which need to be examined. Testing is clearly one main issue. Another area of concern is that of signal integrity of the interconnects. The interconnects must not only be analyzed with regard to opens and shorts but also with regard to the signal delays. Up to now, opens and shorts in bus systems on boards have been tested using boundary scan, mostly neglecting delay test. In addition, it has to be considered that the signal delay (i.e. the time when the signal crosses the switching threshold of the following gate) on a certain line within a bus system depends on the set of input signals of all bus lines. Furthermore, hazards can occur due to coupling between bus lines which can lead to an incorrect function of the whole circuit.
This paper presents a method to estimate the quality of a set of test vectors and the validation procedures from pre-synthesised descriptions in VHDL. The method is based on the definition of fault models, for test features evaluation, and error models, for quality validation estimation.
The design of Self-Checking circuits through output encoding finds a bottleneck in the realization of the network so that each fault produces only errors detectable by the adopted code. An analysis of an expected TSC network is proposed, based on the application of the weighted observability approach. The aim is the verification of the SC property of the encoded circuit (TSC fault simulation) and identification of critical areas for a consequent manipulation to achieve a complete fault coverage.
The implementation of an Cif-Chip I_{DDQ} monitor to support the test Cf complex ASICs is presented in this paper. The monitor can be incorporated into a standard automated test equipment (ATE). It is capable of driving a 2 uF capacitive load can can perform measurements of the I_{DDQ} of a device under test (DUT) in the 0-1mA range. According to measurements the monitor can operate at the test rates up to 30kHz and offers an resolution better than 0.1uA. The on-chip integrated bypass switch is capable of handling DUT transient currents up to several amps. The IOCIMU prototype was fabricated in the 2-um Mietec BiCMOS technology and has an active chip area of 20 mm^{2}.
In this paper a new topology optimization feature of a module generator environment [5-6] will be presented. The optimization is performed by removing redundant elements of objects already placed and by assessing different layout topologies of a module. This drastically reduces the length of the generator source code, because different topologies need no separate source code, but result automatically.
This paper presents an approach to generating asynchronous schedules of various concurrency levels and describes novel net-based scheduling and allocation optimization techniques for asynchronous high-level synthesis. The asynchronous schedules are optimized through the sets of concurrent variable and statement pairs. Experimental results and a comparison of the net-based techniques with the best sequential scheduling and allocation techniques are presented.
The importance of identifying false paths in a combinational circuit cannot be overstated since they may mask the true delay. We present a fast algorithm based on boolean satisfiability for solving this problem. We also present extensions to this per-path approach to find the critical path of a circuit in a reasonable time.
This paper describes algebraic techniques that target low power consumption. A unique power cost function based on de-composed factored form representation of a Boolean expression is introduced to guide the structural transformations. Circuits synthesized by the SIS [5] and POSE [1] consume 54.5% and 10.4% more power than that obtained by our tool respectively.
This paper presents a unified power and timing modeling for ASIC libraries. This ASIC library is being standardized and targeted for a design flow, where timing analysis is complemented by power analysis. We show benchmark results from new industrial gate-level power analysis tools.
An automated technique to narrow down the number of parameters for linear constraint transformation models of analog circuits is described. The sets of more important circuit parameters and specifications are confined in an efficient constraint transformation model. The method is based on least square approximation and principal component analysis of the sensitivity matrix of the transformation. The resulting model encompass the constraints confined using designers' expertize for approximated circuit calculations.
In most applications of digital logic circuits, the circuit function is either specified (0,1) or unspecified (don't-care) for every input condition. However, there are also applications where any one of a subset of functions is an acceptable solution, even though it is not possible to represent all the functions in terms of output don't-cares. In this case, we say that the function is flexible. Flexible functions were considered before in [1]. In this work, we propose a synthesis procedure for flexible functions based on functional blocks called comparison units [2]. The main differences between the proposed procedure and the procedures of [1] are the following. (1) We do not require a closed-form representation of all the flexibility that exists in specifying the function f . We only require that a procedure would exist to check whether a given function belongs to the class of acceptable functions. (2) We use a specific architecture for the implementation of flexible functions. This architecture, based on comparison units [2], is particularly suitable for implementing flexible functions, since the correspondence between circuit size and certain properties of the implemented function is strong and easy to utilize for the minimization of the implementation. The proposed synthesis procedure starts from an acceptable function f' that may be used to implement f . It then modifies f' so as to change certain properties of f' that lead to smaller comparison unit based implementations. Before any modification of f' is accepted, a check is made to make sure that the modified function is an acceptable implementation of f . Modifications are made as long as it is possible to change the properties of f' that lead to a reduction in the implementation size. We also demonstrate that implementations using comparison units for conventional, non-flexible functions are an effective intermediate step for synthesis. For this purpose, we apply the synthesis tool suite SIS from the University of California at Berkeley in two ways. (1) To a comparison unit implementation of a function, and (2) directly starting from the truth table of the function. In most cases, the area of the circuit derived from a comparison unit based implementation is smaller.
This paper introduces a denotational semantics of a behavioral subset of VHDL. This subset is restricted to basic data types only and does not allow for clauses in wait statement. We consider the full model of time and resolution, we give a precise definition of the simulation mechanism. Easy translation rules from VHDL to Boyer-Moore logic can be derived from that semantics.
This paper presents a formal synthesis system which delegates the design space exploration to non-formal, and potentially incorrect, high level synthesis tools. With a quadratic complexity, our system obtains either a truly correct-by-construction design, since the formal design process constitutes itself the verification process, or demonstrates that the solution found by the conventional tool was incorrect.
We present a global design for test methodology for testing a core-based system in its entirety. This is achieved by introducing a 'bypass' mode for each core by which the data can be transferred from a core input port to the output port without interfering the core circuitry itself. The interconnections are thoroughly tested since they are used topropagate test data (patterns or signatures) in the system. The system is modeled as a directed weighted graph in which the core accessibility is solved as a shortest path problem.
This paper presents a novel 1-out-of-n checker that, compared to the other implementations up to now presented, features the advantages of: i) satisfying the TSC or SCD property with respect to all possible internal faults representative of realistic failures; ii) presenting a single output line; iii) requiring significantly lower area overhead.
We propose a non-scan design-for-testability (DFT) method to increase the testability of synchronous sequential circuits. Non-scan DFT allows at-speed testing, as opposed to scan or partial-scan based DFT that normally leads to low-speed testing and longer test application times due to scan operations. The proposed method is based on the identification of several types of restrictions imposed by the combinational logic of the circuit on the values that can be assigned to the next-state variables. These restrictions limit the set of states the circuit can reach, thus limiting the set of input patterns that can be applied to its combinational logic during normal operation. This in turn limits the fault coverage that can be achieved. The proposed DFT procedure is different from other non-scan based DFT procedures [1], [2] in that it relies on lines available locally to drive the inserted DFT logic, avoiding the routing of primary input lines to the flip-flops, and the routing of internal lines to the primary outputs. The proposed scheme uses the complement value Y of a next state variable Y or the value of an adjacent state variable Y in order to change the value of Y, and thus enrich the set of states that can be reached by the circuit. The proposed approach considers several special cases that result in unreachable states (or states that cannot be easily reached) to determine where the DFT logic will be placed. We consider cases where a next-state variable always (or almost always) carries a single value under a random sequence of input vectors, and cases where two next-state variables carry the same values, or complemented values. These cases have a drastic effect on the set of state variable patterns that can be applied to the combinational logic of the circuit in practical time, thus limiting its testability.
We describe a fast (linear time) procedure to optimally size transistors in a chain of multi-input gates/stages. The fast sizing used in a simultaneous sizing and restructuring optimization procedure, to accurately predict relative optimal performance alternative circuit structures for a given total area. The idea extends the concept of optimally sizing a buffer chain[5], and uses tapering constants based on the position of a stage in a circuit, and the position of a transistor in a stack.
A new test technique for linear analog circuits which employs current injection as input test stimulus is described. Our investigations have shown that current transitions resulting from a current injected on internal test points are significantly different for the fault free and faulty circuits. This can be used for fault detection purposes. In fact, the current injection as test input stimulus represents a powerful alternative to the test approaches based on conventional voltage input stimulus. The new approach allows to improve the testability of various faults, which are difficult to detect or are untestable when using voltage-based test stimulus. In addition the technique has significant advantages for BIST testing purposes. The technique is illustrated by means of a modern opamp circuit and by considering catastrophic and gate-oxide-short (GOS) faults.