IP5 Interactive Presentations

Printer-friendly version PDF version

Date: Thursday 17 March 2016
Time: 15:30 - 16:00
Location / Room: Conference Level, Foyer

Interactive Presentations run simultaneously during a 30-minute slot. A poster associated to the IP paper is on display throughout the afternoon. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session, prior to the actual Interactive Presentation. Moreover, one "Best Interactive Presentation Award" will be given.

LabelPresentation Title
Authors
IP5-1RELIABILITY AND PERFORMANCE TRADE-OFFS FOR 3D NOC-ENABLED MULTICORE CHIPS
Speaker:
Partha Pande, Washington State University, US
Authors:
Sourav Das1, Janardhan Rao Doppa1, Partha Pande1 and Krishnendu Chakrabarty2
1Washington State University, US; 2Duke University, US
Abstract
Three-dimensional (3D) integration, a breakthrough technology to achieve "More Moore and More Than Moore," provides the benefits of better performance, lower power consumption, and increased bandwidth through the use of vertical interconnects and 3D stacking. The vertical interconnects enable the design of a high-bandwidth and energy-efficient small-world (SW) network-based 3D network-on-Chip (3D SWNoC) for massive multicore platforms. However, the anticipated performance gain of a 3D SWNoC-enabled multicore chip may be compromised due to the potential failures of through-silicon- vias (TSVs) that are predominantly used as vertical interconnects. In particular, due to the non-homogeneous traffic patterns, heavily used TSVs may wear-out quickly and can also contribute to the wear-out of neighboring TSVs. As a result, the mean-time-to-failure (MTTF) of those TSVs will decrease, which will adversely affect the overall lifetime of the chip. In this paper, we address this traffic-dependent TSV wear-out problem in 3D SWNoC. We demonstrate that by employing an adaptive routing mechanism, we can improve the MTTF of 3D SWNoC significantly while still providing 21% lower energy-delay-product (EDP) compared to a conventional 3D MESH.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-2MEMORY-ACCESS AWARE DVFS FOR NETWORK-ON-CHIP IN CMPS
Speaker:
Yuan Yao, KTH Royal Institute of Technology, SE
Authors:
Yuan Yao and Zhonghai Lu, KTH Royal Institute of Technology, SE
Abstract
We present a new DVFS technique for network-on-chip (NoC) that adjusts the voltage/frequency scales of routers according to memory-access characteristics of application running on the CMP. The memory characteristics are periodically profiled, reflecting both resource-access density in the network and memory-access criticality for application performance. The network conducts per-router voltage/frequency tuning using the memory-access density information while it performs priority-based switch allocation to speed up critical packets and avoid starvation using the memory-criticality information. Compared to a latest per-router DVFS approach, benchmark experiments demonstrate that our memory-access characteristics aware DVFS technique achieves not only better power saving, energy-delay product, but also enhanced network and application performance.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-3A DYNAMICALLY RECONFIGURABLE ECC DECODER ARCHITECTURE
Speaker:
Philippe Coussy, Universite Bretagne Sud / Lab-STICC, FR
Authors:
Awais Sani1, Philippe Coussy2 and Cyrille Chavet3
1Universite de Bretagne-Sud, FR; 2Universite de Bretagne-Sud / Lab-STICC, FR; 3Lab-STICC / Université de Bretagne Sud, FR
Abstract
Due to their impressive error correction performances, Error Correcting Codes (ECC) are now widely used in communication systems. In order to achieve high throughput requirements ECC decoders are based on parallel architectures, which results in a major issue: memory access conflicts. In this paper, we introduce a new class of ECC decoder architectures that dynamically reconfigures by executing on-chip a memory mapping approach. For that purpose, a dedicated algorithm taking into account network constraint is presented. A smart architecture based on a butterfly network and a reconfiguration unit is also proposed. Experimental results show that real-time reconfiguration at reasonable hardware cost is possible.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-4RESISTIVE BLOOM FILTERS: FROM APPROXIMATE MEMBERSHIP TO APPROXIMATE COMPUTING WITH BOUNDED ERRORS
Speaker:
Abbas Rahimi, University of California, Berkeley, US
Authors:
Vahideh Akhlaghi1, Abbas Rahimi2 and Rajesh K. Gupta1
1University of California, San Diego, US; 2University of California, Berkeley, US
Abstract
Approximate computing provides an opportunity for exploiting application characteristics to trade the accuracy for gains in energy efficiency. However, such opportunity must be able to bound the error that the system designer provides to the application developer. Space-efficient probabilistic data structure such as Bloom filter can provide one such means. Bloom filter supports approximate set membership queries with a tunable rate of false positives (i.e., errors) and no false negatives. We propose a resistive Bloom filter (ReBF) to approximate a function by tightly integrating it to a functional unit (FU) implementing the function. ReBF approximately mimics partial functionality of the FU by recalling its frequent input patterns for computational reuse. The accuracy of the target FU is guaranteed by bounding the ReBF error behavior at the design time. We further lower energy consumption of a FU by designing its ReBF using low-power memristor arrays. The experimental results show that function approximation using ReBF for five image processing kernels running on the AMD Southern Islands GPU yields on average 24.1% energy saving in 45 nm technology compared to the exact computation.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-5REAL-TIME SYSTEM-LEVEL IMPLEMENTATION OF A TELEPRESENCE ROBOT USING AN EMBEDDED GPU PLATFORM
Speaker:
Swathi Gurumani, Advanced Digital Sciences Center, SG
Authors:
Muhammad Teguh Satria1, Swathi Gurumani1, Wang Zheng2, Keng Peng Tee2, Augustine Koh1, Pan Yu2, Kyle Rupnow1 and Deming Chen3
1Advanced Digital Sciences Center, SG; 2Institute for Infocomm Research, SG; 3UIUC, US
Abstract
Real-time applications such as telepresence systems present an opportunity to use embedded GPUs for compute acceleration to meet platform goals. In this paper, we develop a prototype of a portable, standalone telepresence robot that performs real-time attention-directed control using an NVIDIA Jetson TK1 embedded platform. We perform platform-specific optimizations to improve thread occupancy, optimize computa- tion workload and improve accuracy of face detection on the embedded GPU and achieve real-time performance of 30 frames per second on the Jetson TK1 and an overall speedup of 10x compared to the ARM CPU version.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-6EXPLORING SPECIALIZED NEAR-MEMORY PROCESSING FOR DATA INTENSIVE OPERATIONS
Speaker:
Salessawi Ferede Yitbarek, University of Michigan, US
Authors:
Salessawi Ferede Yitbarek1, Tao Yang2, Reetuparna Das1 and Todd Austin1
1University of Michigan, US; 2University of California, San Diego, US
Abstract
Emerging 3D stacked memory systems provide significantly more bandwidth than current DDR modules. However, general purpose processors do not take full advantage of these resources offered by the memory modules. Taking advantage of the increased bandwidth requires the use of specialized processing units. In this paper, we evaluate the benefits of placing hardware accelerators at the bottom layer of a 3D stacked memory system compared to accelerators that are placed external to the memory stack. Our evaluation of the design using cycle-accurate simulation and RTL synthesis shows that, for important data intensive kernels, near-memory accelerators inside a single 3D memory package provide 3x-13x speedup over a Quad-core Xeon processor. Most of the benefits are from the application of accelerators, as the near-memory configurations provide marginal benefits compared to the same number of accelerators placed on a die external to the memory package. This comparable performance for external accelerators is due to the high bandwidth afforded by the high-speed off-chip links. On the other hand, near-memory accelerators consume 7%-39% less energy than the external accelerators.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-7MATLAB TO C COMPILATION TARGETING APPLICATION SPECIFIC INSTRUCTION SET PROCESSORS
Speaker:
Francky Catthoor, Interuniversity Microelectronics Centre (IMEC), BE
Authors:
Ioannis Latifis1, Karthick Parashar2, Grigoris Dimitroulakos1, Hans Cappelle2, Christakis Lezos1, Konstantinos Masselos1 and Francky Catthoor2
1University of Peloponnese, GR; 2Interuniversity Microelectronics Centre (IMEC), BE
Abstract
This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C code in which the processor's special instructions are represented via specialized intrinsic functions. By doing this the generated code can be used as input to any C/C++ compiler. Thus the proposed compiler allows the description of the specialized instruction set of the target processor in a parameterized way allowing the support of any processor. The proposed compiler has been used for the generation of application code for an ASIP targeting DSP applications. The code generated by the proposed compiler achieves a speed up between 2x-30x on the targeted ASIP for six DSP benchmarks compared to the code generated by Mathworks MATLAB to C compiler. Thus the proposed compiler can be employed to reduce the development time/effort/cost and time to market by raising the abstraction of application design in an embedded systems / system-on-chip development context while still improving implementation efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-8SAMPLING-BASED BUFFER INSERTION FOR POST-SILICON YIELD IMPROVEMENT UNDER PROCESS VARIABILITY
Speaker:
Grace Li Zhang, Technische Universität München (TUM), DE
Authors:
Grace Li Zhang, Bing Li and Ulf Schlichtmann, Technische Universität München (TUM), DE
Abstract
At submicron manufacturing technology nodes process variations affect circuit performance significantly. This trend leads to a large timing margin and thus overdesign to maintain yield. To combat this pessimism, post-silicon clock tuning buffers can be inserted into circuits to balance timing budgets of critical paths with their neighbors. After manufacturing, these clock buffers can be configured for each chip individually so that chips with timing failures may be rescued to improve yield. In this paper, we propose a sampling-based method to determine the proper locations of these buffers. The goal of this buffer insertion is to reduce the number of buffers and their ranges, while still maintaining a good yield improvement. Experimental results demonstrate that our algorithm can achieve a significant yield improvement (up to 35%) with only a small number of buffers.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-9PRADA: COMBATING VOLTAGE NOISE IN THE NOC POWER SUPPLY THROUGH FLOW-CONTROL AND ROUTING ALGORITHMS
Speaker:
Prabal Basu, Utah State University, US
Authors:
Prabal Basu, Rajesh JayashankaraShridevi, Koushik Chakraborty and Sanghamitra Roy, Utah State University, US
Abstract
Network-on-Chip (NoC) has become the de-facto standard for on-chip communication in MPSoCs. The growing NoC power footprint, increase in the transistor current, and high switching speed of the logic devices, exacerbate the peak power supply noise (PSN) in the NoC power delivery network (PDN). Hence, preserving power supply integrity in the NoC PDN is critical. In this work, we propose PRADA (PSN-aware Runtime Adaptation)—a collection of a novel flow-control protocol (PAF) and an adaptive routing algorithm (PAR), to mitigate PSN in NoCs. Our best scheme achieves 14% and 12% improvements in the regional peak PSN and energy ef- ficiency, with an average of 4.6% performance overhead and marginal area and power footprints.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-10A POWER-EFFICIENT 3-D ON-CHIP INTERCONNECT FOR MULTI-CORE ACCELERATORS WITH STACKED L2 CACHE
Speaker:
Kyungsu Kang, Samsung, KR
Authors:
Kyungsu Kang1, Luca Benini2, Giovanni De Micheli3, Sangho Park1 and Jong-Bae Lee1
1Samsung, KR; 2Università di Bologna, IT; 3École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
The use of multi-core clusters is a promising option for data-intensive embedded applications such as multimodal sensor fusion, image understanding, mobile augmented reality. In this paper, we propose a power-efficient 3-D onchip interconnect for multi-core clusters with stacked L2 cache memory. A new switch design makes a circuit-switched Mesh-of-Tree (MoT) interconnect reconfigurable to support power-gating of processing cores, memory blocks, and unnecessary interconnect resources (routing switch, arbitration switch, inverters placed along the on-chip wires). The proposed 3-D MoT improves the power efficiency up to 77% in terms of energy-delay product (EDP).

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-11POWER-EFFICIENT LOAD-BALANCING ON HETEROGENEOUS COMPUTING PLATFORMS
Speaker:
Muhammad Shafique, Karlsruhe Institute of Technology (KIT), DE
Authors:
Muhammad Usman Karim Khan1, Muhammad Shafique1, Apratim Gupta2, Thomas Schumann2 and Jörg Henkel1
1Karlsruhe Institute of Technology (KIT), DE; 2University of Applied Sciences, Darmstadt, DE
Abstract
In order to address the throughput constraints of the system at minimal power consumption, the workload of computing nodes should be balanced. This requires accounting for the underlying hardware characteristics (e.g., power vs. frequency profiles) and throughput sustainable by these nodes. This work provides a workload distribution and balancing methodology of a divisible load under a throughput constraint, on heterogeneous nodes. The power efficiency of each node is considered during load distribution. For load balancing, the frequency of the node is determined which just fulfills the job requirements of the nodes. We functionally verify our methodology by implementing it on an FPGA-based system, with heterogeneous multi-cores and hardware accelerators, and report results for different image processing benchmarks. Compared to a state-of-the-art-approach, our approach results in up to 64% performance improvement for the benchmarks evaluated in this paper.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-12TOPAZ: MINING HIGH-LEVEL SAFETY PROPERTIES FROM LOGIC SIMULATION TRACES
Speaker:
Fadi Kurdahi, University of California, Irvine, US
Authors:
Ahmed Nassar1, Fadi Kurdahi1 and Salam Zantout2
1University of California, Irvine, US; 2American University of Beirut, LB
Abstract
Formal specifications are hard to formulate and maintain for evolving complex digital hardware designs. Specification mining offers a (partially) automated route to discovering specifications from large simulation traces. In this paper, we embark on a novel and rigorous mining methodology (data preparation, mining algorithms, selection criteria, etc.) for finite-state automata checkers using an iterative and interactive mining tool, called Topaz. Topaz is evaluated using an open-source 32-bit RISC CPU design as a case study to demonstrate extraction of complex temporal properties cross-cutting through all CPU pipeline stages, guided by the CPU instruction set specification.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-13EXPLOITING TRANSACTION LEVEL MODELS FOR OBSERVABILITY-AWARE POST-SILICON TEST GENERATION
Speaker:
Prabhat Mishra, University of Florida, US
Authors:
Farimah Farahmandi1, Prabhat Mishra1 and Sandip Ray2
1University of Florida, US; 2Intel Corporation, US
Abstract
A critical problem in post-silicon debug is to generate efficient tests that both activate requisite coverage goals on the target hardware as well as produce results that are observable through a given on-chip design-for-debug architecture. Unfortunately, such tests cannot be generated directly from RTL models, both due to design complexity and due to bugs in the design itself. In this paper, we propose an approach to address this problem by exploiting transaction-level models (TLM). Our approach involves mapping test and observability requirements between TLM and RTL, enabling TLM analysis to generate post-silicon tests. We provide case studies from a number of different design classes to demonstrate the flexibility and effectiveness of the approach.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-14SEERAD: A HIGH SPEED YET ENERGY-EFFICIENT ROUNDING-BASED APPROXIMATE DIVIDER
Speaker:
Ali Afzali-Kusha, University of Tehran, IR
Authors:
Reza Zendegani1, Mehdi Kamal1, Arash Fayyazi1, Ali Afzali-Kusha1, Saeed Safari1 and Massoud Pedram2
1University of Tehran, IR; 2University of Southern California, US
Abstract
In this paper, a high speed yet energy-efficient approximate divider for error resilient applications is proposed. For the division operation, the divisor is rounded to a value with a specific form resulting in the transformation of the division operation to the multiplication one. The proposed approximate divider enjoys the flexibility of increasing the accuracy at the price of higher delay and hardware usage. The efficacy of the proposed approximate divider is evaluated in comparison to three different implementations of the SRT divider. The results show that the delay and energy consumption of the proposed approximate divider are, on average, 14 and 300 times smaller than those of the Radix-2 SRT with the carry-save reminder computation. Additionally, the effectiveness of the proposed approximate divider is studied in an image division operation performed in image processing applications. The results suggest the appropriateness of the proposed approximate divider for digital signal processing applications.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-15IMPROVING PERFORMANCE GUARANTEES IN WORMHOLE MESH NOC DESIGNS
Speaker:
Milos Panic, Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES
Authors:
Milos Panic1, Carles Hernandez2, Jaume Abella2, Antoni Roca Perez3, Eduardo Quinones2 and Francisco Cazorla4
1Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; 2Barcelona Supercomputing Center, ES; 3Universitat Politècnica de Catalunya, ES; 4Barcelona Supercomputing Center and IIIA-CSIC, ES
Abstract
Wormhole-based mesh Networks-on-Chip (wNoC) are deployed in high-performance many-core processors due to their physical scalability and low-cost. Delivering tight and time composable Worst-Case Execution Time (WCET) estimates for applications as needed in safety-critical real-time embedded systems is challenged by wNoCs due to their distributed nature. We propose a bandwidth control mechanism for wNoCs that enables the computation of tight time-composable WCET estimates with low average performance degradation and high scalability. Our evaluation with the EEMBC automotive suite and an industrial real-time parallel avionics application confirms so.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-16A DATA LAYOUT TRANSFORMATION (DLT) ACCELERATOR: ARCHITECTURAL SUPPORT FOR DATA MOVEMENT OPTIMIZATION IN ACCELERATED-CENTRIC HETEROGENEOUS SYSTEMS
Speaker:
Tung Hoang, University of Chicago, US
Authors:
Tung Hoang, Amirali Shambayati and Andrew A. Chien, University of Chicago, US
Abstract
Technology scaling and growing use of accelerators make optimization of data movement of increasing importance in all computing systems. Further, growing diversity in memory structures makes embedding such optimization in software non-portable. We propose a novel architectural solution called Data Layout Transformation (DLT) associated with a simple set of instructions that enable software to describe the required data movement compactly, and free the implementation to optimize the movement based on the knowledge of the memory hierarchy and system structure. The DLT architecture ideas can be applicable to both general-purpose and accelerator-based heterogeneous systems. Experiment results first show that the proposed DLT architecture can make use of the full bandwidth (>97%) of a wide range of memory systems (DDR3 and HMC) while its implementation cost, in 32nm, is low (only 0.246 mm2 and 75mW at 1GHz). Our evaluation of using the DLT accelerator in accelerated-based heterogeneous system across DDR3 and HMC memory shows that the DLT can enhance system performance in range of 4.6x-99x (DDR3), 4.4x-115x (HMC) which turns out 2.8x-48x (DDR3), 1.4x-39x (HMC) improvement for energy efficiency.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-17OUESSANT: FLEXIBLE INTEGRATION OF DEDICATED COPROCESSORS IN SYSTEMS ON CHIP
Speaker:
Pierre-Henri Horrein, Lab-STICC/Télécom Bretagne, FR
Authors:
Pierre-Henri Horrein, Philip-Dylan Gleonec, Erwan Libessart, André Lalevée and Matthieu Arzel, Lab-STICC/Télécom Bretagne, FR
Abstract
Integration of hardware accelerators in System on Chips is often complex. When dealing with reconfigurable hardware, this greatly limits the attainable flexibility. In this paper, we propose an alternative approach to the Molen paradigm [1]. This approach, named Ouessant, is based on a very simple general purpose instruction set designed for close interaction with dedicated hardware accelerators. This instruction set is used to program a dedicated controler, which commands the accelerator's execution and data transfer with minimal CPU intervention. The resulting architecture is flexible, extensible, and can be easily integrated in System on Chips. Adding new accelerators is also made easier. Implementation of the architecture on different FPGA resources show very low footprint and a very small impact on attainable performance. Ouessant is freely available under an open-source license.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-18A NOVEL BACKGROUND SUBTRACTION SCHEME FOR IN-CAMERA ACCELERATION IN THERMAL IMAGERY
Speaker:
Konstantinos Makantasis, Institute of Communication and Computer Systems, GR
Authors:
Antonis Nikitakis1, Ioannis Papaefstathiou2, Konstantinos Makantasis3 and Anastasios Doulamis4
1Technical University of Crete, GR; 2Synelixis Solutions Ltd, GR; 3Institute of Communication and Computer Systems, GR; 4National Technical University of Athens, GR
Abstract
Real-time segmentation of moving regions in image sequences is a very important task in numerous surveillance and monitoring applications. A common approach for such tasks is the "background subtraction" which tries to extract regions of interest from the image background for further processing or action; as a result its accuracy as well as its real-time performance is of great significance. In this work we utilize a novel scheme, designed and optimized for FPGA-based implementations, which models the intensities of each pixel as a mixture of Gaussian components; following a Bayesian approach, our method automatically estimates the number of Gaussian components as well as their parameters. Our novel system is based on an efficient and highly accurate on-line updating mechanism, which permits our system to be automatically adapted to dynamically changing operation conditions, while it avoids over/under fitting. We also present two reference implementations of our Background Subtraction Parallel System (BSPS) in Reconfigurable Hardware achieving both high performance as well as low power consumption; the presented FPGA-based systems significantly outperform a multi-core ARM and two multi-core low power Intel CPUs in terms of energy consumed per processed pixel as well as frames per second. Moreover, our low-cost, low-power devices allow for the implementation, for the first time, of a highly distributed surveillance system which will alleviate the main problems of the existing centralized approaches.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-19RADIATION-HARDENED DSP CONFIGURATIONS FOR IMPLEMENTING ARITHMETIC FUNCTIONS ON FPGA
Speaker:
Felipe Serrano, Universidad Complutense de Madrid, ES
Authors:
Marcos Sanchez-Elez, Inmaculada Pardines, Felipe Serrano and Hortensia Mecha, Universidad Complutense de Madrid, ES
Abstract
This paper presents a study of different implementations of arithmetic operations on FPGAs. Radiation vulnerability has been analyzed for each implementation using the fault injection platform NESSY. Results in terms of area, delay and reliability are presented. Taking into account the performed tests we propose to build a library of HDL templates. This library is used during the design process with a synthesis tool that implements digital circuits as reliable as possible. Experimental results show that those implementations using DSP slices are the ones which achieve better results.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-20CONFIGURATION PREFETCHING AND REUSE FOR PREEMPTIVE HARDWARE MULTITASKING ON PARTIALLY RECONFIGURABLE FPGAS
Speaker:
Ann Gordon-Ross, University of Florida, US
Authors:
Aurelio Morales-Villanueva, Rohit Kumar and Ann Gordon-Ross, University of Florida, US
Abstract
Partially reconfigurable (PR) FPGAs enable preemptive hardware (HW) multitasking using PR regions (PRRs). To enable this multitasking, the HW task's partial bitstream is downloaded to only the task's PRR, and only that PRR is reconfigured. Since only a small portion of the FPGA fabric is reconfigured, reconfiguration time is significantly reduced as compared to reconfiguring the entire fabric, however this time is not negligible. Reconfiguration time can be reduced/hidden using two techniques: configuration prefetching and configuration reuse. Even though these techniques can effectively reduce/hide reconfiguration overhead, prior works in preemptive HW multitasking did not use these techniques. To the best of our knowledge, no prior work evaluated physical implementations of these techniques on PR FPGAs, which precludes consideration of physical-implementation-specific details, such as delays in accessing bitstreams, speed limitations during reconfiguration, etc. In this work, we present a novel implementation of configuration prefetching and reuse for preemptive HW multitasking on a Virtex-5 FPGA, however, our established fundamentals are device-family independent.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-21ANALOG CIRCUIT TOPOLOGICAL FEATURE EXTRACTION WITH UNSUPERVISED LEARNING OF NEW SUB-STRUCTURES
Speaker:
Alex Doboli, Stony Brook University, US
Authors:
Hao Li, Fanshu Jiao and Alex Doboli, Stony Brook University, US
Abstract
This paper presents novel techniques to automatically extract the topological (structural) features in analog circuits. The extracted features include basic building blocks, structural templates and hierarchical structures. Finding structural features is important for tasks like circuit synthesis and sizing, design verification, design reuse, and design knowledge description, summarization and management. The paper presents algorithms for supervised feature extraction and unsupervised learning of new block connections. Experiments discuss feature extraction for a set of 34 state-of-the-art analog circuits.

Download Paper (PDF; Only available from the DATE venue WiFi)
IP5-22DESIGN AUTOMATION TASKS SCHEDULING FOR ENHANCED PARALLEL EXECUTION OF A STATE-OF-THE-ART LAYOUT-AWARE SIZING APPROACH
Speaker:
Nuno Horta, Instituto de Telecomunicações/Instituto Superior Técnico, PT
Authors:
David Neves, Ricardo Martins, Nuno Lourenço and Nuno Horta, Instituto de Telecomunicações/Instituto Superior Técnico, PT
Abstract
This paper presents an innovative methodology to efficiently schedule design automation tasks during the execution of an analog IC layout-aware sizing process. The referred synthesis process includes several sub-tasks such as DC simulation, floorplanning, placement, global routing, parasitic extraction, and circuit simulations in multiple worst case corners. The schedule of the design tasks is here optimized taking into account standard multi-core architectures, tasks dependencies, accurate time estimations for each task and a limited number of licenses for using commercial tools, e.g., number of simulator licenses. The proposed methodology, first, considers a directed acyclic graph for representing the design flow and task dependencies, then, an evolutionary kernel is used to implement a single-objective multi-constraint optimization. The efficiency and impact of the proposed approach is validated by using a state-of-the-art Analog IC design automation environment.

Download Paper (PDF; Only available from the DATE venue WiFi)