Digitální knihovnaUPCE
 

Fakulta elektrotechniky a informatiky / Faculty of Electrical Engineering and Informatics

Stálý URI pro tuto komunituhttps://hdl.handle.net/10195/3847

Práce obhájené před rokem 2008 jsou uloženy pouze v kolekci Vysokoškolské kvalifikační práce

Procházet

Search Results

Nyní se zobrazuje 1 - 10 z 10
  • Článekpeer-reviewedpostprintOmezený přístup
    An Optimized Low-Power VLSI Architecture for ECG/VCG Data Compression for IoHT Wearable Device Application
    (IEEE (Institute of Electrical and Electronics Engineers), 2023) Janveja, Meenali; Sharma, Ashwani Kumar; Bhardwaj, Abhyuday; Pidanič, Jan; Trivedi, Gaurav
    Continuous monitoring of the electrical activity of heart signals using wearable Internet of Healthcare Things (IoHTs) devices plays a crucial role in decreasing mortality rates. However, this continuous monitoring using an electrocardiogram (ECG) or vectorcardiogram (VCG) generates huge clinical data. Moreover, these devices are constrained in terms of on-chip storage, data transmission capacity, and power. Thus, handling a large amount of data is difficult with these devices, making it necessary to compress these data for storage and transmission. Lossless or near-lossless data compression solves this problem, ensuring that no relevant physiological/clinical information is lost in the compression process. Therefore, low-power, resource-efficient, and lossless VLSI architectures are proposed in this article to compress multichannel ECG/VCG data. The designs are tested using the PTB database for both ECG and VCG data and can achieve compression ratios (CRs) of $3.857$ and $4.45$ with minimal power and area requirements making them suitable for low-power wearable healthcare devices.
  • Konferenční objektpeer-reviewedpostprintOmezený přístup
    IndiRA: Design and Implementation of a Pipelined RISC-V Processor
    (IEEE (Institute of Electrical and Electronics Engineers), 2023) Tiwari, Ankita; Guha, Prithwijit; Trivedi, Gaurav; Gupta, Nitesh; Jayaraj, Navneeth; Pidanič, Jan
    The development of Machine Learning and IoT technology requires fast processing. RISC-V is an open-source reduced instruction set-based instruction set architecture, and the processor based on this architecture can be modified accordingly. The base integer instruction extension supports the operating system environment and is also suitable for embedded systems. It is a 32-bit instruction extension and is defined as RV32I. In this paper, we propose a 32-bit integer instruction-based RISC-V processor core. The proposed core has a five-stage pipeline, including the optimized arithmetic and logic unit. The instruction fetch stage is merged with the pre-fetch stage dynamic branch prediction into a two-stage pipeline. The processor is implemented using Verilog HDL, and the resource utilization is verified for FPGA. The results show that the proposed module performs 30% better than the best-performing processor (considering operating frequency) and showed a 17.6% improvement in the proposed core.
  • Článekpeer-reviewedpostprintOmezený přístup
    Design of DNN-Based Low-Power VLSI Architecture to Classify Atrial Fibrillation for Wearable Devices
    (IEEE (Institute of Electrical and Electronics Engineers), 2023) Parmar, Rushik; Janveja, Meenali; Pidanič, Jan; Trivedi, Gaurav
    Atrial fibrillation (AF) is a recurrent and life-threatening disease leading to rapid growth in the mortality rate due to cardiac abnormalities. It is challenging to manually diagnose AF using electrocardiogram (ECG) signals due to complex and varied changes in its characteristics. In this article, for the first time, an end-to-end edge-enabled machine learning based VLSI architecture is proposed to classify ECG excerpts having AF from normal beats. Researchers have found that abnormal atrial activity is confined to the low-frequency range through the decades. Therefore, in the proposed work, this frequency band is directly analyzed for AF detection, which has not previously been discussed. The proposed architecture is implemented using 180-nm bulk CMOS technology consuming 11.098 mu W at 25 kHz and exhibits an accuracy of 92.37% for class-oriented classification and 81.60% for subject-oriented classification. The low-power realization of the proposed design, as compared to the state-of-the-art methods, makes it suitable to be used for wearable devices.
  • Konferenční objektpeer-reviewedpostprint (accepted version)Otevřený přístup
    Comparison of Floating-point Representations for the Efficient Implementation of Machine Learning Algorithms
    (IEEE, 2022) Mishra, Saras Mani; Tiwari, Ankita; Shekhawat, Hanumant Singh; Guha, Prithwijit; Trivedi, Gaurav; Pidanič, Jan; Němec, Zdeněk
    Smart systems are enabled by artificial intelligence (AI), which is realized using machine learning (ML) techniques. ML algorithms are implemented in the hardware using fixedpoint, integer, and floating-point representations. The performance of hardware implementation gets impacted due to very small or large values because of their limited word size. To overcome this limitation, various floating-point representations are employed, such as IEEE754, posit, bfloat16 etc. Moreover, for the efficient implementation of ML algorithms, one of the most intuitive solutions is to use a suitable number system. As we know, multiply and add (MAC), divider and square root units are the most common building blocks of various ML algorithms. Therefore, in this paper, we present a comparative study of hardware implementations of these units based on bfloat16 and posit number representations. It is observed that posit based implementations perform 1.50x better in terms of accuracy, but consume 1.51x more hardware resources as compared to bfloat16 based realizations. Thus, as per the trade-off between accuracy and resource utilization, it can be stated that the bfloat16 number representation may be preferred over other existing number representations in the hardware implementations of ML algorithms.
  • Konferenční objektpeer-reviewedpostprint (accepted version)Otevřený přístup
    Design and Implementation of a Low Power Area Efficient Bfloat16 based CORDIC Processor
    (IEEE, 2022) Mishra, Saras Mani; Shekhawat, Hanumant Singh; Trivedi, Gaurav; Pidanič, Jan; Němec, Zdeněk
    Coordinate Rotation Digital Computer (CORDIC) algorithm has a great advantage in hardware based implementation because of its simple architecture. It employs shifter and adder for hardware implementation. The major issue with a CORDIC algorithm is the linear dependence of convergence on the number of iterations. Each iteration performs shift and addition or subtraction operations, due to this there is a trade off between area and delay. Also, the floating-point representation of angles would also increase the area and power. The main aim of this work is to implement a low power and area efficient bfloat16 based on a CORDIC algorithm. The proposed hardware module consumes 3.2x and 3.38x less area and power compared to a single-precision floating-point based CORDIC implementation. The result of the proposed module has been verified on a Zynq evaluation FPGA board.
  • Konferenční objektpeer-reviewedpostprint (accepted version)Otevřený přístup
    A Scalable and Adaptive Convolutional Neural Network Accelerator
    (IEEE, 2022) Pidanič, Jan; Vyas, Arpan; Karki, Rishav; Vij, Prateek; Trivedi, Gaurav; Němec, Zdeněk
    Machine learning has become ubiquitous and penetrated every field of technology, medicine, and finance. Convolutional Neural Network (CNN) is one of the most commonly used class of machine learning algorithms that is being used in video and image processing, big data processing, natural language processing, robotics, and a variety of pattern matching and recognition tasks. Depending on the end application, CNNs are being employed on different scales ranging from tiny motion sensors and smartphones to automobiles and server farms. Although existing CNN accelerators are adaptive for different types of CNN models, they are generally suited for a particular scale of operation. In this paper, we describe a scalable and adaptive CNN accelerator. The same hardware-cum-software stack can be configured by a system-level parameter to be synthesized for different scales of operation. This makes the accelerator highly portable across systems of different scales. Furthermore, one single synthesized hardware can run inference for multiple CNN models because of the flexible software stack and hardware control unit making the system highly adaptive. We demonstrate the working of the system at different scales by implementing it on the Xilinx Virtex 7 FPGA and by running multiple CNN models at each scale.
  • Konferenční objektpeer-reviewedpostprint (accepted version)Otevřený přístup
    An Area and Power Efficient VLSI Architecture to Detect Obstructive Sleep Apnea for Wearable Devices
    (IEEE, 2022) Parmar, Rushik; Janveja, Meenali; Trivedi, Gaurav; Pidanič, Jan; Němec, Zdeněk
    Sleep disorders are a common detrimental health condition that reduces quality of life. Among different sleep disorders, Obstructive Sleep Apnea (OSA) is one of the most common sleep disorders. OSA is characterized by a reduction or cessation of airflow during sleep. However, due to expensive and cumbersome detection process, only 10% of the OSA cases are actually diagnosed in the real world. To overcome this challenge, an area and power efficient VLSI Architecture for non-invasive detection of OSA, using features of ECG signal and support vector machines (SVM), is proposed in this manuscript. The proposed classifier achieves an accuracy of 84.60% and sensitivity and specificity of 83.85% and 85.58% respectively. The design is further synthesised using 180 nm Bulk CMOS technology consuming 0.46 mu W power at 1 kHz and occupies an area of 0.429 mm(2). The low-power implementation of the proposed design makes it suitable for preventive health wearable devices.
  • Konferenční objektpeer-reviewedpostprint (accepted version)Otevřený přístup
    An Energy Efficient and Resource Optimal VLSI Architecture for ECG Feature Extraction for Wearable Healthcare Applications
    (IEEE, 2022) Janveja, Meenali; Parmar, Rushik; Trivedi, Gaurav; Pidanič, Jan; Němec, Zdeněk
    The paper presents a low complexity algorithm for extracting features of an electrocardiogram (QRS complex, P wave and T wave). A low power and resource optimal architecture is designed to implement this algorithm efficiently. The algorithm's parameters are chosen appropriately to avoid any floating-point compute-intensive arithmetic operation enabling us to implement it using comparators, shifters and adders only, which leads to efficient hardware resource utilization. The proposed architecture employs techniques such as clock gating to optimize power consumption. The modules delineating different peaks and boundaries of the ECG signal can be turned off when they are not operational or as per the medical requirements. The algorithm and architecture proposed in this paper are validated using MIT-BIH and QT database from Physionet. The proposed algorithm is implemented using the Virtex - 7 FPGA platform with average resource utilization of 0.42%, which is the least compared to other methods. The implementation is synthesized using 180 nm CMOS technology. The proposed design utilizes 7.38 mu W and 7.38 pJ of power and energy respectively, at an operating frequency of 1 MHz at 1.98 V. The energy consumption of the proposed architecture is reduced by factor of 1.28 compared to other known methods due to minimal utilization of Flip-Flops and LUTs. Therefore, our architecture can be efficiently deployed in low power and resource-constrained wearable healthcare applications.
  • Konferenční objektpeer-reviewedpostprint (accepted version)Otevřený přístup
    Design of a Low Power and Area Efficient Bfloat16 based Generalized Systolic Array for DNN Applications
    (IEEE, 2022) Tiwari, Ankita; Mishra, Saras Mani; Guha, Prithwijit; Pidanič, Jan; Němec, Zdeněk; Trivedi, Gaurav
    Nowadays demand for artificial intelligence (AI) enabled mobile platforms is increasing. From healthcare services to defense and from remote to urban area, there is a huge demand of secured and power efficient devices. The performance of these platforms can be enhanced by providing an efficient compute engine. These compute engines perform a huge amount of matrix operations. The most popular choice for large matrix computation is a systolic array. In general, the systolic array performance degrades for the large input matrices, due to the trade off between resource utilization and computation delay. To address this issue, we need a systolic array with a control unit to re-configure the array according to the requirement of the computation. Computation array can be further improved by handling the negative weights and reduce the MAC operations. In this paper, we proposed a generalized bfloat16 based systolic array in which the sign of the partial sum (PS) is predicted before computation. The PS sign aids in network pruning which enhances system performance. The proposed system is implemented on a Virtex-7 FPGA board and it performs 2.21 similar to and 4.19x better in terms of area and power compared to single-precision based systolic array.
  • Článekpeer-reviewedpostprint (accepted)Otevřený přístup
    Tensor Based Multivariate Polynomial Modulo Multiplier for Cryptographic Applications
    (2022) Paul, Bikram; Nath, Angana; Krishnaswamy, Srinivasan; Pidanič, Jan; Němec, Zdeněk; Trivedi, Gaurav
    Modulo polynomial multiplication is an essential mathematical operation in the area of finite field arithmetic. Polynomial functions can be represented as tensors, which can be utilized as basic building blocks for various lattice-based post-quantum cryptography schemes. This paper presents a tensor-based novel modulo multiplication method for multivariate polynomials over GF(2m) and is realized on the hardware platform (FPGA). The proposed method consumes 6.5× less power and achieves more than 6× speedup compared to other contemporary single variable polynomial multiplication implementations. Our method is embarrassingly parallel and easily scalable for multivariate polynomials. Polynomial functions of nine variables, where each variable is of degree 128, are tested with the proposed multiplier, and its corresponding area, power, and power-delay-area product (PDAP) are presented. The computational complexity of single variable and multivariate polynomial multiplications are O(n) and O(np) , respectively, where n is the maximum degree of a polynomial having p variables. Due to its high speed, low latency, and scalability, the proposed modulo multiplier can be used in a wide range of applications.