Fakulta elektrotechniky a informatiky / Faculty of Electrical Engineering and Informatics
Stálý URI pro tuto komunituhttps://hdl.handle.net/10195/3847
Práce obhájené před rokem 2008 jsou uloženy pouze v kolekci Vysokoškolské kvalifikační práce
Procházet
10 výsledky
Search Results
Konferenční objektpeer-reviewedpostprint Omezený přístup Passage Detection of a Train via a Reference Point(Springer, 2023) Rejfek, Luboš; Pidanič, Jan; Štursa, Dominik; Nguyen, Tan N.; Tran, Phuong T.; Němec, Zdeněk; Zálabský, TomášA reference point detection system for position validation of a mobile object was developed for verification of experiments. The detection is based on a classic image processing algorithm and a processing algorithm using neural networks. Both approaches are compared. High-precision concept of the system is based on a camera sensor and automatic processing of video frames for position evalua-tion. The designed system was tested on a real application proving correct operation.Konferenční objektpeer-reviewedpostprint (accepted version) Otevřený přístup Comparison of Floating-point Representations for the Efficient Implementation of Machine Learning Algorithms(IEEE, 2022) Mishra, Saras Mani; Tiwari, Ankita; Shekhawat, Hanumant Singh; Guha, Prithwijit; Trivedi, Gaurav; Pidanič, Jan; Němec, ZdeněkSmart systems are enabled by artificial intelligence (AI), which is realized using machine learning (ML) techniques. ML algorithms are implemented in the hardware using fixedpoint, integer, and floating-point representations. The performance of hardware implementation gets impacted due to very small or large values because of their limited word size. To overcome this limitation, various floating-point representations are employed, such as IEEE754, posit, bfloat16 etc. Moreover, for the efficient implementation of ML algorithms, one of the most intuitive solutions is to use a suitable number system. As we know, multiply and add (MAC), divider and square root units are the most common building blocks of various ML algorithms. Therefore, in this paper, we present a comparative study of hardware implementations of these units based on bfloat16 and posit number representations. It is observed that posit based implementations perform 1.50x better in terms of accuracy, but consume 1.51x more hardware resources as compared to bfloat16 based realizations. Thus, as per the trade-off between accuracy and resource utilization, it can be stated that the bfloat16 number representation may be preferred over other existing number representations in the hardware implementations of ML algorithms.Konferenční objektpeer-reviewedpostprint (accepted version) Otevřený přístup Design and Implementation of a Low Power Area Efficient Bfloat16 based CORDIC Processor(IEEE, 2022) Mishra, Saras Mani; Shekhawat, Hanumant Singh; Trivedi, Gaurav; Pidanič, Jan; Němec, ZdeněkCoordinate Rotation Digital Computer (CORDIC) algorithm has a great advantage in hardware based implementation because of its simple architecture. It employs shifter and adder for hardware implementation. The major issue with a CORDIC algorithm is the linear dependence of convergence on the number of iterations. Each iteration performs shift and addition or subtraction operations, due to this there is a trade off between area and delay. Also, the floating-point representation of angles would also increase the area and power. The main aim of this work is to implement a low power and area efficient bfloat16 based on a CORDIC algorithm. The proposed hardware module consumes 3.2x and 3.38x less area and power compared to a single-precision floating-point based CORDIC implementation. The result of the proposed module has been verified on a Zynq evaluation FPGA board.Konferenční objektpeer-reviewedpostprint (accepted version) Otevřený přístup A Scalable and Adaptive Convolutional Neural Network Accelerator(IEEE, 2022) Pidanič, Jan; Vyas, Arpan; Karki, Rishav; Vij, Prateek; Trivedi, Gaurav; Němec, ZdeněkMachine learning has become ubiquitous and penetrated every field of technology, medicine, and finance. Convolutional Neural Network (CNN) is one of the most commonly used class of machine learning algorithms that is being used in video and image processing, big data processing, natural language processing, robotics, and a variety of pattern matching and recognition tasks. Depending on the end application, CNNs are being employed on different scales ranging from tiny motion sensors and smartphones to automobiles and server farms. Although existing CNN accelerators are adaptive for different types of CNN models, they are generally suited for a particular scale of operation. In this paper, we describe a scalable and adaptive CNN accelerator. The same hardware-cum-software stack can be configured by a system-level parameter to be synthesized for different scales of operation. This makes the accelerator highly portable across systems of different scales. Furthermore, one single synthesized hardware can run inference for multiple CNN models because of the flexible software stack and hardware control unit making the system highly adaptive. We demonstrate the working of the system at different scales by implementing it on the Xilinx Virtex 7 FPGA and by running multiple CNN models at each scale.Konferenční objektpeer-reviewedpostprint (accepted version) Otevřený přístup An Area and Power Efficient VLSI Architecture to Detect Obstructive Sleep Apnea for Wearable Devices(IEEE, 2022) Parmar, Rushik; Janveja, Meenali; Trivedi, Gaurav; Pidanič, Jan; Němec, ZdeněkSleep disorders are a common detrimental health condition that reduces quality of life. Among different sleep disorders, Obstructive Sleep Apnea (OSA) is one of the most common sleep disorders. OSA is characterized by a reduction or cessation of airflow during sleep. However, due to expensive and cumbersome detection process, only 10% of the OSA cases are actually diagnosed in the real world. To overcome this challenge, an area and power efficient VLSI Architecture for non-invasive detection of OSA, using features of ECG signal and support vector machines (SVM), is proposed in this manuscript. The proposed classifier achieves an accuracy of 84.60% and sensitivity and specificity of 83.85% and 85.58% respectively. The design is further synthesised using 180 nm Bulk CMOS technology consuming 0.46 mu W power at 1 kHz and occupies an area of 0.429 mm(2). The low-power implementation of the proposed design makes it suitable for preventive health wearable devices.Konferenční objektpeer-reviewedpostprint (accepted version) Otevřený přístup An Energy Efficient and Resource Optimal VLSI Architecture for ECG Feature Extraction for Wearable Healthcare Applications(IEEE, 2022) Janveja, Meenali; Parmar, Rushik; Trivedi, Gaurav; Pidanič, Jan; Němec, ZdeněkThe paper presents a low complexity algorithm for extracting features of an electrocardiogram (QRS complex, P wave and T wave). A low power and resource optimal architecture is designed to implement this algorithm efficiently. The algorithm's parameters are chosen appropriately to avoid any floating-point compute-intensive arithmetic operation enabling us to implement it using comparators, shifters and adders only, which leads to efficient hardware resource utilization. The proposed architecture employs techniques such as clock gating to optimize power consumption. The modules delineating different peaks and boundaries of the ECG signal can be turned off when they are not operational or as per the medical requirements. The algorithm and architecture proposed in this paper are validated using MIT-BIH and QT database from Physionet. The proposed algorithm is implemented using the Virtex - 7 FPGA platform with average resource utilization of 0.42%, which is the least compared to other methods. The implementation is synthesized using 180 nm CMOS technology. The proposed design utilizes 7.38 mu W and 7.38 pJ of power and energy respectively, at an operating frequency of 1 MHz at 1.98 V. The energy consumption of the proposed architecture is reduced by factor of 1.28 compared to other known methods due to minimal utilization of Flip-Flops and LUTs. Therefore, our architecture can be efficiently deployed in low power and resource-constrained wearable healthcare applications.Konferenční objektpeer-reviewedpostprint (accepted version) Otevřený přístup Design of a Low Power and Area Efficient Bfloat16 based Generalized Systolic Array for DNN Applications(IEEE, 2022) Tiwari, Ankita; Mishra, Saras Mani; Guha, Prithwijit; Pidanič, Jan; Němec, Zdeněk; Trivedi, GauravNowadays demand for artificial intelligence (AI) enabled mobile platforms is increasing. From healthcare services to defense and from remote to urban area, there is a huge demand of secured and power efficient devices. The performance of these platforms can be enhanced by providing an efficient compute engine. These compute engines perform a huge amount of matrix operations. The most popular choice for large matrix computation is a systolic array. In general, the systolic array performance degrades for the large input matrices, due to the trade off between resource utilization and computation delay. To address this issue, we need a systolic array with a control unit to re-configure the array according to the requirement of the computation. Computation array can be further improved by handling the negative weights and reduce the MAC operations. In this paper, we proposed a generalized bfloat16 based systolic array in which the sign of the partial sum (PS) is predicted before computation. The PS sign aids in network pruning which enhances system performance. The proposed system is implemented on a Virtex-7 FPGA board and it performs 2.21 similar to and 4.19x better in terms of area and power compared to single-precision based systolic array.Článekpeer-reviewedpostprint (accepted) Otevřený přístup Tensor Based Multivariate Polynomial Modulo Multiplier for Cryptographic Applications(2022) Paul, Bikram; Nath, Angana; Krishnaswamy, Srinivasan; Pidanič, Jan; Němec, Zdeněk; Trivedi, GauravModulo polynomial multiplication is an essential mathematical operation in the area of finite field arithmetic. Polynomial functions can be represented as tensors, which can be utilized as basic building blocks for various lattice-based post-quantum cryptography schemes. This paper presents a tensor-based novel modulo multiplication method for multivariate polynomials over GF(2m) and is realized on the hardware platform (FPGA). The proposed method consumes 6.5× less power and achieves more than 6× speedup compared to other contemporary single variable polynomial multiplication implementations. Our method is embarrassingly parallel and easily scalable for multivariate polynomials. Polynomial functions of nine variables, where each variable is of degree 128, are tested with the proposed multiplier, and its corresponding area, power, and power-delay-area product (PDAP) are presented. The computational complexity of single variable and multivariate polynomial multiplications are O(n) and O(np) , respectively, where n is the maximum degree of a polynomial having p variables. Due to its high speed, low latency, and scalability, the proposed modulo multiplier can be used in a wide range of applications.Konferenční objektpeer-reviewedpostprint Omezený přístup On Performance of Filter used for Interference Mitigation between LTE and DVB-T Networks in Digital Dividend Spectrum(Croatian Society Electronics in Marine - ELMAR, 2016) Tekovic, Alberto; Bonefacic, Davor; Nad, Robert; Pidanič, Jan; Němec, ZdeněkThis paper concerns interference mitigation techniques caused by Long Term Evolution (LTE) mobile system operating in Digital Dividend into Digital Video Broadcasting Terrestrial (DVB-T) system. Use of filtering as one of the most effective interference mitigation methods has been investigated. Laboratory measurements have been conducted for several commercially available LTE filters in order to determine their performance while exposed to variable ambient temperature.Konferenční objektpeer-reviewedpostprint Omezený přístup Targets detection Analysis in the Passive Coherent Location System in Single Frequency Network(IEEE (Institute of Electrical and Electronics Engineers), 2016) Juryca, Karel; Pidanič, Jan; Němec, ZdeněkThe paper deals with the primary analysis of the target detection in the Passive Coherent Location system that exploits DVB-T transmitters in a Single Frequency Network. The analysis is based on the behavior of a Cross Ambiguity function and is done for one/more targets in one bistatic radar. The Single Frequency Network system uses the same central carrier frequency for all transmitters. The analysis shows that multipath effect significantly influences the determination of the precise accuracy of the maximum of the CA function.