Sign in to use this feature.

Years

Between: -

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (23,562)

Search Parameters:
Journal = Electronics

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 2353 KiB  
Article
ARLO: Augmented Reality Localization Optimization for Real-Time Pose Estimation and Human–Computer Interaction
by Meng Xu, Qiqi Shu, Zhao Huang, Guang Chen and Stefan Poslad
Electronics 2025, 14(7), 1478; https://doi.org/10.3390/electronics14071478 (registering DOI) - 7 Apr 2025
Abstract
Accurate and real-time outdoor localization and pose estimation are critical for various applications, including navigation, robotics, and augmented reality. Apple’s ARKit, a leading AR platform, employs visual–inertial odometry (VIO) and simultaneous localization and mapping (SLAM) algorithms to enable localization and pose estimation. However, [...] Read more.
Accurate and real-time outdoor localization and pose estimation are critical for various applications, including navigation, robotics, and augmented reality. Apple’s ARKit, a leading AR platform, employs visual–inertial odometry (VIO) and simultaneous localization and mapping (SLAM) algorithms to enable localization and pose estimation. However, ARKit-based systems face positional bias when the device’s camera is obscured, a frequent issue in dynamic or crowded environments. This paper presents a novel approach to mitigate this limitation by integrating position bias correction, context-aware localization, and human–computer interaction techniques into a cohesive interactive module group. The proposed system includes a navigation module, a positioning module, and a front-end rendering module that collaboratively optimize ARKit’s localization accuracy. Comprehensive evaluation across a variety of outdoor environments demonstrates the approach’s effectiveness in improving localization precision. This work contributes to enhancing ARKit-based systems, particularly in scenarios with limited visual input, thereby improving user experience and expanding the potential for outdoor localization applications. Experimental evaluations show that our method improves localization accuracy by up to 92.9% and reduces average positional error by more than 85% compared with baseline ARKit in occluded or crowded outdoor environments. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

29 pages, 6481 KiB  
Article
MDFFN: Multi-Scale Dual-Aggregated Feature Fusion Network for Hyperspectral Image Classification
by Ge Song, Xiaoqi Luo, Yuqiao Deng, Fei Zhao, Xiaofei Yang, Jiaxin Chen and Jinjie Chen
Electronics 2025, 14(7), 1477; https://doi.org/10.3390/electronics14071477 (registering DOI) - 7 Apr 2025
Abstract
Employing the multi-scale strategy in hyperspectral image (HSI) classification enables the exploration of complex land-cover structures with diverse shapes. However, existing multi-scale methods still have limitations for fine feature extraction and deep feature fusion, which hinder the further improvement of classification performance. In [...] Read more.
Employing the multi-scale strategy in hyperspectral image (HSI) classification enables the exploration of complex land-cover structures with diverse shapes. However, existing multi-scale methods still have limitations for fine feature extraction and deep feature fusion, which hinder the further improvement of classification performance. In this paper, we propose a multi-scale dual-aggregated feature fusion network (MDFFN) for both balanced and imbalanced environments. The network comprises two main core modules: a multi-scale convolutional information embedding (MCIE) module and a dual aggregated cross-attention (DACA) module. The proposed MCIE module introduces a multi-scale pooling operation to aggregate local features, which efficiently highlights discriminative spectral–spatial information and especially learns key features in small target samples in the imbalanced environment. Furthermore, the proposed DACA module employs a cross-scale interaction strategy to realize the deep fusion of multi-scale features and designs a dual aggregation mechanism to mitigate the loss of information, which facilitates further spatial–spectral feature enhancement. The experimental results demonstrate that the proposed method outperforms state-of-the-art methods on three classical HSI datasets, proving the superiority of the proposed MDFFN. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

19 pages, 2394 KiB  
Article
Quantitative Methodology for Assessing the Quality of Direct Laser Processing of 316L Steel Powder Using Type I and Type II Control Errors
by Oleksandr Vasilevskyi, Alexandra Woods, Matthew Jones and Michael Cullinan
Electronics 2025, 14(7), 1476; https://doi.org/10.3390/electronics14071476 (registering DOI) - 7 Apr 2025
Abstract
The paper proposes a methodology for assessing the quality of the direct laser melting process of 316L steel powder, which was tested when creating products in a construction furnace of the EOSINT M280 system at different laser powers. The methodology for evaluating the [...] Read more.
The paper proposes a methodology for assessing the quality of the direct laser melting process of 316L steel powder, which was tested when creating products in a construction furnace of the EOSINT M280 system at different laser powers. The methodology for evaluating the quality of the laser melting process is based on measuring the melting temperature of 316L steel powder using an infrared camera, assessing the expanded uncertainty of temperature measurements, and calculating the probabilities of the temperature falling within the established confidence limits based on type I and type II control errors (risks). The experimental investigations revealed that the melting temperature of 316L steel powder was achieved at a laser power of 195 W, with an average value of 1446 °C. It was also found that the maximum expanded measurement uncertainty for the temperature was 7%. In this case, an identification of quality indicators of the laser melting process is proposed, which has three levels: good quality (A), satisfactory quality (B), and unsatisfactory/unacceptable quality (C). The studies showed that the probability of achieving a good/high-quality (A) resulted in the laser melting process of 316L steel powder at a laser power of 195 W was 91%, while the probability of achieving satisfactory quality (B) was 0.03%. These findings contribute to enhancing in situ process monitoring in additive manufacturing, enabling the detection of deviations and adjustments to ensure consistently good quality. The proposed methodology provides a robust framework applicable across different LP-BF/M systems, improving process reliability and reproducibility in industrial and scientific applications. Full article
(This article belongs to the Special Issue New Advance in Stretchable Electronics and Additive Manufacturing)
Show Figures

Figure 1

1 pages, 128 KiB  
Retraction
RETRACTED: Zhang et al. Effects of Current Filaments on IGBT Avalanche Robustness: A Simulation Study. Electronics 2024, 13, 2347
by Jingping Zhang, Houcai Luo, Huan Wu, Bofeng Zheng, Xianping Chen, Guoqi Zhang, Paddy French and Shaogang Wang
Electronics 2025, 14(7), 1475; https://doi.org/10.3390/electronics14071475 (registering DOI) - 7 Apr 2025
Abstract
The Electronics Editorial Office retracts the article “Effects of Current Filaments on IGBT Avalanche Robustness: A Simulation Study” [...] Full article
22 pages, 2826 KiB  
Article
Research on Target Detection Algorithm for Complex Traffic Scenes Based on ADVI-CFAR
by Feng Tian, Tianyu Wei, Weibo Fu and Siyuan Wang
Electronics 2025, 14(7), 1474; https://doi.org/10.3390/electronics14071474 (registering DOI) - 6 Apr 2025
Abstract
To address the issue of reduced target detection accuracy due to interfering targets and clutter reference cells in complex traffic scenarios, we propose the ADVI-CFAR (Adaptive Discriminant Variation Index Constant False Alarm Rate) detection algorithm. Considering that the non-uniformity of the background environment [...] Read more.
To address the issue of reduced target detection accuracy due to interfering targets and clutter reference cells in complex traffic scenarios, we propose the ADVI-CFAR (Adaptive Discriminant Variation Index Constant False Alarm Rate) detection algorithm. Considering that the non-uniformity of the background environment leads to significant variations in signal power magnitude, we introduce a background power transition point to evaluate the uniformity of the background environment within the reference window. Moreover, in complex background environments, clutter distributions often exhibit skewness rather than a Gaussian distribution. We incorporate the higher-order statistical skewness of the clutter to calculate the background power threshold index, thereby improving the accuracy of background power estimation. Then, based on the transition points and clutter power index, the background environment is classified, and an appropriate detection threshold calculation method is chosen for target detection. We conduct a simulation analysis in uniform, non-uniform, and clutter edge environments, and the results show that the identification accuracy exceeds 95% for all three background environments. At a detection probability of 50%, the performance loss is 0.08 dB in uniform environments and 0.36 dB in multi-target environments. When the false alarm probability is set to 104, the ADVI-CFAR algorithm significantly suppresses false alarms, with the false alarm peak occurring at 103.52. Real data from urban traffic scenarios validate the method, showing that it achieves a high detection accuracy for target detection in real traffic scenarios and effectively meets the radar target detection requirements in practical traffic environments. Full article
Show Figures

Figure 1

21 pages, 13198 KiB  
Article
Infrared Bionic Compound-Eye Camera: Long-Distance Measurement Simulation and Verification
by Xiaoyu Wang, Linhan Li, Jie Liu, Zhen Huang, Yuhan Li, Huicong Wang, Yimin Zhang, Yang Yu, Xiupeng Yuan, Liya Qiu and Sili Gao
Electronics 2025, 14(7), 1473; https://doi.org/10.3390/electronics14071473 (registering DOI) - 6 Apr 2025
Abstract
To achieve rapid distance estimation and tracking of moving targets in a large field of view, this paper proposes an innovative simulation method. Using a low-cost approach, the imaging and distance measurement performance of the designed cooling-type mid-wave infrared compound-eye camera (CM-CECam) is [...] Read more.
To achieve rapid distance estimation and tracking of moving targets in a large field of view, this paper proposes an innovative simulation method. Using a low-cost approach, the imaging and distance measurement performance of the designed cooling-type mid-wave infrared compound-eye camera (CM-CECam) is experimentally evaluated. The compound-eye camera consists of a small-lens array with a spherical shell, a relay optical system, and a cooling-type mid-wave infrared detector. Based on the spatial arrangement of the small-lens array, a precise simulation imaging model for the compound-eye camera is developed, constructing a virtual imaging space. Distance estimation and error analysis for virtual targets are performed using the principle of stereo disparity. This universal simulation method provides a foundation for spatial design and image-plane adjustments for compound-eye cameras with specialized structures. Using the raw images captured by the compound-eye camera, a scene-specific piecewise linear mapping method is applied. This method significantly reduces the brightness contrast differences between sub-images during wide-field observations, enhancing image details. For the fast detection of moving targets, ommatidia clusters are defined as the minimal spatial constraint units. Local information at the centers of these constraint units is prioritized for processing. This approach replaces traditional global detection methods, improving the efficiency of subsequent processing. Finally, the simulated distance measurement results are validated using real-world scene data. Full article
Show Figures

Figure 1

25 pages, 12059 KiB  
Article
FasterGDSF-DETR: A Faster End-to-End Real-Time Fire Detection Model via the Gather-and-Distribute Mechanism
by Chengming Liu, Fan Wu and Lei Shi
Electronics 2025, 14(7), 1472; https://doi.org/10.3390/electronics14071472 (registering DOI) - 6 Apr 2025
Abstract
Fire detection using deep learning has become a widely adopted approach. However, YOLO-based models often face performance limitations due to NMS, while DETR-based models struggle to meet real-time processing requirements. To address these challenges, we propose FasterGDSF-DETR, a novel fire detection model built [...] Read more.
Fire detection using deep learning has become a widely adopted approach. However, YOLO-based models often face performance limitations due to NMS, while DETR-based models struggle to meet real-time processing requirements. To address these challenges, we propose FasterGDSF-DETR, a novel fire detection model built upon the RT-DETR framework, designed to enhance both detection accuracy and efficiency. Firstly, this model introduces the FasterDBBNet backbone, which efficiently captures and retains feature information, accelerating the model’s convergence speed. Secondly, we propose the AIFI-GDSF hybrid encoder to reduce information loss in intra-scale interactions and improve the capability of detecting varying morphological flames. Furthermore, to better adapt to complex fire scenarios, we expand the dataset based on the KMU Fire and Smoke database and incorporate WIoU as the loss function to improve model robustness. Experimental results demonstrate that our proposed model surpasses mainstream object detection models in both accuracy and computational efficiency. FasterGDSF-DETR achieves a mean Average Precision of 71.5% on the self-constructed dataset, outperforming the YOLOv9 model of the same scale by 2.4 percentage points. This study introduces a novel task-specific enhancement to the RT-DETR framework, offering valuable insights for future advancements in fire detection technology. Full article
(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)
Show Figures

Figure 1

21 pages, 9797 KiB  
Article
Artificial Intelligence-Driven Optimal Charging Strategy for Electric Vehicles and Impacts on Electric Power Grid
by Umar Jamil, Raul Jose Alva, Sara Ahmed and Yu-Fang Jin
Electronics 2025, 14(7), 1471; https://doi.org/10.3390/electronics14071471 (registering DOI) - 6 Apr 2025
Viewed by 18
Abstract
Electric vehicles (EVs) play a crucial role in achieving sustainability goals, mitigating energy crises, and reducing air pollution. However, their rapid adoption poses significant challenges to the power grid, particularly during peak charging periods, necessitating advanced load management strategies. This study introduces an [...] Read more.
Electric vehicles (EVs) play a crucial role in achieving sustainability goals, mitigating energy crises, and reducing air pollution. However, their rapid adoption poses significant challenges to the power grid, particularly during peak charging periods, necessitating advanced load management strategies. This study introduces an artificial intelligence (AI)-integrated optimal charging framework designed to facilitate fast charging and mitigate grid stress by smoothing the “duck curve”. Data from Caltech’s Adaptive Charging Network (ACN) at the National Aeronautics and Space Administration (NASA) Jet Propulsion Laboratory (JPL) site was collected and categorized into day and night patterns to predict charging duration based on key features, including start charging time and energy requested. The AI-driven charging strategy developed optimizes energy management, reduces peak loads, and alleviates grid strain. Additionally, the study evaluates the impact of integrating 1.5 million, 3 million, and 5 million EVs under various AI-based charging strategies, demonstrating the framework’s effectiveness in managing large-scale EV adoption. The peak power consumption reaches around 22,000 MW without EVs, 25,000 MW for 1.5 million EVs, 28,000 MW for 3 million EVs, and 35,000 MW for 5 million EVs without any charging strategy. By implementing an AI-driven optimal charging optimization strategy that considers both early charging and duck curve smoothing, the peak demand is reduced by approximately 16% for 1.5 million EVs, 21.43% for 3 million EVs, and 34.29% for 5 million EVs. Full article
(This article belongs to the Special Issue Recent Advances in Modeling and Control of Electric Energy Systems)
Show Figures

Figure 1

19 pages, 4715 KiB  
Article
Fuzzy Battery Manager: Charging and Balancing Rechargeable Battery Cells with Fuzzy Logic
by Adnan K. Shaout and Zachary Brauchler
Electronics 2025, 14(7), 1470; https://doi.org/10.3390/electronics14071470 (registering DOI) - 6 Apr 2025
Viewed by 24
Abstract
This paper presents the design, implementation, and testing of a fuzzy battery manager featuring a novel hardware design. The system uses a fuzzy inference system to charge and balance two battery cells in series, integrating a microcontroller and a battery charging IC to [...] Read more.
This paper presents the design, implementation, and testing of a fuzzy battery manager featuring a novel hardware design. The system uses a fuzzy inference system to charge and balance two battery cells in series, integrating a microcontroller and a battery charging IC to demonstrate battery management with real hardware. It supports two battery chemistries, showcasing how the fuzzy system can be flexibly adapted to different rechargeable battery technologies. The fuzzy battery manager successfully achieves its goal of charging and balancing cells with high adaptability by simply adjusting membership functions. Its stability and effectiveness on real hardware have been confirmed. This adaptability offers significant potential across various industries. For example, a replacement battery pack designed for longevity using LiFePO4 cells could serve as an alternative to Li-Ion cells in electric vehicles, especially since LiFePO4 cells endure many more charge cycles, albeit with lower charge densities. The required membership functions for this replacement battery could be stored in just a few bytes of ROM within the battery pack, enabling seamless integration and use with existing vehicles and charging systems. Full article
Show Figures

Figure 1

22 pages, 4198 KiB  
Article
YOLOv11-BSS: Damaged Region Recognition Based on Spatial and Channel Synergistic Attention and Bi-Deformable Convolution in Sanding Scenarios
by Yinjiang Li, Zhifeng Zhou and Ying Pan
Electronics 2025, 14(7), 1469; https://doi.org/10.3390/electronics14071469 (registering DOI) - 5 Apr 2025
Viewed by 34
Abstract
In order to address the problem that the paint surface of the damaged region of the body is similar to the color texture characteristics of the usual paint surface, which leads to the phenomenon of leakage or misdetection in the detection process, an [...] Read more.
In order to address the problem that the paint surface of the damaged region of the body is similar to the color texture characteristics of the usual paint surface, which leads to the phenomenon of leakage or misdetection in the detection process, an algorithm for detecting the damaged region of the body based on the improved YOLOv11 is proposed. Firstly, bi-deformable convolution is proposed to optimize the convolution kernel shape offset direction, which effectively improves the feature representation power of the backbone network; secondly, the C2PSA-SCSA module is designed to construct the coupling between spatial attention and channel attention, which enhances the perceptual power of the backbone network, and makes the model pay better attention to the damaged region features. Then, based on the GSConv module and the DWConv module, we build the slim-neck feature fusion network based on the GSConv module and DWConv module, which effectively fuses local features and global features to improve the saturation of semantic features; finally, the Focaler-CIoU border loss function is designed, which makes use of the principle of Focaler-IoU segmented linear mapping, adjusts the border loss function’s attention to different samples, and improves the model’s convergence of feature learning at various scales. The experimental results show that the enhanced YOLOv11-BSS network improves the precision rate by 7.9%, the recall rate by 1.4%, and the mAP@50 by 3.7% over the baseline network, which effectively reduces the leakage and misdetection of the damaged areas of the car body. Full article
17 pages, 6487 KiB  
Article
A Cost-Effective System for EMG/MMG Signal Acquisition
by Jerzy S. Witkowski and Andrzej Grobelny
Electronics 2025, 14(7), 1468; https://doi.org/10.3390/electronics14071468 (registering DOI) - 5 Apr 2025
Viewed by 55
Abstract
This article presents a cost-effective, robust, and reliable system for EMG/MMG (electromyography/mechanomyography). Signals indicating muscle activity have numerous applications and are the subject of many studies. However, acquiring these signals is challenging. Commercial measurement systems are often expensive, limiting their accessibility. Therefore, the [...] Read more.
This article presents a cost-effective, robust, and reliable system for EMG/MMG (electromyography/mechanomyography). Signals indicating muscle activity have numerous applications and are the subject of many studies. However, acquiring these signals is challenging. Commercial measurement systems are often expensive, limiting their accessibility. Therefore, the primary goal of this project was to develop a simple and affordable system for simultaneous EMG and MMG data acquisition, offering efficiency comparable to commercial systems. The system consists of eight EMG/MMG probes, 16-bit analog-to-digital converters with 16 channels, and a microprocessor unit. Despite its multiple components, the system remains simple and user-friendly. This paper describes the construction of the EMG/MMG probe and analyzes the intrinsic noise of the preamplifier, as well as electromagnetic interference, particularly power line noise. The elimination of power line noise was carried out in two stages: first, using techniques known for electromagnetic compatibility (EMC), and second, by implementing a digital filter in the microprocessor system. The proposed solution enables direct data collection from eight EMG/MMG probes using any computer equipped with a USB interface. This interface facilitates both data transmission and power supply, making EMG/MMG data acquisition straightforward and efficient. Full article
(This article belongs to the Section Bioelectronics)
Show Figures

Figure 1

26 pages, 6966 KiB  
Article
Applying Collaborative Co-Simulation to Railway Traction Energy Consumption
by David Golightly, Anirban Bhattacharyya, Ken Pierce, Zhongbei Tian, Zhiyuan Lin, Ronghui Liu, Xinnan Lyu, Kangrui Jiang and Xiao Liu
Electronics 2025, 14(7), 1467; https://doi.org/10.3390/electronics14071467 (registering DOI) - 5 Apr 2025
Viewed by 44
Abstract
Simulation is a vital tool for understanding rail traction energy consumption. Simulating such energy consumption requires an understanding of the interactions between timetable, infrastructure, and driver behavior to be encapsulated within a multi-train system model. This is critical to simulating systemic interactions that [...] Read more.
Simulation is a vital tool for understanding rail traction energy consumption. Simulating such energy consumption requires an understanding of the interactions between timetable, infrastructure, and driver behavior to be encapsulated within a multi-train system model. This is critical to simulating systemic interactions that affect energy consumption on a rail network. However, building and executing such a system simulation is challenging because of diverse models, stakeholders, and knowledge, as well as a lack of tools to support flexible and scalable simulation. This paper presents a demonstration of co-simulation—an approach originating in the automotive industry and now being used in other sectors—that enables a system model to be assessed for different configurations of timetable, rolling stock, infrastructure, and driver behavior. This paper describes the co-simulation approach before outlining the development process that allowed three research institutes, each with diverse models, to collaborate and deliver an integrated, holistic modeling approach. The results of this work are presented and discussed, both in terms of the quantified outputs and findings for energy consumption, and the lessons learned through collaborative co-simulation. Future avenues to build on this work are identified. Full article
(This article belongs to the Special Issue Railway Traction Power Supply, 2nd Edition)
Show Figures

Figure 1

25 pages, 4068 KiB  
Article
Integrated Sliding Mode Control for Permanent Magnet Synchronous Motor Drives Based on Second-Order Disturbance Observer and Low-Pass Filter
by Tran Thanh Tuyen, Jian Yang, Liqing Liao and Jingyang Zhou
Electronics 2025, 14(7), 1466; https://doi.org/10.3390/electronics14071466 (registering DOI) - 5 Apr 2025
Viewed by 32
Abstract
This article presents an improved control strategy based on the traditional sliding-mode controller (SMC), integrated with a generalized higher-order disturbance observer (DOB), to enhance the speed regulation of permanent magnet synchronous motors (PMSMs) during operation. The proposed method is mitigated and employed to [...] Read more.
This article presents an improved control strategy based on the traditional sliding-mode controller (SMC), integrated with a generalized higher-order disturbance observer (DOB), to enhance the speed regulation of permanent magnet synchronous motors (PMSMs) during operation. The proposed method is mitigated and employed to smooth system disturbances by utilizing the disturbance observer (DOB) in conjunction with a low-pass filter (LPF). The low-pass filter is employed to smooth the q-axis current component and reduce speed oscillations. Initially, the paper builds upon the conventional control law and introduces a more optimized approach. The stability of the control strategy is then analyzed using Lyapunov stability theory. Different sliding surfaces are compared to develop the proposed SMC. Finally, the novel control method is introduced by integrating the DOB with the LPF. This approach results in improved speed stability and enhanced adaptability compared to traditional SMC techniques. Simulation and experimental results demonstrate that the proposed control algorithm outperforms traditional methods, particularly in terms of the dynamic response and disturbance rejection. Full article
(This article belongs to the Section Systems & Control Engineering)
23 pages, 665 KiB  
Article
TPH-Fuzz: A Two-Phase Hybrid Fuzzing Framework for Smart Contract Vulnerability Detection
by Fanglei Shi, Jinsheng Yang and Zhaohui Guo
Electronics 2025, 14(7), 1465; https://doi.org/10.3390/electronics14071465 (registering DOI) - 5 Apr 2025
Viewed by 34
Abstract
Blockchain technology is revolutionizing various industries through decentralized architecture and secure transaction mechanisms, yet its core application—smart contracts—faces increasingly sophisticated security threats. Recognizing the critical need for enhanced protection in this emerging domain, this paper introduces TPH-Fuzz, a two-phase hybrid fuzzing framework designed [...] Read more.
Blockchain technology is revolutionizing various industries through decentralized architecture and secure transaction mechanisms, yet its core application—smart contracts—faces increasingly sophisticated security threats. Recognizing the critical need for enhanced protection in this emerging domain, this paper introduces TPH-Fuzz, a two-phase hybrid fuzzing framework designed to overcome current limitations in vulnerability detection. TPH-Fuzz combines global exploration with local vulnerability targeting. It utilizes dynamic symbolic execution for semantics-aware path analysis and employs data-dependency-based state modeling to generate effective transaction sequences. These methods improve both path exploration and vulnerability detection precision significantly. Experiments on a coverage dataset of 9309 contracts demonstrate an 85% branch coverage on complex contracts, outperforming conventional methods; meanwhile, tests on a vulnerability dataset of 1086 labeled contracts show a detection precision of 89.24% across eight vulnerability categories. The promising results underscore the framework’s potential to transform security auditing practices in the blockchain industry, paving the way for more reliable smart contract development and deployment. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

22 pages, 1839 KiB  
Article
A Multimodal Artificial Intelligence Model for Depression Severity Detection Based on Audio and Video Signals
by Liyuan Zhang, Shuai Zhang, Xv Zhang and Yafeng Zhao
Electronics 2025, 14(7), 1464; https://doi.org/10.3390/electronics14071464 (registering DOI) - 4 Apr 2025
Viewed by 49
Abstract
In recent years, artificial intelligence (AI) has increasingly utilized speech and video signals for emotion recognition, facial recognition, and depression detection, playing a crucial role in mental health assessment. However, the AI-driven research on detecting depression severity remains limited, and the existing models [...] Read more.
In recent years, artificial intelligence (AI) has increasingly utilized speech and video signals for emotion recognition, facial recognition, and depression detection, playing a crucial role in mental health assessment. However, the AI-driven research on detecting depression severity remains limited, and the existing models are often too large for lightweight deployment, restricting their real-time monitoring capabilities, especially in resource-constrained environments. To address these challenges, this study proposes a lightweight and accurate multimodal method for detecting depression severity, aiming to provide effective support for smart healthcare systems. Specifically, we design a multimodal detection network based on speech and video signals, enhancing the recognition of depression severity by optimizing the cross-modal fusion strategy. The model leverages Long Short-Term Memory (LSTM) networks to capture long-term dependencies in speech and visual sequences, effectively extracting dynamic features associated with depression. Considering the behavioral differences of respondents when interacting with human versus robotic interviewers, we train two separate sub-models and fuse their outputs using a Mixture of Experts (MOE) framework capable of modeling uncertainty, thereby suppressing the influence of low-confidence experts. In terms of the loss function, the traditional Mean Squared Error (MSE) is replaced with Negative Log-Likelihood (NLL) to better model prediction uncertainty and enhance robustness. The experimental results show that the improved AI model achieves an accuracy of 83.86% in depression severity recognition. The model’s floating-point operations per second (FLOPs) reached 0.468 GFLOPs, with a parameter size of only 0.52 MB, demonstrating its compact size and strong performance. These findings underscore the importance of emotion and facial recognition in AI applications for mental health, offering a promising solution for real-time depression monitoring in resource-limited environments. Full article
Back to TopTop