Sign in to use this feature.

Years

Between: -

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (405)

Search Parameters:
Journal = AI

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 5409 KiB  
Article
Discriminative Deformable Part Model for Pedestrian Detection with Occlusion Handling
by Shahzad Siddiqi, Muhammad Faizan Shirazi and Yawar Rehman
AI 2025, 6(4), 70; https://doi.org/10.3390/ai6040070 - 3 Apr 2025
Viewed by 114
Abstract
Efficient pedestrian detection plays an important role in many practical daily life applications, such as autonomous cars, video surveillance, and intelligent driving assistance systems. The main goal of pedestrian detection systems, especially in vehicles, is to prevent accidents. By recognizing pedestrians in real [...] Read more.
Efficient pedestrian detection plays an important role in many practical daily life applications, such as autonomous cars, video surveillance, and intelligent driving assistance systems. The main goal of pedestrian detection systems, especially in vehicles, is to prevent accidents. By recognizing pedestrians in real time, these systems can alert drivers or even autonomously apply brakes, minimizing the possibility of collisions. However, occlusion is a major obstacle to pedestrian detection. Pedestrians are typically occluded by trees, street poles, cars, and other pedestrians. State-of-the-art detection methods are based on fully visible or little-occluded pedestrians; hence, their performance declines with increasing occlusion level. To meet this challenge, a pedestrian detector capable of handling occlusion is preferred. To increase the detection accuracy for occluded pedestrians, we propose a new method called the Discriminative Deformable Part Model (DDPM), which uses the concept of breaking human image into deformable parts via machine learning. In existing works, human image breaking into deformable parts has been performed by human intuition. In our novel approach, machine learning is used for deformable objects such as humans, combining the benefits and removing the drawbacks of the previous works. We also propose a new pedestrian dataset based on Eastern clothes to accommodate the detector’s evaluation under different intra-class variations of pedestrians. The proposed method achieves a higher detection accuracy on Pascal VOC and VisDrone Detection datasets when compared with other popular detection methods. Full article
Show Figures

Figure 1

21 pages, 494 KiB  
Article
LineMVGNN: Anti-Money Laundering with Line-Graph-Assisted Multi-View Graph Neural Networks
by Chung-Hoo Poon, James Kwok, Calvin Chow and Jang-Hyeon Choi
AI 2025, 6(4), 69; https://doi.org/10.3390/ai6040069 - 3 Apr 2025
Viewed by 95
Abstract
Anti-money laundering (AML) systems are important for protecting the global economy. However, conventional rule-based methods rely on domain knowledge, leading to suboptimal accuracy and a lack of scalability. Graph neural networks (GNNs) for digraphs (directed graphs) can be applied to transaction graphs and [...] Read more.
Anti-money laundering (AML) systems are important for protecting the global economy. However, conventional rule-based methods rely on domain knowledge, leading to suboptimal accuracy and a lack of scalability. Graph neural networks (GNNs) for digraphs (directed graphs) can be applied to transaction graphs and capture suspicious transactions or accounts. However, most spectral GNNs do not naturally support multi-dimensional edge features, lack interpretability due to edge modifications, and have limited scalability owing to their spectral nature. Conversely, most spatial methods may not capture the money flow well. Therefore, in this work, we propose LineMVGNN (Line-Graph-Assisted Multi-View Graph Neural Network), a novel spatial method that considers payment and receipt transactions. Specifically, the LineMVGNN model extends a lightweight MVGNN module, which performs two-way message passing between nodes in a transaction graph. Additionally, LineMVGNN incorporates a line graph view of the original transaction graph to enhance the propagation of transaction information. We conduct experiments on two real-world account-based transaction datasets: the Ethereum phishing transaction network dataset and a financial payment transaction dataset from one of our industry partners. The results show that our proposed method outperforms state-of-the-art methods, reflecting the effectiveness of money laundering detection with line-graph-assisted multi-view graph learning. We also discuss scalability, adversarial robustness, and regulatory considerations of our proposed method. Full article
(This article belongs to the Special Issue AI in Finance: Leveraging AI to Transform Financial Services)
Show Figures

Figure 1

20 pages, 5474 KiB  
Article
Voice-AttentionNet: Voice-Based Multi-Disease Detection with Lightweight Attention-Based Temporal Convolutional Neural Network
by Jintao Wang, Jianhang Zhou and Bob Zhang
AI 2025, 6(4), 68; https://doi.org/10.3390/ai6040068 - 28 Mar 2025
Viewed by 177
Abstract
Voice data contain a wealth of temporal and spectral information and can be a valuable resource for disease classification. However, traditional methods are often not effective in capturing the key features required for the classification of multiple disease classes. To address this challenge, [...] Read more.
Voice data contain a wealth of temporal and spectral information and can be a valuable resource for disease classification. However, traditional methods are often not effective in capturing the key features required for the classification of multiple disease classes. To address this challenge, we propose a voice-based multi-disease detection approach with a lightweight attention-based temporal convolution neural network (Voice-AttentionNet) designed to analyze speech data for multi-class disease classification. Our model utilizes the temporal convolution neural network (CNN) architecture to extract high-resolution temporal features, while incorporating attention mechanisms to highlight disease-related patterns. Extensive experiments have been conducted on our dataset, including speech samples from patients with multiple illnesses. The results show that our method achieves the most advanced performance with an average classification accuracy of 91.61% on six datasets and is superior to the existing classical models. These findings highlight the potential of combining attention mechanisms with temporal CNNs in the use of speech data for disease classification. Moreover, this study provides a promising direction for deploying AI-driven diagnostic tools in clinical scenarios. Full article
(This article belongs to the Section Medical & Healthcare AI)
Show Figures

Figure 1

27 pages, 584 KiB  
Review
Survey of Architectural Floor Plan Retrieval Technology Based on 3ST Features
by Hongxing Ling, Guangsheng Luo, Nanrun Zhou and Xiaoyan Jiang
AI 2025, 6(4), 67; https://doi.org/10.3390/ai6040067 - 26 Mar 2025
Viewed by 214
Abstract
Feature retrieval technology for building floor plans has garnered significant attention in recent years due to its critical role in the efficient management and execution of construction projects. This paper presents a comprehensive exploration of four primary features essential for the retrieval of [...] Read more.
Feature retrieval technology for building floor plans has garnered significant attention in recent years due to its critical role in the efficient management and execution of construction projects. This paper presents a comprehensive exploration of four primary features essential for the retrieval of building floor plans: semantic features, spatial features, shape features, and texture features (collectively referred to as 3ST features). The extraction algorithms and underlying principles associated with these features are thoroughly analyzed, with a focus on advanced methods such as wavelet transforms and Fourier shape descriptors. Furthermore, the performance of various retrieval algorithms is evaluated through rigorous experimental analysis, offering valuable insights into optimizing the retrieval of building floor plans. Finally, this study outlines prospective directions for the advancement of feature retrieval technology in the context of floor plans. Full article
Show Figures

Figure 1

14 pages, 5649 KiB  
Article
One-Shot Autoregressive Generation of Combinatorial Optimization Solutions Based on the Large Language Model Architecture and Learning Algorithms
by Bishad Ghimire, Ausif Mahmood and Khaled Elleithy
AI 2025, 6(4), 66; https://doi.org/10.3390/ai6040066 - 26 Mar 2025
Viewed by 160
Abstract
Large Language Models (LLMs) have immensely advanced the field of Artificial Intelligence (AI), with recent models being able to perform chain-of-thought reasoning and solve complex mathematical problems, ranging from theorem proving to ones involving advanced calculus. The success of LLMs derives from a [...] Read more.
Large Language Models (LLMs) have immensely advanced the field of Artificial Intelligence (AI), with recent models being able to perform chain-of-thought reasoning and solve complex mathematical problems, ranging from theorem proving to ones involving advanced calculus. The success of LLMs derives from a combination of the Transformer architecture with its attention mechanism, the autoregressive training methodology with masked attention, and the alignment fine-tuning via reinforcement learning algorithms. In this research, we attempt to explore a possible solution to the fundamental NP-hard problem of combinatorial optimization, in particular, the Traveling Salesman Problem (TSP), by following the LLM approach in terms of the architecture and training algorithms. Similar to the LLM design, which is trained in an autoregressive manner to predict the next token, our model is trained to predict the next node in a TSP graph. After the model is trained on random TSP graphs with known near-optimal solutions, we fine-tune the model using Direct Preference Optimization (DPO). The tour generation in a trained model is autoregressive one-step generation with no need for iterative refinement. Our results are very promising and indicate that, for TSP graphs up to 100 nodes, a relatively small amount of training data yield solutions within a few percent of the optimal. This optimization improves if more data are used to train the model. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

18 pages, 2018 KiB  
Article
Adapting a Large-Scale Transformer Model to Decode Chicken Vocalizations: A Non-Invasive AI Approach to Poultry Welfare
by Suresh Neethirajan
AI 2025, 6(4), 65; https://doi.org/10.3390/ai6040065 - 25 Mar 2025
Viewed by 210
Abstract
Natural Language Processing (NLP) and advanced acoustic analysis have opened new avenues in animal welfare research by decoding the vocal signals of farm animals. This study explored the feasibility of adapting a large-scale Transformer-based model, OpenAI’s Whisper, originally developed for human speech recognition, [...] Read more.
Natural Language Processing (NLP) and advanced acoustic analysis have opened new avenues in animal welfare research by decoding the vocal signals of farm animals. This study explored the feasibility of adapting a large-scale Transformer-based model, OpenAI’s Whisper, originally developed for human speech recognition, to decode chicken vocalizations. Our primary objective was to determine whether Whisper could effectively identify acoustic patterns associated with emotional and physiological states in poultry, thereby enabling real-time, non-invasive welfare assessments. To achieve this, chicken vocal data were recorded under diverse experimental conditions, including healthy versus unhealthy birds, pre-stress versus post-stress scenarios, and quiet versus noisy environments. The audio recordings were processed through Whisper, producing text-like outputs. Although these outputs did not represent literal translations of chicken vocalizations into human language, they exhibited consistent patterns in token sequences and sentiment indicators strongly correlated with recognized poultry stressors and welfare conditions. Sentiment analysis using standard NLP tools (e.g., polarity scoring) identified notable shifts in “negative” and “positive” scores that corresponded closely with documented changes in vocal intensity associated with stress events and altered physiological states. Despite the inherent domain mismatch—given Whisper’s original training on human speech—the findings clearly demonstrate the model’s capability to reliably capture acoustic features significant to poultry welfare. Recognizing the limitations associated with applying English-oriented sentiment tools, this study proposes future multimodal validation frameworks incorporating physiological sensors and behavioral observations to further strengthen biological interpretability. To our knowledge, this work provides the first demonstration that Transformer-based architectures, even without species-specific fine-tuning, can effectively encode meaningful acoustic patterns from animal vocalizations, highlighting their transformative potential for advancing productivity, sustainability, and welfare practices in precision poultry farming. Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
Show Figures

Figure 1

17 pages, 4840 KiB  
Article
SMART Restaurant ReCommender: A Context-Aware Restaurant Recommendation Engine
by Ayesha Ubaid, Adrian Lie and Xiaojie Lin
AI 2025, 6(4), 64; https://doi.org/10.3390/ai6040064 - 25 Mar 2025
Viewed by 295
Abstract
With the rise of e-commerce and web application usage, recommendation systems have become important to our daily tasks. They provide personalized suggestions to assist with any task under consideration. While various machine learning algorithms have been developed for recommendation tasks, existing systems still [...] Read more.
With the rise of e-commerce and web application usage, recommendation systems have become important to our daily tasks. They provide personalized suggestions to assist with any task under consideration. While various machine learning algorithms have been developed for recommendation tasks, existing systems still face limitations. This research focuses on advancing context-aware recommendation sytems by leveraging the capabilities of Large Language Models (LLMs) in conjunction with real-time data. The research exploits the integration of existing real-time data APIs with LLMs to enhance the capabilities of the recommendation systems already integrated into smart societies. The experimental results demonstrate that the hybrid approach significantly improves the user experience and recommendation quality, ensuring more relevant and dynamic suggestions. Full article
(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)
Show Figures

Figure 1

34 pages, 14344 KiB  
Article
FedBirdAg: A Low-Energy Federated Learning Platform for Bird Detection with Wireless Smart Cameras in Agriculture 4.0
by Samy Benhoussa, Gil De Sousa and Jean-Pierre Chanet
AI 2025, 6(4), 63; https://doi.org/10.3390/ai6040063 - 21 Mar 2025
Viewed by 303
Abstract
Birds can cause substantial damage to crops, directly affecting farmers’ productivity and profitability. As a result, detecting bird presence in crop fields is crucial for effective crop management. Traditional agricultural practices have used various tools and techniques to deter pest birds, while digital [...] Read more.
Birds can cause substantial damage to crops, directly affecting farmers’ productivity and profitability. As a result, detecting bird presence in crop fields is crucial for effective crop management. Traditional agricultural practices have used various tools and techniques to deter pest birds, while digital agriculture has advanced these efforts through Internet of Things (IoT) and artificial intelligence (AI) technologies. With recent advancements in hardware and processing chips, connected devices can now utilize deep convolutional neural networks (CNNs) for on-field image classification. However, training these models can be energy-intensive, especially when large amounts of data, such as images, need to be transmitted for centralized model training. Federated learning (FL) offers a solution by enabling local training on edge devices, reducing data transmission costs and energy demands while also preserving data privacy and achieving shared model knowledge across connected devices. This paper proposes a low-energy federated learning framework for a compact smart camera network designed to perform simple image classification for bird detection in crop fields. The results demonstrate that this decentralized approach achieves performance comparable to a centrally trained model while consuming at least 8 times less energy. Further efficiency improvements, with a minimal tradeoff in performance reduction, are explored through early stopping. Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
Show Figures

Figure 1

14 pages, 655 KiB  
Perspective
AI-Driven Telerehabilitation: Benefits and Challenges of a Transformative Healthcare Approach
by Rocco Salvatore Calabrò and Sepehr Mojdehdehbaher
AI 2025, 6(3), 62; https://doi.org/10.3390/ai6030062 - 17 Mar 2025
Viewed by 597
Abstract
Artificial intelligence (AI) has revolutionized telerehabilitation by integrating machine learning (ML), big data analytics, and real-time feedback to create adaptive, patient-centered care. AI-driven systems enhance telerehabilitation by analyzing patient data to personalize therapy, monitor progress, and suggest adjustments, eliminating the need for constant [...] Read more.
Artificial intelligence (AI) has revolutionized telerehabilitation by integrating machine learning (ML), big data analytics, and real-time feedback to create adaptive, patient-centered care. AI-driven systems enhance telerehabilitation by analyzing patient data to personalize therapy, monitor progress, and suggest adjustments, eliminating the need for constant clinician oversight. The benefits of AI-powered telerehabilitation include increased accessibility, especially for remote or mobility-limited patients, and greater convenience, allowing patients to perform therapies at home. However, challenges persist, such as data privacy risks, the digital divide, and algorithmic bias. Robust encryption protocols, equitable access to technology, and diverse training datasets are critical to addressing these issues. Ethical considerations also arise, emphasizing the need for human oversight and maintaining the therapeutic relationship. AI also aids clinicians by automating administrative tasks and facilitating interdisciplinary collaboration. Innovations like 5G networks, the Internet of Medical Things (IoMT), and robotics further enhance telerehabilitation’s potential. By transforming rehabilitation into a dynamic, engaging, and personalized process, AI and telerehabilitation together represent a paradigm shift in healthcare, promising improved outcomes and broader access for patients worldwide. Full article
Show Figures

Figure 1

27 pages, 7182 KiB  
Article
Detection of Leaf Diseases in Banana Crops Using Deep Learning Techniques
by Nixon Jiménez, Stefany Orellana, Bertha Mazon-Olivo, Wilmer Rivas-Asanza and Iván Ramírez-Morales
AI 2025, 6(3), 61; https://doi.org/10.3390/ai6030061 - 17 Mar 2025
Viewed by 453
Abstract
Leaf diseases, such as Black Sigatoka and Cordana, represent a growing threat to banana crops in Ecuador. These diseases spread rapidly, impacting both leaf and fruit quality. Early detection is crucial for effective control measures. Recently, deep learning has proven to be a [...] Read more.
Leaf diseases, such as Black Sigatoka and Cordana, represent a growing threat to banana crops in Ecuador. These diseases spread rapidly, impacting both leaf and fruit quality. Early detection is crucial for effective control measures. Recently, deep learning has proven to be a powerful tool in agriculture, enabling more accurate analysis and identification of crop diseases. This study applied the CRISP-DM methodology, consisting of six phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. A dataset of 900 banana leaf images was collected—300 of Black Sigatoka, 300 of Cordana, and 300 of healthy leaves. Three pre-trained models (EfficientNetB0, ResNet50, and VGG19) were trained on this dataset. To improve performance, data augmentation techniques were applied using TensorFlow Keras’s ImageDataGenerator class, expanding the dataset to 9000 images. Due to the high computational demands of ResNet50 and VGG19, training was performed with EfficientNetB0. The models—EfficientNetB0, ResNet50, and VGG19—demonstrated the ability to identify leaf diseases in bananas, with accuracies of 88.33%, 88.90%, and 87.22%, respectively. The data augmentation increased the performance of EfficientNetB0 to 87.83%, but did not significantly improve its accuracy. These findings highlight the value of deep learning techniques for early disease detection in banana crops, enhancing diagnostic accuracy and efficiency. Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
Show Figures

Graphical abstract

15 pages, 3081 KiB  
Article
Antiparasitic Pharmacology Goes to the Movies: Leveraging Generative AI to Create Educational Short Films
by Benjamin Worthley, Meize Guo, Lucas Sheneman and Tyler Bland
AI 2025, 6(3), 60; https://doi.org/10.3390/ai6030060 - 17 Mar 2025
Viewed by 317
Abstract
Medical education faces the dual challenge of addressing cognitive overload and sustaining student engagement, particularly in complex subjects such as pharmacology. This study introduces Cinematic Clinical Narratives (CCNs) as an innovative approach to teaching antiparasitic pharmacology, combining generative artificial intelligence (genAI), edutainment, and [...] Read more.
Medical education faces the dual challenge of addressing cognitive overload and sustaining student engagement, particularly in complex subjects such as pharmacology. This study introduces Cinematic Clinical Narratives (CCNs) as an innovative approach to teaching antiparasitic pharmacology, combining generative artificial intelligence (genAI), edutainment, and mnemonic-based learning. The intervention involved two short films, Alien: Parasites Within and Wormquest, designed to teach antiparasitic pharmacology to first-year medical students. A control group of students only received traditional text-based clinical cases, while the experimental group engaged with the CCNs in an active learning environment. Students who received the CCN material scored an average of 8% higher on exam questions related to the material covered by the CCN compared to students in the control group. Results also showed that the CCNs improved engagement and interest among students, as evidenced by significantly higher scores on the Situational Interest Survey for Multimedia (SIS-M) compared to traditional methods. Notably, students preferred CCNs for their storytelling, visuals, and interactive elements. This study underscores the potential of CCNs as a supplementary educational tool, and suggests the potential for broader applications across other medical disciplines outside of antiparasitic pharmacology. By leveraging genAI and edutainment, CCNs represent a scalable and innovative approach to enhancing the medical learning experience. Full article
(This article belongs to the Special Issue Exploring the Use of Artificial Intelligence in Education)
Show Figures

Figure 1

25 pages, 340 KiB  
Article
Clinical Applicability of Machine Learning Models for Binary and Multi-Class Electrocardiogram Classification
by Daniel Nasef, Demarcus Nasef, Kennette James Basco, Alana Singh, Christina Hartnett, Michael Ruane, Jason Tagliarino, Michael Nizich and Milan Toma
AI 2025, 6(3), 59; https://doi.org/10.3390/ai6030059 - 14 Mar 2025
Viewed by 461
Abstract
Background: This study investigates the application of machine learning models to classify electrocardiogram signals, addressing challenges such as class imbalances and inter-class overlap. In this study, “normal” and “abnormal” refer to electrocardiogram findings that either align with or deviate from a standard electrocardiogram, [...] Read more.
Background: This study investigates the application of machine learning models to classify electrocardiogram signals, addressing challenges such as class imbalances and inter-class overlap. In this study, “normal” and “abnormal” refer to electrocardiogram findings that either align with or deviate from a standard electrocardiogram, warranting further evaluation. “Borderline” indicates an electrocardiogram that requires additional assessment to distinguish benign variations from pathology. Methods: A hierarchical framework reformulated the multi-class problem into two binary classification tasks—distinguishing “Abnormal” from “Non-Abnormal” and “Normal” from “Non-Normal”—to enhance performance and interpretability. Convolutional neural networks, deep neural networks, and tree-based models, including Gradient Boosting Classifier and Random Forest, were trained and evaluated using standard metrics (accuracy, precision, recall, and F1 score) and learning curve convergence analysis. Results: Results showed that convolutional neural networks achieved the best balance between generalization and performance, effectively adapting to unseen data and variations without overfitting. They exhibit strong convergence and robust feature importance rankings, with ventricular rate, QRS duration, and P-R interval identified as key predictors. Tree-based models, despite their high performance metrics, demonstrated poor convergence, raising concerns about their reliability on unseen data. Deep neural networks achieved high sensitivity but suffered from overfitting, limiting their generalizability. Conclusions: The hierarchical binary classification approach demonstrated clinical relevance, enabling nuanced diagnostic insights. Furthermore, the study emphasizes the critical role of learning curve analysis in evaluating model reliability, beyond performance metrics alone. Future work should focus on optimizing model convergence and exploring hybrid approaches to improve clinical applicability in electrocardiogram signal classification. Full article
(This article belongs to the Section Medical & Healthcare AI)
Show Figures

Figure 1

25 pages, 4169 KiB  
Article
Leveraging Spectral Neighborhood Information for Corn Yield Prediction with Spatial-Lagged Machine Learning Modeling: Can Neighborhood Information Outperform Vegetation Indices?
by Efrain Noa-Yarasca, Javier M. Osorio Leyton, Chad B. Hajda, Kabindra Adhikari and Douglas R. Smith
AI 2025, 6(3), 58; https://doi.org/10.3390/ai6030058 - 13 Mar 2025
Viewed by 394
Abstract
Accurate and reliable crop yield prediction is essential for optimizing agricultural management, resource allocation, and decision-making, while also supporting farmers and stakeholders in adapting to climate change and increasing global demand. This study introduces an innovative approach to crop yield prediction by incorporating [...] Read more.
Accurate and reliable crop yield prediction is essential for optimizing agricultural management, resource allocation, and decision-making, while also supporting farmers and stakeholders in adapting to climate change and increasing global demand. This study introduces an innovative approach to crop yield prediction by incorporating spatially lagged spectral data (SLSD) through the spatial-lagged machine learning (SLML) model, an enhanced version of the spatial lag X (SLX) model. The research aims to show that SLSD improves prediction compared to traditional vegetation index (VI)-based methods. Conducted on a 19-hectare cornfield at the ARS Grassland, Soil, and Water Research Laboratory during the 2023 growing season, this study used five-band multispectral image data and 8581 yield measurements ranging from 1.69 to 15.86 Mg/Ha. Four predictor sets were evaluated: Set 1 (spectral bands), Set 2 (spectral bands + neighborhood data), Set 3 (spectral bands + VIs), and Set 4 (spectral bands + top VIs + neighborhood data). These were evaluated using the SLX model and four decision-tree-based SLML models (RF, XGB, ET, GBR), with performance assessed using R2 and RMSE. Results showed that incorporating spatial neighborhood data (Set 2) outperformed VI-based approaches (Set 3), emphasizing the importance of spatial context. SLML models, particularly XGB, RF, and ET, performed best with 4–8 neighbors, while excessive neighbors slightly reduced accuracy. In Set 3, VIs improved predictions, but a smaller subset (10–15 indices) was sufficient for optimal yield prediction. Set 4 showed slight gains over Sets 2 and 3, with XGB and RF achieving the highest R2 values. Key predictors included spatially lagged spectral bands (e.g., Green_lag, NIR_lag, RedEdge_lag) and VIs (e.g., CREI, GCI, NCPI, ARI, CCCI), highlighting the value of integrating neighborhood data for improved corn yield prediction. This study underscores the importance of spatial context in corn yield prediction and lays the foundation for future research across diverse agricultural settings, focusing on optimizing neighborhood size, integrating spatial and spectral data, and refining spatial dependencies through localized search algorithms. Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
Show Figures

Figure 1

27 pages, 10632 KiB  
Article
Integration of YOLOv8 Small and MobileNet V3 Large for Efficient Bird Detection and Classification on Mobile Devices
by Axel Frederick Félix-Jiménez, Vania Stephany Sánchez-Lee, Héctor Alejandro Acuña-Cid, Isaul Ibarra-Belmonte, Efraín Arredondo-Morales and Eduardo Ahumada-Tello
AI 2025, 6(3), 57; https://doi.org/10.3390/ai6030057 - 13 Mar 2025
Viewed by 580
Abstract
Background: Bird species identification and classification are crucial for biodiversity research, conservation initiatives, and ecological monitoring. However, conventional identification techniques used by biologists are time-consuming and susceptible to human error. The integration of deep learning models offers a promising alternative to automate and [...] Read more.
Background: Bird species identification and classification are crucial for biodiversity research, conservation initiatives, and ecological monitoring. However, conventional identification techniques used by biologists are time-consuming and susceptible to human error. The integration of deep learning models offers a promising alternative to automate and enhance species recognition processes. Methods: This study explores the use of deep learning for bird species identification in the city of Zacatecas. Specifically, we implement YOLOv8 Small for real-time detection and MobileNet V3 for classification. The models were trained and tested on a dataset comprising five bird species: Vermilion Flycatcher, Pine Flycatcher, Mexican Chickadee, Arizona Woodpecker, and Striped Sparrow. The evaluation metrics included precision, recall, and computational efficiency. Results: The findings demonstrate that both models achieve high accuracy in species identification. YOLOv8 Small excels in real-time detection, making it suitable for dynamic monitoring scenarios, while MobileNet V3 provides a lightweight yet efficient classification solution. These results highlight the potential of artificial intelligence to enhance ornithological research by improving monitoring accuracy and reducing manual identification efforts. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

22 pages, 1390 KiB  
Article
Emotion-Aware Embedding Fusion in Large Language Models (Flan-T5, Llama 2, DeepSeek-R1, and ChatGPT 4) for Intelligent Response Generation
by Abdur Rasool, Muhammad Irfan Shahzad, Hafsa Aslam, Vincent Chan and Muhammad Ali Arshad
AI 2025, 6(3), 56; https://doi.org/10.3390/ai6030056 - 13 Mar 2025
Viewed by 656
Abstract
Empathetic and coherent responses are critical in automated chatbot-facilitated psychotherapy. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce Emotion-Aware Embedding Fusion, a novel framework integrating hierarchical fusion and attention [...] Read more.
Empathetic and coherent responses are critical in automated chatbot-facilitated psychotherapy. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce Emotion-Aware Embedding Fusion, a novel framework integrating hierarchical fusion and attention mechanisms to prioritize semantic and emotional features in therapy transcripts. Our approach combines multiple emotion lexicons, including NRC Emotion Lexicon, VADER, WordNet, and SentiWordNet, with state-of-the-art LLMs such as Flan-T5, Llama 2, DeepSeek-R1, and ChatGPT 4. Therapy session transcripts, comprising over 2000 samples, are segmented into hierarchical levels (word, sentence, and session) using neural networks, while hierarchical fusion combines these features with pooling techniques to refine emotional representations. Attention mechanisms, including multi-head self-attention and cross-attention, further prioritize emotional and contextual features, enabling the temporal modeling of emotional shifts across sessions. The processed embeddings, computed using BERT, GPT-3, and RoBERTa, are stored in the Facebook AI similarity search vector database, which enables efficient similarity search and clustering across dense vector spaces. Upon user queries, relevant segments are retrieved and provided as context to LLMs, enhancing their ability to generate empathetic and contextually relevant responses. The proposed framework is evaluated across multiple practical use cases to demonstrate real-world applicability, including AI-driven therapy chatbots. The system can be integrated into existing mental health platforms to generate personalized responses based on retrieved therapy session data. The experimental results show that our framework enhances empathy, coherence, informativeness, and fluency, surpassing baseline models while improving LLMs’ emotional intelligence and contextual adaptability for psychotherapy. Full article
(This article belongs to the Special Issue Multimodal Artificial Intelligence in Healthcare)
Show Figures

Figure 1

Back to TopTop