Hearing the Forest: Machine Learning for Biodiversity Monitoring Using Soundscapes
The United Nations 2030 Agenda highlights the urgent need to address biodiversity loss and land degradation. This work contributes by supporting the Payment for Environmental Services (PSA) program in Costa Rica through large-scale Passive Acoustic Monitoring (PAM). We propose FOREST, a modular Python-based framework integrating preprocessing, dataset curation, feature extraction, visualization, and predictive classification of ecological audio data. A dataset of 249,660 x 6,016 was constructed, extracting statistical features and Ecological Acoustic Indices (EAIs). An evaluation framework with 3,577 experimental runs analyzes feature impact on model performance. Results show that a subset of EAIs (NPP, BET, HTP, AEI, HFQ) enables robust classification. Hybrid deep learning models were developed, with ParaNet-CNN-LSTM achieving the most consistent performance, exceeding 90 percent median accuracy and 97.5 percent maximum accuracy. The framework is implemented as an open-source web application. Despite limitations such as dataset imbalance and missing metadata, the approach demonstrates that combining EAIs with hybrid deep learning models enables scalable biodiversity monitoring across ecosystems.