publications | Carlos VARGAS

2025

Diplomarbeit
Hearing the FOREST: Machine Learning for Biodiversity Monitoring Using Soundscapes

Carlos Alberto Vargas Rivera

Oct 2025

mit Auszeichnung bestanden

Abs DOI Bib HTML PDF Code

The 2030 United Nations Agenda for Sustainable Development highlights the urgent need to address biodiversity loss and land degradation, which threaten ecosystems and livelihoods worldwide. This thesis contributes to these efforts by supporting the Payment for Environmental Services (PSA) environmental conservation program in Costa Rica as a case study for large-scale biodiversity monitoring using Passive Acoustic Monitoring (PAM). We propose the FOREST (FramewOrk for featuRe Extraction, viSualisation, and classificaTion of Soundscapes), a modular Python-based framework that integrates preprocessing, dataset curation, feature extraction, visualisation, and predictive classification of ecological audio recordings. First, we establish a pipeline that transforms soundscapes into PyTorch tensors, consolidating a curated dataset of shape 249,660 x 6,016 and extracting five statistical scalars and eleven Ecological Acoustic Indices (EAIs), including Number of Peaks (NPP), Bioacoustic Index (BET), Temporal Entropy (HTP), Frequency Entropy (HFQ), and Acoustic Evenness Index (AEI). Second, we develop an evaluation framework comprising 3,577 experiments to systematically analyse the impact of individual features and their combinations on model performance. The results show that a subset of five EAIs, namely NPP, BET, HTP, HFQ, and AEI, achieves robust and accurate classification. Complementary spidernet visualisations reveal distinct ecoacoustic profiles across four ecosystem regions (Reference Forest, Pasture, Natural Regeneration, and Plantation), supporting the interpretation of these indices as proxy indicators of biodiversity. Third, we design and benchmark three hybrid Deep Learning (DL) models, namely ParaNet-CNN-LSTM (Parallel Convolutional Neural Network and Long Short-Term Memory), SeqNet-CNN-LSTM (Sequential Convolutional Neural Network and Long Short-Term Memory), and SeqNet-LSTM-CNN (Sequential Long Short-Term Memory and Convolutional Neural Network), against baseline models including Support Vector Machine (SVM), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and ResNet1D. The comparative analysis shows that ParaNet-CNN-LSTM achieves the most consistent and reliable performance, with median accuracy values above 90 percent and maximum values exceeding 96.2 percent in the optimal range of 10 to 13 input features. The FOREST framework consolidates these contributions into an open-source, web-based application available at www.soundforest.app. Despite limitations such as dataset imbalance, temporal assumptions, absence of metadata, and restriction to the PSA program, the methodology provides a rigorous foundation. This thesis demonstrates that combining Ecological Acoustic Indices with Deep Learning-based hybrid models enables accurate ecological soundscape classification and offers a scalable approach for biodiversity monitoring across diverse ecosystems.
@book{Vargas2025, author = {Vargas Rivera, Carlos Alberto}, title = {Hearing the FOREST: Machine Learning for Biodiversity Monitoring Using Soundscapes}, booktitle = {Hearing the FOREST: Machine Learning for Biodiversity Monitoring Using Soundscapes}, year = {2025}, month = oct, address = {Wien, Österreich}, publisher = {Technische Universität Wien}, school = {Technische Universität Wien}, type = {Diplomarbeit}, language = {en}, doi = {10.34726/hss.2025.126900}, url = {https://doi.org/10.34726/hss.2025.126900}, urn = {http://hdl.handle.net/20.500.12708/220567}, note = {mit Auszeichnung bestanden}, institution = {E192 - Institute of Logic and Computation}, advisor = {Sallinger, Emanuel}, orcid = {0000-0002-1757-3249}, keywords = {Machine Learning, Deep Learning, CNN, LSTM, Ecoacoustics, Bioacoustics, Soundscapes, Passive Acoustic Monitoring, Biodiversity, Audio Processing}, }

2024

Forschungsarbeit
Exploring Multicultural Bridges: An Analysis of TikTok’s Content Dynamics

Nil Yagmur Ilba, Carlos Alberto Vargas Rivera, Yiwei Liu, and 1 more author

In Research Project at Idiap Research Institute, Jun 2024

Abs DOI Bib PDF

TikTok has emerged as a major platform shaping how users interact with media content through personalized recommendation systems. This work investigates multicultural and international content dynamics by analyzing a dataset of 2,580 TikTok videos collected via the Research API. Using natural language processing and Latent Dirichlet Allocation (LDA), four thematic clusters are identified: viral content, music and dance, mental health, and artists. In addition, a hashtag co-occurrence network is constructed to uncover structural relationships and influential communities within the dataset. The results reveal that multicultural dynamics on TikTok are strongly linked to emotional expression, global trends, and shared social experiences, highlighting the platform’s role as a complex and multilingual digital ecosystem.
@inproceedings{Vargas2024, author = {Ilba, Nil Yagmur and Vargas Rivera, Carlos Alberto and Liu, Yiwei and Sekeroglu, Irem}, title = {Exploring Multicultural Bridges: An Analysis of TikTok's Content Dynamics}, booktitle = {Research Project at Idiap Research Institute}, year = {2024}, month = jun, address = {Lausanne, Switzerland}, publisher = {ACM}, language = {en}, doi = {10.13140/RG.2.2.17868.88965}, url = {https://doi.org/10.13140/RG.2.2.17868.88965}, orcid = {0000-0002-1757-3249}, keywords = {Social Media, TikTok, Multiculturalism, Network Analysis, Topic Modeling, Hashtags, Data Science}, school = {EPFL - École Polytechnique Fédérale de Lausanne}, institution = {Idiap Research Institute}, advisor = {Gatica-Perez, Daniel}, }