Visualizing Audio Spectrogram

SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering

Abstract: The Audio-Visual Question Answering (AVQA) task holds significant potential for applications. Compared to traditional unimodal approaches, the multi-modal input of AVQA makes feature ...

Dozens of synced videos capture humanity amid horror of Bondi attack

As gunshots rang out at Bondi, dozens of eyewitnesses risked their lives to film the horror. This is what they wanted you to see.

When a horse whinnies, there's more than meets the ear

A new study finds that horse whinnies are made of both a high and a low frequency, generated by different parts of the vocal ...

IEEE

Listen With Seeing: Cross-Modal Contrastive Learning for Audio-Visual Event Localization

Abstract: In real-world physiological and psychological scenarios, there often exists a robust complementary correlation between audio and visual signals. Audio-Visual Event Localization (AVEL) aims ...

How the color of a theater affects sound perception

Live music can engage more than just one sense, despite it being an auditory medium. Lighting and visual effects can enhance the listening experience, but it is unclear if they can also affect the ...

GitHub

Making Acoustic Side-Channel Attacks on Noisy Keyboards Viable with LLM-Assisted Spectrograms "Typo" Correction

This repo is the implementation of a research project aimed at enhancing Acoustic Side-Channel Attacks (ASCAs) using a novel combination of Vision Transformers (VTs) and Large Language Models (LLMs).

Some results have been hidden because they may be inaccessible to you

Show inaccessible results