3D Paper Pyramid Tutorial

MFPD: Mamba-Driven Feature Pyramid Decoding for Underwater Object Detection

Abstract: Underwater object detection suffers from limited long-range dependency modeling, fine-grained feature representation, and noise suppression, resulting in blurred boundaries, frequent missed ...

GitHub

Unifying 3D Vision-Language Understanding via Promptable Queries

[ 2024.08 ] Release training and evaluation. [ 2024.07 ] Our huggingface DEMO is here DEMO, welcome to try our model! [ 2024.07 ] Release codes of model! TODO: Clean up training and evaluation A ...

GitHub

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Previous research has investigated the application of Multimodal Large Language Models (MLLMs) in understanding 3D scenes by interpreting them as videos. These approaches generally depend on ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

MFPD: Mamba-Driven Feature Pyramid Decoding for Underwater Object Detection

Unifying 3D Vision-Language Understanding via Promptable Queries

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Trending now