Abstract: With the rapid growth of internet content, multimodal long document data has become increasingly prominent, drawing significant attention from researchers. However, most existing methods ...