Skip to content

Latest commit

 

History

History
42 lines (41 loc) · 7.71 KB

File metadata and controls

42 lines (41 loc) · 7.71 KB

Advanced LLM

  • RSGPT: A Remote Sensing Vision Language Model and Benchmark. arXiv'2023. [Paper | Code]
  • GeoChat: Grounded Large Vision-Language Model for Remote Sensing. CVPR'2024. [Paper | Code]
  • Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs. CVPRW'2024. [paper | code]
  • Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data. CVPRW'2024. [Paper | Code]
  • EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering. AAAI'2024. [Paper | Code]
  • Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remotesensing domain. TGRS'2024. [Paper | Code]
  • EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension. TGRS'2024. [paper | code]
  • RingMoGPT: A Unified Remote Sensing Foundation Model for Vision, Language, and grounded tasks. TGRS'2024. [paper]
  • LHRS-Bot: Empowering remote sensing with vgi-enhanced large multimodal language model. ECCV'2024. [Paper | Code]
  • VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding. arXiv'2024. [Paper | Code]
  • Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension. arXiv'2024. [paper]
  • RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts. arXiv'2024. [paper | code]
  • UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models. arXiv'2024. [paper]
  • SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model. ISPRS P&RS'2025. [paper | code]
  • LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation. ISPRS P&RS'2025. [paper | code]
  • SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding. arXiv'2025. [Paper | Code]
  • Falcon: A Remote Sensing Vision-Language Foundation Model. arXiv'2025. [Paper | Code]
  • TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data. ICLR'2025. [Paper | Code]
  • VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis. AAAI'2025. [Paper | Code]
  • XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? CVPR'2025. [Paper | Code]
  • EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues. arXiv'2024. [paper | Code]
  • REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation. arXiv'2024. [paper]
  • AllSpark: A Multimodal Spatiotemporal General Intelligence Model With Ten Modalities via Language as a Reference Framework. TGRS'2025. [Paper | Code]
  • Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach. CVPRW'2025. [Paper | Code]
  • UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding. ICCV'2025. [Paper | Code]
  • When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning. ICCV'2025. [Paper | Code]
  • GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks. ICCV'2025. [Paper | Code]
  • GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing. arXiv'2025. [paper | Code]
  • Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models. arXiv'2025. [paper]
  • When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning. arXiv'2025. [paper | code]
  • EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing. arXiv'2025. [paper | code]
  • OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence. arXiv'2025. [paper]
  • Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind. arXiv'2025. [Paper | Code]
  • Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling. arXiv'2025. [Paper]
  • Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models. arXiv'2025. [Paper | Code]
  • Few-Shot Vision-Language Reasoning for Satellite Imagery via Verifiable Rewards. arXiv'2025. [Paper]
  • GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution. NeurlPS'2025. [Paper | Code]
  • GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning. arXiv'2025. [Paper | Code]
  • Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications. arXiv'2025. [Paper]
  • GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding. arXiv'2025. [Paper]