Advanced LLM

RSGPT: A Remote Sensing Vision Language Model and Benchmark. arXiv'2023. [Paper | Code]
GeoChat: Grounded Large Vision-Language Model for Remote Sensing. CVPR'2024. [Paper | Code]
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs. CVPRW'2024. [paper | code]
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data. CVPRW'2024. [Paper | Code]
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering. AAAI'2024. [Paper | Code]
Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remotesensing domain. TGRS'2024. [Paper | Code]
EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension. TGRS'2024. [paper | code]
RingMoGPT: A Unified Remote Sensing Foundation Model for Vision, Language, and grounded tasks. TGRS'2024. [paper]
LHRS-Bot: Empowering remote sensing with vgi-enhanced large multimodal language model. ECCV'2024. [Paper | Code]
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding. arXiv'2024. [Paper | Code]
Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension. arXiv'2024. [paper]
RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts. arXiv'2024. [paper | code]
UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models. arXiv'2024. [paper]
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model. ISPRS P&RS'2025. [paper | code]
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation. ISPRS P&RS'2025. [paper | code]
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding. arXiv'2025. [Paper | Code]
Falcon: A Remote Sensing Vision-Language Foundation Model. arXiv'2025. [Paper | Code]
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data. ICLR'2025. [Paper | Code]
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis. AAAI'2025. [Paper | Code]
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? CVPR'2025. [Paper | Code]
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues. arXiv'2024. [paper | Code]
REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation. arXiv'2024. [paper]
AllSpark: A Multimodal Spatiotemporal General Intelligence Model With Ten Modalities via Language as a Reference Framework. TGRS'2025. [Paper | Code]
Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach. CVPRW'2025. [Paper | Code]
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding. ICCV'2025. [Paper | Code]
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning. ICCV'2025. [Paper | Code]
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks. ICCV'2025. [Paper | Code]
GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing. arXiv'2025. [paper | Code]
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models. arXiv'2025. [paper]
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning. arXiv'2025. [paper | code]
EagleVision: Object-level Attribute Multimodal LLM for Remote Sensing. arXiv'2025. [paper | code]
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence. arXiv'2025. [paper]
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind. arXiv'2025. [Paper | Code]
Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling. arXiv'2025. [Paper]
Landsat-Bench: Datasets and Benchmarks for Landsat Foundation Models. arXiv'2025. [Paper | Code]
Few-Shot Vision-Language Reasoning for Satellite Imagery via Verifiable Rewards. arXiv'2025. [Paper]
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution. NeurlPS'2025. [Paper | Code]
GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning. arXiv'2025. [Paper | Code]
Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications. arXiv'2025. [Paper]
GeoDiT: A Diffusion-based Vision-Language Model for Geospatial Understanding. arXiv'2025. [Paper]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced LLM

FilesExpand file tree

large_language_model.md

Latest commit

History

large_language_model.md

File metadata and controls

Advanced LLM