MoleculeInsight is an advanced AI-powered platform designed for comprehensive molecular research. By orchestrating a team of specialized AI agents, it aggregates and analyzes data from clinical trials, patents, global trade, market intelligence, and web sources. Using Retrieval-Augmented Generation (RAG), it also leverages internal knowledge bases to provide research-grade insights for pharmaceutical and biotech decision-making.
- 🤖 Multi-Agent Orchestration – A coordinated system of specialized agents running in parallel to gather diverse data points.
- 🧠 RAG-Powered Knowledge – Integrated Retrieval-Augmented Generation (RAG) system that synthesizes internal JSON/PDF documents for context-aware answers.
- 🧬 3D Molecule Visualization – Interactive 3D molecular structure viewer powered by PubChem API and 3Dmol.js:
- Automatic CID (Compound ID) retrieval from molecule names
- Real-time 3D structure rendering in SDF format
- Interactive controls (drag to rotate, scroll to zoom)
- Auto-rotation for enhanced visualization
- 📚 Wikipedia Integration – Contextual molecule information using LangChain:
- Comprehensive overview and background
- Discovery history and scientific context
- Seamlessly integrated with chemical properties
- ⚡ Real-time Analysis – Live monitoring of agent activities and progress updates via WebSockets/polling.
- 📊 Comprehensive Data Sources:
- Clinical Trials: ClinicalTrials.gov data analysis.
- Patents: Patent landscape from PatentsView.
- Trade Data: Import/Export trends via UN Comtrade.
- Market Intel: IQVIA and industry insights.
- Web Intelligence: Real-time news and web search aggregation.
- Wikipedia: Contextual information and molecule background.
- PubChem: Chemical properties and 3D structures.
- 📈 Interactive Dashboard – Rich visualizations using Recharts and Shadcn UI.
- 📝 Automated Reporting – Generates detailed markdown and PDF reports of the analysis.
Built with modern web technologies for a responsive and premium experience:
- Framework: Next.js 16 (App Router)
- UI Architecture: React 19, Tailwind CSS
- Components: Shadcn UI, Radix Primitives
- Visualization: Recharts, Framer Motion
A robust agentic backend powered by:
- API: FastAPI
- AI Core: Google Gemini AI (GenAI)
- Orchestration: Python
concurrent.futuresandasynciofor parallel agent execution. - RAG Engine: ChromaDB for vector storage, supporting JSON/PDF ingestion.
Located in agents/Agent-workers/, these specialized agents perform distinct tasks:
- Clinical Trials Agent: Searches registry for trial phases and status.
- Patent Agent: Analyzes IP landscape and patent filings.
- EXIM Trade Agent: Tracks global trade flows (HS Codes).
- IQVIA Agent: Simulates market intelligence gathering.
- Web Intelligence Agent: Scrapes and summarizes latest web news.
- Internal Knowledge Agent: Queries the local RAG system for proprietary data.
- Wikipedia Agent: Fetches comprehensive molecule information and context from Wikipedia using LangChain.
- Innovation Strategy Agent: Synthesizes all gathered data to propose strategic opportunities.
- Node.js: v18 or higher (using
pnpm) - Python: v3.9 or higher
- API Keys: Google Gemini, UN Comtrade, NewsAPI
git clone https://github.com/BikramMondal5/MoleculeInsight.git
cd MoleculeInsightThe backend handles all AI agents and RAG logic.
# Terminal 1
cd agents
# Create a virtual environment
python -m venv venv
# Activate Virtual Environment
.\venv\Scripts\Activate.ps1
# Install dependencies
pip install -r requirements.txtYou need to configure environment variables for both the backend and frontend.
Create a .env file in agents/RAG/.env (yes, inside the RAG directory):
GEMINI_API_KEY=your_gemini_api_key_here
# Add other keys if required by specific agents (e.g., NEWS_API_KEY, COMTRADE_KEY etc.)Create or update the .env file in the project root:
NEXTAUTH_URL=http://localhost:3000To populate the vector database with your internal JSON/PDF documents:
# Ensure you are still in the agents directory with venv activated
python RAG/ingest_all.pyThis will process documents in agents/RAG/KnowledgeBase/ and store embeddings in agents/RAG/db/.
Open a new terminal in the project root:
# Install dependencies
pnpm installIn your backend terminal (agents/ directory):
python main.pyThe API will start at
http://localhost:8000
In your frontend terminal (root directory):
pnpm devThe web app will be available at
http://localhost:3000
- Dashboard: Navigate to
http://localhost:3000/analysis. - Search: Enter a molecule name (e.g., "Aspirin", "Atorvastatin", "Pembrolizumab") or a research query.
- Filters: Optionally specify geography or specific focus areas.
- Run Analysis: Click "Run Agentic Analysis".
- You will see cards for each agent lighting up as they work.
- Real-time logs will appear in the "Live Agent Status" panel.
- Results: Once complete, explore the tabbed reports for Clinical, Patents, Market, and more.
- Executive Summary: Click the "Summary" button to view:
- Interactive 3D Molecule Viewer: Rotate and zoom the molecular structure
- Wikipedia Context: Comprehensive background and discovery information
- Chemical Properties: Molecular formula, weight, and IUPAC name from PubChem
- Data Visualizations: Charts showing agent accuracy and data coverage
- Export: Download the comprehensive report as a PDF.
The Executive Summary modal provides a comprehensive overview of the analyzed molecule with cutting-edge visualizations:
- PubChem Integration: Automatically fetches 3D structures using the PubChem REST API
- Interactive Viewer: Built with 3Dmol.js for professional molecular rendering
- Controls:
- Drag to rotate the molecule
- Scroll to zoom in/out
- Auto-rotation for dynamic presentation
- Format: Renders SDF (Structure Data File) format with stick and sphere styles
- LangChain Integration: Uses WikipediaAPIWrapper for reliable data retrieval
- Rich Content: Displays comprehensive molecule overview and background
- Markdown Rendering: Properly formatted with ReactMarkdown for professional display
- Smart Extraction: Automatically fetches relevant information based on molecule name
- Molecular Formula: Chemical composition
- Molecular Weight: Precise mass in g/mol
- IUPAC Name: Systematic chemical nomenclature
- Agent Accuracy Chart: Donut chart showing data richness by source
- Coverage Comparison: Bar chart comparing data coverage across agents
- Agent Status: Real-time success/failure indicators for each agent
Found a bug? 🐞 Have a feature request?
- Open an issue or submit a pull request — contributions are always welcome!
This project is licensed under the MIT License.
