Office365 SharePoint PowerPoint Downloader

A Python tool to automatically capture PowerPoint presentations from SharePoint sites and convert them to PDF format—no SharePoint editing permissions or Office 365 login required.

✨ Features

🤖 Automated Capture: Navigates to SharePoint PowerPoint presentations automatically
📸 High-Quality Screenshots: Captures each slide with excellent quality
📄 PDF Generation: Combines all slides into a single, organized PDF file
🔍 Smart Detection: Intelligent last slide detection using advanced image comparison
🎬 Animation Handling: Waits for animations and transitions to complete naturally
📊 Progress Tracking: Real-time visual progress bars during capture and PDF generation
🛡️ Error Recovery: Robust error handling with automatic retry mechanisms
👤 Chrome Profile Support: Use existing Chrome profiles to maintain login sessions
⚙️ Highly Configurable: Extensive configuration via CLI arguments or JSON files

📋 Prerequisites

Python 3.7 or higher
Google Chrome browser
ChromeDriver (automatically managed by Selenium 4.x)

🚀 Quick Start

Installation

Clone this repository:

git clone https://github.com/yourusername/Office365-SharePoint-Downloader.git
cd Office365-SharePoint-Downloader

Install required packages:
```
pip install -r requirements.txt
```

Basic Usage

python main.py <SHAREPOINT_URL>

Example:

python main.py "https://yourcompany.sharepoint.com/sites/team/presentation.pptx"

If you don't provide a URL, the script will prompt you to enter one.

📦 Dependencies

Required

selenium - Browser automation
Pillow - Image processing
reportlab - PDF generation

Optional (Recommended)

pip install imagehash  # Better image comparison
pip install tqdm       # Progress bars

🎯 Usage Examples

Basic Examples

# Capture presentation with default settings
python main.py "https://yourcompany.sharepoint.com/..."

# Custom output folder
python main.py "https://..." -o my_presentation

# Run in headless mode (no browser window)
python main.py "https://..." --headless

Advanced Examples

# High-resolution capture with custom window size
python main.py "https://..." --window-width 2560 --window-height 1440 --pdf-resolution 150

# Using configuration file for complex setups
python main.py "https://..." -c config.json

# Debug mode for troubleshooting
python main.py "https://..." --debug

# Without using Chrome profile (manual login required)
python main.py "https://..." --no-profile

Save Configuration

# Save current settings for reuse
python main.py "https://..." --headless --pdf-resolution 150 --save-config my_config.json

# Use saved configuration
python main.py "https://..." -c my_config.json

⚙️ Configuration

Command-Line Options

Option	Description	Default
`-o, --output`	Output folder name	`slides`
`-c, --config`	Path to JSON configuration file	None
`--save-config`	Save current configuration to JSON	None
`--headless`	Run browser without GUI	False
`--window-width`	Browser window width	1920
`--window-height`	Browser window height	1080
`--wait-timeout`	Element wait timeout (seconds)	10
`--animation-wait`	Animation wait time (seconds)	2.0
`--pdf-resolution`	PDF resolution (DPI)	100.0
`--debug`	Enable debug logging	False
`--no-profile`	Don't use Chrome profile	False
`--user-data-dir`	Custom Chrome profile path	None

Configuration File

Create a config.json file for persistent settings:

{
  "browser": {
    "window_width": 1920,
    "window_height": 1080,
    "headless": false,
    "use_existing_profile": true
  },
  "capture": {
    "animation_wait": 2.0,
    "consecutive_same_threshold": 10,
    "use_perceptual_hash": true,
    "remove_first_n_images": 1,
    "remove_last_n_images": 2
  },
  "pdf": {
    "resolution": 100.0
  },
  "output_folder": "slides",
  "debug": false
}

See config.example.json for all available options.

Environment Variables

OUTPUT_FOLDER: Default output folder
DEBUG: Enable debug mode (true/false)

🔐 Authentication

Using Chrome Profile (Recommended)

Benefits: No manual login required each time

Steps:

Close ALL Chrome windows (important!)
Enable profile usage:
- Set use_existing_profile: true in config, OR
- Don't use --no-profile flag
Run the script—it will use your saved Chrome login session

⚠️ Important: Chrome must be completely closed, or you'll get a "profile locked" error.

Manual Login (No Profile)

Use the --no-profile flag or set use_existing_profile: false in config. You'll need to login manually each time the script runs.

Custom Profile Directory

python main.py "URL" --user-data-dir "/path/to/chrome/profile"

📂 Output

After running, you'll find:

slides/                          # Output folder
├── slide_001.png               # Individual slides
├── slide_002.png
├── slide_003.png
├── ...
└── presentation.pdf            # Combined PDF

powerpoint_capture.log          # Detailed log file
page_source.html               # Debug info (if errors occur)

🏗️ Project Structure

Office365-SharePoint-Downloader/
├── src/
│   ├── browser.py              # Browser setup and management
│   ├── capture.py              # Slide capture logic
│   ├── config.py               # Configuration classes
│   ├── exceptions.py           # Custom exceptions
│   ├── image_cache.py          # LRU cache for images
│   ├── image_utils.py          # Image comparison utilities
│   ├── image_validator.py      # Image validation
│   ├── pdf_generator.py        # PDF generation
│   └── utils.py                # General utilities
├── main.py                     # Main entry point
├── config.example.json         # Example configuration
├── requirements.txt            # Dependencies
└── README.md                   # Documentation

🔧 Troubleshooting

Common Issues

"Profile will be locked" error

Solution: Close ALL Chrome windows and wait a few seconds before running the script.

"Could not find Present button" error

Possible causes:

URL is not a PowerPoint presentation
You don't have view permissions
SharePoint page hasn't loaded fully

Solutions:

Verify the URL is correct
Try opening the presentation manually first
Increase wait_timeout in config

Network timeout errors

Solutions:

Check your internet connection
Verify the URL is accessible
Increase wait_timeout in config
Check firewall/proxy settings

Too many consecutive failures

Solutions:

Run with --debug flag for detailed logs
Check powerpoint_capture.log for specific errors
Verify network stability
Ensure you have access to the presentation

Debug Mode

Enable detailed logging:

python main.py "URL" --debug

This will:

Show detailed error messages
Save page source HTML for inspection
Log all operations with timestamps
Display configuration being used

Log Files

powerpoint_capture.log: Detailed operation logs
page_source.html: Browser page source (saved on errors)

🧠 Technical Details

Image Comparison Methods

Perceptual Hash (Default): Uses imagehash library for accurate similarity detection
- Better at detecting visually similar images
- Resistant to minor variations
- Requires imagehash package
Pixel Comparison (Fallback): Direct pixel-by-pixel comparison
- Works without additional dependencies
- Less accurate for complex images

Smart Waiting System

Instead of fixed delays, the tool:

Monitors DOM changes to detect when animations complete
Checks page stability before capturing
Uses explicit waits with intelligent timeout handling
Adapts to network speed and animation complexity

Memory Optimization

LRU Cache: Recently used images cached in memory
Lazy Loading: Images loaded only when needed
Automatic Cleanup: Resources released after use
Efficient Processing: Minimal memory footprint even for large presentations

Error Handling

Custom Exception Types:

NetworkError: Connection issues
AuthenticationError: Login problems
BrowserError: WebDriver/Chrome issues
NavigationError: Page loading failures
CaptureError: Screenshot problems
PDFGenerationError: PDF creation errors

Features:

Automatic classification of error types
Retry logic with exponential backoff
Helpful error messages with suggestions
Debug information preservation

⚠️ Known Limitations

Complex animations may not be captured perfectly
Requires stable internet connection
Best results with screen resolution ≥ 1920x1080
Chrome must be closed when using existing profile
Some SharePoint configurations may require additional authentication

🤝 Contributing

Contributions are welcome! Please feel free to:

Report bugs
Suggest new features
Submit pull requests
Improve documentation

📄 License

This project is licensed under the MIT License—see the LICENSE file for details.

⚖️ Important Notice

This project is intended for educational and personal use only.

Respect Microsoft's Terms of Service
Comply with your organization's policies
Respect copyright of presentation content
Use only for presentations you have permission to access

🙏 Acknowledgments

Built with:

Selenium - Browser automation
Pillow - Image processing
ReportLab - PDF generation
imagehash - Perceptual hashing

Questions or Issues? Open an issue on GitHub or check the troubleshooting section above.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
config.example.json		config.example.json
main.py		main.py
requirements.txt		requirements.txt

thang1834/Office365-SharePoint-PowerPoint-Downloader

Folders and files

Latest commit

History

Repository files navigation

Office365 SharePoint PowerPoint Downloader

✨ Features

📋 Prerequisites

🚀 Quick Start

Installation

Basic Usage

📦 Dependencies

Required

Optional (Recommended)

🎯 Usage Examples

Basic Examples

Advanced Examples

Save Configuration

⚙️ Configuration

Command-Line Options

Configuration File

Environment Variables

🔐 Authentication

Using Chrome Profile (Recommended)

Manual Login (No Profile)

Custom Profile Directory

📂 Output

🏗️ Project Structure

🔧 Troubleshooting

Common Issues

"Profile will be locked" error

"Could not find Present button" error

Network timeout errors

Too many consecutive failures

Debug Mode

Log Files

🧠 Technical Details

Image Comparison Methods

Smart Waiting System

Memory Optimization

Error Handling

⚠️ Known Limitations

🤝 Contributing

📄 License

⚖️ Important Notice

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages