A Python tool to automatically capture PowerPoint presentations from SharePoint sites and convert them to PDF formatβno SharePoint editing permissions or Office 365 login required.
- π€ Automated Capture: Navigates to SharePoint PowerPoint presentations automatically
- πΈ High-Quality Screenshots: Captures each slide with excellent quality
- π PDF Generation: Combines all slides into a single, organized PDF file
- π Smart Detection: Intelligent last slide detection using advanced image comparison
- π¬ Animation Handling: Waits for animations and transitions to complete naturally
- π Progress Tracking: Real-time visual progress bars during capture and PDF generation
- π‘οΈ Error Recovery: Robust error handling with automatic retry mechanisms
- π€ Chrome Profile Support: Use existing Chrome profiles to maintain login sessions
- βοΈ Highly Configurable: Extensive configuration via CLI arguments or JSON files
- Python 3.7 or higher
- Google Chrome browser
- ChromeDriver (automatically managed by Selenium 4.x)
-
Clone this repository:
git clone https://github.com/yourusername/Office365-SharePoint-Downloader.git cd Office365-SharePoint-Downloader -
Install required packages:
pip install -r requirements.txt
python main.py <SHAREPOINT_URL>Example:
python main.py "https://yourcompany.sharepoint.com/sites/team/presentation.pptx"If you don't provide a URL, the script will prompt you to enter one.
selenium- Browser automationPillow- Image processingreportlab- PDF generation
pip install imagehash # Better image comparison
pip install tqdm # Progress bars# Capture presentation with default settings
python main.py "https://yourcompany.sharepoint.com/..."
# Custom output folder
python main.py "https://..." -o my_presentation
# Run in headless mode (no browser window)
python main.py "https://..." --headless# High-resolution capture with custom window size
python main.py "https://..." --window-width 2560 --window-height 1440 --pdf-resolution 150
# Using configuration file for complex setups
python main.py "https://..." -c config.json
# Debug mode for troubleshooting
python main.py "https://..." --debug
# Without using Chrome profile (manual login required)
python main.py "https://..." --no-profile# Save current settings for reuse
python main.py "https://..." --headless --pdf-resolution 150 --save-config my_config.json
# Use saved configuration
python main.py "https://..." -c my_config.json| Option | Description | Default |
|---|---|---|
-o, --output |
Output folder name | slides |
-c, --config |
Path to JSON configuration file | None |
--save-config |
Save current configuration to JSON | None |
--headless |
Run browser without GUI | False |
--window-width |
Browser window width | 1920 |
--window-height |
Browser window height | 1080 |
--wait-timeout |
Element wait timeout (seconds) | 10 |
--animation-wait |
Animation wait time (seconds) | 2.0 |
--pdf-resolution |
PDF resolution (DPI) | 100.0 |
--debug |
Enable debug logging | False |
--no-profile |
Don't use Chrome profile | False |
--user-data-dir |
Custom Chrome profile path | None |
Create a config.json file for persistent settings:
{
"browser": {
"window_width": 1920,
"window_height": 1080,
"headless": false,
"use_existing_profile": true
},
"capture": {
"animation_wait": 2.0,
"consecutive_same_threshold": 10,
"use_perceptual_hash": true,
"remove_first_n_images": 1,
"remove_last_n_images": 2
},
"pdf": {
"resolution": 100.0
},
"output_folder": "slides",
"debug": false
}See config.example.json for all available options.
OUTPUT_FOLDER: Default output folderDEBUG: Enable debug mode (true/false)
Benefits: No manual login required each time
Steps:
- Close ALL Chrome windows (important!)
- Enable profile usage:
- Set
use_existing_profile: truein config, OR - Don't use
--no-profileflag
- Set
- Run the scriptβit will use your saved Chrome login session
Use the --no-profile flag or set use_existing_profile: false in config. You'll need to login manually each time the script runs.
python main.py "URL" --user-data-dir "/path/to/chrome/profile"After running, you'll find:
slides/ # Output folder
βββ slide_001.png # Individual slides
βββ slide_002.png
βββ slide_003.png
βββ ...
βββ presentation.pdf # Combined PDF
powerpoint_capture.log # Detailed log file
page_source.html # Debug info (if errors occur)
Office365-SharePoint-Downloader/
βββ src/
β βββ browser.py # Browser setup and management
β βββ capture.py # Slide capture logic
β βββ config.py # Configuration classes
β βββ exceptions.py # Custom exceptions
β βββ image_cache.py # LRU cache for images
β βββ image_utils.py # Image comparison utilities
β βββ image_validator.py # Image validation
β βββ pdf_generator.py # PDF generation
β βββ utils.py # General utilities
βββ main.py # Main entry point
βββ config.example.json # Example configuration
βββ requirements.txt # Dependencies
βββ README.md # Documentation
Solution: Close ALL Chrome windows and wait a few seconds before running the script.
Possible causes:
- URL is not a PowerPoint presentation
- You don't have view permissions
- SharePoint page hasn't loaded fully
Solutions:
- Verify the URL is correct
- Try opening the presentation manually first
- Increase
wait_timeoutin config
Solutions:
- Check your internet connection
- Verify the URL is accessible
- Increase
wait_timeoutin config - Check firewall/proxy settings
Solutions:
- Run with
--debugflag for detailed logs - Check
powerpoint_capture.logfor specific errors - Verify network stability
- Ensure you have access to the presentation
Enable detailed logging:
python main.py "URL" --debugThis will:
- Show detailed error messages
- Save page source HTML for inspection
- Log all operations with timestamps
- Display configuration being used
- powerpoint_capture.log: Detailed operation logs
- page_source.html: Browser page source (saved on errors)
-
Perceptual Hash (Default): Uses
imagehashlibrary for accurate similarity detection- Better at detecting visually similar images
- Resistant to minor variations
- Requires
imagehashpackage
-
Pixel Comparison (Fallback): Direct pixel-by-pixel comparison
- Works without additional dependencies
- Less accurate for complex images
Instead of fixed delays, the tool:
- Monitors DOM changes to detect when animations complete
- Checks page stability before capturing
- Uses explicit waits with intelligent timeout handling
- Adapts to network speed and animation complexity
- LRU Cache: Recently used images cached in memory
- Lazy Loading: Images loaded only when needed
- Automatic Cleanup: Resources released after use
- Efficient Processing: Minimal memory footprint even for large presentations
Custom Exception Types:
NetworkError: Connection issuesAuthenticationError: Login problemsBrowserError: WebDriver/Chrome issuesNavigationError: Page loading failuresCaptureError: Screenshot problemsPDFGenerationError: PDF creation errors
Features:
- Automatic classification of error types
- Retry logic with exponential backoff
- Helpful error messages with suggestions
- Debug information preservation
- Complex animations may not be captured perfectly
- Requires stable internet connection
- Best results with screen resolution β₯ 1920x1080
- Chrome must be closed when using existing profile
- Some SharePoint configurations may require additional authentication
Contributions are welcome! Please feel free to:
- Report bugs
- Suggest new features
- Submit pull requests
- Improve documentation
This project is licensed under the MIT Licenseβsee the LICENSE file for details.
This project is intended for educational and personal use only.
- Respect Microsoft's Terms of Service
- Comply with your organization's policies
- Respect copyright of presentation content
- Use only for presentations you have permission to access
Built with:
- Selenium - Browser automation
- Pillow - Image processing
- ReportLab - PDF generation
- imagehash - Perceptual hashing
Questions or Issues? Open an issue on GitHub or check the troubleshooting section above.