Skip to content

A Python tool to automatically capture PowerPoint presentations from SharePoint sites and convert them to PDF. This project does not require SharePoint editing permissions or logging into Microsoft Office 365.

Notifications You must be signed in to change notification settings

thang1834/Office365-SharePoint-PowerPoint-Downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Office365 SharePoint PowerPoint Downloader

A Python tool to automatically capture PowerPoint presentations from SharePoint sites and convert them to PDF formatβ€”no SharePoint editing permissions or Office 365 login required.

✨ Features

  • πŸ€– Automated Capture: Navigates to SharePoint PowerPoint presentations automatically
  • πŸ“Έ High-Quality Screenshots: Captures each slide with excellent quality
  • πŸ“„ PDF Generation: Combines all slides into a single, organized PDF file
  • πŸ” Smart Detection: Intelligent last slide detection using advanced image comparison
  • 🎬 Animation Handling: Waits for animations and transitions to complete naturally
  • πŸ“Š Progress Tracking: Real-time visual progress bars during capture and PDF generation
  • πŸ›‘οΈ Error Recovery: Robust error handling with automatic retry mechanisms
  • πŸ‘€ Chrome Profile Support: Use existing Chrome profiles to maintain login sessions
  • βš™οΈ Highly Configurable: Extensive configuration via CLI arguments or JSON files

πŸ“‹ Prerequisites

  • Python 3.7 or higher
  • Google Chrome browser
  • ChromeDriver (automatically managed by Selenium 4.x)

πŸš€ Quick Start

Installation

  1. Clone this repository:

    git clone https://github.com/yourusername/Office365-SharePoint-Downloader.git
    cd Office365-SharePoint-Downloader
  2. Install required packages:

    pip install -r requirements.txt

Basic Usage

python main.py <SHAREPOINT_URL>

Example:

python main.py "https://yourcompany.sharepoint.com/sites/team/presentation.pptx"

If you don't provide a URL, the script will prompt you to enter one.

πŸ“¦ Dependencies

Required

  • selenium - Browser automation
  • Pillow - Image processing
  • reportlab - PDF generation

Optional (Recommended)

pip install imagehash  # Better image comparison
pip install tqdm       # Progress bars

🎯 Usage Examples

Basic Examples

# Capture presentation with default settings
python main.py "https://yourcompany.sharepoint.com/..."

# Custom output folder
python main.py "https://..." -o my_presentation

# Run in headless mode (no browser window)
python main.py "https://..." --headless

Advanced Examples

# High-resolution capture with custom window size
python main.py "https://..." --window-width 2560 --window-height 1440 --pdf-resolution 150

# Using configuration file for complex setups
python main.py "https://..." -c config.json

# Debug mode for troubleshooting
python main.py "https://..." --debug

# Without using Chrome profile (manual login required)
python main.py "https://..." --no-profile

Save Configuration

# Save current settings for reuse
python main.py "https://..." --headless --pdf-resolution 150 --save-config my_config.json

# Use saved configuration
python main.py "https://..." -c my_config.json

βš™οΈ Configuration

Command-Line Options

Option Description Default
-o, --output Output folder name slides
-c, --config Path to JSON configuration file None
--save-config Save current configuration to JSON None
--headless Run browser without GUI False
--window-width Browser window width 1920
--window-height Browser window height 1080
--wait-timeout Element wait timeout (seconds) 10
--animation-wait Animation wait time (seconds) 2.0
--pdf-resolution PDF resolution (DPI) 100.0
--debug Enable debug logging False
--no-profile Don't use Chrome profile False
--user-data-dir Custom Chrome profile path None

Configuration File

Create a config.json file for persistent settings:

{
  "browser": {
    "window_width": 1920,
    "window_height": 1080,
    "headless": false,
    "use_existing_profile": true
  },
  "capture": {
    "animation_wait": 2.0,
    "consecutive_same_threshold": 10,
    "use_perceptual_hash": true,
    "remove_first_n_images": 1,
    "remove_last_n_images": 2
  },
  "pdf": {
    "resolution": 100.0
  },
  "output_folder": "slides",
  "debug": false
}

See config.example.json for all available options.

Environment Variables

  • OUTPUT_FOLDER: Default output folder
  • DEBUG: Enable debug mode (true/false)

πŸ” Authentication

Using Chrome Profile (Recommended)

Benefits: No manual login required each time

Steps:

  1. Close ALL Chrome windows (important!)
  2. Enable profile usage:
    • Set use_existing_profile: true in config, OR
    • Don't use --no-profile flag
  3. Run the scriptβ€”it will use your saved Chrome login session

⚠️ Important: Chrome must be completely closed, or you'll get a "profile locked" error.

Manual Login (No Profile)

Use the --no-profile flag or set use_existing_profile: false in config. You'll need to login manually each time the script runs.

Custom Profile Directory

python main.py "URL" --user-data-dir "/path/to/chrome/profile"

πŸ“‚ Output

After running, you'll find:

slides/                          # Output folder
β”œβ”€β”€ slide_001.png               # Individual slides
β”œβ”€β”€ slide_002.png
β”œβ”€β”€ slide_003.png
β”œβ”€β”€ ...
└── presentation.pdf            # Combined PDF

powerpoint_capture.log          # Detailed log file
page_source.html               # Debug info (if errors occur)

πŸ—οΈ Project Structure

Office365-SharePoint-Downloader/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ browser.py              # Browser setup and management
β”‚   β”œβ”€β”€ capture.py              # Slide capture logic
β”‚   β”œβ”€β”€ config.py               # Configuration classes
β”‚   β”œβ”€β”€ exceptions.py           # Custom exceptions
β”‚   β”œβ”€β”€ image_cache.py          # LRU cache for images
β”‚   β”œβ”€β”€ image_utils.py          # Image comparison utilities
β”‚   β”œβ”€β”€ image_validator.py      # Image validation
β”‚   β”œβ”€β”€ pdf_generator.py        # PDF generation
β”‚   └── utils.py                # General utilities
β”œβ”€β”€ main.py                     # Main entry point
β”œβ”€β”€ config.example.json         # Example configuration
β”œβ”€β”€ requirements.txt            # Dependencies
└── README.md                   # Documentation

πŸ”§ Troubleshooting

Common Issues

"Profile will be locked" error

Solution: Close ALL Chrome windows and wait a few seconds before running the script.

"Could not find Present button" error

Possible causes:

  • URL is not a PowerPoint presentation
  • You don't have view permissions
  • SharePoint page hasn't loaded fully

Solutions:

  • Verify the URL is correct
  • Try opening the presentation manually first
  • Increase wait_timeout in config

Network timeout errors

Solutions:

  • Check your internet connection
  • Verify the URL is accessible
  • Increase wait_timeout in config
  • Check firewall/proxy settings

Too many consecutive failures

Solutions:

  • Run with --debug flag for detailed logs
  • Check powerpoint_capture.log for specific errors
  • Verify network stability
  • Ensure you have access to the presentation

Debug Mode

Enable detailed logging:

python main.py "URL" --debug

This will:

  • Show detailed error messages
  • Save page source HTML for inspection
  • Log all operations with timestamps
  • Display configuration being used

Log Files

  • powerpoint_capture.log: Detailed operation logs
  • page_source.html: Browser page source (saved on errors)

🧠 Technical Details

Image Comparison Methods

  1. Perceptual Hash (Default): Uses imagehash library for accurate similarity detection

    • Better at detecting visually similar images
    • Resistant to minor variations
    • Requires imagehash package
  2. Pixel Comparison (Fallback): Direct pixel-by-pixel comparison

    • Works without additional dependencies
    • Less accurate for complex images

Smart Waiting System

Instead of fixed delays, the tool:

  • Monitors DOM changes to detect when animations complete
  • Checks page stability before capturing
  • Uses explicit waits with intelligent timeout handling
  • Adapts to network speed and animation complexity

Memory Optimization

  • LRU Cache: Recently used images cached in memory
  • Lazy Loading: Images loaded only when needed
  • Automatic Cleanup: Resources released after use
  • Efficient Processing: Minimal memory footprint even for large presentations

Error Handling

Custom Exception Types:

  • NetworkError: Connection issues
  • AuthenticationError: Login problems
  • BrowserError: WebDriver/Chrome issues
  • NavigationError: Page loading failures
  • CaptureError: Screenshot problems
  • PDFGenerationError: PDF creation errors

Features:

  • Automatic classification of error types
  • Retry logic with exponential backoff
  • Helpful error messages with suggestions
  • Debug information preservation

⚠️ Known Limitations

  • Complex animations may not be captured perfectly
  • Requires stable internet connection
  • Best results with screen resolution β‰₯ 1920x1080
  • Chrome must be closed when using existing profile
  • Some SharePoint configurations may require additional authentication

🀝 Contributing

Contributions are welcome! Please feel free to:

  • Report bugs
  • Suggest new features
  • Submit pull requests
  • Improve documentation

πŸ“„ License

This project is licensed under the MIT Licenseβ€”see the LICENSE file for details.

βš–οΈ Important Notice

This project is intended for educational and personal use only.

  • Respect Microsoft's Terms of Service
  • Comply with your organization's policies
  • Respect copyright of presentation content
  • Use only for presentations you have permission to access

πŸ™ Acknowledgments

Built with:


Questions or Issues? Open an issue on GitHub or check the troubleshooting section above.

About

A Python tool to automatically capture PowerPoint presentations from SharePoint sites and convert them to PDF. This project does not require SharePoint editing permissions or logging into Microsoft Office 365.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages