Contributing to OmniTranscripts

Thank you for your interest in contributing to OmniTranscripts! This guide will help you get started with contributing to this YouTube video transcription API.

Quick Start for Contributors

1. Fork and Clone

# Fork the repository on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/OmniTranscripts.git
cd OmniTranscripts

# Add upstream remote
git remote add upstream https://github.com/wilmoore/OmniTranscripts.git

2. Set Up Development Environment

# Install dependencies (see docs/development.md for details)
make setup

# Copy environment configuration
cp .env.example .env

# Install Go dependencies
go mod tidy

# Run tests to ensure everything works
make test-short

3. Create a Feature Branch

# Sync with upstream
git fetch upstream
git checkout main
git merge upstream/main

# Create feature branch
git checkout -b feature/your-feature-name

Types of Contributions

🐛 Bug Reports

Help us identify and fix issues:

Search existing issues first to avoid duplicates
Use the bug report template when creating new issues
Include detailed information:
- Go version and OS
- Steps to reproduce
- Expected vs actual behavior
- Relevant log outputs
- Minimal reproduction case

Example Bug Report:

## Bug Description
Transcription fails for videos longer than 10 minutes with "context deadline exceeded" error.

## Steps to Reproduce
1. Submit video URL: https://www.youtube.com/watch?v=LONG_VIDEO_ID
2. Wait for processing
3. Check job status

## Expected Behavior
Video should be transcribed successfully

## Actual Behavior
Job fails with timeout error after 5 minutes

## Environment
- Go version: 1.23.0
- OS: Ubuntu 22.04
- OmniTranscripts version: main branch

✨ Feature Requests

Suggest new functionality:

Check the roadmap and existing feature requests
Describe the use case and problem it solves
Provide implementation suggestions if you have ideas
Consider backwards compatibility

Example Feature Request:

## Feature Description
Add support for batch processing multiple videos in a single API call

## Use Case
Users often need to transcribe multiple related videos (e.g., a video series) and would benefit from submitting them all at once.

## Proposed API
```json
{
  "urls": [
    "https://www.youtube.com/watch?v=video1",
    "https://www.youtube.com/watch?v=video2"
  ],
  "batch_options": {
    "priority": "normal",
    "webhook_url": "https://example.com/webhook"
  }
}

Implementation Ideas

Add new /transcribe/batch endpoint
Return batch job ID for tracking all videos
Support batch-level webhooks


### 🔧 Code Contributions

#### Areas Needing Help
- **Performance optimizations**: Improve transcription speed
- **Error handling**: Better error messages and recovery
- **Testing**: Increase test coverage
- **Documentation**: API docs, guides, examples
- **Integrations**: SDK libraries for different languages
- **Monitoring**: Metrics and observability features

#### Before You Start
1. **Discuss large changes** in an issue first
2. **Follow the coding standards** (see below)
3. **Write tests** for new functionality
4. **Update documentation** as needed

## Coding Standards

### Go Code Style

#### General Guidelines
```go
// Good: Clear, descriptive names
func processTranscriptionJob(job *models.Job) error {
    return nil
}

// Bad: Unclear abbreviations
func procTxJob(j *models.Job) error {
    return nil
}

Error Handling

// Good: Descriptive error with context
if err := validateURL(req.URL); err != nil {
    return fmt.Errorf("invalid YouTube URL %q: %w", req.URL, err)
}

// Good: Structured error response
return &errs.Error{
    Code:    errs.InvalidArgument,
    Message: "URL validation failed",
    Details: map[string]interface{}{
        "url": req.URL,
        "reason": "not a valid YouTube URL",
    },
}

Function Documentation

// ProcessTranscription orchestrates the complete video transcription pipeline.
// It downloads the video, extracts audio, and generates timestamped transcripts.
//
// The function handles both short videos (processed synchronously) and long videos
// (processed asynchronously via job queue).
//
// Parameters:
//   - url: YouTube video URL to transcribe
//   - jobID: Unique identifier for tracking processing progress
//
// Returns:
//   - transcript: Complete text transcription
//   - segments: Timestamped transcript segments
//   - error: Processing error or nil on success
func ProcessTranscription(url, jobID string) (string, []models.Segment, error) {
    // Implementation...
}

Testing Standards

func TestProcessTranscription(t *testing.T) {
    tests := []struct {
        name        string
        url         string
        jobID       string
        wantErr     bool
        wantSegments int
    }{
        {
            name:         "valid short video",
            url:          "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
            jobID:        "test_job_123",
            wantErr:      false,
            wantSegments: 10, // Approximate expected segments
        },
        {
            name:    "invalid URL",
            url:     "not-a-url",
            jobID:   "test_job_456",
            wantErr: true,
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            transcript, segments, err := ProcessTranscription(tt.url, tt.jobID)

            if (err != nil) != tt.wantErr {
                t.Errorf("ProcessTranscription() error = %v, wantErr %v", err, tt.wantErr)
                return
            }

            if !tt.wantErr {
                assert.NotEmpty(t, transcript)
                assert.Len(t, segments, tt.wantSegments)
            }
        })
    }
}

API Design Guidelines

Request/Response Models

// Good: Clear field names and validation tags
type TranscribeRequest struct {
    URL     string            `json:"url" validate:"required,url"`
    Options *TranscribeOptions `json:"options,omitempty"`
}

type TranscribeOptions struct {
    Model    string `json:"model,omitempty" validate:"oneof=tiny small base large"`
    Language string `json:"language,omitempty" validate:"len=2"`
}

// Good: Consistent response structure
type TranscribeResponse struct {
    JobID      string           `json:"job_id,omitempty"`
    Transcript string           `json:"transcript,omitempty"`
    Segments   []models.Segment `json:"segments,omitempty"`
    Status     string           `json:"status"`
    CreatedAt  time.Time        `json:"created_at"`
}

Error Responses

// Consistent error format
type ErrorResponse struct {
    Error   string                 `json:"error"`
    Code    string                 `json:"code,omitempty"`
    Details map[string]interface{} `json:"details,omitempty"`
}

Development Workflow

1. Making Changes

# Make your changes
vim lib/transcription.go

# Run tests frequently
make test-short

# Check code quality
make lint
make fmt

2. Testing Your Changes

# Run all tests
make test

# Run specific tests
go test -v ./lib/ -run TestProcessTranscription

# Test with coverage
make test-coverage

# Performance tests (if relevant)
make benchmark

3. Documentation Updates

Update relevant .md files in docs/
Add inline code comments for new functions
Update API documentation if endpoints change
Add examples for new features

4. Commit Guidelines

Commit Message Format

<type>(<scope>): <description>

<body>

<footer>

Types

feat: New feature
fix: Bug fix
docs: Documentation changes
style: Code style changes (formatting, etc.)
refactor: Code refactoring
test: Adding or updating tests
chore: Maintenance tasks

Examples

# Good commit messages
git commit -m "feat(api): add batch transcription endpoint

Support processing multiple videos in a single request.
Includes new /transcribe/batch endpoint with job tracking.

Closes #123"

git commit -m "fix(transcription): handle audio extraction timeout

Increase FFmpeg timeout for long videos and add retry logic.
Fixes issue where 20+ minute videos would fail consistently.

Fixes #456"

git commit -m "docs(api): update endpoint documentation

Add examples for new batch processing endpoint and
clarify error response formats."

5. Pull Request Process

Before Submitting

# Sync with upstream
git fetch upstream
git rebase upstream/main

# Final check
make check  # Runs fmt, lint, vet, and test-short

# Push to your fork
git push origin feature/your-feature-name

Pull Request Template

When creating a PR, include:

Description: Clear explanation of changes
Motivation: Why is this change needed?
Testing: How was this tested?
Breaking Changes: Any backwards compatibility issues?
Checklist: Confirm you've completed all requirements

Example PR Description:

## Description
Adds support for batch processing multiple YouTube videos in a single API request.

## Motivation
Users frequently need to transcribe multiple related videos and currently have to make separate API calls for each one. This feature allows them to submit multiple URLs at once and track progress via a single batch job ID.

## Changes
- Added `/transcribe/batch` endpoint
- New `BatchTranscribeRequest` and `BatchTranscribeResponse` models
- Batch job tracking and status aggregation
- Updated documentation and examples

## Testing
- Unit tests for batch processing logic
- Integration tests with real YouTube videos
- Load testing with 10+ videos per batch
- Manual testing via curl and Postman

## Breaking Changes
None - this is a new endpoint with no changes to existing APIs.

## Checklist
- [x] Tests pass locally
- [x] Code follows style guidelines
- [x] Documentation updated
- [x] No breaking changes
- [x] Performance impact assessed

Review Process

What to Expect

Automated checks: CI will run tests and linting
Code review: Maintainers will review your code
Feedback: You may need to make changes
Approval: Once approved, your PR will be merged

Responding to Feedback

Be responsive: Reply to comments promptly
Ask questions: If feedback isn't clear, ask for clarification
Make requested changes: Address all feedback before requesting re-review
Update tests: If functionality changes, update tests accordingly

Community Guidelines

Code of Conduct

We follow the Contributor Covenant to ensure a welcoming environment for all contributors.

Communication

Be respectful: Treat all community members with respect
Be constructive: Provide helpful feedback and suggestions
Be patient: Maintainers are volunteers with limited time
Stay on topic: Keep discussions focused on the project

Getting Help

Documentation: Check the docs first
GitHub Discussions: Ask questions and share ideas
Issues: Report bugs and request features
Discord (if available): Real-time chat with other contributors

Recognition

Contributors

All contributors are recognized in:

Repository contributors page
Release notes for significant contributions
Special mentions for major features or bug fixes

Becoming a Maintainer

Regular contributors who demonstrate:

Consistent high-quality contributions
Good understanding of the codebase
Helpful community participation
Commitment to project goals

May be invited to become maintainers with additional responsibilities.

Resources

Documentation

Development Guide - Development setup and workflows
Architecture - Technical architecture details
API Documentation - Complete API reference
Deployment Guide - Production deployment options
Troubleshooting - Common issues and solutions

Tools and Links

Thank you for contributing to OmniTranscripts! Your contributions help make video transcription more accessible and powerful for everyone.

FilesExpand file tree

contributing.md

Latest commit

History