ByteMe - ML Pipeline Development Platform

ByteMe is a web-based platform that allows users to develop and test machine learning pipelines in a streamlined environment. It provides a code editor with ML templates, real-time execution, and model deployment capabilities.

Features

Web-based code editor with Monaco (VS Code's editor)
Predefined ML functions for common tasks:
- data_fetch: Load data from CSV or NPY files
- data_preprocessing: Handle data preprocessing with scaling and train-test split
- train: Train ML models (Linear Regression, Random Forest, Neural Network)
Real-time code execution status updates
Distributed execution using Kubernetes
Task queue management with RabbitMQ
Result storage with Redis

Prerequisites

Python 3.9-3.12 (PyTorch compatibility)
Docker
Kubernetes cluster
RabbitMQ
Redis

Architecture and Data Flow

High-Level Architecture

This architecture represents the core components of the ByteMe platform:

Web Interface Layer
- Frontend Framework: React.js with TypeScript
- Code Editor: Monaco Editor (VS Code's editor)
- UI Components: Material-UI (MUI)
- State Management: Redux Toolkit
- Build Tool: Vite
- Testing: Jest and React Testing Library
- Provides the user interface for code editing and file uploads
- Handles real-time output display and model downloads
- Communicates with the backend via WebSocket
WebSocket Server
- Framework: Flask-SocketIO
- Protocol: WebSocket with Socket.IO
- Authentication: JWT (JSON Web Tokens)
- Rate Limiting: Flask-Limiter
- Acts as the communication bridge between frontend and backend
- Handles real-time bidirectional communication
- Manages code execution status updates
- Streams output and error messages
Python Backend
- Framework: Flask
- ML Libraries: PyTorch, scikit-learn, pandas
- Code Execution: Python's subprocess with resource limits
- File Handling: Python's tempfile and shutil
- Database: SQLite (for development), PostgreSQL (for production)
- ORM: SQLAlchemy
- Executes ML code in an isolated environment
- Processes file uploads and data handling
- Manages ML pipeline execution
- Handles model training and evaluation
Data Flow
- User interactions flow from browser to WebSocket server
- Code execution requests are processed by the Python backend
- Results and status updates flow back through WebSocket
- File uploads are handled separately for better performance

AWS Architecture

This architecture represents the production deployment of ByteMe on AWS:

Client Layer
- Users access the application through their web browsers
- Static assets are served through CloudFront CDN
- WebSocket connections are established through ALB
DNS and CDN Layer
- Route 53 manages DNS routing and health checks
- CloudFront provides global content delivery
- ACM handles SSL/TLS certificate management
Load Balancing Layer
- ALB distributes traffic across EC2 instances
- Handles WebSocket connections
- Provides SSL termination
- Manages health checks
Compute Layer
- EC2 instances run the application in Docker containers
- Auto Scaling Group manages instance count
- ECR stores and distributes Docker images
Monitoring Layer
- CloudWatch monitors application metrics
- Collects logs and performance data
- Triggers alerts based on defined thresholds

Installation and Setup

Local Development Setup

Clone the repository:

git clone <repository-url>
cd <repository-name>

Install Python dependencies:

pip install -r requirements.txt

Build Docker images:

docker build -t code-execution-platform:latest .
docker build -t code-execution-worker:latest -f Dockerfile.worker .

Deploy to Kubernetes:

kubectl apply -f k8s/deployment.yaml

Configuration

Create a .env file in the root directory with the following variables:

RABBITMQ_HOST=localhost
REDIS_HOST=localhost
REDIS_PORT=6379

Usage

Start the FastAPI server:

uvicorn app.main:app --host 0.0.0.0 --port 8000

In a separate terminal, start the worker process:

python app/worker.py

Open your browser and navigate to http://localhost:8000
Select a template or write custom code:
- Basic ML Pipeline: Predefined template for common ML tasks
- Custom Code: Write your own code using the available ML functions
Click "Run Code" to execute your code

Note: Both the FastAPI server and worker process must be running simultaneously for the code execution to work properly. The worker process is responsible for executing the code and storing results in Redis, while the FastAPI server handles the web interface and WebSocket communication.

Available ML Functions

data_fetch

data = data_fetch('path/to/your/data.csv')

Loads data from CSV or NPY files.

data_preprocessing

X_train, X_test, y_train, y_test, scaler = data_preprocessing(
    data=data,
    target_column='target',
    test_size=0.2
)

Preprocesses data with scaling and train-test split.

train

model = train(
    X_train=X_train,
    y_train=y_train,
    model_type='linear'  # Options: 'linear', 'random_forest', 'neural_network'
)

Trains ML models with different algorithms.

Data Requirements

Your CSV file must contain a column named 'target' that will be used as the target variable for prediction. If your data uses a different column name, you'll need to modify the code template to use your column name.

To check the available columns in your CSV:

Upload your CSV file
Use the custom template
Run this code:

data = data_fetch('your_file.csv')
print("Available columns:", data.columns.tolist())

Security Considerations

Network Security
- Use VPC with private subnets
- Implement security groups
- Enable SSL/TLS encryption
Application Security
- Input validation
- Code execution sandboxing
- Secure file handling
Data Security
- Temporary file storage
- Secure model transfer
- Regular cleanup

Troubleshooting

Common Issues
- WebSocket connection failures
- File upload errors
- Model serialization issues
Logs and Monitoring
- Check CloudWatch logs
- Monitor EC2 instance metrics
- Review application logs

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Monaco Editor for the code editor
FastAPI for the web framework
RabbitMQ for message queuing
Redis for result storage
Kubernetes for container orchestration

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
app		app
docs		docs
k8s		k8s
static		static
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.worker		Dockerfile.worker
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ByteMe - ML Pipeline Development Platform

Features

Prerequisites

Architecture and Data Flow

High-Level Architecture

AWS Architecture

Installation and Setup

Local Development Setup

Configuration

Usage

Available ML Functions

data_fetch

data_preprocessing

train

Data Requirements

Security Considerations

Troubleshooting

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ByteMe - ML Pipeline Development Platform

Features

Prerequisites

Architecture and Data Flow

High-Level Architecture

AWS Architecture

Installation and Setup

Local Development Setup

Configuration

Usage

Available ML Functions

data_fetch

data_preprocessing

train

Data Requirements

Security Considerations

Troubleshooting

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages