ByteMe is a web-based platform that allows users to develop and test machine learning pipelines in a streamlined environment. It provides a code editor with ML templates, real-time execution, and model deployment capabilities.
- Web-based code editor with Monaco (VS Code's editor)
- Predefined ML functions for common tasks:
data_fetch: Load data from CSV or NPY filesdata_preprocessing: Handle data preprocessing with scaling and train-test splittrain: Train ML models (Linear Regression, Random Forest, Neural Network)
- Real-time code execution status updates
- Distributed execution using Kubernetes
- Task queue management with RabbitMQ
- Result storage with Redis
- Python 3.9-3.12 (PyTorch compatibility)
- Docker
- Kubernetes cluster
- RabbitMQ
- Redis
This architecture represents the core components of the ByteMe platform:
-
Web Interface Layer
- Frontend Framework: React.js with TypeScript
- Code Editor: Monaco Editor (VS Code's editor)
- UI Components: Material-UI (MUI)
- State Management: Redux Toolkit
- Build Tool: Vite
- Testing: Jest and React Testing Library
- Provides the user interface for code editing and file uploads
- Handles real-time output display and model downloads
- Communicates with the backend via WebSocket
-
WebSocket Server
- Framework: Flask-SocketIO
- Protocol: WebSocket with Socket.IO
- Authentication: JWT (JSON Web Tokens)
- Rate Limiting: Flask-Limiter
- Acts as the communication bridge between frontend and backend
- Handles real-time bidirectional communication
- Manages code execution status updates
- Streams output and error messages
-
Python Backend
- Framework: Flask
- ML Libraries: PyTorch, scikit-learn, pandas
- Code Execution: Python's subprocess with resource limits
- File Handling: Python's tempfile and shutil
- Database: SQLite (for development), PostgreSQL (for production)
- ORM: SQLAlchemy
- Executes ML code in an isolated environment
- Processes file uploads and data handling
- Manages ML pipeline execution
- Handles model training and evaluation
-
Data Flow
- User interactions flow from browser to WebSocket server
- Code execution requests are processed by the Python backend
- Results and status updates flow back through WebSocket
- File uploads are handled separately for better performance
This architecture represents the production deployment of ByteMe on AWS:
-
Client Layer
- Users access the application through their web browsers
- Static assets are served through CloudFront CDN
- WebSocket connections are established through ALB
-
DNS and CDN Layer
- Route 53 manages DNS routing and health checks
- CloudFront provides global content delivery
- ACM handles SSL/TLS certificate management
-
Load Balancing Layer
- ALB distributes traffic across EC2 instances
- Handles WebSocket connections
- Provides SSL termination
- Manages health checks
-
Compute Layer
- EC2 instances run the application in Docker containers
- Auto Scaling Group manages instance count
- ECR stores and distributes Docker images
-
Monitoring Layer
- CloudWatch monitors application metrics
- Collects logs and performance data
- Triggers alerts based on defined thresholds
- Clone the repository:
git clone <repository-url>
cd <repository-name>- Install Python dependencies:
pip install -r requirements.txt- Build Docker images:
docker build -t code-execution-platform:latest .
docker build -t code-execution-worker:latest -f Dockerfile.worker .- Deploy to Kubernetes:
kubectl apply -f k8s/deployment.yamlCreate a .env file in the root directory with the following variables:
RABBITMQ_HOST=localhost
REDIS_HOST=localhost
REDIS_PORT=6379- Start the FastAPI server:
uvicorn app.main:app --host 0.0.0.0 --port 8000- In a separate terminal, start the worker process:
python app/worker.py-
Open your browser and navigate to
http://localhost:8000 -
Select a template or write custom code:
- Basic ML Pipeline: Predefined template for common ML tasks
- Custom Code: Write your own code using the available ML functions
-
Click "Run Code" to execute your code
Note: Both the FastAPI server and worker process must be running simultaneously for the code execution to work properly. The worker process is responsible for executing the code and storing results in Redis, while the FastAPI server handles the web interface and WebSocket communication.
data = data_fetch('path/to/your/data.csv')Loads data from CSV or NPY files.
X_train, X_test, y_train, y_test, scaler = data_preprocessing(
data=data,
target_column='target',
test_size=0.2
)Preprocesses data with scaling and train-test split.
model = train(
X_train=X_train,
y_train=y_train,
model_type='linear' # Options: 'linear', 'random_forest', 'neural_network'
)Trains ML models with different algorithms.
Your CSV file must contain a column named 'target' that will be used as the target variable for prediction. If your data uses a different column name, you'll need to modify the code template to use your column name.
To check the available columns in your CSV:
- Upload your CSV file
- Use the custom template
- Run this code:
data = data_fetch('your_file.csv')
print("Available columns:", data.columns.tolist())-
Network Security
- Use VPC with private subnets
- Implement security groups
- Enable SSL/TLS encryption
-
Application Security
- Input validation
- Code execution sandboxing
- Secure file handling
-
Data Security
- Temporary file storage
- Secure model transfer
- Regular cleanup
-
Common Issues
- WebSocket connection failures
- File upload errors
- Model serialization issues
-
Logs and Monitoring
- Check CloudWatch logs
- Monitor EC2 instance metrics
- Review application logs
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Monaco Editor for the code editor
- FastAPI for the web framework
- RabbitMQ for message queuing
- Redis for result storage
- Kubernetes for container orchestration

