π Award-Winning Project: Bachelor thesis grade 10/10 | 1st Place at 2024 Scientific Student Conference | Accenture Special Award | Presented at 2025 National Scientific Student Conference
AI the Artist (StyleApp) is a high-performance Neural Style Transfer (NST) backend that powers a cross-platform creative image stylization application. Transform everyday photos into stunning artwork by applying the style of famous paintings or custom artistic styles.
- Features
- Architecture
- Installation
- Usage
- Advanced Features
- Technical Details
- Project Structure
- API Reference
- Performance
- Contributing
- Citation
- License
- π¨ Classic Neural Style Transfer: Transform images using Gatys et al.'s optimization-based approach
- π€ Segmentation-Based Stylization: Apply different styles to foreground (person) and background separately
- π Mixed Style Transfer: Blend two artistic styles into a single output with adjustable weights
- π RESTful API: Production-ready FastAPI backend with CORS support
- β‘ GPU Acceleration: CUDA support for fast processing
- π§ Flexible Configuration: Multiple initialization methods, customizable loss weights, and iteration counts
- π Metrics & Monitoring: Built-in quality metrics (SSIM, FID, style loss) and Weights & Biases integration
- π― Pre-trained Models: VGG16 and VGG19 architectures for feature extraction
The system implements Neural Style Transfer using the following approach:
- Feature Extraction: Pre-trained VGG networks extract content and style features
- Loss Computation:
- Content Loss: MSE between content feature maps
- Style Loss: MSE between Gram matrices of style features
- Total Variation Loss: Regularization for spatial smoothness
- Optimization: Adam optimizer iteratively updates pixel values to minimize combined loss
- Segmentation (optional): DeepLabV3 for person detection and separate stylization
- Standard NST: Single content image + single style image
- Segmented NST: Different styles for person vs. background (using semantic segmentation)
- Mixed NST: Blend two different artistic styles with adjustable alpha parameter
- Python 3.8+
- CUDA-capable GPU (recommended)
- 8GB+ RAM
# Clone the repository
git clone https://github.com/yourusername/neural-style-transfer.git
cd neural-style-transfer/py-nst
# Install dependencies
pip install torch torchvision
pip install fastapi uvicorn
pip install opencv-python numpy
pip install piqa # for metrics
pip install wandb # optional, for experiment tracking
# Create data directories
mkdir -p data/content-images data/style-images data/output-imagesStart the FastAPI server:
# Using uvicorn directly
uvicorn main:app --host 0.0.0.0 --port 8000
# Or using the provided script
bash start_api.shThe API will be available at http://localhost:8000. View interactive API docs at http://localhost:8000/docs.
# Upload content image
curl -X POST "http://localhost:8000/content/upload/" \
-F "file=@your_photo.jpg"
# Upload style image
curl -X POST "http://localhost:8000/style/upload/" \
-F "file=@vangogh_starry_night.jpg"Standard Style Transfer:
curl -X POST "http://localhost:8000/generate" \
-H "Content-Type: application/json" \
-d '{
"doc_id": "unique_id",
"content_img": "content_filename.jpg",
"style_img": "style_filename.jpg",
"init_method": "content",
"style_weight": 30000,
"tv_weight": 1.0,
"iterations": 1000
}'Segmented Style Transfer:
curl -X POST "http://localhost:8000/generate_seg" \
-H "Content-Type: application/json" \
-d '{
"doc_id": "unique_id",
"content_img": "portrait.jpg",
"style_person_img": "picasso.jpg",
"style_background_img": "monet.jpg",
"style_person_weight": 25000,
"style_background_weight": 30000,
"iterations": 1000
}'Mixed Style Transfer:
curl -X POST "http://localhost:8000/generate_mixed" \
-H "Content-Type: application/json" \
-d '{
"doc_id": "unique_id",
"content_img": "content.jpg",
"style_img_1": "style1.jpg",
"style_img_2": "style2.jpg",
"style_weight": 30000,
"alpha": 0.5,
"iterations": 1000
}'curl "http://localhost:8000/image/generated/{image_name}" -o output.jpgFor standalone processing without the API:
from nst import neural_style_transfer
config = {
'content_img_name': 'photo.jpg',
'style_img_name': 'style.jpg',
'init_method': 'content', # 'random', 'content', or 'style'
'content_weight': 1e5,
'style_weight': 3e4,
'tv_weight': 1e0,
'iterations': 1000,
'model': 'vgg19', # or 'vgg16'
'content_images_dir': 'data/content-images',
'style_images_dir': 'data/style-images',
'output_img_dir': 'data/output-images',
'img_format': (4, '.jpg'),
'height': 400,
'saving_freq': -1 # -1 saves only final result
}
neural_style_transfer(config)content: Start optimization from content image (recommended)style: Start from resized style imagerandom: Start from Gaussian noise
content_weight: Controls content preservation (default: 1e5)style_weight: Controls style strength (default: 3e4)tv_weight: Total variation regularization (default: 1.0)iterations: Optimization steps (500-3000, depending on quality needs)height: Output image height in pixels (width auto-scaled)
Evaluate generated images using metrics.py:
# Computes SSIM (structural similarity with content)
# and FID (FrΓ©chet Inception Distance for style quality)
python metrics.pyTrack experiments and compare results:
# In wandb_nst.py - logs losses and generated images to W&B dashboard
wandb.init(project="neural-style-transfer")
# Run NST with logging enabled- VGG16: 4 layers (
relu1_2,relu2_2,relu3_3,relu4_3) - VGG19: 6 layers (
relu1_1,relu2_1,relu3_1,relu4_1,conv4_2,relu5_1)
Content is typically extracted from relu2_2 (VGG16) or conv4_2 (VGG19), while style is extracted from multiple layers.
Where:
-
$\mathcal{L}_{content}$ is the MSE between content feature maps -
$\mathcal{L}_{style}$ is the MSE between Gram matrices -
$\mathcal{L}_{tv}$ penalizes spatial variations
Style representation uses Gram matrices to capture texture/color correlations:
Person segmentation uses DeepLabV3 (ResNet-101 backbone) with post-processing:
- Morphological opening to remove noise
- Connected component analysis to isolate largest person region
py-nst/
βββ main.py # FastAPI server & API endpoints
βββ nst.py # Core NST implementation (3 modes)
βββ neural_style_transfer.py # Original NST implementation
βββ segmentation.py # Person segmentation with DeepLabV3
βββ metrics.py # Quality metrics (SSIM, FID)
βββ wandb_nst.py # W&B experiment tracking
βββ models/
β βββ definitions/
β βββ vgg_nets.py # VGG16/VGG19 implementations
β βββ __init__.py
βββ utils/
β βββ utils.py # Image processing & model prep
β βββ video_utils.py # Video generation from frames
β βββ db_utils.py # Database utilities
β βββ __init__.py
βββ data/
βββ content-images/ # Input photos
βββ style-images/ # Artistic style references
βββ output-images/ # Generated results
Upload a content image.
- Input: Multipart form data with image file
- Output:
{"image_name": "uuid.jpg"}
Upload a style image.
- Input: Multipart form data with image file
- Output:
{"image_name": "uuid.jpg"}
Standard Neural Style Transfer.
- Parameters:
doc_id: Unique document identifiercontent_img: Content image filenamestyle_img: Style image filenameinit_method:"content","style", or"random"style_weight: Style loss weight (10000-50000)tv_weight: Total variation weight (0.1-10)iterations: Number of optimization steps (500-3000)
Segmented style transfer (different styles for person vs. background).
- Additional Parameters:
style_person_img: Style for person region (optional)style_background_img: Style for background (optional)style_person_weight: Style weight for personstyle_background_weight: Style weight for background
Mixed style transfer (blend two styles).
- Additional Parameters:
style_img_1: First style imagestyle_img_2: Second style imagealpha: Blending factor (0.0-1.0, controls style_img_2 influence)
Download generated image.
- Processing Time: 30-60 seconds per image (GPU) / 5-15 minutes (CPU)
- Image Size: 400px height (default), auto-scaled width
- Memory: ~2-4GB GPU memory for standard images
- Iterations: 1000 iterations provide good quality; 2000+ for high quality
Optimization Tips:
- Use GPU acceleration for 10-20x speedup
- Lower
heightparameter for faster processing - Reduce
iterationsfor quick previews - Use
init_method='content'for faster convergence
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
If you use this project in your research or application, please cite:
@thesis{styleapp2024,
title={AI the Artist: Creative Image Stylization with Neural Style Transfer},
author={Babos DΓ‘vid},
year={2024},
school={Sapientia Hungarian University of Transylvania},
note={1st Place, Scientific Student Conference 2024; Accenture Special Award}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Original Neural Style Transfer paper: Gatys et al., 2015
- VGG networks: Simonyan & Zisserman, 2014
- DeepLabV3: Chen et al., 2017
- PyTorch team for excellent deep learning framework
For questions, suggestions, or collaboration opportunities, please open an issue or contact [babosdavid8@gmail.com].