AI/ML Project Computer Vision

Minecraft LoRA

Modifying a diffusion model to generate Minecraft-like images using custom LoRA training

Project Overview

I trained a custom LoRA (Low-Rank Adaptation) model using 156 carefully curated screenshots from Minecraft to modify a stable diffusion model's output. This project demonstrates the power of style transfer in AI image generation and the importance of dataset curation.

Technologies Used

Python PyTorch Stable Diffusion LoRA Computer Vision

Training Dataset Examples

Screenshots from Minecraft with carefully crafted labels for optimal training results

Training Label:

"a cow in a forest, grass, oak trees"

Training Label:

"a pig in a grass field, pumpkin and mountains in the background, blue sky"

Training Label:

"underwater coral reef, yellow and purple corals, dolphin swimming"

Training Label:

"a house in a spruce village, path, torches, grass, blue sky with clouds, lake in the background"

Style Transfer Demonstration

Watch how a realistic cow transforms into Minecraft style as the LoRA weight increases from 0 to 1

Key Learnings & Observations

Insights gained from multiple training iterations and experimentation

Dataset Diversity

Having a large variety of images is crucial for the model to learn different aspects of Minecraft. Multiple examples of the same subject prevent overfitting to specific scenes.

Precise Labeling

Well-crafted labels are essential since Minecraft aesthetics differ significantly from reality. Avoiding "Minecraft" in labels allows style activation through LoRA weights alone.

Base Model Quality

A robust foundation model with diverse generation capabilities is essential for producing varied and high-quality Minecraft-style outputs.

Training Duration

Finding the optimal training time is critical. Both under-training and over-training can lead to poor results. Multiple checkpoints enable comparison and selection.

Results & Future Improvements

Current Results

This LoRA (version 5) was trained on 512x512 images for 1 hour on my GPU using 156 carefully selected Minecraft screenshots. While it successfully generates Minecraft-style images, some limitations exist due to the relatively small dataset size.

Identified Limitations

Pattern repetition with extensive generation
Occasional image artifacts
Limited style variations due to small dataset (156 vs millions in large models)

Future Improvements

Expand the training dataset with more diverse Minecraft scenes
Experiment with different training parameters and epochs
Test shorter training durations for potentially better results
Implement advanced data augmentation techniques