Back to Projects
Completed November 2025
Text-to-Video Evaluation System
Systematic quality assessment for AI-generated video
Python GPT-4V Jupyter pandas
Background
As text-to-video generation models rapidly improve, systematic evaluation frameworks are needed to measure quality across multiple dimensions — visual fidelity, temporal coherence, prompt alignment, and aesthetic appeal.
Approach
Built an automated evaluation pipeline that leverages GPT-4V for multi-dimensional scoring, then validates against human annotations to measure alignment reliability.
Key Results
- Evaluated 500+ generated videos across 5 quality dimensions
- Achieved 0.85+ correlation between LLM scores and human judgments
- Identified systematic biases in model-specific failure modes