I recreated a “School of Rock” scene using AI, splitting the audio into four parts and generating each segment separately, then stitching them together. The project aimed for lip-sync and action matching the music, combining experimental AI-generated animation with creative improvisation.
Workflow & Recipe
Audio input was processed using LTX-2 with ComfyUI, then each segment was converted to video using i2v flow. Stills/images were generated using Z-image and FLUX 2. Four clips were stitched together, with focus on lip-sync and action matching the audio. Generated on an RTX 4090 GPU