It's not a small step to move from image generation to animation. In fact, it's quite contrare the complexity grows with length, even creating a short would be exponential in cost. GAN Networks cannot and will never be able to do this.
GAN architecture are no longer used expect for small project, they are unstable to train, they usually not cover the whole distribution and they are hard to condition. Nowadays people use diffusion or transformer models.