Video ModelApril 29, 20267 min read

Happy Horse 1.0 — The New #1 AI Video Generator

A single self-attention Transformer from Alibaba that unifies text, video, and audio generation — topping every major leaderboard. Now live on NeonLights AI.

Happy Horse 1.0 (also known as HappyHorse) has arrived at the top of the AI video generation leaderboard virtually overnight — and it is now available on NeonLights AI. With an ELO of 1,336 on text-to-video and 1,393 on image-to-video according to Artificial Analysis, it sits comfortably above Seedance 2.0, SkyReels V4, Kling 3.0, PixVerse V6, and every other model currently available.

Built by Alibaba, the model replaces the multi-stream complexity that defines most video generators with a single 40-layer self-attention Transformer — processing text, video, and audio tokens in one unified sequence. No cross-attention, no separate conditioning branches, no modality-specific pipelines. The result is state-of-the-art quality at record inference speeds.

Happy Horse 1.0 is now live on NeonLights AI. Generate cinematic video from a text prompt, or animate any image as the first frame. Supports 720p and 1080p, five aspect ratios, and up to 15-second durations.

Key Features

Single-Stream Architecture

A 40-layer Transformer processes text, video, and audio via self-attention only. No cross-attention modules, no separate streams — one unified model for all modalities.

Human-Centric Quality

Expressive facial performance, natural speech coordination, realistic body motion, and accurate lip-sync — designed specifically for human subjects in video.

Blazing Fast Inference

5-second 256p video in 2 seconds. 5-second 1080p in 38 seconds on a single H100 GPU — significantly faster than comparable models.

#1 on Every Leaderboard

80% win rate vs Ovi 1.1 and 60.9% vs LTX 2.3 across 2,000 human evaluations. Ranked #1 on both text-to-video and image-to-video benchmarks.

Multilingual Audio

Native support for Mandarin, Cantonese, English, Japanese, Korean, German, and French — natively supported without translation layers.

Fully Open Source

Base model, distilled model, super-resolution model, and inference code — all released publicly for the community.

What Makes Happy Horse 1.0 Different

Most AI video models use separate streams or cross-attention mechanisms to combine text understanding with video generation. Happy Horse 1.0 discards this entirely.

Its single-stream architecture feeds text tokens, a reference image latent, and noisy video/audio tokens into one unified token sequence. All 40 Transformer layers process this sequence using standard self-attention — no specialized conditioning branches, no modality-specific routing.

The architecture uses a sandwich design: the first and last 4 layers handle modality-specific projections, while the middle 32 layers share parameters across all modalities. This means the model learns universal representations that transfer between text, video, and audio naturally.

Two additional innovations stand out:

Timestep-free denoising — Unlike standard diffusion models that require explicit timestep embeddings, Happy Horse infers the denoising state directly from input latents, simplifying the architecture further.

DMD-2 distillation — Enables generation in only 8 denoising steps with no classifier-free guidance, cutting inference time dramatically without sacrificing output quality.

Leaderboard Performance

According to Artificial Analysis, Happy Horse 1.0 leads both major video generation benchmarks as of April 2026:

Text-to-Video — ELO 1,336, ahead of Seedance 2.0 (1,273), SkyReels V4 (1,246), Kling 3.0 Pro (1,241), and PixVerse V6 (1,237).

Image-to-Video — ELO 1,393, ahead of Seedance 2.0 (1,356), PixVerse V6 (1,336), and Kling 3.0 Omni (1,298).

In human evaluations across 2,000 comparisons, Happy Horse 1.0 achieved an 80% win rate against Ovi 1.1 and 60.9% against LTX 2.3. On technical benchmarks, it leads on visual quality (4.80), text alignment (4.18), and word error rate (14.60%) — the lowest WER of any model tested.

These are early numbers and the model is still unavailable for public testing, but the margin over established models is significant.

Inference Speed

One of Happy Horse 1.0's most impressive claims is its generation speed on a single H100 GPU:

256p — 5-second video in 2.0 seconds (faster than real-time)

540p — 5-second video in 8.0 seconds with super-resolution

1080p — 5-second video in 38.4 seconds at full quality

This speed comes from the DMD-2 distillation (only 8 denoising steps needed) combined with MagiCompiler — a full-graph compilation system that fuses operators across Transformer layers for approximately 1.2× end-to-end speedup.

For context, many competing models take minutes to generate comparable quality at 1080p. If these benchmarks hold in real-world API usage, Happy Horse 1.0 would be among the fastest high-quality video generators available.

How to Use Happy Horse 1.0 on NeonLights AI

1. Sign in to your NeonLights AI account (or create one for free).
2. Navigate to the Video Generator page.
3. Select Happy Horse 1.0 from the model panel.
4. Write your prompt — describe the scene, motion, style, and mood.
5. Optionally upload a start frame — any image (JPG, PNG, BMP, or WEBP, each side ≥300px, ≤10MB) to animate as the first frame. When an image is provided, the model runs in image-to-video mode and adopts the image's aspect ratio.
6. Choose your aspect ratio (9:16, 16:9, 1:1, 4:3, 3:4), resolution (720p or 1080p), and duration (5s, 10s, or 15s).
7. Click Generate — your video will be ready in under a minute.

Text-to-Video vs Image-to-Video

Happy Horse 1.0 supports two generation modes:

Text-to-Video — Provide a prompt and choose your aspect ratio. The model generates a video from scratch based on your description. Supports all five aspect ratios: 16:9, 9:16, 1:1, 4:3, and 3:4.

Image-to-Video — Upload a first-frame image. The model animates it into a video. The output inherits the image's aspect ratio (the aspect ratio selector is ignored in this mode). If you also provide a prompt, it steers the motion of the animated image.

Both modes support 720p and 1080p resolution, and durations of 5, 10, or 15 seconds.

Technical Specifications

DeveloperAlibaba
ModelHappy Horse 1.0
Replicate IDalibaba/happyhorse-1.0
ReleaseApril 2026
ArchitectureSingle-stream 40-layer self-attention Transformer
ModalitiesText + Video + Audio (unified)
DistillationDMD-2 — 8 denoising steps, no CFG
LanguagesMandarin · Cantonese · English · Japanese · Korean · German · French
Open SourceYes — base, distilled, super-res models + code
ELO (Text-to-Video)1,336 (#1)
ELO (Image-to-Video)1,393 (#1)
Resolutions720p, 1080p
Aspect Ratios16:9 · 9:16 · 1:1 · 4:3 · 3:4
Max Duration15 seconds
Image-to-VideoYes — animate any first-frame image
NeonLights AI StatusAvailable Now

Pricing

200 Credits

Starting at 200 credits for a 5-second 720p video. 1080p starts at 400 credits. Scale up to 15 seconds at 600 credits (720p) or 1,200 credits (1080p).

Get Credits

Frequently Asked Questions

What is Happy Horse 1.0?

Happy Horse 1.0 (HappyHorse) is a fully open-source AI video generation model that uses a single-stream 40-layer Transformer to unify text, video, and audio generation. It currently ranks #1 on both text-to-video and image-to-video leaderboards according to Artificial Analysis.

Is Happy Horse 1.0 available on NeonLights AI?

Yes! Happy Horse 1.0 is now live on NeonLights AI. Select it from the model panel on the Video Generator page and start generating cinematic video from text or images.

How much does Happy Horse 1.0 cost on NeonLights AI?

At 720p: 200 credits for 5s, 400 for 10s, 600 for 15s. At 1080p: 400 credits for 5s, 800 for 10s, 1,200 for 15s. The default is 720p at 5s (200 credits).

How fast is Happy Horse 1.0?

On a single H100 GPU: 2 seconds for a 5-second 256p video, 8 seconds for 540p with super-resolution, and 38.4 seconds for full 1080p quality. This uses DMD-2 distillation with only 8 denoising steps.

What languages does Happy Horse 1.0 support?

Happy Horse 1.0 natively supports 7 languages: Mandarin, Cantonese, English, Japanese, Korean, German, and French.

How does Happy Horse 1.0 compare to other models?

In human evaluations, Happy Horse 1.0 achieved an 80% win rate vs Ovi 1.1 and 60.9% vs LTX 2.3. On Artificial Analysis leaderboards, it leads text-to-video (ELO 1,336) and image-to-video (ELO 1,393) — ahead of Seedance 2.0, Kling 3.0, and PixVerse V6.

Is Happy Horse 1.0 open source?

Yes. The base model, distilled model, super-resolution model, and inference code have all been released publicly.

What aspect ratios does Happy Horse 1.0 support?

Happy Horse 1.0 supports five aspect ratios for text-to-video: 16:9, 9:16, 1:1, 4:3, and 3:4. In image-to-video mode, the output inherits the uploaded image's aspect ratio.

happy horse 1.0happyhorse 1.0happyhorse aihappy horse aialibaba aiai video generatortext to videoimage to videounified transformeropen sourcemultilingualbest ai video modelneonlights ai

Generate Video With Happy Horse 1.0

The #1 ranked AI video model is now live on NeonLights AI. Generate from text or animate your own image — up to 15 seconds at 1080p.

Try Happy Horse 1.0 Now