HappyHorse 1.0 — The New #1 AI Video Generator
A single self-attention Transformer that unifies text, video, and audio generation — topping every major leaderboard. Here's what we know so far, and what you can use right now.
HappyHorse 1.0 has arrived at the top of the AI video generation leaderboard virtually overnight. With an ELO of 1,336 on text-to-video and 1,393 on image-to-video according to Artificial Analysis, it sits comfortably above Seedance 2.0, SkyReels V4, Kling 3.0, PixVerse V6, and every other model currently available.
The model replaces the multi-stream complexity that defines most video generators with a single 40-layer self-attention Transformer — processing text, video, and audio tokens in one unified sequence. No cross-attention, no separate conditioning branches, no modality-specific pipelines. The result is state-of-the-art quality at record inference speeds.
HappyHorse 1.0 is not available via API yet. Pricing, resolution options, and integration timeline remain unclear. As soon as it becomes accessible, we'll evaluate it for NeonLights AI. In the meantime, you can generate cinematic AI video right now with 10 powerful models on NeonLights AI — including Kling V3 Omni, Veo 3.1, and Seedance 1.5 Pro.
Key Features
Single-Stream Architecture
A 40-layer Transformer processes text, video, and audio via self-attention only. No cross-attention modules, no separate streams — one unified model for all modalities.
Human-Centric Quality
Expressive facial performance, natural speech coordination, realistic body motion, and accurate lip-sync — designed specifically for human subjects in video.
Blazing Fast Inference
5-second 256p video in 2 seconds. 5-second 1080p in 38 seconds on a single H100 GPU — significantly faster than comparable models.
#1 on Every Leaderboard
80% win rate vs Ovi 1.1 and 60.9% vs LTX 2.3 across 2,000 human evaluations. Ranked #1 on both text-to-video and image-to-video benchmarks.
Multilingual Audio
Native support for Mandarin, Cantonese, English, Japanese, Korean, German, and French — natively supported without translation layers.
Fully Open Source
Base model, distilled model, super-resolution model, and inference code — all released publicly for the community.
What Makes HappyHorse 1.0 Different
Most AI video models use separate streams or cross-attention mechanisms to combine text understanding with video generation. HappyHorse 1.0 discards this entirely.
Its single-stream architecture feeds text tokens, a reference image latent, and noisy video/audio tokens into one unified token sequence. All 40 Transformer layers process this sequence using standard self-attention — no specialized conditioning branches, no modality-specific routing.
The architecture uses a sandwich design: the first and last 4 layers handle modality-specific projections, while the middle 32 layers share parameters across all modalities. This means the model learns universal representations that transfer between text, video, and audio naturally.
Two additional innovations stand out:
Timestep-free denoising — Unlike standard diffusion models that require explicit timestep embeddings, HappyHorse infers the denoising state directly from input latents, simplifying the architecture further.
DMD-2 distillation — Enables generation in only 8 denoising steps with no classifier-free guidance, cutting inference time dramatically without sacrificing output quality.
Leaderboard Performance
According to Artificial Analysis, HappyHorse 1.0 leads both major video generation benchmarks as of April 2026:
Text-to-Video — ELO 1,336, ahead of Seedance 2.0 (1,273), SkyReels V4 (1,246), Kling 3.0 Pro (1,241), and PixVerse V6 (1,237).
Image-to-Video — ELO 1,393, ahead of Seedance 2.0 (1,356), PixVerse V6 (1,336), and Kling 3.0 Omni (1,298).
In human evaluations across 2,000 comparisons, HappyHorse 1.0 achieved an 80% win rate against Ovi 1.1 and 60.9% against LTX 2.3. On technical benchmarks, it leads on visual quality (4.80), text alignment (4.18), and word error rate (14.60%) — the lowest WER of any model tested.
These are early numbers and the model is still unavailable for public testing, but the margin over established models is significant.
Inference Speed
One of HappyHorse 1.0's most impressive claims is its generation speed on a single H100 GPU:
256p — 5-second video in 2.0 seconds (faster than real-time)
540p — 5-second video in 8.0 seconds with super-resolution
1080p — 5-second video in 38.4 seconds at full quality
This speed comes from the DMD-2 distillation (only 8 denoising steps needed) combined with MagiCompiler — a full-graph compilation system that fuses operators across Transformer layers for approximately 1.2× end-to-end speedup.
For context, many competing models take minutes to generate comparable quality at 1080p. If these benchmarks hold in real-world API usage, HappyHorse 1.0 would be among the fastest high-quality video generators available.
What We Don't Know Yet
HappyHorse 1.0 tops the leaderboards, but several critical details remain unclear:
API availability — Listed as "Coming soon" on Artificial Analysis. No public API endpoint, no pricing announced.
Maximum resolution — Benchmarks show 256p, 540p, and 1080p. Whether higher resolutions (2K, 4K) will be supported is unknown.
Maximum duration — The benchmarks reference 5-second clips. Longer generation capabilities haven't been confirmed.
Image-to-video specifics — The model ranks #1 on image-to-video, but details on reference image handling, start/end frame support, and multi-image input are not yet documented.
Pricing — No API pricing has been announced. Competing models range from $4.20/min (PixVerse V6) to $13.44/min (Kling 3.0 Pro).
Audio capabilities — The architecture processes audio tokens natively, but specific audio features (lip-sync quality, sound effects, background music) need real-world testing.
We'll update this article as more information becomes available and evaluate HappyHorse 1.0 for NeonLights AI as soon as the API launches.
What You Can Use Right Now on NeonLights AI
While HappyHorse 1.0 remains unavailable, NeonLights AI already offers 10 powerful video generation models you can use today — several of which compete directly with the models HappyHorse was benchmarked against:
Kling V3 Omni — The most versatile model on NeonLights AI. Unified text/image-to-video with up to 7 reference images, native audio, and video editing capabilities. Starts at 300 credits.
Veo 3.1 — Google's state-of-the-art video model with superior prompt comprehension, synchronized native audio, and start/end frame control. 450 credits.
Pixverse V5.6 — Full sound field (BGM + SFX + dialogue) with multi-shot camera control and cinematic cuts. Starts at 120 credits.
Seedance 1.5 Pro — ByteDance's audio-synced video model with dual-branch architecture and lip-sync across 8+ languages. Starts at 60 credits.
LTX 2.3 Fast — Faster-than-real-time generation with synchronized audio and 9 camera motion controls. Starts at 100 credits.
All models are available instantly — no waitlists, no API setup. Buy credits and start generating.
Technical Specifications
Frequently Asked Questions
What is HappyHorse 1.0?
HappyHorse 1.0 is a fully open-source AI video generation model that uses a single-stream 40-layer Transformer to unify text, video, and audio generation. It currently ranks #1 on both text-to-video and image-to-video leaderboards according to Artificial Analysis.
Is HappyHorse 1.0 available on NeonLights AI?
Not yet. HappyHorse 1.0 has no public API as of April 2026. We're monitoring the release closely and will evaluate it for NeonLights AI as soon as the API becomes available.
What can I use instead of HappyHorse 1.0 right now?
NeonLights AI offers 10 video generation models available today, including Kling V3 Omni (multi-reference, native audio), Veo 3.1 (Google's SOTA model), Pixverse V5.6 (full sound field), and Seedance 1.5 Pro (lip-synced audio). All are available instantly with credits.
How fast is HappyHorse 1.0?
On a single H100 GPU: 2 seconds for a 5-second 256p video, 8 seconds for 540p with super-resolution, and 38.4 seconds for full 1080p quality. This uses DMD-2 distillation with only 8 denoising steps.
What languages does HappyHorse 1.0 support?
HappyHorse 1.0 natively supports 7 languages: Mandarin, Cantonese, English, Japanese, Korean, German, and French.
How does HappyHorse 1.0 compare to other models?
In human evaluations, HappyHorse 1.0 achieved an 80% win rate vs Ovi 1.1 and 60.9% vs LTX 2.3. On Artificial Analysis leaderboards, it leads text-to-video (ELO 1,336) and image-to-video (ELO 1,393) — ahead of Seedance 2.0, Kling 3.0, and PixVerse V6.
Is HappyHorse 1.0 open source?
Yes. The base model, distilled model, super-resolution model, and inference code have all been released publicly.
How much will HappyHorse 1.0 cost?
No API pricing has been announced. Competing models range from $4.20/min to $13.44/min. NeonLights AI credit pricing will be determined once the API becomes accessible.
Generate AI Video Now — Don't Wait
HappyHorse 1.0 is coming soon. Create cinematic video today with Kling V3 Omni, Veo 3.1, Pixverse V5.6, and 7 more models on NeonLights AI.
Generate Video on NeonLights AI