Building a custom octocopter from scratch with no prior hardware experience
Mirrored from Hacker News — AI on Front Page for archival readability. Support the source by reading on the original site.
The drone flies through all single, dual, and SOME TRIPLE motor failures!! But it's sim-only, and the path to get here had significant learning curves.
Before training the huge policy with domain randomization and all of the bells and whistles, I decided to train a sim-only policy, which would be ready in about an hour and half on CPU. I'm really glad I did this, because it definitely didn't work out the first time. Here's a (maybe too transparent) timeline of the process:
| # | What I tried | Result |
|---|---|---|
| 1 | Baseline PPO, high exploration, always 2 faults. Ran overnight. | Failed. Entropy climbed 11→22 the whole time, and it crashed at 20M when trying to save (it was fine, I had checkpoints, but still) |
| 2 | Lowered exploration, kept always-2-faults and no curriculum (straight-to-dual). | This one seemed to be working. Looked broken at 2M (crashing at step 7, very negative entropy — Gaussian differential entropy can go negative when variance collapses), but trained through it. I killed it deliberately because I wanted to add single-motor failures first. |
| 3 | Added a hover->single->dual curriculum | Broke everything. Turned out the curriculum's 4M steps of pure hover was exposing two latent bugs (failures from step 0 gave enough training signal to bulldoze past them). |
| 4 | (not a config -- an operator error) | A zombie training process from the night before was running alongside the new one, both writing the same checkpoint filenames. |
| 5 | Residual actions (u = hover + action). | Commands got driven to ±3, saturated and lopsided -- tip-over in 7 steps. |
| 6 | Stripped/re-added the curriculum, low exploration. | 0% at every checkpoint -- and it crashed at step 7 every single time, even during the pure-hover phase. That's what finally told me that this bug had to be systemic. |
| 7 | The two real fixes (below). | Worked. Hover learned by 0.5M steps, 100% survival on hover/single/dual by ~9.5M. |
Everything from #3 onward -- the entropy panic, the residual detour, the curriculum back-and-forth -- was me poking at symptoms of two underlying bugs.
1. The actions were getting stuck. The Gaussian policy outputs unbounded means, but the env hard-clips commands to [0,1] and PPO computes gradients on the unclipped value -- so once a motor drifts past the clip edge there's no corrective gradient pulling it back inside, and it stays there. With 8 motors this produces a lopsided tip-over, and no hyperparameter fixes it because it's not a hyperparameter problem. Fix: squash through tanh as a residual around hover throttle, so commands can't saturate and an untrained net already hovers. This alone bumped untrained survival from 7 steps to 205.
2. Staying alive paid nothing. An open-loop hover test printed
r = 0.00
every step: at the ~1.9m the drone actually settles, the +0.1 survival bonus was exactly cancelled by
the -0.1 altitude penalty. Since the drone is marginally stable every episode eventually crashes (-10),
so "hover 200 steps then crash" and "crash immediately" had the same return. Fix: bump
r_survive
0.1 → 1.0, so hovering pays +0.9/step and PPO finally has a reason to stay up.
Results
Final policy is a 43.4k-parameter MLP.
It even generalizes to 3-motor failures it was never trained on, as long as recovery is physically possible. Even when I killed 3 adjacent motors (physically unrecoverable) then, it fought for 7.2 seconds and sank instead of tumbling.
One nice surprise: the "uncompensatable-yaw" cases (two same-spin motors 90° apart, where in theory the drone should just spin freely) aren't actually uncompensatable. The policy holds heading to within ~13°/s in all of them -- a slow drift, not a free spin. My heuristic for flagging those was too pessimistic.
Next step: the real, sim-to-real-able policy!
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.