Three-Agent Evaluation
100,000 hands · same seeded sequence · single deck
| Agent |
Win% |
Push% |
Loss% |
EV/hand |
| Basic strategy |
43.09% |
9.05% |
47.86% |
−0.0306 |
| Random forest |
42.50% |
8.61% |
48.89% |
−0.0543 |
| RL (MaskablePPO) |
45.16% |
9.02% |
45.82% |
−0.0065 |
EV is expected value per $1 bet. The RL agent learned from game outcomes alone — no labels, no strategy chart. It outperforms basic strategy by exploiting deck composition in single-deck play. See
rl/RESULTS.md for full analysis.