Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs

Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs,第1张

Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs,第2张


web url:Adversarial Policies in Go --Adversarial Policies Beat Superhuman Go AIs:

Adversarial Policies in Go - Game Viewer (far.ai)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs,Center for Human-Compatible Artificial Intelligence Logo,第3张 Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs,MIT Logo,第4张 Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs,FAR Logo,第5张Adversarial Policies Beat Superhuman Go AIs

Tony Wang*     Adam Gleave*     Tom Tseng     Nora Belrose     Kellin PelrineJoseph      MillerMichael D Dennis     Yawen Duan     Viktor Pogrebniak       Sergey Levine     Stuart Russell

Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs,第6张Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs,第7张

Adversarially Exploiting KataGo Game Analysis Transfer to ELF/Leela

Pass-based Attack Human Evaluation Baseline Attacks

Training Sample Games

Contents

· Summary

· KataGo without search (top-100 European player level)

· KataGo with 4096 visits (superhuman)

· KataGo with 10,000,000 visits

Summary

We attack KataGo, a state-of-the-art Go AI system, by training adversarial policies that play against frozen KataGo victims. Our attack achieves a 100% win rate over 1000 games when KataGo uses no tree-search, and a 97% win rate when KataGo uses enough search to be superhuman. Notably, our adversaries do not win by learning to play Go better than KataGo — in fact, our adversaries are easily beaten by a human amateur. Instead, our adversaries win by tricking KataGo into making serious blunders. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes.

All games are randomly selected unless otherwise specified. We primarily attack KataGo network checkpoint b40c256-s11840935168-d2898845681, which we dub Latest since it is the latest confidently rated KataGo network at the time of writing. For more information, see our paper and GitHub.

KataGo without search (top-100 European player level)

Without tree-search, Katago's Latest network plays at the strength of a top-100 European professional. We trained an adversary that wins 100% of the time over 1000 games against this victim1. Our adversary gets the victim to form a large circular structure, and then tricks the victim into allowing the circular structure to be killed. See the "Game Analysis" tab for a more in depth analysis of this adversarial strategy.

[1] The games below are actually against a version of Latest that was patched to be immune to a simpler pass-based attack. We applied this patch to the victim to force our adversary to learn a more interesting attack. The patch is a hardcoded defense that forbids the victim from passing until it has no more legal moves outside its territory. We call the patched victim Latestdef. Because we limit the victim's passing, games are usually played out to the end, terminating automatically once all points belong to a pass-alive-group or pass-alive-territory.


Victim ColorWin colorAdversary WinScore differenceGame lengthDownloadwbTrue106.5385
wbTrue110.5347
wbTrue128.5345
bwTrue121.5376
bwTrue137.5378
bwTrue137.5352

Victim

Rank

-

Caps

6

Time

--:--

Adversary

Rank

-

Caps

68

Time

--:--

Comments

adversary predicted win prob: 1.00 loss: 0.00, predicted score: 107.3

Victim: Latestdef, no search

Adversary: 545 million training steps, 600 visits

KataGo with 4096 visits (superhuman)

With 4096 visits, KataGo's Latest network plays at a superhuman level. Nonetheless, our adversary still achieves a 97.3% win rate against Latest and a 95.7% win rate against the defended victim Latestdef. Games against Latestdef are shown below.

Victim ColorWin colorAdversary WinScore differenceGame lengthDownloadwbTrue108.5373
wbTrue116.5383
wbTrue124.5347
bwTrue47.5492
bwTrue123.5356
bwTrue123.5352
bwTrue129.5372
bwTrue133.5500
wwFalse-231.5396
wwFalse-155.5467

Victim

Rank

-

Caps

3

Time

--:--

Adversary

Rank

-

Caps

56

Time

--:--

Comments

adversary predicted win prob: 1.00 loss: 0.00, predicted score: 109.1

Victim: Latestdef, 4096 visits

Adversary: 545 million training steps, 600 visits

KataGo with 10,000,000 visits

Our adversary with 600 visits still achieves a 72% win rate against Latest with 10,000,000 visits, demonstrating that large amounts of search is not a practical defense against the adversary.

Victim ColorWin colorAdversary WinScore differenceGame lengthDownloadwbTrue114.5306
wbTrue122.5334
wbTrue136.5355
bwTrue89.5335
bwTrue125.5384
bwTrue167.5374
wwFalse-127.5440
bbFalse-282.5553

Victim

Rank

-

Caps

3

Time

--:--

Adversary

Rank

-

Caps

51

Time

--:--

Comments

White passed.

Victim: Latest, 10,000,000 visits, 1,024 search threads

Adversary: 545 million training steps, 600 visits




--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs,Center for Human-Compatible Artificial Intelligence Logo,第3张 Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs,MIT Logo,第4张 Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs,FAR Logo,第10张

Adversarial Policies Beat Superhuman Go AIsTony Wang*Adam Gleave*Tom TsengNora BelroseKellin PelrineJoseph MillerMichael D DennisYawen DuanViktor PogrebniakSergey LevineStuart Russell Adversarially Exploiting KataGo Game Analysis Transfer to ELF/Leela Pass-based Attack Human Evaluation Baseline Attacks Training Sample Games Contents

· Summary

· KataGo without search (top-100 European player level)

· KataGo with 4096 visits (superhuman)

· KataGo with 10,000,000 visits

Summary

We attack KataGo, a state-of-the-art Go AI system, by training adversarial policies that play against frozen KataGo victims. Our attack achieves a 100% win rate over 1000 games when KataGo uses no tree-search, and a 97% win rate when KataGo uses enough search to be superhuman. Notably, our adversaries do not win by learning to play Go better than KataGo — in fact, our adversaries are easily beaten by a human amateur. Instead, our adversaries win by tricking KataGo into making serious blunders. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes.

All games are randomly selected unless otherwise specified. We primarily attack KataGo network checkpoint b40c256-s11840935168-d2898845681, which we dub Latest since it is the latest confidently rated KataGo network at the time of writing. For more information, see our paper and GitHub.

KataGo without search (top-100 European player level)

Without tree-search, Katago's Latest network plays at the strength of a top-100 European professional. We trained an adversary that wins 100% of the time over 1000 games against this victim1. Our adversary gets the victim to form a large circular structure, and then tricks the victim into allowing the circular structure to be killed. See the "Game Analysis" tab for a more in depth analysis of this adversarial strategy.

[1] The games below are actually against a version of Latest that was patched to be immune to a simpler pass-based attack. We applied this patch to the victim to force our adversary to learn a more interesting attack. The patch is a hardcoded defense that forbids the victim from passing until it has no more legal moves outside its territory. We call the patched victim Latestdef. Because we limit the victim's passing, games are usually played out to the end, terminating automatically once all points belong to a pass-alive-group or pass-alive-territory.

Victim ColorWin colorAdversary WinScore differenceGame lengthDownloadwbTrue106.5385wbTrue110.5347wbTrue128.5345bwTrue121.5376bwTrue137.5378bwTrue137.5352VictimRank-Caps6Time--:--AdversaryRank-Caps68Time--:--Comments

adversary predicted win prob: 1.00 loss: 0.00, predicted score: 107.3

Victim: Latestdef, no search

Adversary: 545 million training steps, 600 visits

KataGo with 4096 visits (superhuman)

With 4096 visits, KataGo's Latest network plays at a superhuman level. Nonetheless, our adversary still achieves a 97.3% win rate against Latest and a 95.7% win rate against the defended victim Latestdef. Games against Latestdef are shown below.

Victim ColorWin colorAdversary WinScore differenceGame lengthDownloadwbTrue108.5373wbTrue116.5383wbTrue124.5347bwTrue47.5492bwTrue123.5356bwTrue123.5352bwTrue129.5372bwTrue133.5500wwFalse-231.5396wwFalse-155.5467VictimRank-Caps3Time--:--AdversaryRank-Caps56Time--:--Comments

adversary predicted win prob: 1.00 loss: 0.00, predicted score: 109.1

Victim: Latestdef, 4096 visits

Adversary: 545 million training steps, 600 visits

KataGo with 10,000,000 visits

Our adversary with 600 visits still achieves a 72% win rate against Latest with 10,000,000 visits, demonstrating that large amounts of search is not a practical defense against the adversary.

Victim ColorWin colorAdversary WinScore differenceGame lengthDownloadwbTrue114.5306wbTrue122.5334wbTrue136.5355bwTrue89.5335bwTrue125.5384bwTrue167.5374wwFalse-127.5440bbFalse-282.5553VictimRank-Caps3Time--:--AdversaryRank-Caps51Time--:--CommentsWhite passed.

Victim: Latest, 10,000,000 visits, 1,024 search threads

Adversary: 545 million training steps, 600 visits


本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
DABAN RP主题是一个优秀的主题,极致后台体验,无插件,集成会员系统
白度搜_经验知识百科全书 » Adversarial Policies in Go --- Adversarial Policies Beat Superhuman Go AIs

0条评论

发表评论

提供最优质的资源集合

立即查看 了解详情