前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
Created by: sbl1996
Why "v1" is used as the default policy instead of "v0" as used in the original paper? Is there any experiment comparison? Thanks.