Created by: sbl1996
Why "v1" is used as the default policy instead of "v0" as used in the original paper? Is there any experiment comparison? Thanks.