* added sigmoid BF16 FWD/BWD and gelu BF16 BWD * added newline at EOF * switched from lambdas to local functions * changed function names
拖放文件到此处或点击上传