some legacy code still use xpu_wait() for stream sync -- it only syncs default stream. this PR replaces them with dev_ctx.Wait() to ensure that correct stream is always used
拖放文件到此处或点击上传