-
由 Roger He 提交于
This can improve performance for some cases. v2 (chk): handle all sizes, simplify the patch quite a bit v3 (chk): adjust dw estimation as well v4 (chk): use single loop, make end mask 64bit Signed-off-by: NRoger He <Hongbo.He@amd.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Tested-by: NRoger He <Hongbo.He@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChunming Zhou <david1.zhou@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
6849d47c