Stop aggressive inlining.

It's not clear what exactly is happening here, but the Read implementation for text decoding appears a bit sensitive. Small pertubations in the code appear to have a nearly 100% impact on the overall speed of ripgrep when searching UTF-16 files. I haven't had the time to examine the generated code in detail, but `perf stat` seems to think that the instruction cache is performing a lot worse when the code slows down. This might mean that excessive inlining causes a different code structure that leads to less-than-optimal icache usage, but it's at best a guess. Explicitly disabling the inline for the cold path seems to help the optimizer figure out the right thing.

Stop aggressive inlining.
It's not clear what exactly is happening here, but the Read implementation for text decoding appears a bit sensitive. Small pertubations in the code appear to have a nearly 100% impact on the overall speed of ripgrep when searching UTF-16 files. I haven't had the time to examine the generated code in detail, but `perf stat` seems to think that the instruction cache is performing a lot worse when the code slows down. This might mean that excessive inlining causes a different code structure that leads to less-than-optimal icache usage, but it's at best a guess. Explicitly disabling the inline for the cold path seems to help the optimizer figure out the right thing.
8db24e13 · Andrew Gallant · 8bbe58d6 · 8db24e13
隐藏空白更改
内联并排

Showing with 1 addition and 0 deletion

src/decoder.rs src/decoder.rs +1 -0

未找到文件。
--- a/src/decoder.rs
+++ b/src/decoder.rs
@@ -251,6 +251,7 @@ impl<R: io::Read, B: AsMut<[u8]>> DecodeReader<R, B> {
        Ok(nwrite)
    }

+    #[inline(never)] // impacts perf...
    fn detect(&mut self) -> io::Result<()> {
        let bom = try!(self.rdr.peek_bom());
        self.decoder = bom.decoder();