1. 18 12月, 2013 2 次提交
    • F
      net: ovs: use CRC32 accelerated flow hash if available · 500f8087
      Francesco Fusco 提交于
      Currently OVS uses jhash2() for calculating flow hashes in its
      internal flow_hash() function. The performance of the flow_hash()
      function is critical, as the input data can be hundreds of bytes
      long.
      
      OVS is largely deployed in x86_64 based datacenters.  Therefore,
      we argue that the performance critical fast path of OVS should
      exploit underlying CPU features in order to reduce the per packet
      processing costs. We replace jhash2 with the hash implementation
      provided by the kernel hash lib, which exploits the crc32l
      instruction to achieve high performance
      
      Our patch greatly reduces the hash footprint from ~200 cycles of
      jhash2() to around ~90 cycles in case of ovs_flow_hash_crc()
      (measured with rdtsc over maximum length flow keys on an i7 Intel
      CPU).
      
      Additionally, we wrote a microbenchmark to stress the flow table
      performance. The benchmark inserts random flows into the flow
      hash and then performs lookups. Our hash deployed on a CRC32
      capable CPU reduces the lookup for 1000 flows, 100 masks from
      ~10,100us to ~6,700us, for example.
      
      Thus, simply use the newly introduced arch_fast_hash2() as a
      drop-in replacement.
      Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NThomas Graf <tgraf@redhat.com>
      Acked-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      500f8087
    • F
      lib: introduce arch optimized hash library · 71ae8aac
      Francesco Fusco 提交于
      We introduce a new hashing library that is meant to be used in
      the contexts where speed is more important than uniformity of the
      hashed values. The hash library leverages architecture specific
      implementation to achieve high performance and fall backs to
      jhash() for the generic case.
      
      On Intel-based x86 architectures, the library can exploit the crc32l
      instruction, part of the Intel SSE4.2 instruction set, if the
      instruction is supported by the processor. This implementation
      is twice as fast as the jhash() implementation on an i7 processor.
      
      Additional architectures, such as Arm64 provide instructions for
      accelerating the computation of CRC, so they could be added as well
      in follow-up work.
      Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NThomas Graf <tgraf@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71ae8aac
  2. 17 12月, 2013 5 次提交
  3. 16 12月, 2013 1 次提交
  4. 15 12月, 2013 1 次提交
  5. 14 12月, 2013 26 次提交
  6. 13 12月, 2013 5 次提交