1. 23 1月, 2014 35 次提交
  2. 22 1月, 2014 5 次提交
    • D
      Merge branch 'reciprocal' · 374d1125
      David S. Miller 提交于
      Hannes Frederic Sowa says:
      
      ====================
      reciprocal_divide update
      
      This patch is on top of aee636c4 ("bpf: do not use reciprocal
      divide") from Eric that sits in net tree. It will not create a merge
      conflict, but it depends on this one, so we suggest, if possible, to
      merge net into net-next.
      
      We are proposing this change with only small modifications from the
      v2 version, namely updating the name of trim to reciprocal_scale
      (as commented on by Ben Hutchings and Eric Dumazet, thanks!).
      
      We thought about introducing the reciprocal_divide algorithm in
      parallel to the one already used by the kernel but faced organizational
      issues, leading us to the conclusion that it is best to just replace
      the old one: We could not come up with names for the different
      implementations and also with a way to describe the differences to
      guide developers which one to choose in which situation. This is
      because we cannot specify the correct semantics for the version
      which is currently used by the kernel. Altough it seems to not be
      causing problems in the kernel, we cannot surely say so in the
      case of flex_array for the future. Current usage seems ok, but
      future users could run into problems.
      
      Changelog:
      
      v1->v2:
       - changed name to prandom_u32_max in p1
       - changed name to trim in p2
       - reworked code in p3
      v2->v3:
       - p1 and p3 stays unchanged, only small update in commit
         message in p3
       - changed name to reciprocal_scale in p2
       - fixed kernel doc format
      v3->v4:
       - pseduo -> pseudo (thanks to Tilman Schmidt)
      v4->v5:
       - fix pseduo -> pseudo for real now, sorry for the noise
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      374d1125
    • H
      reciprocal_divide: update/correction of the algorithm · 809fa972
      Hannes Frederic Sowa 提交于
      Jakub Zawadzki noticed that some divisions by reciprocal_divide()
      were not correct [1][2], which he could also show with BPF code
      after divisions are transformed into reciprocal_value() for runtime
      invariance which can be passed to reciprocal_divide() later on;
      reverse in BPF dump ended up with a different, off-by-one K in
      some situations.
      
      This has been fixed by Eric Dumazet in commit aee636c4
      ("bpf: do not use reciprocal divide"). This follow-up patch
      improves reciprocal_value() and reciprocal_divide() to work in
      all cases by using Granlund and Montgomery method, so that also
      future use is safe and without any non-obvious side-effects.
      Known problems with the old implementation were that division by 1
      always returned 0 and some off-by-ones when the dividend and divisor
      where very large. This seemed to not be problematic with its
      current users, as far as we can tell. Eric Dumazet checked for
      the slab usage, we cannot surely say so in the case of flex_array.
      Still, in order to fix that, we propose an extension from the
      original implementation from commit 6a2d7a95 resp. [3][4],
      by using the algorithm proposed in "Division by Invariant Integers
      Using Multiplication" [5], Torbjörn Granlund and Peter L.
      Montgomery, that is, pseudocode for q = n/d where q, n, d is in
      u32 universe:
      
      1) Initialization:
      
        int l = ceil(log_2 d)
        uword m' = floor((1<<32)*((1<<l)-d)/d)+1
        int sh_1 = min(l,1)
        int sh_2 = max(l-1,0)
      
      2) For q = n/d, all uword:
      
        uword t = (n*m')>>32
        q = (t+((n-t)>>sh_1))>>sh_2
      
      The assembler implementation from Agner Fog [6] also helped a lot
      while implementing. We have tested the implementation on x86_64,
      ppc64, i686, s390x; on x86_64/haswell we're still half the latency
      compared to normal divide.
      
      Joint work with Daniel Borkmann.
      
        [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
        [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
        [3] https://gmplib.org/~tege/division-paper.pdf
        [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
        [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
        [6] http://www.agner.org/optimize/asmlib.zipReported-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      809fa972
    • D
      net: introduce reciprocal_scale helper and convert users · 89770b0a
      Daniel Borkmann 提交于
      As David Laight suggests, we shouldn't necessarily call this
      reciprocal_divide() when users didn't requested a reciprocal_value();
      lets keep the basic idea and call it reciprocal_scale(). More
      background information on this topic can be found in [1].
      
      Joint work with Hannes Frederic Sowa.
      
        [1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.htmlSuggested-by: NDavid Laight <david.laight@aculab.com>
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89770b0a
    • D
      random32: add prandom_u32_max and convert open coded users · f337db64
      Daniel Borkmann 提交于
      Many functions have open coded a function that returns a random
      number in range [0,N-1]. Under the assumption that we have a PRNG
      such as taus113 with being well distributed in [0, ~0U] space,
      we can implement such a function as uword t = (n*m')>>32, where
      m' is a random number obtained from PRNG, n the right open interval
      border and t our resulting random number, with n,m',t in u32 universe.
      
      Lets go with Joe and simply call it prandom_u32_max(), although
      technically we have an right open interval endpoint, but that we
      have documented. Other users can further be migrated to the new
      prandom_u32_max() function later on; for now, we need to make sure
      to migrate reciprocal_divide() users for the reciprocal_divide()
      follow-up fixup since their function signatures are going to change.
      
      Joint work with Hannes Frederic Sowa.
      
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f337db64
    • M
      net/mlx4_core: Remove unnecessary validation for port number · 6cd28f04
      Moni Shoua 提交于
      This is a fix to a regression introduced by commit:
      "982290a7 net/mlx4_core: Check port number for validity
      before accessing data"
      
      IPoIB could not attach to multicast group and we get this in dmesg:
      [144214.145008] ib0: failed to attach to multicast group, ret = -22
      [144214.145016] ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
      [144214.145019] ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
      
      The cause to the problem is because port is extracted from gid[5].
      Which is only valid for Ethernet.
      Removed this validation in mlx4_qp_attach_common(), which is accessed
      from both Ethernet and IB flows.
      Error flow for bad port value in Ethernet is already exists in that
      function.
      Signed-off-by: NMoni Shoua <monis@mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cd28f04