1. 15 2月, 2017 3 次提交
  2. 14 2月, 2017 1 次提交
  3. 10 1月, 2017 10 次提交
    • A
      bpf: rename ARG_PTR_TO_STACK · 39f19ebb
      Alexei Starovoitov 提交于
      since ARG_PTR_TO_STACK is no longer just pointer to stack
      rename it to ARG_PTR_TO_MEM and adjust comment.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39f19ebb
    • U
      smc: establish new socket family · ac713874
      Ursula Braun 提交于
      * enable smc module loading and unloading
       * register new socket family
       * basic smc socket creation and deletion
       * use backing TCP socket to run CLC (Connection Layer Control)
         handshake of SMC protocol
       * Setup for infiniband traffic is implemented in follow-on patches.
         For now fallback to TCP socket is always used.
      Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
      Reviewed-by: NUtz Bacher <utz.bacher@de.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac713874
    • J
      stmmac: move stmmac_clk, pclk, clk_ptp_ref and stmmac_rst to platform structure · f573c0b9
      jpinto 提交于
      This patch moves stmmac_clk, pclk, clk_ptp_ref and stmmac_rst to the
      plat_stmmacenet_data structure. It also moves these platform variables
      initialization to stmmac_platform. This was done for two reasons:
      
      a) If PCI is used, platform related code is being executed in stmmac_main
      resulting in warnings that have no sense and conceptually was not right
      
      b) stmmac as a synopsys reference ethernet driver stack will be hosting
      more and more drivers to its structure like synopsys/dwc_eth_qos.c.
      These drivers have their own DT bindings that are not compatible with
      stmmac's. One of the most important are the clock names, and so they need
      to be parsed in the glue logic and initialized there, and that is the main
      reason why the clocks were passed to the platform structure.
      Signed-off-by: NJoao Pinto <jpinto@synopsys.com>
      Tested-by: NNiklas Cassel <niklas.cassel@axis.com>
      Reviewed-by: NLars Persson <larper@axis.com>
      Acked-by: NAlexandre TORGUE <alexandre.torgue@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f573c0b9
    • J
      stmmac: adding DT parameter for LPI tx clock gating · b4b7b772
      jpinto 提交于
      This patch adds a new parameter to the stmmac DT: snps,en-tx-lpi-clockgating.
      It was ported from synopsys/dwc_eth_qos.c and it is useful if lpi tx clock
      gating is needed by stmmac users also.
      Signed-off-by: NJoao Pinto <jpinto@synopsys.com>
      Tested-by: NNiklas Cassel <niklas.cassel@axis.com>
      Reviewed-by: NLars Persson <larper@axis.com>
      Acked-by: NAlexandre TORGUE <alexandre.torgue@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4b7b772
    • J
      siphash: implement HalfSipHash1-3 for hash tables · 1ae2324f
      Jason A. Donenfeld 提交于
      HalfSipHash, or hsiphash, is a shortened version of SipHash, which
      generates 32-bit outputs using a weaker 64-bit key. It has *much* lower
      security margins, and shouldn't be used for anything too sensitive, but
      it could be used as a hashtable key function replacement, if the output
      is never exposed, and if the security requirement is not too high.
      
      The goal is to make this something that performance-critical jhash users
      would be willing to use.
      
      On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias
      SipHash1-3 to HalfSipHash1-3 on those systems.
      
      64-bit x86_64:
      [    0.509409] test_siphash:     SipHash2-4 cycles: 4049181
      [    0.510650] test_siphash:     SipHash1-3 cycles: 2512884
      [    0.512205] test_siphash: HalfSipHash1-3 cycles: 3429920
      [    0.512904] test_siphash:    JenkinsHash cycles:  978267
      So, we map hsiphash() -> SipHash1-3
      
      32-bit x86:
      [    0.509868] test_siphash:     SipHash2-4 cycles: 14812892
      [    0.513601] test_siphash:     SipHash1-3 cycles:  9510710
      [    0.515263] test_siphash: HalfSipHash1-3 cycles:  3856157
      [    0.515952] test_siphash:    JenkinsHash cycles:  1148567
      So, we map hsiphash() -> HalfSipHash1-3
      
      hsiphash() is roughly 3 times slower than jhash(), but comes with a
      considerable security improvement.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ae2324f
    • J
      siphash: add cryptographically secure PRF · 2c956a60
      Jason A. Donenfeld 提交于
      SipHash is a 64-bit keyed hash function that is actually a
      cryptographically secure PRF, like HMAC. Except SipHash is super fast,
      and is meant to be used as a hashtable keyed lookup function, or as a
      general PRF for short input use cases, such as sequence numbers or RNG
      chaining.
      
      For the first usage:
      
      There are a variety of attacks known as "hashtable poisoning" in which an
      attacker forms some data such that the hash of that data will be the
      same, and then preceeds to fill up all entries of a hashbucket. This is
      a realistic and well-known denial-of-service vector. Currently
      hashtables use jhash, which is fast but not secure, and some kind of
      rotating key scheme (or none at all, which isn't good). SipHash is meant
      as a replacement for jhash in these cases.
      
      There are a modicum of places in the kernel that are vulnerable to
      hashtable poisoning attacks, either via userspace vectors or network
      vectors, and there's not a reliable mechanism inside the kernel at the
      moment to fix it. The first step toward fixing these issues is actually
      getting a secure primitive into the kernel for developers to use. Then
      we can, bit by bit, port things over to it as deemed appropriate.
      
      While SipHash is extremely fast for a cryptographically secure function,
      it is likely a bit slower than the insecure jhash, and so replacements
      will be evaluated on a case-by-case basis based on whether or not the
      difference in speed is negligible and whether or not the current jhash usage
      poses a real security risk.
      
      For the second usage:
      
      A few places in the kernel are using MD5 or SHA1 for creating secure
      sequence numbers, syn cookies, port numbers, or fast random numbers.
      SipHash is a faster and more fitting, and more secure replacement for MD5
      in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
      obvious and straight-forward, and so is submitted along with this patch
      series. There shouldn't be much of a debate over its efficacy.
      
      Dozens of languages are already using this internally for their hash
      tables and PRFs. Some of the BSDs already use this in their kernels.
      SipHash is a widely known high-speed solution to a widely known set of
      problems, and it's time we catch-up.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: NJean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c956a60
    • E
      IB/mlx5: Support 4k UAR for libmlx5 · 30aa60b3
      Eli Cohen 提交于
      Add fields to structs to convey to kernel an indication whether the
      library supports multi UARs per page and return to the library the size
      of a UAR based on the queried value.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      30aa60b3
    • E
      IB/mlx5: Allow future extension of libmlx5 input data · b037c29a
      Eli Cohen 提交于
      Current check requests that new fields in struct
      mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero.
      This was introduced so new libraries passing additional information to
      the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified
      by old kernels that do not support their request by failing the
      operation. This schecme is problematic since it requires libmlx5 to issue
      the requests with descending input size for struct
      mlx5_ib_alloc_ucontext_req_v2.
      
      To avoid this, we require that new features that will obey the following
      rules:
      If the feature requires one or more fields in the response and the at
      least one of the fields can be encoded such that a zero value means the
      kernel ignored the request then this field will provide the indication
      to the library. If no response is required or if zero is a valid
      response, a new field should be added that indicates to the library
      whether its request was processed.
      
      Fixes: b368d7cb ('IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext')
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      b037c29a
    • E
      IB/mlx5: Use blue flame register allocator in mlx5_ib · 5fe9dec0
      Eli Cohen 提交于
      Make use of the blue flame registers allocator at mlx5_ib. Since blue
      flame was not really supported we remove all the code that is related to
      blue flame and we let all consumers to use the same blue flame register.
      Once blue flame is supported we will add the code. As part of this patch
      we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only
      used by mlx5_ib.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      5fe9dec0
    • E
      net/mlx5: Add interface to get reference to a UAR · 01187175
      Eli Cohen 提交于
      A reference to a UAR is required to generate CQ or EQ doorbells. Since
      CQ or EQ doorbells can all be generated using the same UAR area without
      any effect on performance, we are just getting a reference to any
      available UAR, If one is not available we allocate it but we don't waste
      the blue flame registers it can provide and we will use them for
      subsequent allocations.
      We get a reference to such UAR and put in mlx5_priv so any kernel
      consumer can make use of it.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      01187175
  4. 09 1月, 2017 6 次提交
    • W
      net-tc: convert tc_from to tc_from_ingress and tc_redirected · bc31c905
      Willem de Bruijn 提交于
      The tc_from field fulfills two roles. It encodes whether a packet was
      redirected by an act_mirred device and, if so, whether act_mirred was
      called on ingress or egress. Split it into separate fields.
      
      The information is needed by the special IFB loop, where packets are
      taken out of the normal path by act_mirred, forwarded to IFB, then
      reinjected at their original location (ingress or egress) by IFB.
      
      The IFB device cannot use skb->tc_at_ingress, because that may have
      been overwritten as the packet travels from act_mirred to ifb_xmit,
      when it passes through tc_classify on the IFB egress path. Cache this
      value in skb->tc_from_ingress.
      
      That field is valid only if a packet arriving at ifb_xmit came from
      act_mirred. Other packets can be crafted to reach ifb_xmit. These
      must be dropped. Set tc_redirected on redirection and drop all packets
      that do not have this bit set.
      
      Both fields are set only on cloned skbs in tc actions, so original
      packet sources do not have to clear the bit when reusing packets
      (notably, pktgen and octeon).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc31c905
    • W
      net-tc: convert tc_at to tc_at_ingress · 8dc07fdb
      Willem de Bruijn 提交于
      Field tc_at is used only within tc actions to distinguish ingress from
      egress processing. A single bit is sufficient for this purpose.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dc07fdb
    • W
      net-tc: convert tc_verd to integer bitfields · a5135bcf
      Willem de Bruijn 提交于
      Extract the remaining two fields from tc_verd and remove the __u16
      completely. TC_AT and TC_FROM are converted to equivalent two-bit
      integer fields tc_at and tc_from. Where possible, use existing
      helper skb_at_tc_ingress when reading tc_at. Introduce helper
      skb_reset_tc to clear fields.
      
      Not documenting tc_from and tc_at, because they will be replaced
      with single bit fields in follow-on patches.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5135bcf
    • W
      net-tc: extract skip classify bit from tc_verd · e7246e12
      Willem de Bruijn 提交于
      Packets sent by the IFB device skip subsequent tc classification.
      A single bit governs this state. Move it out of tc_verd in
      anticipation of removing that __u16 completely.
      
      The new bitfield tc_skip_classify temporarily uses one bit of a
      hole, until tc_verd is removed completely in a follow-up patch.
      
      Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
      With that many options, little value in documenting it.
      
      Introduce a helper function to deduplicate the logic in the two
      sites that check this bit.
      
      The field tc_skip_classify is set only in IFB on skbs cloned in
      act_mirred, so original packet sources do not have to clear the
      bit when reusing packets (notably, pktgen and octeon).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7246e12
    • S
      net: make ndo_get_stats64 a void function · bc1f4470
      stephen hemminger 提交于
      The network device operation for reading statistics is only called
      in one place, and it ignores the return value. Having a structure
      return value is potentially confusing because some future driver could
      incorrectly assume that the return value was used.
      
      Fix all drivers with ndo_get_stats64 to have a void function.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc1f4470
    • D
      net: ipmr: Remove nowait arg to ipmr_get_route · 9f09eaea
      David Ahern 提交于
      ipmr_get_route has 1 caller and the nowait arg is 0. Remove the arg and
      simplify ipmr_get_route accordingly.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f09eaea
  5. 08 1月, 2017 4 次提交
    • E
      net/mlx5: Introduce blue flame register allocator · a6d51b68
      Eli Cohen 提交于
      Here is an implementation of an allocator that allocates blue flame
      registers. A blue flame register is used for generating send doorbells.
      A blue flame register can be used to generate either a regular doorbell
      or a blue flame doorbell where the data to be sent is written to the
      device's I/O memory hence saving the need to read the data from memory.
      For blue flame kind of doorbells to succeed, the blue flame register
      need to be mapped as write combining. The user can specify what kind of
      send doorbells she wishes to use. If she requested write combining
      mapping but that failed, the allocator will fall back to non write
      combining mapping and will indicate that to the user.
      Subsequent patches in this series will make use of this allocator.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a6d51b68
    • E
      mlx5: Fix naming convention with respect to UARs · 2f5ff264
      Eli Cohen 提交于
      This establishes a solid naming conventions for UARs. A UAR (User Access
      Region) can have size identical to a system page or can be fixed 4KB
      depending on a value queried by firmware. Each UAR always has 4 blue
      flame register which are used to post doorbell to send queue. In
      addition, a UAR has section used for posting doorbells to CQs or EQs. In
      this patch we change names to reflect this conventions.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      2f5ff264
    • J
      mm: workingset: fix use-after-free in shadow node shrinker · ea07b862
      Johannes Weiner 提交于
      Several people report seeing warnings about inconsistent radix tree
      nodes followed by crashes in the workingset code, which all looked like
      use-after-free access from the shadow node shrinker.
      
      Dave Jones managed to reproduce the issue with a debug patch applied,
      which confirmed that the radix tree shrinking indeed frees shadow nodes
      while they are still linked to the shadow LRU:
      
        WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
        CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
        Call Trace:
           delete_node+0x1e4/0x200
           __radix_tree_delete_node+0xd/0x10
           shadow_lru_isolate+0xe6/0x220
           __list_lru_walk_one.isra.4+0x9b/0x190
           list_lru_walk_one+0x23/0x30
           scan_shadow_nodes+0x2e/0x40
           shrink_slab.part.44+0x23d/0x5d0
           shrink_node+0x22c/0x330
           kswapd+0x392/0x8f0
      
      This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
      inlined radix_tree_shrink().
      
      The problem is with 14b46879 ("mm: workingset: move shadow entry
      tracking to radix tree exceptional tracking"), which passes an update
      callback into the radix tree to link and unlink shadow leaf nodes when
      tree entries change, but forgot to pass the callback when reclaiming a
      shadow node.
      
      While the reclaimed shadow node itself is unlinked by the shrinker, its
      deletion from the tree can cause the left-most leaf node in the tree to
      be shrunk.  If that happens to be a shadow node as well, we don't unlink
      it from the LRU as we should.
      
      Consider this tree, where the s are shadow entries:
      
             root->rnode
                  |
             [0       n]
              |       |
           [s    ] [sssss]
      
      Now the shadow node shrinker reclaims the rightmost leaf node through
      the shadow node LRU:
      
             root->rnode
                  |
             [0        ]
              |
          [s     ]
      
      Because the parent of the deleted node is the first level below the
      root and has only one child in the left-most slot, the intermediate
      level is shrunk and the node containing the single shadow is put in
      its place:
      
             root->rnode
                  |
             [s        ]
      
      The shrinker again sees a single left-most slot in a first level node
      and thus decides to store the shadow in root->rnode directly and free
      the node - which is a leaf node on the shadow node LRU.
      
        root->rnode
             |
             s
      
      Without the update callback, the freed node remains on the shadow LRU,
      where it causes later shrinker runs to crash.
      
      Pass the node updater callback into __radix_tree_delete_node() in case
      the deletion causes the left-most branch in the tree to collapse too.
      
      Also add warnings when linked nodes are freed right away, rather than
      wait for the use-after-free when the list is scanned much later.
      
      Fixes: 14b46879 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
      Reported-by: NDave Chinner <david@fromorbit.com>
      Reported-by: NHugh Dickins <hughd@google.com>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-and-tested-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Leech <cleech@redhat.com>
      Cc: Lee Duncan <lduncan@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea07b862
    • K
      net: netcp: extract eflag from desc for rx_hook handling · 69d707d0
      Karicheri, Muralidharan 提交于
      Extract the eflag bits from the received desc and pass it down
      the rx_hook chain to be available for netcp modules. Also the
      psdata and epib data has to be inspected by the netcp modules.
      So the desc can be freed only after returning from the rx_hook.
      So move knav_pool_desc_put() after the rx_hook processing.
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69d707d0
  6. 07 1月, 2017 1 次提交
  7. 05 1月, 2017 3 次提交
    • A
      dsa: mv88e6xxx: Optimise atu_get · 59527581
      Andrew Lunn 提交于
      Lookup in the ATU can be performed starting from a given MAC
      address. This is faster than starting with the first possible MAC
      address and iterating all entries.
      
      Entries are returned in numeric order. So if the MAC address returned
      is bigger than what we are searching for, we know it is not in the
      ATU.
      
      Using the benchmark provided by Volodymyr Bendiuga
      <volodymyr.bendiuga@gmail.com>,
      
      https://www.spinics.net/lists/netdev/msg411550.html
      
      on an Marvell Armada 370 RD, the test to add a number of static fdb
      entries went from 1.616531 seconds to 0.312052 seconds.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59527581
    • P
      vfio-mdev: fix non-standard ioctl return val causing i386 build fail · c6ef7fd4
      Paul Gortmaker 提交于
      What appears to be a copy and paste error from the line above gets
      the ioctl a ssize_t return value instead of the traditional "int".
      
      The associated sample code used "long" which meant it would compile
      for x86-64 but not i386, with the latter failing as follows:
      
        CC [M]  samples/vfio-mdev/mtty.o
      samples/vfio-mdev/mtty.c:1418:20: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
        .ioctl          = mtty_ioctl,
                          ^
      samples/vfio-mdev/mtty.c:1418:20: note: (near initialization for ‘mdev_fops.ioctl’)
      cc1: some warnings being treated as errors
      
      Since in this case, vfio is working with struct file_operations; as such:
      
          long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
          long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
      
      ...and so here we just standardize on long vs. the normal int that user
      space typically sees and documents as per "man ioctl" and similar.
      
      Fixes: 9d1a546c ("docs: Sample driver to demonstrate how to use Mediated device framework.")
      Cc: Kirti Wankhede <kwankhede@nvidia.com>
      Cc: Neo Jia <cjia@nvidia.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      c6ef7fd4
    • Y
      scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr)) · 1ff8cebf
      yuan linyu 提交于
      sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned.
      remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and
      CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent.
      Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ff8cebf
  8. 03 1月, 2017 10 次提交
  9. 02 1月, 2017 2 次提交