1. 03 11月, 2015 31 次提交
    • A
      net/mlx5e: Avoid NULL pointer access in case of configuration failure · a1985740
      Achiad Shochat 提交于
      In case a configuration operation that involves closing and re-opening
      resources (e.g RX/TX queue size change) fails at the re-opening stage
      these resources will remain closed.
      So when executing (following) configuration operations (e.g ifconfig
      down) we cannot assume that these resources are available.
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1985740
    • J
      net/core: generic support for disabling netdev features down stack · fd867d51
      Jarod Wilson 提交于
      There are some netdev features, which when disabled on an upper device,
      such as a bonding master or a bridge, must be disabled and cannot be
      re-enabled on underlying devices.
      
      This is a rework of an earlier more heavy-handed appraoch, which simply
      disables and prevents re-enabling of netdev features listed in a new
      define in include/net/netdev_features.h, NETIF_F_UPPER_DISABLES. Any upper
      device that disables a flag in that feature mask, the disabling will
      propagate down the stack, and any lower device that has any upper device
      with one of those flags disabled should not be able to enable said flag.
      
      Initially, only LRO is included for proof of concept, and because this
      code effectively does the same thing as dev_disable_lro(), though it will
      also activate from the ethtool path, which was one of the goals here.
      
      [root@dell-per730-01 ~]# ethtool -k bond0 |grep large
      large-receive-offload: on
      [root@dell-per730-01 ~]# ethtool -k p5p1 |grep large
      large-receive-offload: on
      [root@dell-per730-01 ~]# ethtool -K bond0 lro off
      [root@dell-per730-01 ~]# ethtool -k bond0 |grep large
      large-receive-offload: off
      [root@dell-per730-01 ~]# ethtool -k p5p1 |grep large
      large-receive-offload: off
      
      dmesg dump:
      
      [ 1033.277986] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2.
      [ 1034.067949] bnx2x 0000:06:00.1 p5p2: using MSI-X  IRQs: sp 74  fp[0] 76 ... fp[7] 83
      [ 1034.753612] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1.
      [ 1035.591019] bnx2x 0000:06:00.0 p5p1: using MSI-X  IRQs: sp 62  fp[0] 64 ... fp[7] 71
      
      This has been successfully tested with bnx2x, qlcnic and netxen network
      cards as slaves in a bond interface. Turning LRO on or off on the master
      also turns it on or off on each of the slaves, new slaves are added with
      LRO in the same state as the master, and LRO can't be toggled on the
      slaves.
      
      Also, this should largely remove the need for dev_disable_lro(), and most,
      if not all, of its call sites can be replaced by simply making sure
      NETIF_F_LRO isn't included in the relevant device's feature flags.
      
      Note that this patch is driven by bug reports from users saying it was
      confusing that bonds and slaves had different settings for the same
      features, and while it won't be 100% in sync if a lower device doesn't
      support a feature like LRO, I think this is a good step in the right
      direction.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <gospo@cumulusnetworks.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Nikolay Aleksandrov <razor@blackwall.org>
      CC: Michal Kubecek <mkubecek@suse.cz>
      CC: Alexander Duyck <alexander.duyck@gmail.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd867d51
    • S
      sh_eth: fix typo in RX descriptor bit name · c238041f
      Sergei Shtylyov 提交于
      The correct name of the RX descriptor 0 bit 30 is RDLE (receive descriptor
      list end),  not  RDEL.
      Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c238041f
    • D
      Merge branch 'bonding-actor-updates' · a176ded3
      David S. Miller 提交于
      Mahesh Bandewar says:
      
      ====================
      re-org actor admin/oper key updates
      
      I was observing machines entering into weird LACP state when the
      partner is in passive mode. This issue is not because of the partners
      in passive state but probably because of some operational key update
      which is pushing the state-machine is that weird state. This was
      happening randomly on about 1% of the machine (when the sample size
      is a large set of machines with a variety of NICs/ports bonded).
      
      In this patch-series I'm attempting to unify the logic of actor-key
      / operational-key changes to one place to avoid possible errors in
      update. Also this eliminates the need for the event-handler to decide
      if the key needs update.
      
      After this patch-set none of the machines (from same sample set) were
      exhibiting LACP-weirdness that was observed earlier.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a176ded3
    • M
      bonding: simplify / unify event handling code for 3ad mode. · 52bc6716
      Mahesh Bandewar 提交于
      Old logic of updating state-machine is not required since
      ad_update_actor_keys() does it implicitly. The only loss is
      the notification differentiation between speed vs. duplex
      change. Now only one unified notification is printed.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52bc6716
    • M
      bonding: unify all places where actor-oper key needs to be updated. · 7bb11dc9
      Mahesh Bandewar 提交于
      actor_admin, and actor_oper key is changed at multiple locations in
      the code. This patch brings all those updates into one location in
      an attempt to avoid possible inconsistent updates causing LACP state
      machine to go in weird state.
      
      The unified place is ad_update_actor_key() with simple state-machine
      logic -
        (a) If port is "duplex" then only it can participate in LACP
        (b) Speed change reinitializes the LACP state-machine.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bb11dc9
    • M
      bonding: Simplify __get_duplex function. · b25c2e7d
      Mahesh Bandewar 提交于
      Eliminate 'else' clause by simply initializing variable
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b25c2e7d
    • D
      Merge branch 'bpf-persistent' · 12d43096
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      BPF updates
      
      This set adds support for persistent maps/progs. Please see
      individual patches for further details. A man-page update
      to bpf(2) will be sent later on, also a iproute2 patch for
      support in tc.
      
      v1 -> v2:
        - Reworked most of patch 4 and 5
        - Rebased to latest net-next
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12d43096
    • D
      bpf: add sample usages for persistent maps/progs · 42984d7c
      Daniel Borkmann 提交于
      This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN
      and BPF_OBJ_GET commands can be used.
      
      Example with maps:
      
        # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42
        bpf: map fd:3 (Success)
        bpf: pin ret:(0,Success)
        bpf: fd:3 u->(1:42) ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
        bpf: get fd:3 (Success)
        bpf: fd:3 l->(1):42 ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24
        bpf: get fd:3 (Success)
        bpf: fd:3 u->(1:24) ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
        bpf: get fd:3 (Success)
        bpf: fd:3 l->(1):24 ret:(0,Success)
      
        # ./fds_example -F /sys/fs/bpf/m2 -P -m
        bpf: map fd:3 (Success)
        bpf: pin ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1
        bpf: get fd:3 (Success)
        bpf: fd:3 l->(1):0 ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m2 -G -m
        bpf: get fd:3 (Success)
      
      Example with progs:
      
        # ./fds_example -F /sys/fs/bpf/p -P -p
        bpf: prog fd:3 (Success)
        bpf: pin ret:(0,Success)
        bpf sock:4 <- fd:3 attached ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/p -G -p
        bpf: get fd:3 (Success)
        bpf: sock:4 <- fd:3 attached ret:(0,Success)
      
        # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o
        bpf: prog fd:5 (Success)
        bpf: pin ret:(0,Success)
        bpf: sock:3 <- fd:5 attached ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/p2 -G -p
        bpf: get fd:3 (Success)
        bpf: sock:4 <- fd:3 attached ret:(0,Success)
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42984d7c
    • D
      bpf: add support for persistent maps/progs · b2197755
      Daniel Borkmann 提交于
      This work adds support for "persistent" eBPF maps/programs. The term
      "persistent" is to be understood that maps/programs have a facility
      that lets them survive process termination. This is desired by various
      eBPF subsystem users.
      
      Just to name one example: tc classifier/action. Whenever tc parses
      the ELF object, extracts and loads maps/progs into the kernel, these
      file descriptors will be out of reach after the tc instance exits.
      So a subsequent tc invocation won't be able to access/relocate on this
      resource, and therefore maps cannot easily be shared, f.e. between the
      ingress and egress networking data path.
      
      The current workaround is that Unix domain sockets (UDS) need to be
      instrumented in order to pass the created eBPF map/program file
      descriptors to a third party management daemon through UDS' socket
      passing facility. This makes it a bit complicated to deploy shared
      eBPF maps or programs (programs f.e. for tail calls) among various
      processes.
      
      We've been brainstorming on how we could tackle this issue and various
      approches have been tried out so far, which can be read up further in
      the below reference.
      
      The architecture we eventually ended up with is a minimal file system
      that can hold map/prog objects. The file system is a per mount namespace
      singleton, and the default mount point is /sys/fs/bpf/. Any subsequent
      mounts within a given namespace will point to the same instance. The
      file system allows for creating a user-defined directory structure.
      The objects for maps/progs are created/fetched through bpf(2) with
      two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor
      along with a pathname is being passed to bpf(2) that in turn creates
      (we call it eBPF object pinning) the file system nodes. Only the pathname
      is being passed to bpf(2) for getting a new BPF file descriptor to an
      existing node. The user can use that to access maps and progs later on,
      through bpf(2). Removal of file system nodes is being managed through
      normal VFS functions such as unlink(2), etc. The file system code is
      kept to a very minimum and can be further extended later on.
      
      The next step I'm working on is to add dump eBPF map/prog commands
      to bpf(2), so that a specification from a given file descriptor can
      be retrieved. This can be used by things like CRIU but also applications
      can inspect the meta data after calling BPF_OBJ_GET.
      
      Big thanks also to Alexei and Hannes who significantly contributed
      in the design discussion that eventually let us end up with this
      architecture here.
      
      Reference: https://lkml.org/lkml/2015/10/15/925Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2197755
    • D
      bpf: consolidate bpf_prog_put{, _rcu} dismantle paths · e9d8afa9
      Daniel Borkmann 提交于
      We currently have duplicated cleanup code in bpf_prog_put() and
      bpf_prog_put_rcu() cleanup paths. Back then we decided that it was
      not worth it to make it a common helper called by both, but with
      the recent addition of resource charging, we could have avoided
      the fix in commit ac00737f ("bpf: Need to call bpf_prog_uncharge_memlock
      from bpf_prog_put") if we would have had only a single, common path.
      We can simplify it further by assigning aux->prog only once during
      allocation time.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9d8afa9
    • D
      bpf: align and clean bpf_{map,prog}_get helpers · c2101297
      Daniel Borkmann 提交于
      Add a bpf_map_get() function that we're going to use later on and
      align/clean the remaining helpers a bit so that we have them a bit
      more consistent:
      
        - __bpf_map_get() and __bpf_prog_get() that both work on the fd
          struct, check whether the descriptor is eBPF and return the
          pointer to the map/prog stored in the private data.
      
          Also, we can return f.file->private_data directly, the function
          signature is enough of a documentation already.
      
        - bpf_map_get() and bpf_prog_get() that both work on u32 user fd,
          call their respective __bpf_map_get()/__bpf_prog_get() variants,
          and take a reference.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2101297
    • D
      bpf: abstract anon_inode_getfd invocations · aa79781b
      Daniel Borkmann 提交于
      Since we're going to use anon_inode_getfd() invocations in more than just
      the current places, make a helper function for both, so that we only need
      to pass a map/prog pointer to the helper itself in order to get a fd. The
      new helpers are called bpf_map_new_fd() and bpf_prog_new_fd().
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa79781b
    • E
      net: fix percpu memory leaks · 1d6119ba
      Eric Dumazet 提交于
      This patch fixes following problems :
      
      1) percpu_counter_init() can return an error, therefore
        init_frag_mem_limit() must propagate this error so that
        inet_frags_init_net() can do the same up to its callers.
      
      2) If ip[46]_frags_ns_ctl_register() fail, we must unwind
         properly and free the percpu_counter.
      
      Without this fix, we leave freed object in percpu_counters
      global list (if CONFIG_HOTPLUG_CPU) leading to crashes.
      
      This bug was detected by KASAN and syzkaller tool
      (http://github.com/google/syzkaller)
      
      Fixes: 6d7b857d ("net: use lib/percpu_counter API for fragmentation mem accounting")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d6119ba
    • S
      ravb: use pdev rather than ndev for error messages · c4511132
      Simon Horman 提交于
      This corrects what appear to be typos, making the code consistent with
      itself, and allowing meaningful prefixes to be displayed with the errors in
      question.
      
      Before:
       (null): failed to initialize MDIO
       (null): Cannot allocate desc base address table (size 176 bytes)
      
      After:
      ravb e6800000.ethernet: failed to initialize MDIO
      ravb e6800000.ethernet: Cannot allocate desc base address table (size 176 bytes)
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4511132
    • E
      net: make skb_set_owner_w() more robust · 9e17f8a4
      Eric Dumazet 提交于
      skb_set_owner_w() is called from various places that assume
      skb->sk always point to a full blown socket (as it changes
      sk->sk_wmem_alloc)
      
      We'd like to attach skb to request sockets, and in the future
      to timewait sockets as well. For these kind of pseudo sockets,
      we need to take a traditional refcount and use sock_edemux()
      as the destructor.
      
      It is now time to un-inline skb_set_owner_w(), being too big.
      
      Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Bisected-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e17f8a4
    • I
      bridge: vlan: Use rcu_dereference instead of rtnl_dereference · eca1e006
      Ido Schimmel 提交于
      br_should_learn() is protected by RCU and not by RTNL, so use correct
      flavor of nbp_vlan_group().
      
      Fixes: 907b1e6e ("bridge: vlan: use proper rcu for the vlgrp
      member")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eca1e006
    • V
      net: dsa: mv88e6xxx: lookup switch name · b9b37713
      Vivien Didelot 提交于
      All the mv88e6xxx drivers use the exact same code in their probe
      function to lookup the switch name given its ID. Thus introduce a
      mv88e6xxx_switch_id structure and a mv88e6xxx_lookup_name function in
      the common mv88e6xxx code.
      
      In the meantime make __mv88e6xxx_reg_{read,write} static since we do not
      need to expose these low-level r/w routines anymore.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9b37713
    • V
      net: dsa: mv88e6xxx: assert SMI lock · 3996a4ff
      Vivien Didelot 提交于
      It's easy to forget to lock the smi_mutex before calling the low-level
      _mv88e6xxx_reg_{read,write}, so add a assert_smi_lock function in them.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3996a4ff
    • Y
      bpf: convert hashtab lock to raw lock · ac00881f
      Yang Shi 提交于
      When running bpf samples on rt kernel, it reports the below warning:
      
      BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
      in_atomic(): 1, irqs_disabled(): 128, pid: 477, name: ping
      Preemption disabled at:[<ffff80000017db58>] kprobe_perf_func+0x30/0x228
      
      CPU: 3 PID: 477 Comm: ping Not tainted 4.1.10-rt8 #4
      Hardware name: Freescale Layerscape 2085a RDB Board (DT)
      Call trace:
      [<ffff80000008a5b0>] dump_backtrace+0x0/0x128
      [<ffff80000008a6f8>] show_stack+0x20/0x30
      [<ffff8000007da90c>] dump_stack+0x7c/0xa0
      [<ffff8000000e4830>] ___might_sleep+0x188/0x1a0
      [<ffff8000007e2200>] rt_spin_lock+0x28/0x40
      [<ffff80000018bf9c>] htab_map_update_elem+0x124/0x320
      [<ffff80000018c718>] bpf_map_update_elem+0x40/0x58
      [<ffff800000187658>] __bpf_prog_run+0xd48/0x1640
      [<ffff80000017ca6c>] trace_call_bpf+0x8c/0x100
      [<ffff80000017db58>] kprobe_perf_func+0x30/0x228
      [<ffff80000017dd84>] kprobe_dispatcher+0x34/0x58
      [<ffff8000007e399c>] kprobe_handler+0x114/0x250
      [<ffff8000007e3bf4>] kprobe_breakpoint_handler+0x1c/0x30
      [<ffff800000085b80>] brk_handler+0x88/0x98
      [<ffff8000000822f0>] do_debug_exception+0x50/0xb8
      Exception stack(0xffff808349687460 to 0xffff808349687580)
      7460: 4ca2b600 ffff8083 4a3a7000 ffff8083 49687620 ffff8083 0069c5f8 ffff8000
      7480: 00000001 00000000 007e0628 ffff8000 496874b0 ffff8083 007e1de8 ffff8000
      74a0: 496874d0 ffff8083 0008e04c ffff8000 00000001 00000000 4ca2b600 ffff8083
      74c0: 00ba2e80 ffff8000 49687528 ffff8083 49687510 ffff8083 000e5c70 ffff8000
      74e0: 00c22348 ffff8000 00000000 ffff8083 49687510 ffff8083 000e5c74 ffff8000
      7500: 4ca2b600 ffff8083 49401800 ffff8083 00000001 00000000 00000000 00000000
      7520: 496874d0 ffff8083 00000000 00000000 00000000 00000000 00000000 00000000
      7540: 2f2e2d2c 33323130 00000000 00000000 4c944500 ffff8083 00000000 00000000
      7560: 00000000 00000000 008751e0 ffff8000 00000001 00000000 124e2d1d 00107b77
      
      Convert hashtab lock to raw lock to avoid such warning.
      Signed-off-by: NYang Shi <yang.shi@linaro.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac00881f
    • D
      Merge branch 'bridge_vlan_fixes' · 21086b99
      David S. Miller 提交于
      Nikolay Aleksandrov says:
      
      ====================
      bridge: vlan: failure path and comment fixes
      
      This is a set from Ido which takes care of one failure path error in
      nbp_vlan_init (patch 1) and a few comment errors (patch 2).
      I must admit I didn't expect the port init continues after a vlan init
      failure but should've checked to make sure. Thanks to Ido for catching
      these!
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21086b99
    • I
      bridge: vlan: Use correct flag name in comment · ddd611d3
      Ido Schimmel 提交于
      The flag used to indicate if a VLAN should be used for filtering - as
      opposed to context only - on the bridge itself (e.g. br0) is called
      'brentry' and not 'brvlan'.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddd611d3
    • I
      bridge: vlan: Prevent possible use-after-free · 07bc588f
      Ido Schimmel 提交于
      When adding a port to a bridge we initialize VLAN filtering on it. We do
      not bail out in case an error occurred in nbp_vlan_init, as it can be
      used as a non VLAN filtering bridge.
      
      However, if VLAN filtering is required and an error occurred in
      nbp_vlan_init, we should set vlgrp to NULL, so that VLAN filtering
      functions (e.g. br_vlan_find, br_get_pvid) will know the struct is
      invalid and will not try to access it.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07bc588f
    • E
      tcp/dccp: fix ireq->pktopts race · ce105008
      Eric Dumazet 提交于
      IPv6 request sockets store a pointer to skb containing the SYN packet
      to be able to transfer it to full blown socket when 3WHS is done
      (ireq->pktopts -> np->pktoptions)
      
      As explained in commit 5e0724d0 ("tcp/dccp: fix hashdance race for
      passive sessions"), we must transfer the skb only if we won the
      hashdance race, if multiple cpus receive the 'ack' packet completing
      3WHS at the same time.
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce105008
    • S
      RDS: convert bind hash table to re-sizable hashtable · 7b565434
      santosh.shilimkar@oracle.com 提交于
      To further improve the RDS connection scalabilty on massive systems
      where number of sockets grows into tens of thousands  of sockets, there
      is a need of larger bind hashtable. Pre-allocated 8K or 16K table is
      not very flexible in terms of memory utilisation. The rhashtable
      infrastructure gives us the flexibility to grow the hashtbable based
      on use and also comes up with inbuilt efficient bucket(chain) handling.
      Reviewed-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b565434
    • S
      net: rds: changing the return type from int to void · d3ffaefa
      Saurabh Sengar 提交于
      as result of function rds_iw_flush_mr_pool is nowhere checked,
      changing its return type from int to void.
      also removing the unused variable rc as there is nothing to return
      Signed-off-by: NSaurabh Sengar <saurabh.truth@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3ffaefa
    • D
      Merge branch 'encx24j600-fixes' · 64cf3708
      David S. Miller 提交于
      Javier Martinez Canillas says:
      
      ====================
      net: encx24j600: Fix SPI driver module autoload
      
      Recently I've been trying to fix module autoloading for all SPI drivers and
      found that the encx24j600 driver does not fill module alias information due
      missing a MODULE_DEVICE_TABLE() so module autload won't work and the driver
      Kconfig symbol is tristate which means the driver can be built as a module.
      
      But also the SPI id table is not correctly defined so this series fixes both
      issues.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64cf3708
    • J
      net: encx24j600: Export missing SPI module alias information · 07f56c61
      Javier Martinez Canillas 提交于
      The driver Kconfig symbol is tristate which means that it can be built as
      a module but the module alias information is not added to the module info
      so module autoload won't work since user-space won't have the information.
      Signed-off-by: NJavier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07f56c61
    • J
      net: encx24j600: Fix SPI id table definition · d0cb48cd
      Javier Martinez Canillas 提交于
      A driver's SPI id table is expected to be an array of struct spi_device_id
      that ends with a zero-initialized sentinel entry. But this driver defines
      the table as a single struct spi_device_id and sets .id_table to a pointer
      to this struct.
      
      But spi_match_id() has a loop that iterates while the struct spi_device_id
      .name[0] is not NULL, so not having a sentinel can cause a NULL pointer
      deference error.
      
      This patch defines the SPI id table correctly as all other SPI drivers do.
      Signed-off-by: NJavier Martinez Canillas <javier@osg.samsung.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0cb48cd
    • G
      enic: assign affinity hint to interrupts · 322cf7e3
      Govindarajulu Varadarajan 提交于
      The affinity hint is used by the user space daemon, irqbalancer, to
      indicate a preferred CPU mask for irqs. This patch sets the irq affinity
      hint to local numa core first, when exausted we try non-local numa cores.
      
      Also set tx xps cpus mask bassed on affinity hint.
      
      v2: remove the global affinity policy.
      Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      322cf7e3
    • P
      ipv4: use l4 hash for locally generated multipath flows · 9920e48b
      Paolo Abeni 提交于
      This patch changes how the multipath hash is computed for locally
      generated flows: now the hash comprises l4 information.
      
      This allows better utilization of the available paths when the existing
      flows have the same source IP and the same destination IP: with l3 hash,
      even when multiple connections are in place simultaneously, a single path
      will be used, while with l4 hash we can use all the available paths.
      
      v2 changes:
      - use get_hash_from_flowi4() instead of implementing just another l4 hash
        function
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9920e48b
  2. 02 11月, 2015 9 次提交