1. 04 12月, 2015 18 次提交
    • J
      i40e/i40evf: avoid mutex re-init · 8ddb3326
      Jesse Brandeburg 提交于
      If the driver were to happen to have a mutex held while
      the i40e_init_adminq call was called, the init_adminq might
      inadvertently call mutex_init on a lock that was held
      which is a violation of the calling semantics.
      
      Fix this by avoiding adminq.c code allocating/freeing this memory, and
      then do the same work only once in probe/remove.
      
      Testing Hints (Required if no HSD): for VF, load i40evf in bare metal
      and echo 32 > sriov_numvfs; echo 0 > sriov_numvfs in a loop.  Yes this
      is a horrible thing to do.
      
      Change-ID: Ida263c51b34e195252179e7e5e400d73a99be7a2
      Reported-by: NStefan Assmann <sassmann@redhat.com>
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8ddb3326
    • J
      MAINTAINERS: Update Intel Wired LAN reviewers · 6e80a18c
      Jeff Kirsher 提交于
      Since Matthew has moved on to other pastures and no longer works
      for Intel, remove him from the list of reviewers and add Bruce
      Allan as his replacement.
      
      CC: Bruce Allan <bruce.w.allan@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      6e80a18c
    • J
      e100.txt: Cleanup license info in kernel doc · a3fb6568
      Jeff Kirsher 提交于
      Apparently the e100.txt document contained a "License" section left
      over from days of old, which does not need to be in the kernel
      documentation.  So clean it up..
      
      CC: John Ronciak <john.ronciak@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      a3fb6568
    • A
      ixgbe: Reset interface after enabling SR-IOV · bf4d67d9
      Alexander Duyck 提交于
      Enabling SR-IOV and then bringing the interface up was resulting in the PF
      MAC addresses getting into a bad state.  Specifically the MAC address was
      enabled for both VF 0 and the PF.  This resulted in some odd behaviors such
      as VF 0 receiving a copy of the PFs traffic, which in turn enables the
      ability for VF 0 to spoof the PF.
      
      A workaround for this issue appears to be to bring up the interface first
      and then enable SR-IOV as this way the reset is then triggered in the
      existing code.
      
      In order to correct this I have added a change to ixgbe_setup_tc where if
      the interface is down we still will at least call ixgbe_reset so that the
      MAC addresses for the device are reset to the correct pools.
      
      Steps to reproduce issue:
      modprobe ixgbe
      echo 7 > /sys/bus/pci/devices/0000\:01\:00.1/sriov_numvfs
      ifconfig enp1s0f1 up
      ethregs -s 1:00.1 | grep MPSAR | grep -v 00000000
      
      Result:
      	MPSAR[0]               00000081
      	MPSAR[254]             00000001
      
      Expected Result, behavior after patch:
      	MPSAR[0]               00000080
      	MPSAR[254]             00000080
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Tested-by: NDarin Miller <darin.j.miller@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      bf4d67d9
    • J
      net: phy: reset only targeted phy · cf18b778
      Jérôme Pouiller 提交于
      It is possible to address another chip on same MDIO bus. The case is
      correctly handled for media advertising. It is taken into account
      only if mii_data->phy_id == phydev->addr. However, this condition
      was missing for reset case.
      Signed-off-by: NJérôme Pouiller <jezz@sysmic.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf18b778
    • D
      Merge branch 'bnxt_en-fixes' · c5ba5c8a
      David S. Miller 提交于
      Michael Chan says:
      
      ====================
      bnxt_en: set mac address and uc_list bug fixes.
      
      Fix ndo_set_mac_address() for PF and VF.
      Re-apply uc_list after chip reset.
      
      v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5ba5c8a
    • M
      bnxt_en: Setup uc_list mac filters after resetting the chip. · b664f008
      Michael Chan 提交于
      Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
      mc_list mac address filters.  Before the patch, uc_list is not
      setup again after chip reset (such as ethtool ring size change)
      and macvlans don't work any more after that.
      
      Modify bnxt_cfg_rx_mode() to return error codes appropriately so
      that the init chip sequence can detect any failures.
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b664f008
    • J
      bnxt_en: enforce proper storing of MAC address · bdd4347b
      Jeffrey Huang 提交于
      For PF, the bp->pf.mac_addr always holds the permanent MAC
      addr assigned by the HW.  For VF, the bp->vf.mac_addr always
      holds the administrator assigned VF MAC addr. The random
      generated VF MAC addr should never get stored to bp->vf.mac_addr.
      This way, when the VF wants to change the MAC address, we can tell
      if the adminstrator has already set it and disallow the VF from
      changing it.
      
      v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
      Signed-off-by: NJeffrey Huang <huangjw@broadcom.com>
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdd4347b
    • J
      bnxt_en: Fixed incorrect implementation of ndo_set_mac_address · 1fc2cfd0
      Jeffrey Huang 提交于
      The existing ndo_set_mac_address only copies the new MAC addr
      and didn't set the new MAC addr to the HW. The correct way is
      to delete the existing default MAC filter from HW and add
      the new one. Because of RFS filters are also dependent on the
      default mac filter l2 context, the driver must go thru
      close_nic() to delete the default MAC and RFS filters, then
      open_nic() to set the default MAC address to HW.
      Signed-off-by: NJeffrey Huang <huangjw@broadcom.com>
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fc2cfd0
    • V
      net: lpc_eth: remove irq > NR_IRQS check from probe() · 39198ec9
      Vladimir Zapolskiy 提交于
      If the driver is used on an ARM platform with SPARSE_IRQ defined,
      semantics of NR_IRQS is different (minimal value of virtual irqs) and
      by default it is set to 16, see arch/arm/include/asm/irq.h.
      
      This value may be less than the actual number of virtual irqs, which
      may break the driver initialization. The check removal allows to use
      the driver on such a platform, and, if irq controller driver works
      correctly, the check is not needed on legacy platforms.
      
      Fixes a runtime problem:
      
          lpc-eth 31060000.ethernet: error getting resources.
          lpc_eth: lpc-eth: not found (-6).
      Signed-off-by: NVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39198ec9
    • E
      net_sched: fix qdisc_tree_decrease_qlen() races · 4eaf3b84
      Eric Dumazet 提交于
      qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
      devices.
      
      One problem is that it updates sch->q.qlen and sch->qstats.drops
      on the mq/mqprio root qdisc, while it should not : Daniele
      reported underflows errors :
      [  681.774821] PAX: sch->q.qlen: 0 n: 1
      [  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
      [  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
      [  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
      [  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
      [  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
      [  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
      [  681.774962] Call Trace:
      [  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
      [  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
      [  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
      [  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
      [  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
      [  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
      [  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
      [  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
      [  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
      [  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
      [  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
      [  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
      [  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
      [  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
      
      mq/mqprio have their own ways to report qlen/drops by folding stats on
      all their queues, with appropriate locking.
      
      A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
      without proper locking : concurrent qdisc updates could corrupt the list
      that qdisc_match_from_root() parses to find a qdisc given its handle.
      
      Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
      qdisc_tree_decrease_qlen() can use to abort its tree traversal,
      as soon as it meets a mq/mqprio qdisc children.
      
      Second problem can be fixed by RCU protection.
      Qdisc are already freed after RCU grace period, so qdisc_list_add() and
      qdisc_list_del() simply have to use appropriate rcu list variants.
      
      A future patch will add a per struct netdev_queue list anchor, so that
      qdisc_tree_decrease_qlen() can have more efficient lookups.
      Reported-by: NDaniele Fucini <dfucini@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <cwang@twopensource.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eaf3b84
    • P
      openvswitch: fix hangup on vxlan/gre/geneve device deletion · 13175303
      Paolo Abeni 提交于
      Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
      to the underlying tunnel device, but never released it when such
      device is deleted.
      Deleting the underlying device via the ip tool cause the kernel to
      hangup in the netdev_wait_allrefs() loop.
      This commit ensure that on device unregistration dp_detach_port_notify()
      is called for all vports that hold the device reference, properly
      releasing it.
      
      Fixes: 614732ea ("openvswitch: Use regular VXLAN net_device device")
      Fixes: b2acd1dc ("openvswitch: Use regular GRE net_device instead of vport")
      Fixes: 6b001e68 ("openvswitch: Use Geneve device.")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NFlavio Leitner <fbl@sysclose.org>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13175303
    • A
      ipv4: igmp: Allow removing groups from a removed interface · 4eba7bb1
      Andrew Lunn 提交于
      When a multicast group is joined on a socket, a struct ip_mc_socklist
      is appended to the sockets mc_list containing information about the
      joined group.
      
      If the interface is hot unplugged, this entry becomes stale. Prior to
      commit 52ad353a ("igmp: fix the problem when mc leave group") it
      was possible to remove the stale entry by performing a
      IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
      the interface. However, this fix enforces that the interface must
      still exist. Thus with time, the number of stale entries grows, until
      sysctl_igmp_max_memberships is reached and then it is not possible to
      join and more groups.
      
      The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
      performed without specifying the interface, either by ifindex or ip
      address. However here we do supply one of these. So loosen the
      restriction on device existence to only apply when the interface has
      not been specified. This then restores the ability to clean up the
      stale entries.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Fixes: 52ad353a "(igmp: fix the problem when mc leave group")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eba7bb1
    • E
      ipv6: sctp: implement sctp_v6_destroy_sock() · 602dd62d
      Eric Dumazet 提交于
      Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.
      
      We need to call inet6_destroy_sock() to properly release
      inet6 specific fields.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      602dd62d
    • D
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 79aecc72
      David S. Miller 提交于
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth 2015-12-01
      
      Here's a Bluetooth fix for the 4.4-rc series that fixes a memory leak of
      the Security Manager L2CAP channel that'll happen for every LE
      connection.
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79aecc72
    • Y
      arm64: bpf: add 'store immediate' instruction · df849ba3
      Yang Shi 提交于
      aarch64 doesn't have native store immediate instruction, such operation
      has to be implemented by the below instruction sequence:
      
      Load immediate to register
      Store register
      Signed-off-by: NYang Shi <yang.shi@linaro.org>
      CC: Zi Shen Lim <zlim.lnx@gmail.com>
      CC: Xi Wang <xi.wang@gmail.com>
      Reviewed-by: NZi Shen Lim <zlim.lnx@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df849ba3
    • E
      ipv6: kill sk_dst_lock · 6bd4f355
      Eric Dumazet 提交于
      While testing the np->opt RCU conversion, I found that UDP/IPv6 was
      using a mixture of xchg() and sk_dst_lock to protect concurrent changes
      to sk->sk_dst_cache, leading to possible corruptions and crashes.
      
      ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
      way to fix the mess is to remove sk_dst_lock completely, as we did for
      IPv4.
      
      __ip6_dst_store() and ip6_dst_store() share same implementation.
      
      sk_setup_caps() being called with socket lock being held or not,
      we have to use sk_dst_set() instead of __sk_dst_set()
      
      Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
      in ip6_dst_store() before the sk_setup_caps(sk, dst) call.
      
      This is because ip6_dst_store() can be called from process context,
      without any lock held.
      
      As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
      from another cpu doing a concurrent ip6_dst_store()
      
      Doing the dst dereference before doing the install is needed to make
      sure no use after free would trigger.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bd4f355
    • E
      ipv6: sctp: add rcu protection around np->opt · c836a8ba
      Eric Dumazet 提交于
      This patch completes the work I did in commit 45f6fad8
      ("ipv6: add complete rcu protection around np->opt"), as I missed
      sctp part.
      
      This simply makes sure np->opt is used with proper RCU locking
      and accessors.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c836a8ba
  2. 03 12月, 2015 19 次提交
  3. 02 12月, 2015 3 次提交
    • D
      bpf, array: fix heap out-of-bounds access when updating elements · fbca9d2d
      Daniel Borkmann 提交于
      During own review but also reported by Dmitry's syzkaller [1] it has been
      noticed that we trigger a heap out-of-bounds access on eBPF array maps
      when updating elements. This happens with each map whose map->value_size
      (specified during map creation time) is not multiple of 8 bytes.
      
      In array_map_alloc(), elem_size is round_up(attr->value_size, 8) and
      used to align array map slots for faster access. However, in function
      array_map_update_elem(), we update the element as ...
      
      memcpy(array->value + array->elem_size * index, value, array->elem_size);
      
      ... where we access 'value' out-of-bounds, since it was allocated from
      map_update_elem() from syscall side as kmalloc(map->value_size, GFP_USER)
      and later on copied through copy_from_user(value, uvalue, map->value_size).
      Thus, up to 7 bytes, we can access out-of-bounds.
      
      Same could happen from within an eBPF program, where in worst case we
      access beyond an eBPF program's designated stack.
      
      Since 1be7f75d ("bpf: enable non-root eBPF programs") didn't hit an
      official release yet, it only affects priviledged users.
      
      In case of array_map_lookup_elem(), the verifier prevents eBPF programs
      from accessing beyond map->value_size through check_map_access(). Also
      from syscall side map_lookup_elem() only copies map->value_size back to
      user, so nothing could leak.
      
        [1] http://github.com/google/syzkaller
      
      Fixes: 28fbcfa0 ("bpf: add array type of eBPF maps")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbca9d2d
    • E
      net: fix sock_wake_async() rcu protection · ceb5d58b
      Eric Dumazet 提交于
      Dmitry provided a syzkaller (http://github.com/google/syzkaller)
      triggering a fault in sock_wake_async() when async IO is requested.
      
      Said program stressed af_unix sockets, but the issue is generic
      and should be addressed in core networking stack.
      
      The problem is that by the time sock_wake_async() is called,
      we should not access the @flags field of 'struct socket',
      as the inode containing this socket might be freed without
      further notice, and without RCU grace period.
      
      We already maintain an RCU protected structure, "struct socket_wq"
      so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
      is the safe route.
      
      It also reduces number of cache lines needing dirtying, so might
      provide a performance improvement anyway.
      
      In followup patches, we might move remaining flags (SOCK_NOSPACE,
      SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
      being mostly read and let it being shared between cpus.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ceb5d58b
    • E
      net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA · 9cd3e072
      Eric Dumazet 提交于
      This patch is a cleanup to make following patch easier to
      review.
      
      Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
      from (struct socket)->flags to a (struct socket_wq)->flags
      to benefit from RCU protection in sock_wake_async()
      
      To ease backports, we rename both constants.
      
      Two new helpers, sk_set_bit(int nr, struct sock *sk)
      and sk_clear_bit(int net, struct sock *sk) are added so that
      following patch can change their implementation.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cd3e072