1. 10 9月, 2015 26 次提交
    • D
      netlink, mmap: fix edge-case leakages in nf queue zero-copy · 6bb0fef4
      Daniel Borkmann 提交于
      When netlink mmap on receive side is the consumer of nf queue data,
      it can happen that in some edge cases, we write skb shared info into
      the user space mmap buffer:
      
      Assume a possible rx ring frame size of only 4096, and the network skb,
      which is being zero-copied into the netlink skb, contains page frags
      with an overall skb->len larger than the linear part of the netlink
      skb.
      
      skb_zerocopy(), which is generic and thus not aware of the fact that
      shared info cannot be accessed for such skbs then tries to write and
      fill frags, thus leaking kernel data/pointers and in some corner cases
      possibly writing out of bounds of the mmap area (when filling the
      last slot in the ring buffer this way).
      
      I.e. the ring buffer slot is then of status NL_MMAP_STATUS_VALID, has
      an advertised length larger than 4096, where the linear part is visible
      at the slot beginning, and the leaked sizeof(struct skb_shared_info)
      has been written to the beginning of the next slot (also corrupting
      the struct nl_mmap_hdr slot header incl. status etc), since skb->end
      points to skb->data + ring->frame_size - NL_MMAP_HDRLEN.
      
      The fix adds and lets __netlink_alloc_skb() take the actual needed
      linear room for the network skb + meta data into account. It's completely
      irrelevant for non-mmaped netlink sockets, but in case mmap sockets
      are used, it can be decided whether the available skb_tailroom() is
      really large enough for the buffer, or whether it needs to internally
      fallback to a normal alloc_skb().
      
      >From nf queue side, the information whether the destination port is
      an mmap RX ring is not really available without extra port-to-socket
      lookup, thus it can only be determined in lower layers i.e. when
      __netlink_alloc_skb() is called that checks internally for this. I
      chose to add the extra ldiff parameter as mmap will then still work:
      We have data_len and hlen in nfqnl_build_packet_message(), data_len
      is the full length (capped at queue->copy_range) for skb_zerocopy()
      and hlen some possible part of data_len that needs to be copied; the
      rem_len variable indicates the needed remaining linear mmap space.
      
      The only other workaround in nf queue internally would be after
      allocation time by f.e. cap'ing the data_len to the skb_tailroom()
      iff we deal with an mmap skb, but that would 1) expose the fact that
      we use a mmap skb to upper layers, and 2) trim the skb where we
      otherwise could just have moved the full skb into the normal receive
      queue.
      
      After the patch, in my test case the ring slot doesn't fit and therefore
      shows NL_MMAP_STATUS_COPY, where a full skb carries all the data and
      thus needs to be picked up via recv().
      
      Fixes: 3ab1f683 ("nfnetlink: add support for memory mapped netlink")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bb0fef4
    • D
      netlink, mmap: don't walk rx ring on poll if receive queue non-empty · a66e3656
      Daniel Borkmann 提交于
      In case of netlink mmap, there can be situations where received frames
      have to be placed into the normal receive queue. The ring buffer indicates
      this through NL_MMAP_STATUS_COPY, so the user is asked to pick them up
      via recvmsg(2) syscall, and to put the slot back to NL_MMAP_STATUS_UNUSED.
      
      Commit 0ef70770 ("netlink: rx mmap: fix POLLIN condition") changed
      polling, so that we walk in the worst case the whole ring through the
      new netlink_has_valid_frame(), for example, when the ring would have no
      NL_MMAP_STATUS_VALID, but at least one NL_MMAP_STATUS_COPY frame.
      
      Since we do a datagram_poll() already earlier to pick up a mask that could
      possibly contain POLLIN | POLLRDNORM already (due to NL_MMAP_STATUS_COPY),
      we can skip checking the rx ring entirely.
      
      In case the kernel is compiled with !CONFIG_NETLINK_MMAP, then all this is
      irrelevant anyway as netlink_poll() is just defined as datagram_poll().
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a66e3656
    • H
      cxgb4: changes for new firmware 1.14.4.0 · f2be053c
      Hariprasad Shenai 提交于
      Incorporate fw_ldst_cmd structure change for new firmware and also
      update version string for the same
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2be053c
    • N
      net: fec: add netif status check before set mac address · 9638d19e
      Nimrod Andy 提交于
      There exist one issue by below case that case system hang:
      ifconfig eth0 down
      ifconfig eth0 hw ether 00:10:19:19:81:19
      
      After eth0 down, all fec clocks are gated off. In the .fec_set_mac_address()
      function, it will set new MAC address to registers, which causes system hang.
      
      So it needs to add netif status check to avoid registers access when clocks are
      gated off. Until eth0 up the new MAC address are wrote into related registers.
      
      V2:
      As Lucas Stach's suggestion, add a comment in the code to explain why it needed.
      
      CC: Lucas Stach <l.stach@pengutronix.de>
      CC: Florian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9638d19e
    • D
      Merge branch 'r8152-autoresume' · 1d68c286
      David S. Miller 提交于
      Hayes Wang says:
      
      ====================
      r8152: fix the autoresume may fail
      
      Fix the autosuspend issues which occur about linking change.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d68c286
    • H
      r8152: fix the runtime suspend issues · 2dd49e0f
      hayeswang 提交于
      Fix the runtime suspend issues result from the linking change.
      
      Case 1:
      a) link down occurs.
      b) driver disable tx/rx.
      c) autosuspend occurs.
      d) hw linking up.
      e) device suspends without enabling tx/rx.
      f) couldn't wake up when receiving packets.
      
      Case 2:
      a) Nway results in linking down.
      b) autosuspend occurs.
      c) device suspends.
      d) device may not wake up when linking up.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2dd49e0f
    • H
      r8152: split DRIVER_VERSION · d0942473
      hayeswang 提交于
      Split DRIVER_VERSION into NETNEXT_VERSION and NET_VERSION. Then,
      according to the value of DRIVER_VERSION, we could know which
      patches are used generally without comparing the source code.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0942473
    • W
      ipv6: fix ifnullfree.cocci warnings · 52fe51f8
      Wu Fengguang 提交于
      net/ipv6/route.c:2946:3-8: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.
      
       NULL check before some freeing functions is not needed.
      
       Based on checkpatch warning
       "kfree(NULL) is safe this check is probably not required"
       and kfreeaddr.cocci by Julia Lawall.
      
      Generated by: scripts/coccinelle/free/ifnullfree.cocci
      
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52fe51f8
    • W
      add microchip LAN88xx phy driver · 792aec47
      Woojung.Huh@microchip.com 提交于
      Add Microchip LAN88XX phy driver for phylib.
      Signed-off-by: NWoojung Huh <woojung.huh@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      792aec47
    • A
      stmmac: fix check for phydev being open · dfc50fca
      Alexey Brodkin 提交于
      Current check of phydev with IS_ERR(phydev) may make not much sense
      because of_phy_connect() returns NULL on failure instead of error value.
      
      Still for checking result of phy_connect() IS_ERR() makes perfect sense.
      
      So let's use combined check IS_ERR_OR_NULL() that covers both cases.
      
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfc50fca
    • R
      net: qlcnic: delete redundant memsets · 1f0ca208
      Rasmus Villemoes 提交于
      In all cases, mbx->req.arg and mbx->rsp.arg have just been allocated
      using kcalloc(), so these six memsets are redundant.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f0ca208
    • R
      net: mv643xx_eth: use kzalloc · b66a6085
      Rasmus Villemoes 提交于
      The double memset is a little ugly; using kzalloc avoids it altogether.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b66a6085
    • R
      net: jme: use kzalloc() instead of kmalloc+memset · e9b5ac27
      Rasmus Villemoes 提交于
      Using kzalloc saves a tiny bit on .text.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9b5ac27
    • R
      net: cavium: liquidio: use kzalloc in setup_glist() · ce8e5c70
      Rasmus Villemoes 提交于
      We save a little .text and get rid of the sizeof(...) style
      inconsistency.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce8e5c70
    • P
      net: ipv6: use common fib_default_rule_pref · f53de1e9
      Phil Sutter 提交于
      This switches IPv6 policy routing to use the shared
      fib_default_rule_pref() function of IPv4 and DECnet. It is also used in
      multicast routing for IPv4 as well as IPv6.
      
      The motivation for this patch is a complaint about iproute2 behaving
      inconsistent between IPv4 and IPv6 when adding policy rules: Formerly,
      IPv6 rules were assigned a fixed priority of 0x3FFF whereas for IPv4 the
      assigned priority value was decreased with each rule added.
      
      Since then all users of the default_pref field have been converted to
      assign the generic function fib_default_rule_pref(), fib_nl_newrule()
      may just use it directly instead. Therefore get rid of the function
      pointer altogether and make fib_default_rule_pref() static, as it's not
      used outside fib_rules.c anymore.
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f53de1e9
    • T
      net: ethoc: Remove unnecessary #ifdef CONFIG_OF · 444c5f92
      Tobias Klauser 提交于
      For !CONFIG_OF of_get_property() is defined to always return NULL. Thus
      there's no need to protect the call to of_get_property() with #ifdef
      CONFIG_OF.
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      444c5f92
    • F
      net: dsa: bcm_sf2: Fix 64-bits register writes · 03679a14
      Florian Fainelli 提交于
      The macro to write 64-bits quantities to the 32-bits register swapped
      the value and offsets arguments, we want to preserve the ordering of the
      arguments with respect to how writel() is implemented for instance:
      value first, offset/base second.
      
      Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03679a14
    • A
      bpf: fix out of bounds access in verifier log · 687f0715
      Alexei Starovoitov 提交于
      when the verifier log is enabled the print_bpf_insn() is doing
      bpf_alu_string[BPF_OP(insn->code) >> 4]
      and
      bpf_jmp_string[BPF_OP(insn->code) >> 4]
      where BPF_OP is a 4-bit instruction opcode.
      Malformed insns can cause out of bounds access.
      Fix it by sizing arrays appropriately.
      
      The bug was found by clang address sanitizer with libfuzzer.
      Reported-by: NYonghong Song <yhs@plumgrid.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      687f0715
    • R
      ipv6: fix multipath route replace error recovery · 6b9ea5a6
      Roopa Prabhu 提交于
      Problem:
      The ecmp route replace support for ipv6 in the kernel, deletes the
      existing ecmp route too early, ie when it installs the first nexthop.
      If there is an error in installing the subsequent nexthops, its too late
      to recover the already deleted existing route leaving the fib
      in an inconsistent state.
      
      This patch reduces the possibility of this by doing the following:
      a) Changes the existing multipath route add code to a two stage process:
        build rt6_infos + insert them
      	ip6_route_add rt6_info creation code is moved into
      	ip6_route_info_create.
      b) This ensures that most errors are caught during building rt6_infos
        and we fail early
      c) Separates multipath add and del code. Because add needs the special
        two stage mode in a) and delete essentially does not care.
      d) In any event if the code fails during inserting a route again, a
        warning is printed (This should be unlikely)
      
      Before the patch:
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      /* Try replacing the route with a duplicate nexthop */
      $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
      fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
      swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
      RTNETLINK answers: File exists
      
      $ip -6 route show
      /* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
       * kernel */
      
      After the patch:
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      /* Try replacing the route with a duplicate nexthop */
      $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
      fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
      swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
      RTNETLINK answers: File exists
      
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b9ea5a6
    • D
      ebpf: fix fd refcount leaks related to maps in bpf syscall · 592867bf
      Daniel Borkmann 提交于
      We may already have gotten a proper fd struct through fdget(), so
      whenever we return at the end of an map operation, we need to call
      fdput(). However, each map operation from syscall side first probes
      CHECK_ATTR() to verify that unused fields in the bpf_attr union are
      zero.
      
      In case of malformed input, we return with error, but the lookup to
      the map_fd was already performed at that time, so that we return
      without an corresponding fdput(). Fix it by performing an fdget()
      only right before bpf_map_get(). The fdget() invocation on maps in
      the verifier is not affected.
      
      Fixes: db20fd2b ("bpf: add lookup/update/delete/iterate methods to BPF maps")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      592867bf
    • S
      RDS: verify the underlying transport exists before creating a connection · 74e98eb0
      Sasha Levin 提交于
      There was no verification that an underlying transport exists when creating
      a connection, this would cause dereferencing a NULL ptr.
      
      It might happen on sockets that weren't properly bound before attempting to
      send a message, which will cause a NULL ptr deref:
      
      [135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
      [135546.051270] Modules linked in:
      [135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
      [135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
      [135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
      [135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
      [135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
      [135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
      [135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
      [135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
      [135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
      [135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
      [135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
      [135546.064723] Stack:
      [135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
      [135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
      [135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
      [135546.068629] Call Trace:
      [135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
      [135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
      [135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
      [135546.071981] rds_sendmsg (net/rds/send.c:1058)
      [135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
      [135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
      [135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
      [135546.076349] ? __might_fault (mm/memory.c:3795)
      [135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
      [135546.078856] SYSC_sendto (net/socket.c:1657)
      [135546.079596] ? SYSC_connect (net/socket.c:1628)
      [135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
      [135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
      [135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
      [135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
      [135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74e98eb0
    • D
      xen-netback: require fewer guest Rx slots when not using GSO · 1d5d4852
      David Vrabel 提交于
      Commit f48da8b1 (xen-netback: fix
      unlimited guest Rx internal queue and carrier flapping) introduced a
      regression.
      
      The PV frontend in IPXE only places 4 requests on the guest Rx ring.
      Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
      not receive any packets.
      
      a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
         for the largest possible packet.  Calculate the required slots
         based on the maximum GSO size or the MTU.
      
         This calculation of the number of required slots relies on
         1650d545 (xen-netback: always fully coalesce guest Rx packets)
         which present in 4.0-rc1 and later.
      
      b) Reduce the Rx stall detection to checking for at least one
         available Rx request.  This is fine since we're predominately
         concerned with detecting interfaces which are down and thus have
         zero available Rx requests.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d5d4852
    • D
      Merge branch 'cxgb4-fixes' · 9b57ab8b
      David S. Miller 提交于
      Hariprasad Shenai says:
      
      ====================
      cxgb4: Fix tx flit calculation and wc stat configuration
      
      This patch series fixes the following:
      Patch 1/2 fixes tx flit calculation, which if wrong can lead to
      stall, hang, data corrpution, write combining failure. Patch 2/2 fixes
      PCI-E write combining stats configuration.
      
      This patch series has been created against net tree and includes
      patches on cxgb4 driver.
      
      We have included all the maintainers of respective drivers. Kindly review
      the change and let us know in case of any review comments.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b57ab8b
    • H
      cxgb4: Fix for write-combining stats configuration · 2a485cf7
      Hariprasad Shenai 提交于
      The write-combining configuration register SGE_STAT_CFG_A needs to
      be configured after FW initializes the adapter, else FW will reset
      the configuration
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a485cf7
    • H
      cxgb4: Fix tx flit calculation · fd1754fb
      Hariprasad Shenai 提交于
      In commit 0aac3f56 ("cxgb4: Add comment for calculate tx flits
      and sge length code") introduced a regression where tx flit calculation
      is going wrong, which can lead to data corruption, hang, stall and
      write-combining failure. Fixing it.
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd1754fb
    • A
      net: eth: altera: Fix the initial device operstate · d43cefcd
      Atsushi Nemoto 提交于
      Call netif_carrier_off() prior to register_netdev(), otherwise
      userspace can see incorrect link state.
      Signed-off-by: NAtsushi Nemoto <nemoto@toshiba-tops.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d43cefcd
  2. 09 9月, 2015 7 次提交
    • K
      net: tipc: fix stall during bclink wakeup procedure · 7845989c
      Kolmakov Dmitriy 提交于
      If an attempt to wake up users of broadcast link is made when there is
      no enough place in send queue than it may hang up inside the
      tipc_sk_rcv() function since the loop breaks only after the wake up
      queue becomes empty. This can lead to complete CPU stall with the
      following message generated by RCU:
      
      INFO: rcu_sched self-detected stall on CPU { 0}  (t=2101 jiffies
      					g=54225 c=54224 q=11465)
      Task dump for CPU 0:
      tpch            R  running task        0 39949  39948 0x0000000a
       ffffffff818536c0 ffff88181fa037a0 ffffffff8106a4be 0000000000000000
       ffffffff818536c0 ffff88181fa037c0 ffffffff8106d8a8 ffff88181fa03800
       0000000000000001 ffff88181fa037f0 ffffffff81094a50 ffff88181fa15680
      Call Trace:
       <IRQ>  [<ffffffff8106a4be>] sched_show_task+0xae/0x120
       [<ffffffff8106d8a8>] dump_cpu_task+0x38/0x40
       [<ffffffff81094a50>] rcu_dump_cpu_stacks+0x90/0xd0
       [<ffffffff81097c3b>] rcu_check_callbacks+0x3eb/0x6e0
       [<ffffffff8106e53f>] ? account_system_time+0x7f/0x170
       [<ffffffff81099e64>] update_process_times+0x34/0x60
       [<ffffffff810a84d1>] tick_sched_handle.isra.18+0x31/0x40
       [<ffffffff810a851c>] tick_sched_timer+0x3c/0x70
       [<ffffffff8109a43d>] __run_hrtimer.isra.34+0x3d/0xc0
       [<ffffffff8109aa95>] hrtimer_interrupt+0xc5/0x1e0
       [<ffffffff81030d52>] ? native_smp_send_reschedule+0x42/0x60
       [<ffffffff81032f04>] local_apic_timer_interrupt+0x34/0x60
       [<ffffffff810335bc>] smp_apic_timer_interrupt+0x3c/0x60
       [<ffffffff8165a3fb>] apic_timer_interrupt+0x6b/0x70
       [<ffffffff81659129>] ? _raw_spin_unlock_irqrestore+0x9/0x10
       [<ffffffff8107eb9f>] __wake_up_sync_key+0x4f/0x60
       [<ffffffffa313ddd1>] tipc_write_space+0x31/0x40 [tipc]
       [<ffffffffa313dadf>] filter_rcv+0x31f/0x520 [tipc]
       [<ffffffffa313d699>] ? tipc_sk_lookup+0xc9/0x110 [tipc]
       [<ffffffff81659259>] ? _raw_spin_lock_bh+0x19/0x30
       [<ffffffffa314122c>] tipc_sk_rcv+0x2dc/0x3e0 [tipc]
       [<ffffffffa312e7ff>] tipc_bclink_wakeup_users+0x2f/0x40 [tipc]
       [<ffffffffa313ce26>] tipc_node_unlock+0x186/0x190 [tipc]
       [<ffffffff81597c1c>] ? kfree_skb+0x2c/0x40
       [<ffffffffa313475c>] tipc_rcv+0x2ac/0x8c0 [tipc]
       [<ffffffffa312ff58>] tipc_l2_rcv_msg+0x38/0x50 [tipc]
       [<ffffffff815a76d3>] __netif_receive_skb_core+0x5a3/0x950
       [<ffffffff815a98d3>] __netif_receive_skb+0x13/0x60
       [<ffffffff815a993e>] netif_receive_skb_internal+0x1e/0x90
       [<ffffffff815aa138>] napi_gro_receive+0x78/0xa0
       [<ffffffffa07f93f4>] tg3_poll_work+0xc54/0xf40 [tg3]
       [<ffffffff81597c8c>] ? consume_skb+0x2c/0x40
       [<ffffffffa07f9721>] tg3_poll_msix+0x41/0x160 [tg3]
       [<ffffffff815ab0f2>] net_rx_action+0xe2/0x290
       [<ffffffff8104b92a>] __do_softirq+0xda/0x1f0
       [<ffffffff8104bc26>] irq_exit+0x76/0xa0
       [<ffffffff81004355>] do_IRQ+0x55/0xf0
       [<ffffffff8165a12b>] common_interrupt+0x6b/0x6b
       <EOI>
      
      The issue occurs only when tipc_sk_rcv() is used to wake up postponed
      senders:
      
      	tipc_bclink_wakeup_users()
      		// wakeupq - is a queue which consists of special
      		// 		 messages with SOCK_WAKEUP type.
      		tipc_sk_rcv(wakeupq)
      			...
      			while (skb_queue_len(inputq)) {
      				filter_rcv(skb)
      					// Here the type of message is checked
      					// and if it is SOCK_WAKEUP then
      					// it tries to wake up a sender.
      					tipc_write_space(sk)
      						wake_up_interruptible_sync_poll()
      			}
      
      After the sender thread is woke up it can gather control and perform
      an attempt to send a message. But if there is no enough place in send
      queue it will call link_schedule_user() function which puts a message
      of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep.
      Thus the size of the queue actually is not changed and the while()
      loop never exits.
      
      The approach I proposed is to wake up only senders for which there is
      enough place in send queue so the described issue can't occur.
      Moreover the same approach is already used to wake up senders on
      unicast links.
      
      I have got into the issue on our product code but to reproduce the
      issue I changed a benchmark test application (from
      tipcutils/demos/benchmark) to perform the following scenario:
      	1. Run 64 instances of test application (nodes). It can be done
      	   on the one physical machine.
      	2. Each application connects to all other using TIPC sockets in
      	   RDM mode.
      	3. When setup is done all nodes start simultaneously send
      	   broadcast messages.
      	4. Everything hangs up.
      
      The issue is reproducible only when a congestion on broadcast link
      occurs. For example, when there are only 8 nodes it works fine since
      congestion doesn't occur. Send queue limit is 40 in my case (I use a
      critical importance level) and when 64 nodes send a message at the
      same moment a congestion occurs every time.
      Signed-off-by: NDmitry S Kolmakov <kolmakov.dmitriy@huawei.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7845989c
    • B
      dm9000: fix a typo · 7b901873
      Barry Song 提交于
      Signed-off-by: NBarry Song <Baohua.Song@csr.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b901873
    • V
      net: bridge: remove unnecessary switchdev include · 7a577f01
      Vivien Didelot 提交于
      Remove the unnecessary switchdev.h include from br_netlink.c.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a577f01
    • V
      net: bridge: check __vlan_vid_del for error · bf361ad3
      Vivien Didelot 提交于
      Since __vlan_del can return an error code, change its inner function
      __vlan_vid_del to return an eventual error from switchdev_port_obj_del.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf361ad3
    • F
      net: dsa: bcm_sf2: Fix ageing conditions and operation · 39797a27
      Florian Fainelli 提交于
      The comparison check between cur_hw_state and hw_state is currently
      invalid because cur_hw_state is right shifted by G_MISTP_SHIFT, while
      hw_state is not, so we end-up comparing bits 2:0 with bits 7:5, which is
      going to cause an additional aging to occur. Fix this by not shifting
      cur_hw_state while reading it, but instead, mask the value with the
      appropriately shitfted bitmask.
      
      The other problem with the fast-ageing process is that we did not set
      the EN_AGE_DYNAMIC bit to request the ageing to occur for dynamically
      learned MAC addresses. Finally, write back 0 to the FAST_AGE_CTRL
      register to avoid leaving spurious bits sets from one operation to the
      other.
      
      Fixes: 12f460f2 ("net: dsa: bcm_sf2: add HW bridging support")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39797a27
    • J
      device property: Don't overwrite addr when failing in device_get_mac_address · 5b902d6f
      Julien Grall 提交于
      The function device_get_mac_address is trying different property names
      in order to get the mac address. To check the return value, the variable
      addr (which contain the buffer pass by the caller) will be re-used. This
      means that if the previous property is not found, the next property will
      be read using a NULL buffer.
      
      Therefore it's only possible to retrieve the mac if node contains a
      property "mac-address". Fix it by using a temporary buffer for the
      return value.
      
      This has been introduced by commit 4c96b7dc
      "Add a matching set of device_ functions for determining mac/phy"
      Signed-off-by: NJulien Grall <julien.grall@citrix.com>
      Cc: Jeremy Linton <jeremy.linton@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Reviewed-by: NJeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b902d6f
    • E
      usbnet: Fix a race between usbnet_stop() and the BH · fcb0bb6a
      Eugene Shatokhin 提交于
      The race may happen when a device (e.g. YOTA 4G LTE Modem) is
      unplugged while the system is downloading a large file from the Net.
      
      Hardware breakpoints and Kprobes with delays were used to confirm that
      the race does actually happen.
      
      The race is on skb_queue ('next' pointer) between usbnet_stop()
      and rx_complete(), which, in turn, calls usbnet_bh().
      
      Here is a part of the call stack with the code where the changes to the
      queue happen. The line numbers are for the kernel 4.1.0:
      
      *0 __skb_unlink (skbuff.h:1517)
          prev->next = next;
      *1 defer_bh (usbnet.c:430)
          spin_lock_irqsave(&list->lock, flags);
          old_state = entry->state;
          entry->state = state;
          __skb_unlink(skb, list);
          spin_unlock(&list->lock);
          spin_lock(&dev->done.lock);
          __skb_queue_tail(&dev->done, skb);
          if (dev->done.qlen == 1)
              tasklet_schedule(&dev->bh);
          spin_unlock_irqrestore(&dev->done.lock, flags);
      *2 rx_complete (usbnet.c:640)
          state = defer_bh(dev, skb, &dev->rxq, state);
      
      At the same time, the following code repeatedly checks if the queue is
      empty and reads these values concurrently with the above changes:
      
      *0  usbnet_terminate_urbs (usbnet.c:765)
          /* maybe wait for deletions to finish. */
          while (!skb_queue_empty(&dev->rxq)
              && !skb_queue_empty(&dev->txq)
              && !skb_queue_empty(&dev->done)) {
                  schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
                  set_current_state(TASK_UNINTERRUPTIBLE);
                  netif_dbg(dev, ifdown, dev->net,
                        "waited for %d urb completions\n", temp);
          }
      *1  usbnet_stop (usbnet.c:806)
          if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
              usbnet_terminate_urbs(dev);
      
      As a result, it is possible, for example, that the skb is removed from
      dev->rxq by __skb_unlink() before the check
      "!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
      also possible in this case that the skb is added to dev->done queue
      after "!skb_queue_empty(&dev->done)" is checked. So
      usbnet_terminate_urbs() may stop waiting and return while dev->done
      queue still has an item.
      
      Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid
      this race.
      Signed-off-by: NEugene Shatokhin <eugene.shatokhin@rosalab.ru>
      Reviewed-by: NBjørn Mork <bjorn@mork.no>
      Acked-by: NOliver Neukum <oneukum@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcb0bb6a
  3. 07 9月, 2015 7 次提交