1. 10 9月, 2015 2 次提交
  2. 09 9月, 2015 7 次提交
    • K
      net: tipc: fix stall during bclink wakeup procedure · 7845989c
      Kolmakov Dmitriy 提交于
      If an attempt to wake up users of broadcast link is made when there is
      no enough place in send queue than it may hang up inside the
      tipc_sk_rcv() function since the loop breaks only after the wake up
      queue becomes empty. This can lead to complete CPU stall with the
      following message generated by RCU:
      
      INFO: rcu_sched self-detected stall on CPU { 0}  (t=2101 jiffies
      					g=54225 c=54224 q=11465)
      Task dump for CPU 0:
      tpch            R  running task        0 39949  39948 0x0000000a
       ffffffff818536c0 ffff88181fa037a0 ffffffff8106a4be 0000000000000000
       ffffffff818536c0 ffff88181fa037c0 ffffffff8106d8a8 ffff88181fa03800
       0000000000000001 ffff88181fa037f0 ffffffff81094a50 ffff88181fa15680
      Call Trace:
       <IRQ>  [<ffffffff8106a4be>] sched_show_task+0xae/0x120
       [<ffffffff8106d8a8>] dump_cpu_task+0x38/0x40
       [<ffffffff81094a50>] rcu_dump_cpu_stacks+0x90/0xd0
       [<ffffffff81097c3b>] rcu_check_callbacks+0x3eb/0x6e0
       [<ffffffff8106e53f>] ? account_system_time+0x7f/0x170
       [<ffffffff81099e64>] update_process_times+0x34/0x60
       [<ffffffff810a84d1>] tick_sched_handle.isra.18+0x31/0x40
       [<ffffffff810a851c>] tick_sched_timer+0x3c/0x70
       [<ffffffff8109a43d>] __run_hrtimer.isra.34+0x3d/0xc0
       [<ffffffff8109aa95>] hrtimer_interrupt+0xc5/0x1e0
       [<ffffffff81030d52>] ? native_smp_send_reschedule+0x42/0x60
       [<ffffffff81032f04>] local_apic_timer_interrupt+0x34/0x60
       [<ffffffff810335bc>] smp_apic_timer_interrupt+0x3c/0x60
       [<ffffffff8165a3fb>] apic_timer_interrupt+0x6b/0x70
       [<ffffffff81659129>] ? _raw_spin_unlock_irqrestore+0x9/0x10
       [<ffffffff8107eb9f>] __wake_up_sync_key+0x4f/0x60
       [<ffffffffa313ddd1>] tipc_write_space+0x31/0x40 [tipc]
       [<ffffffffa313dadf>] filter_rcv+0x31f/0x520 [tipc]
       [<ffffffffa313d699>] ? tipc_sk_lookup+0xc9/0x110 [tipc]
       [<ffffffff81659259>] ? _raw_spin_lock_bh+0x19/0x30
       [<ffffffffa314122c>] tipc_sk_rcv+0x2dc/0x3e0 [tipc]
       [<ffffffffa312e7ff>] tipc_bclink_wakeup_users+0x2f/0x40 [tipc]
       [<ffffffffa313ce26>] tipc_node_unlock+0x186/0x190 [tipc]
       [<ffffffff81597c1c>] ? kfree_skb+0x2c/0x40
       [<ffffffffa313475c>] tipc_rcv+0x2ac/0x8c0 [tipc]
       [<ffffffffa312ff58>] tipc_l2_rcv_msg+0x38/0x50 [tipc]
       [<ffffffff815a76d3>] __netif_receive_skb_core+0x5a3/0x950
       [<ffffffff815a98d3>] __netif_receive_skb+0x13/0x60
       [<ffffffff815a993e>] netif_receive_skb_internal+0x1e/0x90
       [<ffffffff815aa138>] napi_gro_receive+0x78/0xa0
       [<ffffffffa07f93f4>] tg3_poll_work+0xc54/0xf40 [tg3]
       [<ffffffff81597c8c>] ? consume_skb+0x2c/0x40
       [<ffffffffa07f9721>] tg3_poll_msix+0x41/0x160 [tg3]
       [<ffffffff815ab0f2>] net_rx_action+0xe2/0x290
       [<ffffffff8104b92a>] __do_softirq+0xda/0x1f0
       [<ffffffff8104bc26>] irq_exit+0x76/0xa0
       [<ffffffff81004355>] do_IRQ+0x55/0xf0
       [<ffffffff8165a12b>] common_interrupt+0x6b/0x6b
       <EOI>
      
      The issue occurs only when tipc_sk_rcv() is used to wake up postponed
      senders:
      
      	tipc_bclink_wakeup_users()
      		// wakeupq - is a queue which consists of special
      		// 		 messages with SOCK_WAKEUP type.
      		tipc_sk_rcv(wakeupq)
      			...
      			while (skb_queue_len(inputq)) {
      				filter_rcv(skb)
      					// Here the type of message is checked
      					// and if it is SOCK_WAKEUP then
      					// it tries to wake up a sender.
      					tipc_write_space(sk)
      						wake_up_interruptible_sync_poll()
      			}
      
      After the sender thread is woke up it can gather control and perform
      an attempt to send a message. But if there is no enough place in send
      queue it will call link_schedule_user() function which puts a message
      of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep.
      Thus the size of the queue actually is not changed and the while()
      loop never exits.
      
      The approach I proposed is to wake up only senders for which there is
      enough place in send queue so the described issue can't occur.
      Moreover the same approach is already used to wake up senders on
      unicast links.
      
      I have got into the issue on our product code but to reproduce the
      issue I changed a benchmark test application (from
      tipcutils/demos/benchmark) to perform the following scenario:
      	1. Run 64 instances of test application (nodes). It can be done
      	   on the one physical machine.
      	2. Each application connects to all other using TIPC sockets in
      	   RDM mode.
      	3. When setup is done all nodes start simultaneously send
      	   broadcast messages.
      	4. Everything hangs up.
      
      The issue is reproducible only when a congestion on broadcast link
      occurs. For example, when there are only 8 nodes it works fine since
      congestion doesn't occur. Send queue limit is 40 in my case (I use a
      critical importance level) and when 64 nodes send a message at the
      same moment a congestion occurs every time.
      Signed-off-by: NDmitry S Kolmakov <kolmakov.dmitriy@huawei.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7845989c
    • B
      dm9000: fix a typo · 7b901873
      Barry Song 提交于
      Signed-off-by: NBarry Song <Baohua.Song@csr.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b901873
    • V
      net: bridge: remove unnecessary switchdev include · 7a577f01
      Vivien Didelot 提交于
      Remove the unnecessary switchdev.h include from br_netlink.c.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a577f01
    • V
      net: bridge: check __vlan_vid_del for error · bf361ad3
      Vivien Didelot 提交于
      Since __vlan_del can return an error code, change its inner function
      __vlan_vid_del to return an eventual error from switchdev_port_obj_del.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf361ad3
    • F
      net: dsa: bcm_sf2: Fix ageing conditions and operation · 39797a27
      Florian Fainelli 提交于
      The comparison check between cur_hw_state and hw_state is currently
      invalid because cur_hw_state is right shifted by G_MISTP_SHIFT, while
      hw_state is not, so we end-up comparing bits 2:0 with bits 7:5, which is
      going to cause an additional aging to occur. Fix this by not shifting
      cur_hw_state while reading it, but instead, mask the value with the
      appropriately shitfted bitmask.
      
      The other problem with the fast-ageing process is that we did not set
      the EN_AGE_DYNAMIC bit to request the ageing to occur for dynamically
      learned MAC addresses. Finally, write back 0 to the FAST_AGE_CTRL
      register to avoid leaving spurious bits sets from one operation to the
      other.
      
      Fixes: 12f460f2 ("net: dsa: bcm_sf2: add HW bridging support")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39797a27
    • J
      device property: Don't overwrite addr when failing in device_get_mac_address · 5b902d6f
      Julien Grall 提交于
      The function device_get_mac_address is trying different property names
      in order to get the mac address. To check the return value, the variable
      addr (which contain the buffer pass by the caller) will be re-used. This
      means that if the previous property is not found, the next property will
      be read using a NULL buffer.
      
      Therefore it's only possible to retrieve the mac if node contains a
      property "mac-address". Fix it by using a temporary buffer for the
      return value.
      
      This has been introduced by commit 4c96b7dc
      "Add a matching set of device_ functions for determining mac/phy"
      Signed-off-by: NJulien Grall <julien.grall@citrix.com>
      Cc: Jeremy Linton <jeremy.linton@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Reviewed-by: NJeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b902d6f
    • E
      usbnet: Fix a race between usbnet_stop() and the BH · fcb0bb6a
      Eugene Shatokhin 提交于
      The race may happen when a device (e.g. YOTA 4G LTE Modem) is
      unplugged while the system is downloading a large file from the Net.
      
      Hardware breakpoints and Kprobes with delays were used to confirm that
      the race does actually happen.
      
      The race is on skb_queue ('next' pointer) between usbnet_stop()
      and rx_complete(), which, in turn, calls usbnet_bh().
      
      Here is a part of the call stack with the code where the changes to the
      queue happen. The line numbers are for the kernel 4.1.0:
      
      *0 __skb_unlink (skbuff.h:1517)
          prev->next = next;
      *1 defer_bh (usbnet.c:430)
          spin_lock_irqsave(&list->lock, flags);
          old_state = entry->state;
          entry->state = state;
          __skb_unlink(skb, list);
          spin_unlock(&list->lock);
          spin_lock(&dev->done.lock);
          __skb_queue_tail(&dev->done, skb);
          if (dev->done.qlen == 1)
              tasklet_schedule(&dev->bh);
          spin_unlock_irqrestore(&dev->done.lock, flags);
      *2 rx_complete (usbnet.c:640)
          state = defer_bh(dev, skb, &dev->rxq, state);
      
      At the same time, the following code repeatedly checks if the queue is
      empty and reads these values concurrently with the above changes:
      
      *0  usbnet_terminate_urbs (usbnet.c:765)
          /* maybe wait for deletions to finish. */
          while (!skb_queue_empty(&dev->rxq)
              && !skb_queue_empty(&dev->txq)
              && !skb_queue_empty(&dev->done)) {
                  schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
                  set_current_state(TASK_UNINTERRUPTIBLE);
                  netif_dbg(dev, ifdown, dev->net,
                        "waited for %d urb completions\n", temp);
          }
      *1  usbnet_stop (usbnet.c:806)
          if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
              usbnet_terminate_urbs(dev);
      
      As a result, it is possible, for example, that the skb is removed from
      dev->rxq by __skb_unlink() before the check
      "!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
      also possible in this case that the skb is added to dev->done queue
      after "!skb_queue_empty(&dev->done)" is checked. So
      usbnet_terminate_urbs() may stop waiting and return while dev->done
      queue still has an item.
      
      Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid
      this race.
      Signed-off-by: NEugene Shatokhin <eugene.shatokhin@rosalab.ru>
      Reviewed-by: NBjørn Mork <bjorn@mork.no>
      Acked-by: NOliver Neukum <oneukum@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcb0bb6a
  3. 07 9月, 2015 8 次提交
  4. 06 9月, 2015 7 次提交
  5. 04 9月, 2015 10 次提交
  6. 03 9月, 2015 6 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next · dd5cdb48
      Linus Torvalds 提交于
      Pull networking updates from David Miller:
       "Another merge window, another set of networking changes.  I've heard
        rumblings that the lightweight tunnels infrastructure has been voted
        networking change of the year.  But what do I know?
      
         1) Add conntrack support to openvswitch, from Joe Stringer.
      
         2) Initial support for VRF (Virtual Routing and Forwarding), which
            allows the segmentation of routing paths without using multiple
            devices.  There are some semantic kinks to work out still, but
            this is a reasonably strong foundation.  From David Ahern.
      
         3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.
      
         4) Ignore route nexthops with a link down state in ipv6, just like
            ipv4.  From Andy Gospodarek.
      
         5) Remove spinlock from fast path of act_gact and act_mirred, from
            Eric Dumazet.
      
         6) Document the DSA layer, from Florian Fainelli.
      
         7) Add netconsole support to bcmgenet, systemport, and DSA.  Also
            from Florian Fainelli.
      
         8) Add Mellanox Switch Driver and core infrastructure, from Jiri
            Pirko.
      
         9) Add support for "light weight tunnels", which allow for
            encapsulation and decapsulation without bearing the overhead of a
            full blown netdevice.  From Thomas Graf, Jiri Benc, and a cast of
            others.
      
        10) Add Identifier Locator Addressing support for ipv6, from Tom
            Herbert.
      
        11) Support fragmented SKBs in iwlwifi, from Johannes Berg.
      
        12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.
      
        13) Add BQL support to 3c59x driver, from Loganaden Velvindron.
      
        14) Stop using a zero TX queue length to mean that a device shouldn't
            have a qdisc attached, use an explicit flag instead.  From Phil
            Sutter.
      
        15) Use generic geneve netdevice infrastructure in openvswitch, from
            Pravin B Shelar.
      
        16) Add infrastructure to avoid re-forwarding a packet in software
            that was already forwarded by a hardware switch.  From Scott
            Feldman.
      
        17) Allow AF_PACKET fanout function to be implemented in a bpf
            program, from Willem de Bruijn"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
        netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
        netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
        net: fec: clear receive interrupts before processing a packet
        ipv6: fix exthdrs offload registration in out_rt path
        xen-netback: add support for multicast control
        bgmac: Update fixed_phy_register()
        sock, diag: fix panic in sock_diag_put_filterinfo
        flow_dissector: Use 'const' where possible.
        flow_dissector: Fix function argument ordering dependency
        ixgbe: Resolve "initialized field overwritten" warnings
        ixgbe: Remove bimodal SR-IOV disabling
        ixgbe: Add support for reporting 2.5G link speed
        ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
        ixgbe: support for ethtool set_rxfh
        ixgbe: Avoid needless PHY access on copper phys
        ixgbe: cleanup to use cached mask value
        ixgbe: Remove second instance of lan_id variable
        ixgbe: use kzalloc for allocating one thing
        flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
        ixgbe: Remove unused PCI bus types
        ...
      dd5cdb48
    • L
      Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 1e1a4e8f
      Linus Torvalds 提交于
      Pull device mapper update from Mike Snitzer:
      
       - a couple small cleanups in dm-cache, dm-verity, persistent-data's
         dm-btree, and DM core.
      
       - a 4.1-stable fix for dm-cache that fixes the leaking of deferred bio
         prison cells
      
       - a 4.2-stable fix that adds feature reporting for the dm-stats
         features added in 4.2
      
       - improve DM-snapshot to not invalidate the on-disk snapshot if
         snapshot device write overflow occurs; but a write overflow triggered
         through the origin device will still invalidate the snapshot.
      
       - optimize DM-thinp's async discard submission a bit now that late bio
         splitting has been included in block core.
      
       - switch DM-cache's SMQ policy lock from using a mutex to a spinlock;
         improves performance on very low latency devices (eg. NVMe SSD).
      
       - document DM RAID 4/5/6's discard support
      
      [ I did not pull the slab changes, which weren't appropriate for this
        tree, and weren't obviously the right thing to do anyway.  At the very
        least they need some discussion and explanation before getting merged.
      
        Because not pulling the actual tagged commit but doing a partial pull
        instead, this merge commit thus also obviously is missing the git
        signature from the original tag ]
      
      * tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm cache: fix use after freeing migrations
        dm cache: small cleanups related to deferred prison cell cleanup
        dm cache: fix leaking of deferred bio prison cells
        dm raid: document RAID 4/5/6 discard support
        dm stats: report precise_timestamps and histogram in @stats_list output
        dm thin: optimize async discard submission
        dm snapshot: don't invalidate on-disk image on snapshot write overflow
        dm: remove unlikely() before IS_ERR()
        dm: do not override error code returned from dm_get_device()
        dm: test return value for DM_MAPIO_SUBMITTED
        dm verity: remove unused mempool
        dm cache: move wake_waker() from free_migrations() to where it is needed
        dm btree remove: remove unused function get_nr_entries()
        dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling()
        dm cache policy smq: change the mutex to a spinlock
      1e1a4e8f
    • D
      netfilter: nf_conntrack: make nf_ct_zone_dflt built-in · 62da9865
      Daniel Borkmann 提交于
      Fengguang reported, that some randconfig generated the following linker
      issue with nf_ct_zone_dflt object involved:
      
        [...]
        CC      init/version.o
        LD      init/built-in.o
        net/built-in.o: In function `ipv4_conntrack_defrag':
        nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt'
        net/built-in.o: In function `ipv6_defrag':
        nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt'
        make: *** [vmlinux] Error 1
      
      Given that configurations exist where we have a built-in part, which is
      accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user()
      and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a
      module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in
      area when netfilter is configured in general.
      
      Therefore, split the more generic parts into a common header under
      include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in
      section that already holds parts related to CONFIG_NF_CONNTRACK in the
      netfilter core. This fixes the issue on my side.
      
      Fixes: 308ac914 ("netfilter: nf_conntrack: push zone object into functions")
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62da9865
    • D
      netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled · a82b0e63
      Daniel Borkmann 提交于
      While testing various Kconfig options on another issue, I found that
      the following one triggers as well on allmodconfig and nf_conntrack
      disabled:
      
        net/ipv4/netfilter/nf_dup_ipv4.c: In function ‘nf_dup_ipv4’:
        net/ipv4/netfilter/nf_dup_ipv4.c:72:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function)
          if (this_cpu_read(nf_skb_duplicated))
        [...]
        net/ipv6/netfilter/nf_dup_ipv6.c: In function ‘nf_dup_ipv6’:
        net/ipv6/netfilter/nf_dup_ipv6.c:66:20: error: ‘nf_skb_duplicated’ undeclared (first use in this function)
          if (this_cpu_read(nf_skb_duplicated))
      
      Fix it by including directly the header where it is defined.
      
      Fixes: bbde9fc1 ("netfilter: factor out packet duplication for IPv4/IPv6")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a82b0e63
    • R
      net: fec: clear receive interrupts before processing a packet · ed63f1dc
      Russell King 提交于
      The patch just to re-submit the patch "db3421c1" because the
      patch "4d494cdc" remove the change.
      
      Clear any pending receive interrupt before we process a pending packet.
      This helps to avoid any spurious interrupts being raised after we have
      fully cleaned the receive ring, while still allowing an interrupt to be
      raised if we receive another packet.
      
      The position of this is critical: we must do this prior to reading the
      next packet status to avoid potentially dropping an interrupt when a
      packet is still pending.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed63f1dc
    • D
      ipv6: fix exthdrs offload registration in out_rt path · e41b0bed
      Daniel Borkmann 提交于
      We previously register IPPROTO_ROUTING offload under inet6_add_offload(),
      but in error path, we try to unregister it with inet_del_offload(). This
      doesn't seem correct, it should actually be inet6_del_offload(), also
      ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it
      also uses rthdr_offload twice), but it got removed entirely later on.
      
      Fixes: 3336288a ("ipv6: Switch to using new offload infrastructure.")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e41b0bed