1. 08 3月, 2013 12 次提交
    • V
      bonding: fire NETDEV_RELEASE event only on 0 slaves · 80028ea1
      Veaceslav Falico 提交于
      Currently, if we set up netconsole over bonding and release a slave,
      netconsole will stop logging on the whole bonding device. Change the
      behavior to stop the netconsole only when the last slave is released.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80028ea1
    • Z
      vxlan: fix oops when delete netns containing vxlan · 9cb6cb7e
      Zang MingJie 提交于
      The following script will produce a kernel oops:
      
          sudo ip netns add v
          sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo
          sudo ip netns exec v ip link set lo up
          sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo
          sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo
          sudo ip netns exec v ip link set vxlan0 up
          sudo ip netns del v
      
      where inspect by gdb:
      
          Program received signal SIGSEGV, Segmentation fault.
          [Switching to Thread 107]
          0xffffffffa0289e33 in ?? ()
          (gdb) bt
          #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
          #1  vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087
          #2  0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299
          #3  0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335
          #4  0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851
          #5  0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752
          #6  0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170
          #7  0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302
          #8  0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157
          #9  0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276
          #10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168
          #11 <signal handler called>
          #12 0x0000000000000000 in ?? ()
          #13 0x0000000000000000 in ?? ()
          (gdb) fr 0
          #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
          533		struct sock *sk = vn->sock->sk;
          (gdb) l
          528	static int vxlan_leave_group(struct net_device *dev)
          529	{
          530		struct vxlan_dev *vxlan = netdev_priv(dev);
          531		struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
          532		int err = 0;
          533		struct sock *sk = vn->sock->sk;
          534		struct ip_mreqn mreq = {
          535			.imr_multiaddr.s_addr	= vxlan->gaddr,
          536			.imr_ifindex		= vxlan->link,
          537		};
          (gdb) p vn->sock
          $4 = (struct socket *) 0x0
      
      The kernel calls `vxlan_exit_net` when deleting the netns before shutting down
      vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock`
      is already gone causes the oops. so we should manually shutdown all interfaces
      before deleting `vn->sock` as the patch does.
      Signed-off-by: NZang MingJie <zealot0630@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cb6cb7e
    • B
      vmxnet3: prevent div-by-zero panic when ring resizing uninitialized dev · e4fabf2b
      Bhavesh Davda 提交于
      Linux is free to call ethtool ops as soon as a netdev exists when probe
      finishes. However, we only allocate vmxnet3 tx/rx queues and initialize the
      rx_buf_per_pkt field in struct vmxnet3_adapter when the interface is
      opened (UP).
      Signed-off-by: NBhavesh Davda <bhavesh@vmware.com>
      Signed-off-by: NShreyas N Bhatewara <sbhatewara@vmware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4fabf2b
    • D
      Merge branch 'mlx4' · c5b15679
      David S. Miller 提交于
      Or Gerlitz says:
      
      ====================
      Here's a batch of fixes to the mlx4 core and ethernet drivers for 3.9
      
      The commit that disabled RFS when running in SRIOV mode fixes a regression which was
      introduced in 3.9-rc1 but actually present also in the 3.8 -stable series. It turns out
      that a slightly different fix is needed there and we will generate and submit it there.
      
      Patches done against net commit 66d29cbc "benet: Wait f/w POST until timeout"
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5b15679
    • A
      net/mlx4_en: Disable RFS when running in SRIOV mode · a229e488
      Amir Vadai 提交于
      Commit 37706996 "mlx4_en: fix allocation of CPU affinity reverse-map" fixed
      a bug when mlx4_dev->caps.comp_pool is larger from the device rx rings, but
      introduced a regression.
      
      When the mlx4_core is activating its "legacy mode" (e.g when running in SRIOV
      mode) w.r.t to EQs/IRQs usage, comp_pool becomes zero and we're crashing on
      divide by zero alloc_cpu_rmap.
      
      Fix that by enabling RFS only when running in non-legacy mode.
      Reported-by: NYan Burman <yanb@mellanox.com>
      Cc: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a229e488
    • Y
      net/mlx4_en: Cleanup MAC resources on module unload or port stop · 83a5a6ce
      Yan Burman 提交于
      Make sure we cleanup all MAC related resources (entries in the port MAC
      table and steering rules) when stopping a port or when the driver is unloaded.
      
      The leak was introduced by commit 07cb4b0a "net/mlx4_en: Manage hash of MAC
      addresses per port".
      Signed-off-by: NYan Burman <yanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83a5a6ce
    • Y
      net/mlx4_en: Fix race when setting the device MAC address · bfa8ab47
      Yan Burman 提交于
      Remove unnecessary use of workqueue for the device MAC address setting
      flow, and fix a race when setting MAC address which was introduced by
      commit c07cb4b0 "net/mlx4_en: Manage hash of MAC addresses per port"
      
      The race happened when mlx4_en_replace_mac was being executed in parallel
      with a successive call to ndo_set_mac_address, e.g witn an A/B/A MAC
      setting configuration test, the third set fails.
      
      With this change we also properly report an error if set MAC fails.
      Signed-off-by: NYan Burman <yanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfa8ab47
    • J
      net/mlx4_core: Fix endianness bug in set_param_l · e7dbeba8
      Jack Morgenstein 提交于
      The set_param_l function assumes casting a u64 pointer to a u32 pointer
      allows to access the lower 32bits, but it results in writing the upper
      32 bits on big endian systems.
      
      The fixed function reads the upper 32 bits of the 64 argument, and or's
      them with the 32 bits of the 32-bit value passed to the function.
      
      Since this is now a "read-modify-write" operation, we got many
      "unintialized variable" warnings which needed to be fixed as well.
      
      Reported-by: Alexander Schmidt <alexschm@de.ibm.com>.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7dbeba8
    • J
      net/mlx4_core: Turn off device-managed FS bit in dev-cap wrapper if DMFS is not enabled · 0081c8f3
      Jack Morgenstein 提交于
      Older kernels detect DMFS (device-managed flow steering) from the HCA
      device capability directly, regardless of whether the capability was
      enabled in INIT_HCA, this is fixed by commit 7b8157be "mlx4_core: Adjustments
      to Flow Steering activation logic for SR-IOV"
      
      To protect against guests running kernels without this fix, the host driver
      should turn off the DMFS capability bit in mlx4_QUERY_DEV_CAP_wrapper.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0081c8f3
    • J
      net/mlx4_core: Disable mlx4_QP_ATTACH calls from guests if the host uses flow steering · 3fb817f1
      Jack Morgenstein 提交于
      Guests kernels may not correctly detect if DMFS (device-enabled flow steering) is
      activated by the host. If DMFS is activated, the master should return error to guests
      which try to use the B0-steering flow calls (mlx4_QP_ATTACH).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fb817f1
    • D
      Merge branch 'master' of git://1984.lsi.us.es/nf · 43b18db8
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      The following patchset contains Netfilter fixes for your net tree,
      they are:
      
      * Don't generate audit log message if audit is not enabled, from Gao Feng.
      
      * Fix logging formatting for packets dropped by helpers, by Joe Perches.
      
      * Fix a compilation warning in nfnetlink if CONFIG_PROVE_RCU is not set,
        from Paul Bolle.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43b18db8
    • D
      Merge branch 'intel' · 8b4cd8a0
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      This series contains updates to e1000e only.
      
      All three patches come from Konstantin Khlebnikov to resolve power
      management issues.  The first patch removes redundant and unbalanced
      pci_disable_device() from the shutdown function.  The second patch
      removes redundant actions from the driver and fixes the interaction
      with actions in pci-bus runtime power management code.  The third
      and last patch fixes some messages like 'Error reading PHY register'
      and 'Hardware Erorr' and saves several seconds on reboot.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b4cd8a0
  2. 07 3月, 2013 10 次提交
    • K
      e1000e: fix accessing to suspended device · e60b22c5
      Konstantin Khlebnikov 提交于
      This patch fixes some annoying messages like 'Error reading PHY register' and
      'Hardware Erorr' and saves several seconds on reboot.
      
      Cc: Bruce Allan <bruce.w.allan@intel.com>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e60b22c5
    • K
      e1000e: fix runtime power management transitions · 66148bab
      Konstantin Khlebnikov 提交于
      This patch removes redundant actions from driver and fixes its interaction
      with actions in pci-bus runtime power management code.
      
      It removes pci_save_state() from __e1000_shutdown() for normal adapters,
      PCI bus callbacks pci_pm_*() will do all this for us. Now __e1000_shutdown()
      switches to D3-state only quad-port adapters, because they needs quirk for
      clearing false-positive error from downsteam pci-e port.
      
      pci_save_state() now called after clearing bus-master bit, thus __e1000_resume()
      and e1000_io_slot_reset() must set it back after restoring configuration space.
      
      This patch set get_link_status before calling pm_runtime_put() in e1000_open()
      to allow e1000_idle() get real link status and schedule first runtime suspend.
      
      This patch also enables wakeup for device if management mode is enabled
      (like for WoL) as result pci_prepare_to_sleep() would setup wakeup without
      special actions like custom 'enable_wakeup' sign.
      
      Cc: Bruce Allan <bruce.w.allan@intel.com>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      66148bab
    • K
      e1000e: fix pci-device enable-counter balance · 4e0855df
      Konstantin Khlebnikov 提交于
      This patch removes redundant and unbalanced pci_disable_device() from
      __e1000_shutdown(). pci_clear_master() is enough, device can go into
      suspended state with elevated enable_cnt.
      
      Bug was introduced in commit 23606cf5
      ("e1000e / PCI / PM: Add basic runtime PM support (rev. 4)") in v2.6.35
      
      Cc: Bruce Allan <bruce.w.allan@intel.com>
      CC: Stable <stable@kernel.org>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4e0855df
    • E
      tun: add a missing nf_reset() in tun_net_xmit() · f8af75f3
      Eric Dumazet 提交于
      Dave reported following crash :
      
      general protection fault: 0000 [#1] SMP
      CPU 2
      Pid: 25407, comm: qemu-kvm Not tainted 3.7.9-205.fc18.x86_64 #1 Hewlett-Packard HP Z400 Workstation/0B4Ch
      RIP: 0010:[<ffffffffa0399bd5>]  [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
      RSP: 0018:ffff880276913d78  EFLAGS: 00010206
      RAX: 50626b6b7876376c RBX: ffff88026e530d68 RCX: ffff88028d158e00
      RDX: ffff88026d0d5470 RSI: 0000000000000011 RDI: 0000000000000002
      RBP: ffff880276913d88 R08: 0000000000000000 R09: ffff880295002900
      R10: 0000000000000000 R11: 0000000000000003 R12: ffffffff81ca3b40
      R13: ffffffff8151a8e0 R14: ffff880270875000 R15: 0000000000000002
      FS:  00007ff3bce38a00(0000) GS:ffff88029fc40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007fd1430bd000 CR3: 000000027042b000 CR4: 00000000000027e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process qemu-kvm (pid: 25407, threadinfo ffff880276912000, task ffff88028c369720)
      Stack:
       ffff880156f59100 ffff880156f59100 ffff880276913d98 ffffffff815534f7
       ffff880276913db8 ffffffff8151a74b ffff880270875000 ffff880156f59100
       ffff880276913dd8 ffffffff8151a5a6 ffff880276913dd8 ffff88026d0d5470
      Call Trace:
       [<ffffffff815534f7>] nf_conntrack_destroy+0x17/0x20
       [<ffffffff8151a74b>] skb_release_head_state+0x7b/0x100
       [<ffffffff8151a5a6>] __kfree_skb+0x16/0xa0
       [<ffffffff8151a666>] kfree_skb+0x36/0xa0
       [<ffffffff8151a8e0>] skb_queue_purge+0x20/0x40
       [<ffffffffa02205f7>] __tun_detach+0x117/0x140 [tun]
       [<ffffffffa022184c>] tun_chr_close+0x3c/0xd0 [tun]
       [<ffffffff8119669c>] __fput+0xec/0x240
       [<ffffffff811967fe>] ____fput+0xe/0x10
       [<ffffffff8107eb27>] task_work_run+0xa7/0xe0
       [<ffffffff810149e1>] do_notify_resume+0x71/0xb0
       [<ffffffff81640152>] int_signal+0x12/0x17
      Code: 00 00 04 48 89 e5 41 54 53 48 89 fb 4c 8b a7 e8 00 00 00 0f 85 de 00 00 00 0f b6 73 3e 0f b7 7b 2a e8 10 40 00 00 48 85 c0 74 0e <48> 8b 40 28 48 85 c0 74 05 48 89 df ff d0 48 c7 c7 08 6a 3a a0
      RIP  [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
       RSP <ffff880276913d78>
      
      This is because tun_net_xmit() needs to call nf_reset()
      before queuing skb into receive_queue
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8af75f3
    • D
      Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · 930df2df
      David S. Miller 提交于
      John W. Linville says:
      
      ====================
      This time just passing along a big batch of fixes from Johannes...
      
      For the mac80211 bits:
      
      "Here I have fixes from Ben Greear for stray work items when deleting
      interfaces, another idle handling fix from Felix, a fix from Marco ro a
      mesh PS buffering crash and I have a fix for the VHT MCS calculation in
      association request frames and more nl80211 feature advertising removal
      as well as a workaround to increase the dump size if the SKB overhead is
      too large. For 3.10 I already have a complete fix queued, but that also
      requires (simple) userspace changes."
      
      And for the iwlwifi bits:
      
      "The patches from Dor fix a bunch of calibration issues in the new MVM
      driver, and Emmanuel has a number of fixes there as well. Also, we
      decided to disable 8k A-MSDU by default, so that's in there. My own
      patches are addressing an issue we found with the new devices but that
      seems to also exist on older ones, the DMA writeback the devices do can
      be delayed and cause issues. The fix is unfortunately relatively large
      and depends on two other changes (to not be hugely conflicting), but I
      think it's still worth it at this point."
      
      As Johannes says, it is a bit large.  But I hope it is still early
      enough in the cycle to make that worthwhile.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      930df2df
    • S
      be2net: use CSR-BAR SEMAPHORE reg for BE2/BE3 · c5b3ad4c
      Sathya Perla 提交于
      The SLIPORT_SEMAPHORE register shadowed in the
      config-space may not reflect the correct POST stage after
      an EEH reset in BE2/3; it may return FW_READY state even though
      FW is not ready. This causes the driver to prematurely
      poll the FW mailbox and fail.
      
      For BE2/3 use the CSR-BAR/0xac instead.
      Reported-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5b3ad4c
    • J
      net: docs: document multiqueue tuntap API · f422d2a0
      Jason Wang 提交于
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f422d2a0
    • D
      Merge branch 'sfc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc · 70e21fe4
      David S. Miller 提交于
      Ben Hutchings says:
      
      ====================
      Fix regressions introduced by the last set of fixes (sorry):
      
      1. Potential deadlock when disabling TX queues.
      2. RX was broken on architectures other than x86 and powerpc.
      
      I still expect to send one more bug fix for 3.9, but as it sometimes
      takes days to reproduce the bug it's going to take a couple of weeks of
      testing to be confident that it's really fixed.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70e21fe4
    • B
      sfc: Correct efx_rx_buffer::page_offset when EFX_PAGE_IP_ALIGN != 0 · c73e787a
      Ben Hutchings 提交于
      RX DMA buffers start at an offset of EFX_PAGE_IP_ALIGN bytes from the
      start of a cache line.  This offset obviously needs to be included in
      the virtual address, but this was missed in commit b590ace0
      ('sfc: Fix efx_rx_buf_offset() in the presence of swiotlb') since
      EFX_PAGE_IP_ALIGN is equal to 0 on both x86 and powerpc.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      c73e787a
    • B
      sfc: Disable soft interrupt handling during efx_device_detach_sync() · 35205b21
      Ben Hutchings 提交于
      efx_device_detach_sync() locks all TX queues before marking the device
      detached and thus disabling further TX scheduling.  But it can still
      be interrupted by TX completions which then result in TX scheduling in
      soft interrupt context.  This will deadlock when it tries to acquire
      a TX queue lock that efx_device_detach_sync() already acquired.
      
      To avoid deadlock, we must use netif_tx_{,un}lock_bh().
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      35205b21
  3. 06 3月, 2013 18 次提交
    • J
      Merge branch 'master' of... · 32cdd592
      John W. Linville 提交于
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
      32cdd592
    • G
      benet: Wait f/w POST until timeout · 66d29cbc
      Gavin Shan 提交于
      While PCI card faces EEH errors, reset (usually hot reset) is
      expected to recover from the EEH errors. After EEH core finishes
      the reset, the driver callback (be_eeh_reset) is called and wait
      the firmware to complete POST successfully. The original code would
      return with error once detecting failure during POST stage. That
      seems not enough.
      
      The patch forces the driver (be_eeh_reset) to wait the firmware
      completes POST until timeout, instead of returning error upon
      detection POST failure immediately. Also, it would improve the
      reliability of the EEH funtionality of the driver.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Acked-by: NSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66d29cbc
    • D
      net/ipv4: Timestamp option cannot overflow with prespecified addresses · fa2b04f4
      David Ward 提交于
      When a router forwards a packet that contains the IPv4 timestamp option,
      if there is no space left in the option for the router to add its own
      timestamp, then the router increments the Overflow value in the option.
      
      However, if the addresses of the routers are prespecified in the option,
      then the overflow condition cannot happen: the option is structured so
      that each prespecified router has a place to write its timestamp. Other
      routers do not add a timestamp, so there will never be a lack of space.
      
      This fix ensures that the Overflow value in the IPv4 timestamp option is
      not incremented when the addresses of the routers are prespecified, even
      if the Pointer value is greater than the Length value.
      Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa2b04f4
    • E
      net: reduce net_rx_action() latency to 2 HZ · d1f41b67
      Eric Dumazet 提交于
      We should use time_after_eq() to get maximum latency of two ticks,
      instead of three.
      
      Bug added in commit 24f8b238 (net: increase receive packet quantum)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1f41b67
    • R
      net: fix new kernel-doc warnings in net core · 691b3b7e
      Randy Dunlap 提交于
      Fix new kernel-doc warnings in net/core/dev.c:
      
      Warning(net/core/dev.c:4788): No description found for parameter 'new_carrier'
      Warning(net/core/dev.c:4788): Excess function parameter 'new_carries' description in 'dev_change_carrier'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      691b3b7e
    • Z
      reset nf before xmit vxlan encapsulated packet · 88c4c066
      Zang MingJie 提交于
      We should reset nf settings bond to the skb as ipip/ipgre do.
      
      If not, the conntrack/nat info bond to the origin packet may continually
      redirect the packet to vxlan interface causing a routing loop.
      
      this is the scenario:
      
           VETP     VXLAN Gateway
          /----\  /---------------\
          |    |  |               |
          |  vx+--+vx --NAT-> eth0+--> Internet
          |    |  |               |
          \----/  \---------------/
      
      when there are any packet coming from internet to the vetp, there will be lots
      of garbage packets coming out the gateway's vxlan interface, but none actually
      sent to the physical interface, because they are redirected back to the vxlan
      interface in the postrouting chain of NAT rule, and dmesg complains:
      
          Mar  1 21:52:53 debian kernel: [ 8802.997699] Dead loop on virtual device vxlan0, fix it urgently!
          Mar  1 21:52:54 debian kernel: [ 8804.004907] Dead loop on virtual device vxlan0, fix it urgently!
          Mar  1 21:52:55 debian kernel: [ 8805.012189] Dead loop on virtual device vxlan0, fix it urgently!
          Mar  1 21:52:56 debian kernel: [ 8806.020593] Dead loop on virtual device vxlan0, fix it urgently!
      
      the patch should fix the problem
      Signed-off-by: NZang MingJie <zealot0630@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88c4c066
    • P
      pkt_sched: sch_qfq: remove a useless invocation of qfq_update_eligible · 76e4cb0d
      Paolo Valente 提交于
      QFQ+ can select for service only 'eligible' aggregates, i.e.,
      aggregates that would have started to be served also in the emulated
      ideal system.  As a consequence, for QFQ+ to be work conserving, at
      least one of the active aggregates must be eligible when it is time to
      choose the next aggregate to serve.
      
      The set of eligible aggregates is updated through the function
      qfq_update_eligible(), which does guarantee that, after its
      invocation, at least one of the active aggregates is eligible.
      Because of this property, this function is invoked in
      qfq_deactivate_agg() to guarantee that at least one of the active
      aggregates is still eligible after an aggregate has been deactivated.
      In particular, the critical case is when there are other active
      aggregates, but the aggregate being deactivated happens to be the only
      one eligible.
      
      However, this precaution is not needed for QFQ+ to be work conserving,
      because update_eligible() is always invoked also at the beginning of
      qfq_choose_next_agg(). This patch removes the additional invocation of
      update_eligible() in qfq_deactivate_agg().
      Signed-off-by: NPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: NFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76e4cb0d
    • P
      pkt_sched: sch_qfq: do not allow virtual time to jump if an aggregate is in service · 40dd2d54
      Paolo Valente 提交于
      By definition of (the algorithm of) QFQ+, the system virtual time must
      be pushed up only if there is no 'eligible' aggregate, i.e. no
      aggregate that would have started to be served also in the ideal
      system emulated by QFQ+.  QFQ+ serves only eligible aggregates, hence
      the aggregate currently in service is eligible.  As a consequence, to
      decide whether there is no eligible aggregate, QFQ+ must also check
      whether there is no aggregate in service.
      Signed-off-by: NPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: NFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40dd2d54
    • P
      pkt_sched: sch_qfq: prevent budget from wrapping around after a dequeue · a0143efa
      Paolo Valente 提交于
      Aggregate budgets are computed so as to guarantee that, after an
      aggregate has been selected for service, that aggregate has enough
      budget to serve at least one maximum-size packet for the classes it
      contains. For this reason, after a new aggregate has been selected
      for service, its next packet is immediately dequeued, without any
      further control.
      
      The maximum packet size for a class, lmax, can be changed through
      qfq_change_class(). In case the user sets lmax to a lower value than
      the the size of some of the still-to-arrive packets, QFQ+ will
      automatically push up lmax as it enqueues these packets.  This
      automatic push up is likely to happen with TSO/GSO.
      
      In any case, if lmax is assigned a lower value than the size of some
      of the packets already enqueued for the class, then the following
      problem may occur: the size of the next packet to dequeue for the
      class may happen to be larger than lmax, after the aggregate to which
      the class belongs has been just selected for service. In this case,
      even the budget of the aggregate, which is an unsigned value, may be
      lower than the size of the next packet to dequeue. After dequeueing
      this packet and subtracting its size from the budget, the latter would
      wrap around.
      
      This fix prevents the budget from wrapping around after any packet
      dequeue.
      Signed-off-by: NPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: NFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0143efa
    • P
      pkt_sched: sch_qfq: serve activated aggregates immediately if the scheduler is empty · 2f3b89a1
      Paolo Valente 提交于
      If no aggregate is in service, then the function qfq_dequeue() does
      not dequeue any packet. For this reason, to guarantee QFQ+ to be work
      conserving, a just-activated aggregate must be set as in service
      immediately if it happens to be the only active aggregate.
      This is done by the function qfq_enqueue().
      
      Unfortunately, the function qfq_add_to_agg(), used to add a class to
      an aggregate, does not perform this important additional operation.
      In particular, if: 1) qfq_add_to_agg() is invoked to complete the move
      of a class from a source aggregate, becoming, for this move, inactive,
      to a destination aggregate, becoming instead active, and 2) the
      destination aggregate becomes the only active aggregate, then this
      aggregate is not however set as in service. QFQ+ remains then in a
      non-work-conserving state until a new invocation of qfq_enqueue()
      recovers the situation.
      
      This fix solves the problem by moving the logic for setting an
      aggregate as in service directly into the function qfq_activate_agg().
      Hence, from whatever point qfq_activate_aggregate() is invoked, QFQ+
      remains work conserving.  Since the more-complex logic of this new
      version of activate_aggregate() is not necessary, in qfq_dequeue(), to
      reschedule an aggregate that finishes its budget, then the aggregate
      is now rescheduled by invoking directly the functions needed.
      Signed-off-by: NPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: NFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f3b89a1
    • P
      pkt_sched: sch_qfq: fix the update of eligible-group sets · 624b85fb
      Paolo Valente 提交于
      Between two invocations of make_eligible, the system virtual time may
      happen to grow enough that, in its binary representation, a bit with
      higher order than 31 flips. This happens especially with
      TSO/GSO. Before this fix, the mask used in make_eligible was computed
      as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
      a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
      general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
      The fix just replaces 1UL with 1ULL.
      Signed-off-by: NPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: NFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      624b85fb
    • P
      pkt_sched: sch_qfq: properly cap timestamps in charge_actual_service · 9b99b7e9
      Paolo Valente 提交于
      QFQ+ schedules the active aggregates in a group using a bucket list
      (one list per group). The bucket in which each aggregate is inserted
      depends on the aggregate's timestamps, and the number
      of buckets in a group is enough to accomodate the possible (range of)
      values of the timestamps of all the aggregates in the group. For this
      property to hold, timestamps must however be computed correctly.  One
      necessary condition for computing timestamps correctly is that the
      number of bits dequeued for each aggregate, while the aggregate is in
      service, does not exceed the maximum budget budgetmax assigned to the
      aggregate.
      
      For each aggregate, budgetmax is proportional to the number of classes
      in the aggregate. If the number of classes of the aggregate is
      decreased through qfq_change_class(), then budgetmax is decreased
      automatically as well.  Problems may occur if the aggregate is in
      service when budgetmax is decreased, because the current remaining
      budget of the aggregate and/or the service already received by the
      aggregate may happen to be larger than the new value of budgetmax.  In
      this case, when the aggregate is eventually deselected and its
      timestamps are updated, the aggregate may happen to have received an
      amount of service larger than budgetmax.  This may cause the aggregate
      to be assigned a higher virtual finish time than the maximum
      acceptable value for the last bucket in the bucket list of the group.
      
      This fix introduces a cap that addresses this issue.
      Signed-off-by: NPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: NFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b99b7e9
    • P
      net/irda: Raise dtr in non-blocking open · f74861ca
      Peter Hurley 提交于
      DTR/RTS need to be raised, regardless of the open() mode, but not
      if the port has already shutdown.
      Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f74861ca
    • P
      net/irda: Use barrier to set task state · 0b176ce3
      Peter Hurley 提交于
      Without a memory and compiler barrier, the task state change
      can migrate relative to the condition testing in a blocking loop.
      However, the task state change must be visible across all cpus
      prior to testing those conditions. Failing to do this can result
      in the familiar 'lost wakeup' and this task will hang until killed.
      Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b176ce3
    • P
      net/irda: Hold port lock while bumping blocked_open · 2f7c069b
      Peter Hurley 提交于
      Although tty_lock() already protects concurrent update to
      blocked_open, that fails to meet the separation-of-concerns between
      tty_port and tty.
      Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f7c069b
    • P
      net/irda: Fix port open counts · a4ed2e73
      Peter Hurley 提交于
      Saving the port count bump is unsafe. If the tty is hung up while
      this open was blocking, the port count is zeroed.
      
      Explicitly check if the tty was hung up while blocking, and correct
      the port count if not.
      Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4ed2e73
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net into intel · 0305d068
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ===================
      This series contains fixes to e1000e and igb.
      
      The e1000e fix resolves an issue at 1000Mbps link speed, where one of the
      MAC's internal clocks can be stopped for up to 4us when entering K1 (a
      power mode of the MAC-PHY interconnect).  If the MAC is waiting for
      completion indications for 2 DMA write requests into Host memory
      (e.g. descriptor writeback or Rx packet writing) and the
      indications occur while the clock is stopped, both indications will be
      missed by the MAC causing the MAC to wait for the completion indications
      and be unable to generate further DMA write requests.  This results in an
      apparent hardware hang.  The patch works-around the issue by disabling
      the de-assertion of the clock request when 1000Mbps link is acquired (K1
      must be disabled while doing this).
      
      The igb fix to drop BUILD_BUG_ON check from igb_build_rx_buffer resolves
      a build error on s390 devices.  The igb driver was throwing a build error
      due to the fact that a frame built using build_skb would be larger than 2K.
      Since this is not likely to change at any point in the future we are better
      off just dropping the check since we already had a check in
      igb_set_rx_buffer_len that will just disable the usage of build_skb anyway.
      
      The igb fix for i210 link setup changes the setup copper link function
      to use a switch statement, so that the appropriate setup link function
      is called for the given PHY types.
      
      Lastly, the igb fix for a lockdep issue in igb_get_i2c_client resolves
      the issue by re-factoring the initialization and usage of the i2c_client.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0305d068
    • L
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 9f225788
      Linus Torvalds 提交于
      Pull powerpc fixes from Ben Herrenschmidt:
       "Here are a few powerpc bits & fixes for rc1.  A couple of str*cpy
        fixes, some fixes in handling the FSCR register on Power8 (controls
        the enabling of processor features), a 32-bit build fix and a couple
        more nits."
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: Set DSCR bit in FSCR setup
        powerpc: Add DSCR FSCR register bit definition
        powerpc: Fix setting FSCR for HV=0 and on secondary CPUs
        powerpc: Wireup the kcmp syscall to sys_ni
        powerpc: Remove unused BITOP_LE_SWIZZLE macro
        powerpc: Avoid link stack corruption in MMU on syscall entry path
        drivers/tty/hvc: Use strlcpy instead of strncpy
        powerpc/pseries/hvcserver: Fix strncpy buffer limit in location code
        powerpc: Fix compile of sha1-powerpc-asm.S on 32-bit
      9f225788