1. 27 12月, 2019 5 次提交
  2. 26 12月, 2019 8 次提交
    • D
      Merge branch 'hsr-fix-several-bugs-in-hsr-module' · 095e90e0
      David S. Miller 提交于
      Taehee Yoo says:
      
      ====================
      hsr: fix several bugs in hsr module
      
      1. The first patch fixes debugfs warning when it's opened when hsr module
      is being removed. debugfs file is opened, it tries to hold .owner module,
      but it would print warning messages if it couldn't hold .owner module.
      In order to avoid the warning message, this patch makes hsr module does
      not set .owner. Unsetting .owner is safe because these are protected by
      inode_lock().
      
      2. The second patch fixes wrong error handling of hsr_dev_finalize()
      a) hsr_dev_finalize() calls debugfs_create_{dir/file} to create debugfs.
      it checks NULL pointer but debugfs don't return NULL so it's wrong code.
      b) hsr_dev_finalize() calls register_netdevice(). so if it fails after
      register_netdevice(), it should call unregister_netdevice().
      But it doesn't.
      c) debugfs doesn't affect any actual logic of hsr module.
      So, the failure of creating of debugfs could be ignored.
      
      3. The third patch adds hsr root debugfs directory.
      When hsr interface is created, it creates debugfs directory in
      /sys/kernel/debug/<interface name>.
      It's a little bit faulty path because if an interface is the same with
      another directory name in the same path, it will fail. If hsr root
      directory is existing, the possibility of failure of creating debugfs
      file will be reduced.
      
      4. The fourth patch adds debugfs rename routine.
      debugfs directory name is the same with hsr interface name.
      So hsr interface name is changed, debugfs directory name should be
      changed too.
      
      5. The fifth patch fixes a race condition in node list add and del.
      hsr nodes are protected by RCU and there is no write side lock.
      But node insertions and deletions could be being operated concurrently.
      So write side locking is needed.
      
      6. The Sixth patch resets network header
      Tap routine is enabled, below message will be printed.
      
      [  175.852292][    C3] protocol 88fb is buggy, dev veth0
      
      hsr module doesn't set network header for supervision frame.
      But tap routine validates network header.
      If network header wasn't set, it resets and warns about it.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      095e90e0
    • T
      hsr: reset network header when supervision frame is created · 3ed0a1d5
      Taehee Yoo 提交于
      The supervision frame is L2 frame.
      When supervision frame is created, hsr module doesn't set network header.
      If tap routine is enabled, dev_queue_xmit_nit() is called and it checks
      network_header. If network_header pointer wasn't set(or invalid),
      it resets network_header and warns.
      In order to avoid unnecessary warning message, resetting network_header
      is needed.
      
      Test commands:
          ip netns add nst
          ip link add veth0 type veth peer name veth1
          ip link add veth2 type veth peer name veth3
          ip link set veth1 netns nst
          ip link set veth3 netns nst
          ip link set veth0 up
          ip link set veth2 up
          ip link add hsr0 type hsr slave1 veth0 slave2 veth2
          ip a a 192.168.100.1/24 dev hsr0
          ip link set hsr0 up
          ip netns exec nst ip link set veth1 up
          ip netns exec nst ip link set veth3 up
          ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
          ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
          ip netns exec nst ip link set hsr1 up
          tcpdump -nei veth0
      
      Splat looks like:
      [  175.852292][    C3] protocol 88fb is buggy, dev veth0
      
      Fixes: f421436a ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ed0a1d5
    • T
      hsr: fix a race condition in node list insertion and deletion · 92a35678
      Taehee Yoo 提交于
      hsr nodes are protected by RCU and there is no write side lock.
      But node insertions and deletions could be being operated concurrently.
      So write side locking is needed.
      
      Test commands:
          ip netns add nst
          ip link add veth0 type veth peer name veth1
          ip link add veth2 type veth peer name veth3
          ip link set veth1 netns nst
          ip link set veth3 netns nst
          ip link set veth0 up
          ip link set veth2 up
          ip link add hsr0 type hsr slave1 veth0 slave2 veth2
          ip a a 192.168.100.1/24 dev hsr0
          ip link set hsr0 up
          ip netns exec nst ip link set veth1 up
          ip netns exec nst ip link set veth3 up
          ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
          ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
          ip netns exec nst ip link set hsr1 up
      
          for i in {0..9}
          do
              for j in {0..9}
      	do
      	    for k in {0..9}
      	    do
      	        for l in {0..9}
      		do
      	        arping 192.168.100.2 -I hsr0 -s 00:01:3$i:4$j:5$k:6$l -c1 &
      		done
      	    done
      	done
          done
      
      Splat looks like:
      [  236.066091][ T3286] list_add corruption. next->prev should be prev (ffff8880a5940300), but was ffff8880a5940d0.
      [  236.069617][ T3286] ------------[ cut here ]------------
      [  236.070545][ T3286] kernel BUG at lib/list_debug.c:25!
      [  236.071391][ T3286] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  236.072343][ T3286] CPU: 0 PID: 3286 Comm: arping Tainted: G        W         5.5.0-rc1+ #209
      [  236.073463][ T3286] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  236.074695][ T3286] RIP: 0010:__list_add_valid+0x74/0xd0
      [  236.075499][ T3286] Code: 48 39 da 75 27 48 39 f5 74 36 48 39 dd 74 31 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 b
      [  236.078277][ T3286] RSP: 0018:ffff8880aaa97648 EFLAGS: 00010286
      [  236.086991][ T3286] RAX: 0000000000000075 RBX: ffff8880d4624c20 RCX: 0000000000000000
      [  236.088000][ T3286] RDX: 0000000000000075 RSI: 0000000000000008 RDI: ffffed1015552ebf
      [  236.098897][ T3286] RBP: ffff88809b53d200 R08: ffffed101b3c04f9 R09: ffffed101b3c04f9
      [  236.099960][ T3286] R10: 00000000308769a1 R11: ffffed101b3c04f8 R12: ffff8880d4624c28
      [  236.100974][ T3286] R13: ffff8880d4624c20 R14: 0000000040310100 R15: ffff8880ce17ee02
      [  236.138967][ T3286] FS:  00007f23479fa680(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
      [  236.144852][ T3286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  236.145720][ T3286] CR2: 00007f4a14bab210 CR3: 00000000a61c6001 CR4: 00000000000606f0
      [  236.146776][ T3286] Call Trace:
      [  236.147222][ T3286]  hsr_add_node+0x314/0x490 [hsr]
      [  236.153633][ T3286]  hsr_forward_skb+0x2b6/0x1bc0 [hsr]
      [  236.154362][ T3286]  ? rcu_read_lock_sched_held+0x90/0xc0
      [  236.155091][ T3286]  ? rcu_read_lock_bh_held+0xa0/0xa0
      [  236.156607][ T3286]  hsr_dev_xmit+0x70/0xd0 [hsr]
      [  236.157254][ T3286]  dev_hard_start_xmit+0x160/0x740
      [  236.157941][ T3286]  __dev_queue_xmit+0x1961/0x2e10
      [  236.158565][ T3286]  ? netdev_core_pick_tx+0x2e0/0x2e0
      [ ... ]
      
      Reported-by: syzbot+3924327f9ad5f4d2b343@syzkaller.appspotmail.com
      Fixes: f421436a ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a35678
    • T
      hsr: rename debugfs file when interface name is changed · 4c2d5e33
      Taehee Yoo 提交于
      hsr interface has own debugfs file, which name is same with interface name.
      So, interface name is changed, debugfs file name should be changed too.
      
      Fixes: fc4ecaee ("net: hsr: add debugfs support for display node list")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c2d5e33
    • T
      hsr: add hsr root debugfs directory · c6c4ccd7
      Taehee Yoo 提交于
      In current hsr code, when hsr interface is created, it creates debugfs
      directory /sys/kernel/debug/<interface name>.
      If there is same directory or file name in there, it fails.
      In order to reduce possibility of failure of creation of debugfs,
      this patch adds root directory.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
      
      Before this patch:
          /sys/kernel/debug/hsr0/node_table
      
      After this patch:
          /sys/kernel/debug/hsr/hsr0/node_table
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6c4ccd7
    • T
      hsr: fix error handling routine in hsr_dev_finalize() · 1d19e2d5
      Taehee Yoo 提交于
      hsr_dev_finalize() is called to create new hsr interface.
      There are some wrong error handling codes.
      
      1. wrong checking return value of debugfs_create_{dir/file}.
      These function doesn't return NULL. If error occurs in there,
      it returns error pointer.
      So, it should check error pointer instead of NULL.
      
      2. It doesn't unregister interface if it fails to setup hsr interface.
      If it fails to initialize hsr interface after register_netdevice(),
      it should call unregister_netdevice().
      
      3. Ignore failure of creation of debugfs
      If creating of debugfs dir and file is failed, creating hsr interface
      will be failed. But debugfs doesn't affect actual logic of hsr module.
      So, ignoring this is more correct and this behavior is more general.
      
      Fixes: c5a75911 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d19e2d5
    • T
      hsr: avoid debugfs warning message when module is remove · 84bb59d7
      Taehee Yoo 提交于
      When hsr module is being removed, debugfs_remove() is called to remove
      both debugfs directory and file.
      
      When module is being removed, module state is changed to
      MODULE_STATE_GOING then exit() is called.
      At this moment, module couldn't be held so try_module_get()
      will be failed.
      
      debugfs's open() callback tries to hold the module if .owner is existing.
      If it fails, warning message is printed.
      
      CPU0				CPU1
      delete_module()
          try_stop_module()
          hsr_exit()			open() <-- WARNING
              debugfs_remove()
      
      In order to avoid the warning message, this patch makes hsr module does
      not set .owner. Unsetting .owner is safe because these are protected by
      inode_lock().
      
      Test commands:
          #SHELL1
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          while :
          do
              ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
      	modprobe -rv hsr
          done
      
          #SHELL2
          while :
          do
              cat /sys/kernel/debug/hsr0/node_table
          done
      
      Splat looks like:
      [  101.223783][ T1271] ------------[ cut here ]------------
      [  101.230309][ T1271] debugfs file owner did not clean up at exit: node_table
      [  101.230380][ T1271] WARNING: CPU: 3 PID: 1271 at fs/debugfs/file.c:309 full_proxy_open+0x10f/0x650
      [  101.233153][ T1271] Modules linked in: hsr(-) dummy veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_d]
      [  101.237112][ T1271] CPU: 3 PID: 1271 Comm: cat Tainted: G        W         5.5.0-rc1+ #204
      [  101.238270][ T1271] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  101.240379][ T1271] RIP: 0010:full_proxy_open+0x10f/0x650
      [  101.241166][ T1271] Code: 48 c1 ea 03 80 3c 02 00 0f 85 c1 04 00 00 49 8b 3c 24 e8 04 86 7e ff 84 c0 75 2d 4c 8
      [  101.251985][ T1271] RSP: 0018:ffff8880ca22fa38 EFLAGS: 00010286
      [  101.273355][ T1271] RAX: dffffc0000000008 RBX: ffff8880cc6e6200 RCX: 0000000000000000
      [  101.274466][ T1271] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8880c4dd5c14
      [  101.275581][ T1271] RBP: 0000000000000000 R08: fffffbfff2922f5d R09: 0000000000000000
      [  101.276733][ T1271] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffc0551bc0
      [  101.277853][ T1271] R13: ffff8880c4059a48 R14: ffff8880be50a5e0 R15: ffffffff941adaa0
      [  101.278956][ T1271] FS:  00007f8871cda540(0000) GS:ffff8880da800000(0000) knlGS:0000000000000000
      [  101.280216][ T1271] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  101.282832][ T1271] CR2: 00007f88717cfd10 CR3: 00000000b9440005 CR4: 00000000000606e0
      [  101.283974][ T1271] Call Trace:
      [  101.285328][ T1271]  do_dentry_open+0x63c/0xf50
      [  101.286077][ T1271]  ? open_proxy_open+0x270/0x270
      [  101.288271][ T1271]  ? __x64_sys_fchdir+0x180/0x180
      [  101.288987][ T1271]  ? inode_permission+0x65/0x390
      [  101.289682][ T1271]  path_openat+0x701/0x2810
      [  101.290294][ T1271]  ? path_lookupat+0x880/0x880
      [  101.290957][ T1271]  ? check_chain_key+0x236/0x5d0
      [  101.291676][ T1271]  ? __lock_acquire+0xdfe/0x3de0
      [  101.292358][ T1271]  ? sched_clock+0x5/0x10
      [  101.292962][ T1271]  ? sched_clock_cpu+0x18/0x170
      [  101.293644][ T1271]  ? find_held_lock+0x39/0x1d0
      [  101.305616][ T1271]  do_filp_open+0x17a/0x270
      [  101.306061][ T1271]  ? may_open_dev+0xc0/0xc0
      [ ... ]
      
      Fixes: fc4ecaee ("net: hsr: add debugfs support for display node list")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84bb59d7
    • N
  3. 25 12月, 2019 20 次提交
    • D
      Merge branch 's390-qeth-fixes' · 7f936f2a
      David S. Miller 提交于
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2019-12-23
      
      please apply the following patch series for qeth to your net tree.
      
      This brings two fixes for errors during device initialization, deals with
      several issues in the vnicc control code, and adds a missing lock.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f936f2a
    • J
      s390/qeth: fix initialization on old HW · 0b698c83
      Julian Wiedmann 提交于
      I stumbled over an old OSA model that claims to support DIAG_ASSIST,
      but then rejects the cmd to query its DIAG capabilities.
      
      In the old code this was ok, as the returned raw error code was > 0.
      Now that we translate the raw codes to errnos, the "rc < 0" causes us
      to fail the initialization of the device.
      
      The fix is trivial: don't bail out when the DIAG query fails. Such an
      error is not critical, we can still use the device (with a slightly
      reduced set of features).
      
      Fixes: 742d4d40 ("s390/qeth: convert remaining legacy cmd callbacks")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b698c83
    • A
      s390/qeth: vnicc Fix init to default · d1b9ae18
      Alexandra Winter 提交于
      During vnicc_init wanted_char should be compared to cur_char and not
      to QETH_VNICC_DEFAULT. Without this patch there is no way to enforce
      the default values as desired values.
      
      Note, that it is expected, that a card comes online with default values.
      This patch was tested with private card firmware.
      
      Fixes: caa1f0b1 ("s390/qeth: add VNICC enable/disable support")
      Signed-off-by: NAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1b9ae18
    • A
      s390/qeth: Fix vnicc_is_in_use if rx_bcast not set · e8a66d80
      Alexandra Winter 提交于
      Symptom: After vnicc/rx_bcast has been manually set to 0,
      	bridge_* sysfs parameters can still be set or written.
      Only occurs on HiperSockets, as OSA doesn't support changing rx_bcast.
      
      Vnic characteristics and bridgeport settings are mutually exclusive.
      rx_bcast defaults to 1, so manually setting it to 0 should disable
      bridge_* parameters.
      
      Instead it makes sense here to check the supported mask. If the card
      does not support vnicc at all, bridge commands are always allowed.
      
      Fixes: caa1f0b1 ("s390/qeth: add VNICC enable/disable support")
      Signed-off-by: NAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8a66d80
    • A
      s390/qeth: fix false reporting of VNIC CHAR config failure · 68c57bfd
      Alexandra Winter 提交于
      Symptom: Error message "Configuring the VNIC characteristics failed"
      in dmesg whenever an OSA interface on z15 is set online.
      
      The VNIC characteristics get re-programmed when setting a L2 device
      online. This follows the selected 'wanted' characteristics - with the
      exception that the INVISIBLE characteristic unconditionally gets
      switched off.
      
      For devices that don't support INVISIBLE (ie. OSA), the resulting
      IO failure raises a noisy error message
      ("Configuring the VNIC characteristics failed").
      For IQD, INVISIBLE is off by default anyways.
      
      So don't unnecessarily special-case the INVISIBLE characteristic, and
      thereby suppress the misleading error message on OSA devices.
      
      Fixes: caa1f0b1 ("s390/qeth: add VNICC enable/disable support")
      Signed-off-by: NAlexandra Winter <wintera@linux.ibm.com>
      Reviewed-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68c57bfd
    • J
      s390/qeth: lock the card while changing its hsuid · 5b6c7b55
      Julian Wiedmann 提交于
      qeth_l3_dev_hsuid_store() initially checks the card state, but doesn't
      take the conf_mutex to ensure that the card stays in this state while
      being reconfigured.
      
      Rework the code to take this lock, and drop a redundant state check in a
      helper function.
      
      Fixes: b3332930 ("qeth: add support for af_iucv HiperSockets transport")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b6c7b55
    • J
      s390/qeth: fix qdio teardown after early init error · 8b5026bc
      Julian Wiedmann 提交于
      qeth_l?_set_online() goes through a number of initialization steps, and
      on any error uses qeth_l?_stop_card() to tear down the residual state.
      
      The first initialization step is qeth_core_hardsetup_card(). When this
      fails after having established a QDIO context on the device
      (ie. somewhere after qeth_mpc_initialize()), qeth_l?_stop_card() doesn't
      shut down this QDIO context again (since the card state hasn't
      progressed from DOWN at this stage).
      
      Even worse, we then call qdio_free() as final teardown step to free the
      QDIO data structures - while some of them are still hooked into wider
      QDIO infrastructure such as the IRQ list. This is inevitably followed by
      use-after-frees and other nastyness.
      
      Fix this by unconditionally calling qeth_qdio_clear_card() to shut down
      the QDIO context, and also to halt/clear any pending activity on the
      various IO channels.
      Remove the naive attempt at handling the teardown in
      qeth_mpc_initialize(), it clearly doesn't suffice and we're handling it
      properly now in the wider teardown code.
      
      Fixes: 4a71df50 ("qeth: new qeth device driver")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b5026bc
    • D
      Merge branch 'disable-neigh-update-for-tunnels-during-pmtu-update' · 47d0b2fe
      David S. Miller 提交于
      Hangbin Liu says:
      
      ====================
      disable neigh update for tunnels during pmtu update
      
      When we setup a pair of gretap, ping each other and create neighbour cache.
      Then delete and recreate one side. We will never be able to ping6 to the new
      created gretap.
      
      The reason is when we ping6 remote via gretap, we will call like
      
      gre_tap_xmit()
       - ip_tunnel_xmit()
         - tnl_update_pmtu()
           - skb_dst_update_pmtu()
             - ip6_rt_update_pmtu()
               - __ip6_rt_update_pmtu()
                 - dst_confirm_neigh()
                   - ip6_confirm_neigh()
                     - __ipv6_confirm_neigh()
                       - n->confirmed = now
      
      As the confirmed time updated, in neigh_timer_handler() the check for
      NUD_DELAY confirm time will pass and the neigh state will back to
      NUD_REACHABLE. So the old/wrong mac address will be used again.
      
      If we do not update the confirmed time, the neigh state will go to
      neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
      neigh later, which is what IPv4 does.
      
      We couldn't remove the ip6_confirm_neigh() directly as we still need it
      for TCP flows. To fix it, we have to pass a bool parameter to
      dst_ops.update_pmtu() and only disable neighbor update for tunnels.
      
      v5: No code change, upate some commits description
      v4: No code change, upate some commits description
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47d0b2fe
    • H
      net/dst: do not confirm neighbor for vxlan and geneve pmtu update · f081042d
      Hangbin Liu 提交于
      When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      So disable the neigh confirm for vxlan and geneve pmtu update.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Fixes: a93bf0ff ("vxlan: update skb dst pmtu on tx path")
      Fixes: 52a589d5 ("geneve: update skb dst pmtu on tx path")
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Tested-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f081042d
    • H
      sit: do not confirm neighbor when do pmtu update · 4d42df46
      Hangbin Liu 提交于
      When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d42df46
    • H
      vti: do not confirm neighbor when do pmtu update · 8247a79e
      Hangbin Liu 提交于
      When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      Although vti and vti6 are immune to this problem because they are IFF_NOARP
      interfaces, as Guillaume pointed. There is still no sense to confirm neighbour
      here.
      
      v5: Update commit description.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8247a79e
    • H
      tunnel: do not confirm neighbor when do pmtu update · 7a1592bc
      Hangbin Liu 提交于
      When do tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      v5: No Change.
      v4: Update commit description
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      
      Fixes: 0dec879f ("net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP")
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Tested-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a1592bc
    • H
      net/dst: add new function skb_dst_update_pmtu_no_confirm · 07dc35c6
      Hangbin Liu 提交于
      Add a new function skb_dst_update_pmtu_no_confirm() for callers who need
      update pmtu but should not do neighbor confirm.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07dc35c6
    • H
      gtp: do not confirm neighbor when do pmtu update · 6e9105c7
      Hangbin Liu 提交于
      When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
      we should not call dst_confirm_neigh() as there is no two-way communication.
      
      Although GTP only support ipv4 right now, and __ip_rt_update_pmtu() does not
      call dst_confirm_neigh(), we still set it to false to keep consistency with
      IPv6 code.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e9105c7
    • H
      ip6_gre: do not confirm neighbor when do pmtu update · 675d76ad
      Hangbin Liu 提交于
      When we do ipv6 gre pmtu update, we will also do neigh confirm currently.
      This will cause the neigh cache be refreshed and set to REACHABLE before
      xmit.
      
      But if the remote mac address changed, e.g. device is deleted and recreated,
      we will not able to notice this and still use the old mac address as the neigh
      cache is REACHABLE.
      
      Fix this by disable neigh confirm when do pmtu update
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      675d76ad
    • H
      net: add bool confirm_neigh parameter for dst_ops.update_pmtu · bd085ef6
      Hangbin Liu 提交于
      The MTU update code is supposed to be invoked in response to real
      networking events that update the PMTU. In IPv6 PMTU update function
      __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
      confirmed time.
      
      But for tunnel code, it will call pmtu before xmit, like:
        - tnl_update_pmtu()
          - skb_dst_update_pmtu()
            - ip6_rt_update_pmtu()
              - __ip6_rt_update_pmtu()
                - dst_confirm_neigh()
      
      If the tunnel remote dst mac address changed and we still do the neigh
      confirm, we will not be able to update neigh cache and ping6 remote
      will failed.
      
      So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
      should not be invoking dst_confirm_neigh() as we have no evidence
      of successful two-way communication at this point.
      
      On the other hand it is also important to keep the neigh reachability fresh
      for TCP flows, so we cannot remove this dst_confirm_neigh() call.
      
      To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
      to choose whether we should do neigh update or not. I will add the parameter
      in this patch and set all the callers to true to comply with the previous
      way, and fix the tunnel code one by one on later patches.
      
      v5: No change.
      v4: No change.
      v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
          dst_ops.update_pmtu to control whether we should do neighbor confirm.
          Also split the big patch to small ones for each area.
      v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Reviewed-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd085ef6
    • D
      Merge tag 'rxrpc-fixes-20191220' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · ff43ae4b
      David S. Miller 提交于
      David Howells says:
      
      ====================
      rxrpc: Fixes
      
      Here are a couple of bugfixes plus a patch that makes one of the bugfixes
      easier:
      
       (1) Move the ping and mutex unlock on a new call from rxrpc_input_packet()
           into rxrpc_new_incoming_call(), which it calls.  This means the
           lock-unlock section is entirely within the latter function.  This
           simplifies patch (2).
      
       (2) Don't take the call->user_mutex at all in the softirq path.  Mutexes
           aren't allowed to be taken or released there and a patch was merged
           that caused a warning to be emitted every time this happened.  Looking
           at the code again, it looks like that taking the mutex isn't actually
           necessary, as the value of call->state will block access to the call.
      
       (3) Fix the incoming call path to check incoming calls earlier to reject
           calls to RPC services for which we don't have a security key of the
           appropriate class.  This avoids an assertion failure if YFS tries
           making a secure call to the kafs cache manager RPC service.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff43ae4b
    • F
      net: dsa: bcm_sf2: Fix IP fragment location and behavior · 7c3125f0
      Florian Fainelli 提交于
      The IP fragment is specified through user-defined field as the first
      bit of the first user-defined word. We were previously trying to extract
      it from the user-defined mask which could not possibly work. The ip_frag
      is also supposed to be a boolean, if we do not cast it as such, we risk
      overwriting the next fields in CFP_DATA(6) which would render the rule
      inoperative.
      
      Fixes: 7318166c ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c3125f0
    • M
      sctp: fix err handling of stream initialization · 61d5d406
      Marcelo Ricardo Leitner 提交于
      The fix on 951c6db9 fixed the issued reported there but introduced
      another. When the allocation fails within sctp_stream_init() it is
      okay/necessary to free the genradix. But it is also called when adding
      new streams, from sctp_send_add_streams() and
      sctp_process_strreset_addstrm_in() and in those situations it cannot
      just free the genradix because by then it is a fully operational
      association.
      
      The fix here then is to only free the genradix in sctp_stream_init()
      and on those other call sites  move on with what it already had and let
      the subsequent error handling to handle it.
      
      Tested with the reproducers from this report and the previous one,
      with lksctp-tools and sctp-tests.
      
      Reported-by: syzbot+9a1bc632e78a1a98488b@syzkaller.appspotmail.com
      Fixes: 951c6db9 ("sctp: fix memleak on err handling of stream initialization")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61d5d406
    • A
      udp: fix integer overflow while computing available space in sk_rcvbuf · feed8a4f
      Antonio Messina 提交于
      When the size of the receive buffer for a socket is close to 2^31 when
      computing if we have enough space in the buffer to copy a packet from
      the queue to the buffer we might hit an integer overflow.
      
      When an user set net.core.rmem_default to a value close to 2^31 UDP
      packets are dropped because of this overflow. This can be visible, for
      instance, with failure to resolve hostnames.
      
      This can be fixed by casting sk_rcvbuf (which is an int) to unsigned
      int, similarly to how it is done in TCP.
      Signed-off-by: NAntonio Messina <amessina@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feed8a4f
  4. 23 12月, 2019 7 次提交
    • L
      Merge tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · c6017471
      Linus Torvalds 提交于
      Pull xfs fixes from Darrick Wong:
       "Fix a few bugs that could lead to corrupt files, fsck complaints, and
        filesystem crashes:
      
         - Minor documentation fixes
      
         - Fix a file corruption due to read racing with an insert range
           operation.
      
         - Fix log reservation overflows when allocating large rt extents
      
         - Fix a buffer log item flags check
      
         - Don't allow administrators to mount with sunit= options that will
           cause later xfs_repair complaints about the root directory being
           suspicious because the fs geometry appeared inconsistent
      
         - Fix a non-static helper that should have been static"
      
      * tag 'xfs-5.5-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: Make the symbol 'xfs_rtalloc_log_count' static
        xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
        xfs: split the sunit parameter update into two parts
        xfs: refactor agfl length computation function
        libxfs: resync with the userspace libxfs
        xfs: use bitops interface for buf log item AIL flag check
        xfs: fix log reservation overflows when allocating large rt extents
        xfs: stabilize insert range start boundary to avoid COW writeback race
        xfs: fix Sphinx documentation warning
      c6017471
    • L
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · a3965607
      Linus Torvalds 提交于
      Pull ext4 bug fixes from Ted Ts'o:
       "Ext4 bug fixes, including a regression fix"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: clarify impact of 'commit' mount option
        ext4: fix unused-but-set-variable warning in ext4_add_entry()
        jbd2: fix kernel-doc notation warning
        ext4: use RCU API in debug_print_tree
        ext4: validate the debug_want_extra_isize mount option at parse time
        ext4: reserve revoke credits in __ext4_new_inode
        ext4: unlock on error in ext4_expand_extra_isize()
        ext4: optimize __ext4_check_dir_entry()
        ext4: check for directory entries too close to block end
        ext4: fix ext4_empty_dir() for directories with holes
      a3965607
    • L
      Merge tag 'block-5.5-20191221' of git://git.kernel.dk/linux-block · 44579f35
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "Let's try this one again, this time without the compat_ioctl changes.
        We've got those fixed up, but that can go out next week.
      
        This contains:
      
         - block queue flush lockdep annotation (Bart)
      
         - Type fix for bsg_queue_rq() (Bart)
      
         - Three dasd fixes (Stefan, Jan)
      
         - nbd deadlock fix (Mike)
      
         - Error handling bio user map fix (Yang)
      
         - iocost fix (Tejun)
      
         - sbitmap waitqueue addition fix that affects the kyber IO scheduler
           (David)"
      
      * tag 'block-5.5-20191221' of git://git.kernel.dk/linux-block:
        sbitmap: only queue kyber's wait callback if not already active
        block: fix memleak when __blk_rq_map_user_iov() is failed
        s390/dasd: fix typo in copyright statement
        s390/dasd: fix memleak in path handling error case
        s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly
        block: Fix a lockdep complaint triggered by request queue flushing
        block: Fix the type of 'sts' in bsg_queue_rq()
        block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT
        nbd: fix shutdown and recv work deadlock v2
        iocost: over-budget forced IOs should schedule async delay
      44579f35
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a313c8e0
      Linus Torvalds 提交于
      Pull KVM fixes from Paolo Bonzini:
       "PPC:
         - Fix a bug where we try to do an ultracall on a system without an
           ultravisor
      
        KVM:
         - Fix uninitialised sysreg accessor
         - Fix handling of demand-paged device mappings
         - Stop spamming the console on IMPDEF sysregs
         - Relax mappings of writable memslots
         - Assorted cleanups
      
        MIPS:
         - Now orphan, James Hogan is stepping down
      
        x86:
         - MAINTAINERS change, so long Radim and thanks for all the fish
         - supported CPUID fixes for AMD machines without SPEC_CTRL"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        MAINTAINERS: remove Radim from KVM maintainers
        MAINTAINERS: Orphan KVM for MIPS
        kvm: x86: Host feature SSBD doesn't imply guest feature AMD_SSBD
        kvm: x86: Host feature SSBD doesn't imply guest feature SPEC_CTRL_SSBD
        KVM: PPC: Book3S HV: Don't do ultravisor calls on systems without ultravisor
        KVM: arm/arm64: Properly handle faulting of device mappings
        KVM: arm64: Ensure 'params' is initialised when looking up sys register
        KVM: arm/arm64: Remove excessive permission check in kvm_arch_prepare_memory_region
        KVM: arm64: Don't log IMP DEF sysreg traps
        KVM: arm64: Sanely ratelimit sysreg messages
        KVM: arm/arm64: vgic: Use wrapper function to lock/unlock all vcpus in kvm_vgic_create()
        KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy()
        KVM: arm/arm64: Get rid of unused arg in cpu_init_hyp_mode()
      a313c8e0
    • L
      Merge tag 'riscv/for-v5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 7214618c
      Linus Torvalds 提交于
      Pull RISC-V fixes from Paul Walmsley:
       "Several fixes, and one cleanup, for RISC-V.
      
        Fixes:
      
         - Fix an error in a Kconfig file that resulted in an undefined
           Kconfig option "CONFIG_CONFIG_MMU"
      
         - Fix undefined Kconfig option "CONFIG_CONFIG_MMU"
      
         - Fix scratch register clearing in M-mode (affects nommu users)
      
         - Fix a mismerge on my part that broke the build for
           CONFIG_SPARSEMEM_VMEMMAP users
      
        Cleanup:
      
         - Move SiFive L2 cache-related code to drivers/soc, per request"
      
      * tag 'riscv/for-v5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: move sifive_l2_cache.c to drivers/soc
        riscv: define vmemmap before pfn_to_page calls
        riscv: fix scratch register clearing in M-mode.
        riscv: Fix use of undefined config option CONFIG_CONFIG_MMU
      7214618c
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 78bac77b
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso,
          including adding a missing ipv6 match description.
      
       2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi
          Bhat.
      
       3) Fix uninit value in bond_neigh_init(), from Eric Dumazet.
      
       4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold.
      
       5) Fix use after free in tipc_disc_rcv(), from Tuong Lien.
      
       6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul
          Chaignon.
      
       7) Multicast MAC limit test is off by one in qede, from Manish Chopra.
      
       8) Fix established socket lookup race when socket goes from
          TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening
          RCU grace period. From Eric Dumazet.
      
       9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet.
      
      10) Fix active backup transition after link failure in bonding, from
          Mahesh Bandewar.
      
      11) Avoid zero sized hash table in gtp driver, from Taehee Yoo.
      
      12) Fix wrong interface passed to ->mac_link_up(), from Russell King.
      
      13) Fix DSA egress flooding settings in b53, from Florian Fainelli.
      
      14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost.
      
      15) Fix double free in dpaa2-ptp code, from Ioana Ciornei.
      
      16) Reject invalid MTU values in stmmac, from Jose Abreu.
      
      17) Fix refcount leak in error path of u32 classifier, from Davide
          Caratti.
      
      18) Fix regression causing iwlwifi firmware crashes on boot, from Anders
          Kaseorg.
      
      19) Fix inverted return value logic in llc2 code, from Chan Shu Tak.
      
      20) Disable hardware GRO when XDP is attached to qede, frm Manish
          Chopra.
      
      21) Since we encode state in the low pointer bits, dst metrics must be
          at least 4 byte aligned, which is not necessarily true on m68k. Add
          annotations to fix this, from Geert Uytterhoeven.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits)
        sfc: Include XDP packet headroom in buffer step size.
        sfc: fix channel allocation with brute force
        net: dst: Force 4-byte alignment of dst_metrics
        selftests: pmtu: fix init mtu value in description
        hv_netvsc: Fix unwanted rx_table reset
        net: phy: ensure that phy IDs are correctly typed
        mod_devicetable: fix PHY module format
        qede: Disable hardware gro when xdp prog is installed
        net: ena: fix issues in setting interrupt moderation params in ethtool
        net: ena: fix default tx interrupt moderation interval
        net/smc: unregister ib devices in reboot_event
        net: stmmac: platform: Fix MDIO init for platforms without PHY
        llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c)
        net: hisilicon: Fix a BUG trigered by wrong bytes_compl
        net: dsa: ksz: use common define for tag len
        s390/qeth: don't return -ENOTSUPP to userspace
        s390/qeth: fix promiscuous mode after reset
        s390/qeth: handle error due to unsupported transport mode
        cxgb4: fix refcount init for TC-MQPRIO offload
        tc-testing: initial tdc selftests for cls_u32
        ...
      78bac77b
    • J
      pipe: fix empty pipe check in pipe_write() · 0dd1e377
      Jan Stancek 提交于
      LTP pipeio_1 test is hanging with v5.5-rc2-385-gb8e382a1,
      with read side observing empty pipe and sleeping and write
      side running out of space and then sleeping as well. In this
      scenario there are 5 writers and 1 reader.
      
      Problem is that after pipe_write() reacquires pipe lock, it
      re-checks for empty pipe with potentially stale 'head' and
      doesn't wake up read side anymore. pipe->tail can advance
      beyond 'head', because there are multiple writers.
      
      Use pipe->head for empty pipe check after reacquiring lock
      to observe current state.
      
      Testing: With patch, LTP pipeio_1 ran successfully in loop for 1 hour.
               Without patch it hanged within a minute.
      
      Fixes: 1b6b26ae ("pipe: fix and clarify pipe write wakeup logic")
      Reported-by: NRachel Sibley <rasibley@redhat.com>
      Signed-off-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0dd1e377