1. 04 5月, 2019 18 次提交
  2. 02 5月, 2019 22 次提交
    • G
      Linux 4.19.38 · a03957ab
      Greg Kroah-Hartman 提交于
      a03957ab
    • D
      powerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg · 5cb299c8
      Diana Craciun 提交于
      commit e59f5bd759b7dee57593c5b6c0441609bda5d530 upstream.
      Signed-off-by: NDiana Craciun <diana.craciun@nxp.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cb299c8
    • J
      net/tls: don't leak IV and record seq when offload fails · 53db6523
      Jakub Kicinski 提交于
      [ Upstream commit 12c7686111326148b4b5db189130522a4ad1be4a ]
      
      When device refuses the offload in tls_set_device_offload_rx()
      it calls tls_sw_free_resources_rx() to clean up software context
      state.
      
      Unfortunately, tls_sw_free_resources_rx() does not free all
      the state tls_set_sw_offload() allocated - it leaks IV and
      sequence number buffers.  All other code paths which lead to
      tls_sw_release_resources_rx() (which tls_sw_free_resources_rx()
      calls) free those right before the call.
      
      Avoid the leak by moving freeing of iv and rec_seq into
      tls_sw_release_resources_rx().
      
      Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53db6523
    • J
      net/tls: avoid potential deadlock in tls_set_device_offload_rx() · d3bdd359
      Jakub Kicinski 提交于
      [ Upstream commit 62ef81d5632634d5e310ed25b9b940b2b6612b46 ]
      
      If device supports offload, but offload fails tls_set_device_offload_rx()
      will call tls_sw_free_resources_rx() which (unhelpfully) releases
      and reacquires the socket lock.
      
      For a small fix release and reacquire the device_offload_lock.
      
      Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3bdd359
    • M
      net/mlx5e: Fix use-after-free after xdp_return_frame · 041b3224
      Maxim Mikityanskiy 提交于
      [ Upstream commit 12fc512f5741443a03adde2ead20724da8ad550a ]
      
      xdp_return_frame releases the frame. It leads to releasing the page, so
      it's not allowed to access xdpi.xdpf->len after that, because xdpi.xdpf
      is at xdp->data_hard_start after convert_to_xdp_frame. This patch moves
      the memory access to precede the return of the frame.
      
      Fixes: 58b99ee3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      041b3224
    • M
      net/mlx5e: Fix the max MTU check in case of XDP · ae6b0710
      Maxim Mikityanskiy 提交于
      [ Upstream commit d460c2718906252a2a69bc6f89b537071f792e6e ]
      
      MLX5E_XDP_MAX_MTU was calculated incorrectly. It didn't account for
      NET_IP_ALIGN and MLX5E_HW2SW_MTU, and it also misused MLX5_SKB_FRAG_SZ.
      This commit fixes the calculations and adds a brief explanation for the
      formula used.
      
      Fixes: a26a5bdf ("net/mlx5e: Restrict the combination of large MTU and XDP")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae6b0710
    • P
      mlxsw: spectrum: Put MC TCs into DWRR mode · b08774d3
      Petr Machata 提交于
      [ Upstream commit f476b3f809fa02f47af6333ed63715058c3fc348 ]
      
      Both Spectrum-1 and Spectrum-2 chips are currently configured such that
      pairs of TC n (which is used for UC traffic) and TC n+8 (which is used
      for MC traffic) are feeding into the same subgroup. Strict
      prioritization is configured between the two TCs, and by enabling
      MC-aware mode on the switch, the lower-numbered (UC) TCs are favored
      over the higher-numbered (MC) TCs.
      
      On Spectrum-2 however, there is an issue in configuration of the
      MC-aware mode. As a result, MC traffic is prioritized over UC traffic.
      To work around the issue, configure the MC TCs with DWRR mode (while
      keeping the UC TCs in strict mode).
      
      With this patch, the multicast-unicast arbitration results in the same
      behavior on both Spectrum-1 and Spectrum-2 chips.
      
      Fixes: 7b819530 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b08774d3
    • I
      mlxsw: pci: Reincrease PCI reset timeout · 21e47998
      Ido Schimmel 提交于
      [ Upstream commit 1ab3030193d25878b3b1409060e1e0a879800c95 ]
      
      During driver initialization the driver sends a reset to the device and
      waits for the firmware to signal that it is ready to continue.
      
      Commit d2f372ba0914 ("mlxsw: pci: Increase PCI SW reset timeout")
      increased the timeout to 13 seconds due to longer PHY calibration in
      Spectrum-2 compared to Spectrum-1.
      
      Recently it became apparent that this timeout is too short and therefore
      this patch increases it again to a safer limit that will be reduced in
      the future.
      
      Fixes: c3ab4354 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
      Fixes: d2f372ba0914 ("mlxsw: pci: Increase PCI SW reset timeout")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21e47998
    • J
      net: hns: Fix WARNING when hns modules installed · e875a409
      Jun Xiao 提交于
      Commit dfdf26babc98 upstream
      
      this patch need merge to 4.19.y stable kernel
      
      Fix Conflict:already fixed the confilct dfdf26babc98 with Yonglong Liu
      
      stable candidate:user cannot connect to the internet via hns dev
      by default setting without this patch
      
      we have already verified this patch on kunpeng916 platform,
      and it works well.
      Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: NJun Xiao <xiaojun2@hisilicon.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e875a409
    • H
      team: fix possible recursive locking when add slaves · 7ce836e8
      Hangbin Liu 提交于
      [ Upstream commit 925b0c841e066b488cc3a60272472b2c56300704 ]
      
      If we add a bond device which is already the master of the team interface,
      we will hold the team->lock in team_add_slave() first and then request the
      lock in team_set_mac_address() again. The functions are called like:
      
      - team_add_slave()
       - team_port_add()
         - team_port_enter()
           - team_modeop_port_enter()
             - __set_port_dev_addr()
               - dev_set_mac_address()
                 - bond_set_mac_address()
                   - dev_set_mac_address()
        	       - team_set_mac_address
      
      Although team_upper_dev_link() would check the upper devices but it is
      called too late. Fix it by adding a checking before processing the slave.
      
      v2: Do not split the string in netdev_err()
      
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ce836e8
    • S
      stmmac: pci: Adjust IOT2000 matching · 1f78e75e
      Su Bao Cheng 提交于
      [ Upstream commit e0c1d14a1a3211dccf0540a6703ffbd5d2a75bdb ]
      
      Since there are more IOT2040 variants with identical hardware but
      different asset tags, the asset tag matching should be adjusted to
      support them.
      
      For the board name "SIMATIC IOT2000", currently there are 2 types of
      hardware, IOT2020 and IOT2040. The IOT2020 is identified by its unique
      asset tag. Match on it first. If we then match on the board name only,
      we will catch all IOT2040 variants. In the future there will be no other
      devices with the "SIMATIC IOT2000" DMI board name but different
      hardware.
      Signed-off-by: NSu Bao Cheng <baocheng.su@siemens.com>
      Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f78e75e
    • J
      net/tls: fix refcount adjustment in fallback · e97f0bc7
      Jakub Kicinski 提交于
      [ Upstream commit 9188d5ca454fd665145904267e726e9e8d122f5c ]
      
      Unlike atomic_add(), refcount_add() does not deal well
      with a negative argument.  TLS fallback code reallocates
      the skb and is very likely to shrink the truesize, leading to:
      
      [  189.513254] WARNING: CPU: 5 PID: 0 at lib/refcount.c:81 refcount_add_not_zero_checked+0x15c/0x180
       Call Trace:
        refcount_add_checked+0x6/0x40
        tls_enc_skb+0xb93/0x13e0 [tls]
      
      Once wmem_allocated count saturates the application can no longer
      send data on the socket.  This is similar to Eric's fixes for GSO,
      TCP:
      commit 7ec318fe ("tcp: gso: avoid refcount_t warning from tcp_gso_segment()")
      and UDP:
      commit 575b65bc ("udp: avoid refcount_t saturation in __udp_gso_segment()").
      
      Unlike the GSO case, for TLS fallback it's likely that the skb has
      shrunk, so the "likely" annotation is the other way around (likely
      branch being "sub").
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e97f0bc7
    • V
      net: stmmac: move stmmac_check_ether_addr() to driver probe · b02f8aa8
      Vinod Koul 提交于
      [ Upstream commit b561af36b1841088552464cdc3f6371d92f17710 ]
      
      stmmac_check_ether_addr() checks the MAC address and assigns one in
      driver open(). In many cases when we create slave netdevice, the dev
      addr is inherited from master but the master dev addr maybe NULL at
      that time, so move this call to driver probe so that address is
      always valid.
      Signed-off-by: NXiaofei Shen <xiaofeis@codeaurora.org>
      Tested-by: NXiaofei Shen <xiaofeis@codeaurora.org>
      Signed-off-by: NSneh Shah <snehshah@codeaurora.org>
      Signed-off-by: NVinod Koul <vkoul@kernel.org>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b02f8aa8
    • E
      net/rose: fix unbound loop in rose_loopback_timer() · d7b10dfe
      Eric Dumazet 提交于
      [ Upstream commit 0453c682459583910d611a96de928f4442205493 ]
      
      This patch adds a limit on the number of skbs that fuzzers can queue
      into loopback_queue. 1000 packets for rose loopback seems more than enough.
      
      Then, since we now have multiple cpus in most linux hosts,
      we also need to limit the number of skbs rose_loopback_timer()
      can dequeue at each round.
      
      rose_loopback_queue() can be drop-monitor friendly, calling
      consume_skb() or kfree_skb() appropriately.
      
      Finally, use mod_timer() instead of del_timer() + add_timer()
      
      syzbot report was :
      
      rcu: INFO: rcu_preempt self-detected stall on CPU
      rcu:    0-...!: (10499 ticks this GP) idle=536/1/0x4000000000000002 softirq=103291/103291 fqs=34
      rcu:     (t=10500 jiffies g=140321 q=323)
      rcu: rcu_preempt kthread starved for 10426 jiffies! g140321 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
      rcu: RCU grace-period kthread stack dump:
      rcu_preempt     I29168    10      2 0x80000000
      Call Trace:
       context_switch kernel/sched/core.c:2877 [inline]
       __schedule+0x813/0x1cc0 kernel/sched/core.c:3518
       schedule+0x92/0x180 kernel/sched/core.c:3562
       schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803
       rcu_gp_fqs_loop kernel/rcu/tree.c:1971 [inline]
       rcu_gp_kthread+0x962/0x17b0 kernel/rcu/tree.c:2128
       kthread+0x357/0x430 kernel/kthread.c:253
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      NMI backtrace for cpu 0
      CPU: 0 PID: 7632 Comm: kworker/0:4 Not tainted 5.1.0-rc5+ #172
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events iterate_cleanup_work
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
       nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
       arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
       trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
       rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1223
       print_cpu_stall kernel/rcu/tree.c:1360 [inline]
       check_cpu_stall kernel/rcu/tree.c:1434 [inline]
       rcu_pending kernel/rcu/tree.c:3103 [inline]
       rcu_sched_clock_irq.cold+0x500/0xa4a kernel/rcu/tree.c:2544
       update_process_times+0x32/0x80 kernel/time/timer.c:1635
       tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
       tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
       __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
       __hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451
       hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
       local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
       smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
      RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50 kernel/kcov.c:95
      Code: 89 25 b4 6e ec 08 41 bc f4 ff ff ff e8 cd 5d ea ff 48 c7 05 9e 6e ec 08 00 00 00 00 e9 a4 e9 ff ff 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 48 8b 75 08 65 48 8b 04 25 00 ee 01 00 65 8b 15 c8 60
      RSP: 0018:ffff8880ae807ce0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
      RAX: ffff88806fd40640 RBX: dffffc0000000000 RCX: ffffffff863fbc56
      RDX: 0000000000000100 RSI: ffffffff863fbc1d RDI: ffff88808cf94228
      RBP: ffff8880ae807d10 R08: ffff88806fd40640 R09: ffffed1015d00f8b
      R10: ffffed1015d00f8a R11: 0000000000000003 R12: ffff88808cf941c0
      R13: 00000000fffff034 R14: ffff8882166cd840 R15: 0000000000000000
       rose_loopback_timer+0x30d/0x3f0 net/rose/rose_loopback.c:91
       call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
       expire_timers kernel/time/timer.c:1362 [inline]
       __run_timers kernel/time/timer.c:1681 [inline]
       __run_timers kernel/time/timer.c:1649 [inline]
       run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
       __do_softirq+0x266/0x95a kernel/softirq.c:293
       do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7b10dfe
    • Z
      net: rds: exchange of 8K and 1M pool · ed1866aa
      Zhu Yanjun 提交于
      [ Upstream commit 4b9fc7146249a6e0e3175d0acc033fdcd2bfcb17 ]
      
      Before the commit 490ea596 ("RDS: IB: move FMR code to its own file"),
      when the dirty_count is greater than 9/10 of max_items of 8K pool,
      1M pool is used, Vice versa. After the commit 490ea596 ("RDS: IB: move
      FMR code to its own file"), the above is removed. When we make the
      following tests.
      
      Server:
        rds-stress -r 1.1.1.16 -D 1M
      
      Client:
        rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M
      
      The following will appear.
      "
      connecting to 1.1.1.16:4000
      negotiated options, tasks will start in 2 seconds
      Starting up..header from 1.1.1.166:4001 to id 4001 bogus
      ..
      tsks  tx/s  rx/s tx+rx K/s  mbi K/s  mbo K/s tx us/c  rtt us
      cpu %
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
      ...
      "
      So this exchange between 8K and 1M pool is added back.
      
      Fixes: commit 490ea596 ("RDS: IB: move FMR code to its own file")
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed1866aa
    • E
      net/mlx5e: ethtool, Remove unsupported SFP EEPROM high pages query · 7da11d6a
      Erez Alfasi 提交于
      [ Upstream commit ace329f4ab3ba434be2adf618073c752d083b524 ]
      
      Querying EEPROM high pages data for SFP module is currently
      not supported by our driver and yet queried, resulting in
      invalid FW queries.
      
      Set the EEPROM ethtool data length to 256 for SFP module will
      limit the reading for page 0 only and prevent invalid FW queries.
      
      Fixes: bb64143e ("net/mlx5e: Add ethtool support for dump module EEPROM")
      Signed-off-by: NErez Alfasi <ereza@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7da11d6a
    • A
      mlxsw: spectrum: Fix autoneg status in ethtool · 829fd984
      Amit Cohen 提交于
      [ Upstream commit 151f0dddbbfe4c35c9c5b64873115aafd436af9d ]
      
      If link is down and autoneg is set to on/off, the status in ethtool does
      not change.
      
      The reason is when the link is down the function returns with zero
      before changing autoneg value.
      
      Move the checking of link state (up/down) to be performed after setting
      autoneg value, in order to be sure that autoneg will change in any case.
      
      Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
      Signed-off-by: NAmit Cohen <amitc@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      829fd984
    • Z
      ipv4: set the tcp_min_rtt_wlen range from 0 to one day · 250e51f8
      ZhangXiaoxu 提交于
      [ Upstream commit 19fad20d15a6494f47f85d869f00b11343ee5c78 ]
      
      There is a UBSAN report as below:
      UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56
      signed integer overflow:
      2147483647 * 1000 cannot be represented in type 'int'
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1
      Call Trace:
       <IRQ>
       dump_stack+0x8c/0xba
       ubsan_epilogue+0x11/0x60
       handle_overflow+0x12d/0x170
       ? ttwu_do_wakeup+0x21/0x320
       __ubsan_handle_mul_overflow+0x12/0x20
       tcp_ack_update_rtt+0x76c/0x780
       tcp_clean_rtx_queue+0x499/0x14d0
       tcp_ack+0x69e/0x1240
       ? __wake_up_sync_key+0x2c/0x50
       ? update_group_capacity+0x50/0x680
       tcp_rcv_established+0x4e2/0xe10
       tcp_v4_do_rcv+0x22b/0x420
       tcp_v4_rcv+0xfe8/0x1190
       ip_protocol_deliver_rcu+0x36/0x180
       ip_local_deliver+0x15b/0x1a0
       ip_rcv+0xac/0xd0
       __netif_receive_skb_one_core+0x7f/0xb0
       __netif_receive_skb+0x33/0xc0
       netif_receive_skb_internal+0x84/0x1c0
       napi_gro_receive+0x2a0/0x300
       receive_buf+0x3d4/0x2350
       ? detach_buf_split+0x159/0x390
       virtnet_poll+0x198/0x840
       ? reweight_entity+0x243/0x4b0
       net_rx_action+0x25c/0x770
       __do_softirq+0x19b/0x66d
       irq_exit+0x1eb/0x230
       do_IRQ+0x7a/0x150
       common_interrupt+0xf/0xf
       </IRQ>
      
      It can be reproduced by:
        echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen
      
      Fixes: f6722583 ("tcp: track min RTT using windowed min-filter")
      Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      250e51f8
    • E
      ipv4: add sanity checks in ipv4_link_failure() · 07445fea
      Eric Dumazet 提交于
      [ Upstream commit 20ff83f10f113c88d0bb74589389b05250994c16 ]
      
      Before calling __ip_options_compile(), we need to ensure the network
      header is a an IPv4 one, and that it is already pulled in skb->head.
      
      RAW sockets going through a tunnel can end up calling ipv4_link_failure()
      with total garbage in the skb, or arbitrary lengthes.
      
      syzbot report :
      
      BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:355 [inline]
      BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123
      Write of size 69 at addr ffff888096abf068 by task syz-executor.4/9204
      
      CPU: 0 PID: 9204 Comm: syz-executor.4 Not tainted 5.1.0-rc5+ #77
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
       kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       check_memory_region_inline mm/kasan/generic.c:185 [inline]
       check_memory_region+0x123/0x190 mm/kasan/generic.c:191
       memcpy+0x38/0x50 mm/kasan/common.c:133
       memcpy include/linux/string.h:355 [inline]
       __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123
       __icmp_send+0x725/0x1400 net/ipv4/icmp.c:695
       ipv4_link_failure+0x29f/0x550 net/ipv4/route.c:1204
       dst_link_failure include/net/dst.h:427 [inline]
       vti6_xmit net/ipv6/ip6_vti.c:514 [inline]
       vti6_tnl_xmit+0x10d4/0x1c0c net/ipv6/ip6_vti.c:553
       __netdev_start_xmit include/linux/netdevice.h:4414 [inline]
       netdev_start_xmit include/linux/netdevice.h:4423 [inline]
       xmit_one net/core/dev.c:3292 [inline]
       dev_hard_start_xmit+0x1b2/0x980 net/core/dev.c:3308
       __dev_queue_xmit+0x271d/0x3060 net/core/dev.c:3878
       dev_queue_xmit+0x18/0x20 net/core/dev.c:3911
       neigh_direct_output+0x16/0x20 net/core/neighbour.c:1527
       neigh_output include/net/neighbour.h:508 [inline]
       ip_finish_output2+0x949/0x1740 net/ipv4/ip_output.c:229
       ip_finish_output+0x73c/0xd50 net/ipv4/ip_output.c:317
       NF_HOOK_COND include/linux/netfilter.h:278 [inline]
       ip_output+0x21f/0x670 net/ipv4/ip_output.c:405
       dst_output include/net/dst.h:444 [inline]
       NF_HOOK include/linux/netfilter.h:289 [inline]
       raw_send_hdrinc net/ipv4/raw.c:432 [inline]
       raw_sendmsg+0x1d2b/0x2f20 net/ipv4/raw.c:663
       inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg+0xdd/0x130 net/socket.c:661
       sock_write_iter+0x27c/0x3e0 net/socket.c:988
       call_write_iter include/linux/fs.h:1866 [inline]
       new_sync_write+0x4c7/0x760 fs/read_write.c:474
       __vfs_write+0xe4/0x110 fs/read_write.c:487
       vfs_write+0x20c/0x580 fs/read_write.c:549
       ksys_write+0x14f/0x2d0 fs/read_write.c:599
       __do_sys_write fs/read_write.c:611 [inline]
       __se_sys_write fs/read_write.c:608 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:608
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458c29
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f293b44bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458c29
      RDX: 0000000000000014 RSI: 00000000200002c0 RDI: 0000000000000003
      RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f293b44c6d4
      R13: 00000000004c8623 R14: 00000000004ded68 R15: 00000000ffffffff
      
      The buggy address belongs to the page:
      page:ffffea00025aafc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      flags: 0x1fffc0000000000()
      raw: 01fffc0000000000 0000000000000000 ffffffff025a0101 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888096abef80: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2
       ffff888096abf000: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff888096abf080: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
                               ^
       ffff888096abf100: 00 00 00 00 f1 f1 f1 f1 00 00 f3 f3 00 00 00 00
       ffff888096abf180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07445fea
    • S
      x86/fpu: Don't export __kernel_fpu_{begin,end}() · d4ff57d0
      Sebastian Andrzej Siewior 提交于
      commit 12209993e98c5fa1855c467f22a24e3d5b8be205 upstream.
      
      There is one user of __kernel_fpu_begin() and before invoking it,
      it invokes preempt_disable(). So it could invoke kernel_fpu_begin()
      right away. The 32bit version of arch_efi_call_virt_setup() and
      arch_efi_call_virt_teardown() does this already.
      
      The comment above *kernel_fpu*() claims that before invoking
      __kernel_fpu_begin() preemption should be disabled and that KVM is a
      good example of doing it. Well, KVM doesn't do that since commit
      
        f775b13e ("x86,kvm: move qemu/guest FPU switching out to vcpu_run")
      
      so it is not an example anymore.
      
      With EFI gone as the last user of __kernel_fpu_{begin|end}(), both can
      be made static and not exported anymore.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: linux-efi <linux-efi@vger.kernel.org>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20181129150210.2k4mawt37ow6c2vq@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4ff57d0
    • J
      mm: Fix warning in insert_pfn() · 423497a9
      Jan Kara 提交于
      commit f2c57d91b0d96aa13ccff4e3b178038f17b00658 upstream.
      
      In DAX mode a write pagefault can race with write(2) in the following
      way:
      
      CPU0                            CPU1
                                      write fault for mapped zero page (hole)
      dax_iomap_rw()
        iomap_apply()
          xfs_file_iomap_begin()
            - allocates blocks
          dax_iomap_actor()
            invalidate_inode_pages2_range()
              - invalidates radix tree entries in given range
                                      dax_iomap_pte_fault()
                                        grab_mapping_entry()
                                          - no entry found, creates empty
                                        ...
                                        xfs_file_iomap_begin()
                                          - finds already allocated block
                                        ...
                                        vmf_insert_mixed_mkwrite()
                                          - WARNs and does nothing because there
                                            is still zero page mapped in PTE
              unmap_mapping_pages()
      
      This race results in WARN_ON from insert_pfn() and is occasionally
      triggered by fstest generic/344. Note that the race is otherwise
      harmless as before write(2) on CPU0 is finished, we will invalidate page
      tables properly and thus user of mmap will see modified data from
      write(2) from that point on. So just restrict the warning only to the
      case when the PFN in PTE is not zero page.
      
      Link: http://lkml.kernel.org/r/20180824154542.26872-1-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      423497a9
    • D
      x86/retpolines: Disable switch jump tables when retpolines are enabled · e923c6b7
      Daniel Borkmann 提交于
      commit a9d57ef15cbe327fe54416dd194ee0ea66ae53a4 upstream.
      
      Commit ce02ef06fcf7 ("x86, retpolines: Raise limit for generating indirect
      calls from switch-case") raised the limit under retpolines to 20 switch
      cases where gcc would only then start to emit jump tables, and therefore
      effectively disabling the emission of slow indirect calls in this area.
      
      After this has been brought to attention to gcc folks [0], Martin Liska
      has then fixed gcc to align with clang by avoiding to generate switch jump
      tables entirely under retpolines. This is taking effect in gcc starting
      from stable version 8.4.0. Given kernel supports compilation with older
      versions of gcc where the fix is not being available or backported anymore,
      we need to keep the extra KBUILD_CFLAGS around for some time and generally
      set the -fno-jump-tables to align with what more recent gcc is doing
      automatically today.
      
      More than 20 switch cases are not expected to be fast-path critical, but
      it would still be good to align with gcc behavior for versions < 8.4.0 in
      order to have consistency across supported gcc versions. vmlinux size is
      slightly growing by 0.27% for older gcc. This flag is only set to work
      around affected gcc, no change for clang.
      
        [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952Suggested-by: NMartin Liska <mliska@suse.cz>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Björn Töpel<bjorn.topel@intel.com>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.netSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e923c6b7