1. 02 6月, 2018 4 次提交
  2. 01 6月, 2018 9 次提交
  3. 31 5月, 2018 2 次提交
  4. 29 5月, 2018 13 次提交
    • T
      tun: Fix NULL pointer dereference in XDP redirect · 6547e387
      Toshiaki Makita 提交于
      Calling XDP redirection requires bh disabled. Softirq can call another
      XDP function and redirection functions, then the percpu static variable
      ri->map can be overwritten to NULL.
      
      This is a generic XDP case called from tun.
      
      [ 3535.736058] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [ 3535.743974] PGD 0 P4D 0
      [ 3535.746530] Oops: 0000 [#1] SMP PTI
      [ 3535.750049] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel crypto_simd cryptd enclosure hpwdt hpilo glue_helper ipmi_si pcspkr wmi mei_me ioatdma mei ipmi_devintf shpchp dca ipmi_msghandler lpc_ich acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm smartpqi i40e crc32c_intel scsi_transport_sas tg3 i2c_core ptp pps_core
      [ 3535.813456] CPU: 5 PID: 1630 Comm: vhost-1614 Not tainted 4.17.0-rc4 #2
      [ 3535.820127] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017
      [ 3535.828732] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
      [ 3535.833740] RSP: 0018:ffffb4bc47bf7c58 EFLAGS: 00010246
      [ 3535.839009] RAX: ffff9fdfcfea1c40 RBX: 0000000000000000 RCX: ffff9fdf27fe3100
      [ 3535.846205] RDX: ffff9fdfca769200 RSI: 0000000000000000 RDI: 0000000000000000
      [ 3535.853402] RBP: ffffb4bc491d9000 R08: 00000000000045ad R09: 0000000000000ec0
      [ 3535.860597] R10: 0000000000000001 R11: ffff9fdf26c3ce4e R12: ffff9fdf9e72c000
      [ 3535.867794] R13: 0000000000000000 R14: fffffffffffffff2 R15: ffff9fdfc82cdd00
      [ 3535.874990] FS:  0000000000000000(0000) GS:ffff9fdfcfe80000(0000) knlGS:0000000000000000
      [ 3535.883152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3535.888948] CR2: 0000000000000018 CR3: 0000000bde724004 CR4: 00000000007626e0
      [ 3535.896145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 3535.903342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 3535.910538] PKRU: 55555554
      [ 3535.913267] Call Trace:
      [ 3535.915736]  xdp_do_generic_redirect+0x7a/0x310
      [ 3535.920310]  do_xdp_generic.part.117+0x285/0x370
      [ 3535.924970]  tun_get_user+0x5b9/0x1260 [tun]
      [ 3535.929279]  tun_sendmsg+0x52/0x70 [tun]
      [ 3535.933237]  handle_tx+0x2ad/0x5f0 [vhost_net]
      [ 3535.937721]  vhost_worker+0xa5/0x100 [vhost]
      [ 3535.942030]  kthread+0xf5/0x130
      [ 3535.945198]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
      [ 3535.950031]  ? kthread_bind+0x10/0x10
      [ 3535.953727]  ret_from_fork+0x35/0x40
      [ 3535.957334] Code: 0e 74 15 83 f8 10 75 05 e9 49 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 29 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 49 a9 b3 ff 31 c0 c3
      [ 3535.976387] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffffb4bc47bf7c58
      [ 3535.982883] CR2: 0000000000000018
      [ 3535.987096] ---[ end trace 383b299dd1430240 ]---
      [ 3536.131325] Kernel panic - not syncing: Fatal exception
      [ 3536.137484] Kernel Offset: 0x26a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [ 3536.281406] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      And a kernel with generic case fixed still panics in tun driver XDP
      redirect, because it disabled only preemption, but not bh.
      
      [ 2055.128746] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [ 2055.136662] PGD 0 P4D 0
      [ 2055.139219] Oops: 0000 [#1] SMP PTI
      [ 2055.142736] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel ipmi_ssif crypto_simd enclosure cryptd hpwdt glue_helper ioatdma hpilo wmi dca pcspkr ipmi_si acpi_power_meter ipmi_devintf shpchp mei_me ipmi_msghandler mei lpc_ich sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e smartpqi tg3 scsi_transport_sas crc32c_intel i2c_core ptp pps_core
      [ 2055.206142] CPU: 6 PID: 1693 Comm: vhost-1683 Tainted: G        W         4.17.0-rc5-fix-tun+ #1
      [ 2055.215011] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017
      [ 2055.223617] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
      [ 2055.228624] RSP: 0018:ffff998b07607cc0 EFLAGS: 00010246
      [ 2055.233892] RAX: ffff8dbd8e235700 RBX: ffff8dbd8ff21c40 RCX: 0000000000000004
      [ 2055.241089] RDX: ffff998b097a9000 RSI: 0000000000000000 RDI: 0000000000000000
      [ 2055.248286] RBP: 0000000000000000 R08: 00000000000065a8 R09: 0000000000005d80
      [ 2055.255483] R10: 0000000000000040 R11: ffff8dbcf0100000 R12: ffff998b097a9000
      [ 2055.262681] R13: ffff8dbd8c98c000 R14: 0000000000000000 R15: ffff998b07607d78
      [ 2055.269879] FS:  0000000000000000(0000) GS:ffff8dbd8ff00000(0000) knlGS:0000000000000000
      [ 2055.278039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2055.283834] CR2: 0000000000000018 CR3: 0000000c0c8cc005 CR4: 00000000007626e0
      [ 2055.291030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 2055.298227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 2055.305424] PKRU: 55555554
      [ 2055.308153] Call Trace:
      [ 2055.310624]  xdp_do_redirect+0x7b/0x380
      [ 2055.314499]  tun_get_user+0x10fe/0x12a0 [tun]
      [ 2055.318895]  tun_sendmsg+0x52/0x70 [tun]
      [ 2055.322852]  handle_tx+0x2ad/0x5f0 [vhost_net]
      [ 2055.327337]  vhost_worker+0xa5/0x100 [vhost]
      [ 2055.331646]  kthread+0xf5/0x130
      [ 2055.334813]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
      [ 2055.339646]  ? kthread_bind+0x10/0x10
      [ 2055.343343]  ret_from_fork+0x35/0x40
      [ 2055.346950] Code: 0e 74 15 83 f8 10 75 05 e9 e9 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 c9 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 e9 a9 b3 ff 31 c0 c3
      [ 2055.366004] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffff998b07607cc0
      [ 2055.372500] CR2: 0000000000000018
      [ 2055.375856] ---[ end trace 2a2dcc5e9e174268 ]---
      [ 2055.523626] Kernel panic - not syncing: Fatal exception
      [ 2055.529796] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [ 2055.677539] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      v2:
       - Removed preempt_disable/enable since local_bh_disable will prevent
         preemption as well, feedback from Jason Wang.
      
      Fixes: 761876c8 ("tap: XDP support")
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6547e387
    • S
      be2net: Fix error detection logic for BE3 · d2c2725c
      Suresh Reddy 提交于
      Check for 0xE00 (RECOVERABLE_ERR) along with ARMFW UE (0x0)
      in be_detect_error() to know whether the error is valid error or not
      
      Fixes: 673c96e5 ("be2net: Fix UE detection logic for BE3")
      Signed-off-by: NSuresh Reddy <suresh.reddy@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2c2725c
    • J
      net: qmi_wwan: Add Netgear Aircard 779S · 2415f3bd
      Josh Hill 提交于
      Add support for Netgear Aircard 779S
      Signed-off-by: NJosh Hill <josh@joshuajhill.com>
      Acked-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2415f3bd
    • P
      mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG · 47bf9df2
      Petr Machata 提交于
      VLAN 1 is internally used for untagged traffic. Prevent creation of
      explicit netdevice for that VLAN, because that currently isn't supported
      and leads to the NULL pointer dereference cited below.
      
      Fix by preventing creation of VLAN devices with VID of 1 over mlxsw
      devices or LAG devices that involve mlxsw devices.
      
      [  327.175816] ================================================================================
      [  327.184544] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c:200:12
      [  327.193667] member access within null pointer of type 'const struct mlxsw_sp_fid'
      [  327.201226] CPU: 0 PID: 8983 Comm: ip Not tainted 4.17.0-rc4-petrm_net_ip6gre_headroom-custom-140 #11
      [  327.210496] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
      [  327.219872] Call Trace:
      [  327.222384]  dump_stack+0xc3/0x12b
      [  327.234007]  ubsan_epilogue+0x9/0x49
      [  327.237638]  ubsan_type_mismatch_common+0x1f9/0x2d0
      [  327.255769]  __ubsan_handle_type_mismatch+0x90/0xa7
      [  327.264716]  mlxsw_sp_fid_type+0x35/0x50 [mlxsw_spectrum]
      [  327.270255]  mlxsw_sp_port_vlan_router_leave+0x46/0xc0 [mlxsw_spectrum]
      [  327.277019]  mlxsw_sp_inetaddr_port_vlan_event+0xe1/0x340 [mlxsw_spectrum]
      [  327.315031]  mlxsw_sp_netdevice_vrf_event+0xa8/0x100 [mlxsw_spectrum]
      [  327.321626]  mlxsw_sp_netdevice_event+0x276/0x430 [mlxsw_spectrum]
      [  327.367863]  notifier_call_chain+0x4c/0x150
      [  327.372128]  __netdev_upper_dev_link+0x1b3/0x260
      [  327.399450]  vrf_add_slave+0xce/0x170 [vrf]
      [  327.403703]  do_setlink+0x658/0x1d70
      [  327.508998]  rtnl_newlink+0x908/0xf20
      [  327.559128]  rtnetlink_rcv_msg+0x50c/0x720
      [  327.571720]  netlink_rcv_skb+0x16a/0x1f0
      [  327.583450]  netlink_unicast+0x2ca/0x3e0
      [  327.599305]  netlink_sendmsg+0x3e2/0x7f0
      [  327.616655]  sock_sendmsg+0x76/0xc0
      [  327.620207]  ___sys_sendmsg+0x494/0x5d0
      [  327.666117]  __sys_sendmsg+0xc2/0x130
      [  327.690953]  do_syscall_64+0x66/0x370
      [  327.694677]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  327.699782] RIP: 0033:0x7f4c2f3f8037
      [  327.703393] RSP: 002b:00007ffe8c389708 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  327.711035] RAX: ffffffffffffffda RBX: 000000005b03f53e RCX: 00007f4c2f3f8037
      [  327.718229] RDX: 0000000000000000 RSI: 00007ffe8c389760 RDI: 0000000000000003
      [  327.725431] RBP: 00007ffe8c389760 R08: 0000000000000000 R09: 00007f4c2f443630
      [  327.732632] R10: 00000000000005eb R11: 0000000000000246 R12: 0000000000000000
      [  327.739833] R13: 00000000006774e0 R14: 00007ffe8c3897e8 R15: 0000000000000000
      [  327.747096] ================================================================================
      
      Fixes: 9589a7b5 ("mlxsw: spectrum: Handle VLAN devices linking / unlinking")
      Suggested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47bf9df2
    • I
      atm: zatm: fix memcmp casting · f9c6442a
      Ivan Bornyakov 提交于
      memcmp() returns int, but eprom_try_esi() cast it to unsigned char. One
      can lose significant bits and get 0 from non-0 value returned by the
      memcmp().
      Signed-off-by: NIvan Bornyakov <brnkv.i1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9c6442a
    • H
      iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs · ab1068d6
      Hao Wei Tee 提交于
      When there are 16 or more logical CPUs, we request for
      `IWL_MAX_RX_HW_QUEUES` (16) IRQs only as we limit to that number of
      IRQs, but later on we compare the number of IRQs returned to
      nr_online_cpus+2 instead of max_irqs, the latter being what we
      actually asked for. This ends up setting num_rx_queues to 17 which
      causes lots of out-of-bounds array accesses later on.
      
      Compare to max_irqs instead, and also add an assertion in case
      num_rx_queues > IWM_MAX_RX_HW_QUEUES.
      
      This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199551
      
      Fixes: 2e5d4a8f ("iwlwifi: pcie: Add new configuration to enable MSIX")
      Signed-off-by: NHao Wei Tee <angelsl@in04.sg>
      Tested-by: NSara Sharon <sara.sharon@intel.com>
      Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      ab1068d6
    • S
      Revert "rt2800: use TXOP_BACKOFF for probe frames" · 52a19236
      Stanislaw Gruszka 提交于
      This reverts commit fb47ada8.
      
      In some situations when we set TXOP_BACKOFF, the probe frame is
      not sent at all. What it worse then sending probe frame as part
      of AMPDU and can degrade 11n performance to 11g rates.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      52a19236
    • A
      net: netsec: reduce DMA mask to 40 bits · 31256426
      Ard Biesheuvel 提交于
      The netsec network controller IP can drive 64 address bits for DMA, and
      the DMA mask is set accordingly in the driver. However, the SynQuacer
      SoC, which is the only silicon incorporating this IP at the moment,
      integrates this IP in a manner that leaves address bits [63:40]
      unconnected.
      
      Up until now, this has not resulted in any problems, given that the DDR
      controller doesn't decode those bits to begin with. However, recent
      firmware updates for platforms incorporating this SoC allow the IOMMU
      to be enabled, which does decode address bits [47:40], and allocates
      top down from the IOVA space, producing DMA addresses that have bits
      set that have been left unconnected.
      
      Both the DT and ACPI (IORT) descriptions of the platform take this into
      account, and only describe a DMA address space of 40 bits (using either
      dma-ranges DT properties, or DMA address limits in IORT named component
      nodes). However, even though our IOMMU and bus layers may take such
      limitations into account by setting a narrower DMA mask when creating
      the platform device, the netsec probe() entrypoint follows the common
      practice of setting the DMA mask uncondionally, according to the
      capabilities of the IP block itself rather than to its integration into
      the chip.
      
      It is currently unclear what the correct fix is here. We could hack around
      it by only setting the DMA mask if it deviates from its default value of
      DMA_BIT_MASK(32). However, this makes it impossible for the bus layer to
      use DMA_BIT_MASK(32) as the bus limit, and so it appears that a more
      comprehensive approach is required to take DMA limits imposed by the
      SoC as a whole into account.
      
      In the mean time, let's limit the DMA mask to 40 bits. Given that there
      is currently only one SoC that incorporates this IP, this is a reasonable
      approach that can be backported to -stable and buys us some time to come
      up with a proper fix going forward.
      
      Fixes: 533dd11a ("net: socionext: Add Synquacer NetSec driver")
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Jassi Brar <jaswinder.singh@linaro.org>
      Cc: Masahisa Kojima <masahisa.kojima@linaro.org>
      Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NJassi Brar <jaswinder.singh@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31256426
    • M
      ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inline · bbb40a0b
      Mathieu Xhonneux 提交于
      seg6_do_srh_encap and seg6_do_srh_inline can possibly do an
      out-of-bounds access when adding the SRH to the packet. This no longer
      happen when expanding the skb not only by the size of the SRH (+
      outer IPv6 header), but also by skb->mac_len.
      
      [   53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620
      [   53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674
      
      [   53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90
      [   53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.11.0-20171110_100015-anatol 04/01/2014
      [   53.796673] Call Trace:
      [   53.796679]  <IRQ>
      [   53.796689]  dump_stack+0x71/0xab
      [   53.796700]  print_address_description+0x6a/0x270
      [   53.796707]  kasan_report+0x258/0x380
      [   53.796715]  ? seg6_do_srh_encap+0x284/0x620
      [   53.796722]  memmove+0x34/0x50
      [   53.796730]  seg6_do_srh_encap+0x284/0x620
      [   53.796741]  ? seg6_do_srh+0x29b/0x360
      [   53.796747]  seg6_do_srh+0x29b/0x360
      [   53.796756]  seg6_input+0x2e/0x2e0
      [   53.796765]  lwtunnel_input+0x93/0xd0
      [   53.796774]  ipv6_rcv+0x690/0x920
      [   53.796783]  ? ip6_input+0x170/0x170
      [   53.796791]  ? eth_gro_receive+0x2d0/0x2d0
      [   53.796800]  ? ip6_input+0x170/0x170
      [   53.796809]  __netif_receive_skb_core+0xcc0/0x13f0
      [   53.796820]  ? netdev_info+0x110/0x110
      [   53.796827]  ? napi_complete_done+0xb6/0x170
      [   53.796834]  ? e1000_clean+0x6da/0xf70
      [   53.796845]  ? process_backlog+0x129/0x2a0
      [   53.796853]  process_backlog+0x129/0x2a0
      [   53.796862]  net_rx_action+0x211/0x5c0
      [   53.796870]  ? napi_complete_done+0x170/0x170
      [   53.796887]  ? run_rebalance_domains+0x11f/0x150
      [   53.796891]  __do_softirq+0x10e/0x39e
      [   53.796894]  do_softirq_own_stack+0x2a/0x40
      [   53.796895]  </IRQ>
      [   53.796898]  do_softirq.part.16+0x54/0x60
      [   53.796900]  __local_bh_enable_ip+0x5b/0x60
      [   53.796903]  ip6_finish_output2+0x416/0x9f0
      [   53.796906]  ? ip6_dst_lookup_flow+0x110/0x110
      [   53.796909]  ? ip6_sk_dst_lookup_flow+0x390/0x390
      [   53.796911]  ? __rcu_read_unlock+0x66/0x80
      [   53.796913]  ? ip6_mtu+0x44/0xf0
      [   53.796916]  ? ip6_output+0xfc/0x220
      [   53.796918]  ip6_output+0xfc/0x220
      [   53.796921]  ? ip6_finish_output+0x2b0/0x2b0
      [   53.796923]  ? memcpy+0x34/0x50
      [   53.796926]  ip6_send_skb+0x43/0xc0
      [   53.796929]  rawv6_sendmsg+0x1216/0x1530
      [   53.796932]  ? __orc_find+0x6b/0xc0
      [   53.796934]  ? rawv6_rcv_skb+0x160/0x160
      [   53.796937]  ? __rcu_read_unlock+0x66/0x80
      [   53.796939]  ? __rcu_read_unlock+0x66/0x80
      [   53.796942]  ? is_bpf_text_address+0x1e/0x30
      [   53.796944]  ? kernel_text_address+0xec/0x100
      [   53.796946]  ? __kernel_text_address+0xe/0x30
      [   53.796948]  ? unwind_get_return_address+0x2f/0x50
      [   53.796950]  ? __save_stack_trace+0x92/0x100
      [   53.796954]  ? save_stack+0x89/0xb0
      [   53.796956]  ? kasan_kmalloc+0xa0/0xd0
      [   53.796958]  ? kmem_cache_alloc+0xd2/0x1f0
      [   53.796961]  ? prepare_creds+0x23/0x160
      [   53.796963]  ? __x64_sys_capset+0x252/0x3e0
      [   53.796966]  ? do_syscall_64+0x69/0x160
      [   53.796968]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   53.796971]  ? __alloc_pages_nodemask+0x170/0x380
      [   53.796973]  ? __alloc_pages_slowpath+0x12c0/0x12c0
      [   53.796977]  ? tty_vhangup+0x20/0x20
      [   53.796979]  ? policy_nodemask+0x1a/0x90
      [   53.796982]  ? __mod_node_page_state+0x8d/0xa0
      [   53.796986]  ? __check_object_size+0xe7/0x240
      [   53.796989]  ? __sys_sendto+0x229/0x290
      [   53.796991]  ? rawv6_rcv_skb+0x160/0x160
      [   53.796993]  __sys_sendto+0x229/0x290
      [   53.796996]  ? __ia32_sys_getpeername+0x50/0x50
      [   53.796999]  ? commit_creds+0x2de/0x520
      [   53.797002]  ? security_capset+0x57/0x70
      [   53.797004]  ? __x64_sys_capset+0x29f/0x3e0
      [   53.797007]  ? __x64_sys_rt_sigsuspend+0xe0/0xe0
      [   53.797011]  ? __do_page_fault+0x664/0x770
      [   53.797014]  __x64_sys_sendto+0x74/0x90
      [   53.797017]  do_syscall_64+0x69/0x160
      [   53.797019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   53.797022] RIP: 0033:0x7f43b7a6714a
      [   53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX:
      000000000000002c
      [   53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a
      [   53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004
      [   53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c
      [   53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
      [   53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661
      
      [   53.797171] Allocated by task 642:
      [   53.797460]  kasan_kmalloc+0xa0/0xd0
      [   53.797463]  kmem_cache_alloc+0xd2/0x1f0
      [   53.797465]  getname_flags+0x40/0x210
      [   53.797467]  user_path_at_empty+0x1d/0x40
      [   53.797469]  do_faccessat+0x12a/0x320
      [   53.797471]  do_syscall_64+0x69/0x160
      [   53.797473]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [   53.797607] Freed by task 642:
      [   53.797869]  __kasan_slab_free+0x130/0x180
      [   53.797871]  kmem_cache_free+0xa8/0x230
      [   53.797872]  filename_lookup+0x15b/0x230
      [   53.797874]  do_faccessat+0x12a/0x320
      [   53.797876]  do_syscall_64+0x69/0x160
      [   53.797878]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [   53.798014] The buggy address belongs to the object at ffff88011975e600
                      which belongs to the cache names_cache of size 4096
      [   53.799043] The buggy address is located 1786 bytes inside of
                      4096-byte region [ffff88011975e600, ffff88011975f600)
      [   53.800013] The buggy address belongs to the page:
      [   53.800414] page:ffffea000465d600 count:1 mapcount:0
      mapping:0000000000000000 index:0x0 compound_mapcount: 0
      [   53.801259] flags: 0x17fff0000008100(slab|head)
      [   53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000
      0000000100070007
      [   53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40
      0000000000000000
      [   53.803787] page dumped because: kasan: bad access detected
      
      [   53.804384] Memory state around the buggy address:
      [   53.804788]  ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb
      fb fb fb fb
      [   53.805384]  ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb
      fb fb fb fb
      [   53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb
      fb fb fb fb
      [   53.806577]                                                                 ^
      [   53.807165]  ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb
      fb fb fb fb
      [   53.807762]  ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb
      fb fb fb fb
      [   53.808356] ==================================================================
      [   53.808949] Disabling lock debugging due to kernel taint
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbb40a0b
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 513acc5b
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains Netfilter/IPVS fixes for your net tree:
      
      1) Null pointer dereference when dumping conntrack helper configuration,
         from Taehee Yoo.
      
      2) Missing sanitization in ebtables extension name through compat,
         from Paolo Abeni.
      
      3) Broken fetch of tracing value, from Taehee Yoo.
      
      4) Incorrect arithmetics in packet ratelimiting.
      
      5) Buffer overflow in IPVS sync daemon, from Julian Anastasov.
      
      6) Wrong argument to nla_strlcpy() in nfnetlink_{acct,cthelper},
         from Eric Dumazet.
      
      7) Fix splat in nft_update_chain_stats().
      
      8) Null pointer dereference from object netlink dump path, from
         Taehee Yoo.
      
      9) Missing static_branch_inc() when enabling counters in existing
         chain, from Taehee Yoo.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      513acc5b
    • T
      netfilter: nf_tables: increase nft_counters_enabled in nft_chain_stats_replace() · bbb8c61f
      Taehee Yoo 提交于
      When a chain is updated, a counter can be attached. if so,
      the nft_counters_enabled should be increased.
      
      test commands:
      
         %nft add table ip filter
         %nft add chain ip filter input { type filter hook input priority 4\; }
         %iptables-compat -Z input
         %nft delete chain ip filter input
      
      we can see below messages.
      
      [  286.443720] jump label: negative count!
      [  286.448278] WARNING: CPU: 0 PID: 1459 at kernel/jump_label.c:197 __static_key_slow_dec_cpuslocked+0x6f/0xf0
      [  286.449144] Modules linked in: nf_tables nfnetlink ip_tables x_tables
      [  286.449144] CPU: 0 PID: 1459 Comm: nft Tainted: G        W         4.17.0-rc2+ #12
      [  286.449144] RIP: 0010:__static_key_slow_dec_cpuslocked+0x6f/0xf0
      [  286.449144] RSP: 0018:ffff88010e5176f0 EFLAGS: 00010286
      [  286.449144] RAX: 000000000000001b RBX: ffffffffc0179500 RCX: ffffffffb8a82522
      [  286.449144] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88011b7e5eac
      [  286.449144] RBP: 0000000000000000 R08: ffffed00236fce5c R09: ffffed00236fce5b
      [  286.449144] R10: ffffffffc0179503 R11: ffffed00236fce5c R12: 0000000000000000
      [  286.449144] R13: ffff88011a28e448 R14: ffff88011a28e470 R15: dffffc0000000000
      [  286.449144] FS:  00007f0384328700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
      [  286.449144] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  286.449144] CR2: 00007f038394bf10 CR3: 0000000104a86000 CR4: 00000000001006f0
      [  286.449144] Call Trace:
      [  286.449144]  static_key_slow_dec+0x6a/0x70
      [  286.449144]  nf_tables_chain_destroy+0x19d/0x210 [nf_tables]
      [  286.449144]  nf_tables_commit+0x1891/0x1c50 [nf_tables]
      [  286.449144]  nfnetlink_rcv+0x1148/0x13d0 [nfnetlink]
      [ ... ]
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      bbb8c61f
    • T
      netfilter: nf_tables: fix NULL-ptr in nf_tables_dump_obj() · 360cc79d
      Taehee Yoo 提交于
      The table field in nft_obj_filter is not an array. In order to check
      tablename, we should check if the pointer is set.
      
      Test commands:
      
         %nft add table ip filter
         %nft add counter ip filter ct1
         %nft reset counters
      
      Splat looks like:
      
      [  306.510504] kasan: CONFIG_KASAN_INLINE enabled
      [  306.516184] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [  306.524775] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  306.528284] Modules linked in: nft_objref nft_counter nf_tables nfnetlink ip_tables x_tables
      [  306.528284] CPU: 0 PID: 1488 Comm: nft Not tainted 4.17.0-rc4+ #17
      [  306.528284] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
      [  306.528284] RIP: 0010:nf_tables_dump_obj+0x52c/0xa70 [nf_tables]
      [  306.528284] RSP: 0018:ffff8800b6cb7520 EFLAGS: 00010246
      [  306.528284] RAX: 0000000000000000 RBX: ffff8800b6c49820 RCX: 0000000000000000
      [  306.528284] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffed0016d96e9a
      [  306.528284] RBP: ffff8800b6cb75c0 R08: ffffed00236fce7c R09: ffffed00236fce7b
      [  306.528284] R10: ffffffff9f6241e8 R11: ffffed00236fce7c R12: ffff880111365108
      [  306.528284] R13: 0000000000000000 R14: ffff8800b6c49860 R15: ffff8800b6c49860
      [  306.528284] FS:  00007f838b007700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
      [  306.528284] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  306.528284] CR2: 00007ffeafabcf78 CR3: 00000000b6cbe000 CR4: 00000000001006f0
      [  306.528284] Call Trace:
      [  306.528284]  netlink_dump+0x470/0xa20
      [  306.528284]  __netlink_dump_start+0x5ae/0x690
      [  306.528284]  ? nf_tables_getobj+0x1b3/0x740 [nf_tables]
      [  306.528284]  nf_tables_getobj+0x2f5/0x740 [nf_tables]
      [  306.528284]  ? nft_obj_notify+0x100/0x100 [nf_tables]
      [  306.528284]  ? nf_tables_getobj+0x740/0x740 [nf_tables]
      [  306.528284]  ? nf_tables_dump_flowtable_done+0x70/0x70 [nf_tables]
      [  306.528284]  ? nft_obj_notify+0x100/0x100 [nf_tables]
      [  306.528284]  nfnetlink_rcv_msg+0x8ff/0x932 [nfnetlink]
      [  306.528284]  ? nfnetlink_rcv_msg+0x216/0x932 [nfnetlink]
      [  306.528284]  netlink_rcv_skb+0x1c9/0x2f0
      [  306.528284]  ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
      [  306.528284]  ? debug_check_no_locks_freed+0x270/0x270
      [  306.528284]  ? netlink_ack+0x7a0/0x7a0
      [  306.528284]  ? ns_capable_common+0x6e/0x110
      [ ... ]
      
      Fixes: e46abbcc ("netfilter: nf_tables: Allow table names of up to 255 chars")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      360cc79d
    • P
      netfilter: nf_tables: disable preemption in nft_update_chain_stats() · ad9d9e85
      Pablo Neira Ayuso 提交于
      This patch fixes the following splat.
      
      [118709.054937] BUG: using smp_processor_id() in preemptible [00000000] code: test/1571
      [118709.054970] caller is nft_update_chain_stats.isra.4+0x53/0x97 [nf_tables]
      [118709.054980] CPU: 2 PID: 1571 Comm: test Not tainted 4.17.0-rc6+ #335
      [...]
      [118709.054992] Call Trace:
      [118709.055011]  dump_stack+0x5f/0x86
      [118709.055026]  check_preemption_disabled+0xd4/0xe4
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ad9d9e85
  5. 26 5月, 2018 12 次提交
    • L
      Merge branch 'akpm' (patches from Andrew) · bc2dbc54
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "16 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        kasan: fix memory hotplug during boot
        kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
        checkpatch: fix macro argument precedence test
        init/main.c: include <linux/mem_encrypt.h>
        kernel/sys.c: fix potential Spectre v1 issue
        mm/memory_hotplug: fix leftover use of struct page during hotplug
        proc: fix smaps and meminfo alignment
        mm: do not warn on offline nodes unless the specific node is explicitly requested
        mm, memory_hotplug: make has_unmovable_pages more robust
        mm/kasan: don't vfree() nonexistent vm_area
        MAINTAINERS: change hugetlbfs maintainer and update files
        ipc/shm: fix shmat() nil address after round-down when remapping
        Revert "ipc/shm: Fix shmat mmap nil-page protection"
        idr: fix invalid ptr dereference on item delete
        ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio"
        mm: fix nr_rotate_swap leak in swapon() error case
      bc2dbc54
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 03250e10
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Let's begin the holiday weekend with some networking fixes:
      
         1) Whoops need to restrict cfg80211 wiphy names even more to 64
            bytes. From Eric Biggers.
      
         2) Fix flags being ignored when using kernel_connect() with SCTP,
            from Xin Long.
      
         3) Use after free in DCCP, from Alexey Kodanev.
      
         4) Need to check rhltable_init() return value in ipmr code, from Eric
            Dumazet.
      
         5) XDP handling fixes in virtio_net from Jason Wang.
      
         6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.
      
         7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
            Morgenstein.
      
         8) Prevent out-of-bounds speculation using indexes in BPF, from
            Daniel Borkmann.
      
         9) Fix regression added by AF_PACKET link layer cure, from Willem de
            Bruijn.
      
        10) Correct ENIC dma mask, from Govindarajulu Varadarajan.
      
        11) Missing config options for PMTU tests, from Stefano Brivio"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
        ibmvnic: Fix partial success login retries
        selftests/net: Add missing config options for PMTU tests
        mlx4_core: allocate ICM memory in page size chunks
        enic: set DMA mask to 47 bit
        ppp: remove the PPPIOCDETACH ioctl
        ipv4: remove warning in ip_recv_error
        net : sched: cls_api: deal with egdev path only if needed
        vhost: synchronize IOTLB message with dev cleanup
        packet: fix reserve calculation
        net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
        net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
        bpf: properly enforce index mask to prevent out-of-bounds speculation
        net/mlx4: Fix irq-unsafe spinlock usage
        net: phy: broadcom: Fix bcm_write_exp()
        net: phy: broadcom: Fix auxiliary control register reads
        net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
        net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
        ibmvnic: Only do H_EOI for mobility events
        tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
        virtio-net: fix leaking page for gso packet during mergeable XDP
        ...
      03250e10
    • D
      kasan: fix memory hotplug during boot · 3f195972
      David Hildenbrand 提交于
      Using module_init() is wrong.  E.g.  ACPI adds and onlines memory before
      our memory notifier gets registered.
      
      This makes sure that ACPI memory detected during boot up will not result
      in a kernel crash.
      
      Easily reproducible with QEMU, just specify a DIMM when starting up.
      
      Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com
      Fixes: 786a8959 ("kasan: disable memory hotplug")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f195972
    • D
      kasan: free allocated shadow memory on MEM_CANCEL_ONLINE · ed1596f9
      David Hildenbrand 提交于
      We have to free memory again when we cancel onlining, otherwise a later
      onlining attempt will fail.
      
      Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com
      Fixes: fa69b598 ("mm/kasan: add support for memory hotplug")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed1596f9
    • J
      checkpatch: fix macro argument precedence test · d41362ed
      Joe Perches 提交于
      checkpatch's macro argument precedence test is broken so fix it.
      
      Link: http://lkml.kernel.org/r/5dd900e9197febc1995604bb33c23c136d8b33ce.camel@perches.comSigned-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d41362ed
    • M
      init/main.c: include <linux/mem_encrypt.h> · ae67d58d
      Mathieu Malaterre 提交于
      In commit c7753208 ("x86, swiotlb: Add memory encryption support") a
      call to function `mem_encrypt_init' was added.  Include prototype
      defined in header <linux/mem_encrypt.h> to prevent a warning reported
      during compilation with W=1:
      
        init/main.c:494:20: warning: no previous prototype for `mem_encrypt_init' [-Wmissing-prototypes]
      
      Link: http://lkml.kernel.org/r/20180522195533.31415-1-malat@debian.orgSigned-off-by: NMathieu Malaterre <malat@debian.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Gargi Sharma <gs051095@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae67d58d
    • G
      kernel/sys.c: fix potential Spectre v1 issue · 23d6aef7
      Gustavo A. R. Silva 提交于
      `resource' can be controlled by user-space, hence leading to a potential
      exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
        kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
        kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
      
      Fix this by sanitizing *resource* before using it to index
      current->signal->rlim
      
      Notice that given that speculation windows are large, the policy is to
      kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
      
      Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.comSigned-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23d6aef7
    • J
      mm/memory_hotplug: fix leftover use of struct page during hotplug · a2155861
      Jonathan Cameron 提交于
      The case of a new numa node got missed in avoiding using the node info
      from page_struct during hotplug.  In this path we have a call to
      register_mem_sect_under_node (which allows us to specify it is hotplug
      so don't change the node), via link_mem_sections which unfortunately
      does not.
      
      Fix is to pass check_nid through link_mem_sections as well and disable
      it in the new numa node path.
      
      Note the bug only 'sometimes' manifests depending on what happens to be
      in the struct page structures - there are lots of them and it only needs
      to match one of them.
      
      The result of the bug is that (with a new memory only node) we never
      successfully call register_mem_sect_under_node so don't get the memory
      associated with the node in sysfs and meminfo for the node doesn't
      report it.
      
      It came up whilst testing some arm64 hotplug patches, but appears to be
      universal.  Whilst I'm triggering it by removing then reinserting memory
      to a node with no other elements (thus making the node disappear then
      appear again), it appears it would happen on hotplugging memory where
      there was none before and it doesn't seem to be related the arm64
      patches.
      
      These patches call __add_pages (where most of the issue was fixed by
      Pavel's patch).  If there is a node at the time of the __add_pages call
      then all is well as it calls register_mem_sect_under_node from there
      with check_nid set to false.  Without a node that function returns
      having not done the sysfs related stuff as there is no node to use.
      This is expected but it is the resulting path that fails...
      
      Exact path to the problem is as follows:
      
       mm/memory_hotplug.c: add_memory_resource()
      
         The node is not online so we enter the 'if (new_node)' twice, on the
         second such block there is a call to link_mem_sections which calls
         into
      
        drivers/node.c: link_mem_sections() which calls
      
        drivers/node.c: register_mem_sect_under_node() which calls
           get_nid_for_pfn and keeps trying until the output of that matches
           the expected node (passed all the way down from
           add_memory_resource)
      
      It is effectively the same fix as the one referred to in the fixes tag
      just in the code path for a new node where the comments point out we
      have to rerun the link creation because it will have failed in
      register_new_memory (as there was no node at the time).  (actually that
      comment is wrong now as we don't have register_new_memory any more it
      got renamed to hotplug_memory_register in Pavel's patch).
      
      Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com
      Fixes: fc44f7f9 ("mm/memory_hotplug: don't read nid from struct page during hotplug")
      Signed-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2155861
    • H
      proc: fix smaps and meminfo alignment · 6c04ab0e
      Hugh Dickins 提交于
      The 4.17-rc /proc/meminfo and /proc/<pid>/smaps look ugly: single-digit
      numbers (commonly 0) are misaligned.
      
      Remove seq_put_decimal_ull_width()'s leftover optimization for single
      digits: it's wrong now that num_to_str() takes care of the width.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805241554210.1326@eggly.anvils
      Fixes: d1be35cb ("proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c04ab0e
    • M
      mm: do not warn on offline nodes unless the specific node is explicitly requested · 8addc2d0
      Michal Hocko 提交于
      Oscar has noticed that we splat
      
         WARNING: CPU: 0 PID: 64 at ./include/linux/gfp.h:467 vmemmap_alloc_block+0x4e/0xc9
         [...]
         CPU: 0 PID: 64 Comm: kworker/u4:1 Tainted: G        W   E     4.17.0-rc5-next-20180517-1-default+ #66
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
         Workqueue: kacpi_hotplug acpi_hotplug_work_fn
         Call Trace:
          vmemmap_populate+0xf2/0x2ae
          sparse_mem_map_populate+0x28/0x35
          sparse_add_one_section+0x4c/0x187
          __add_pages+0xe7/0x1a0
          add_pages+0x16/0x70
          add_memory_resource+0xa3/0x1d0
          add_memory+0xe4/0x110
          acpi_memory_device_add+0x134/0x2e0
          acpi_bus_attach+0xd9/0x190
          acpi_bus_scan+0x37/0x70
          acpi_device_hotplug+0x389/0x4e0
          acpi_hotplug_work_fn+0x1a/0x30
          process_one_work+0x146/0x340
          worker_thread+0x47/0x3e0
          kthread+0xf5/0x130
          ret_from_fork+0x35/0x40
      
      when adding memory to a node that is currently offline.
      
      The VM_WARN_ON is just too loud without a good reason.  In this
      particular case we are doing
      
      	alloc_pages_node(node, GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_NOWARN, order)
      
      so we do not insist on allocating from the given node (it is more a
      hint) so we can fall back to any other populated node and moreover we
      explicitly ask to not warn for the allocation failure.
      
      Soften the warning only to cases when somebody asks for the given node
      explicitly by __GFP_THISNODE.
      
      Link: http://lkml.kernel.org/r/20180523125555.30039-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NOscar Salvador <osalvador@techadventures.net>
      Tested-by: NOscar Salvador <osalvador@techadventures.net>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8addc2d0
    • M
      mm, memory_hotplug: make has_unmovable_pages more robust · 15c30bc0
      Michal Hocko 提交于
      Oscar has reported:
      : Due to an unfortunate setting with movablecore, memblocks containing bootmem
      : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
      : So while trying to remove that memory, the system failed in do_migrate_range
      : and __offline_pages never returned.
      :
      : This can be reproduced by running
      : qemu-system-x86_64 -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M
      : and movablecore=4G kernel command line
      :
      : linux kernel: BIOS-provided physical RAM map:
      : linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
      : linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
      : linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
      : linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
      : linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
      : linux kernel: NX (Execute Disable) protection: active
      : linux kernel: SMBIOS 2.8 present.
      : linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
      : linux kernel: Hypervisor detected: KVM
      : linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
      : linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
      : linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
      :
      : linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
      : linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
      : linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
      : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
      : linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
      : linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
      : linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
      : linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
      :
      : zoneinfo shows that the zone movable is placed into both numa nodes:
      : Node 0, zone  Movable
      :   pages free     160140
      :         min      1823
      :         low      2278
      :         high     2733
      :         spanned  262144
      :         present  262144
      :         managed  245670
      : Node 1, zone  Movable
      :   pages free     448427
      :         min      3827
      :         low      4783
      :         high     5739
      :         spanned  524288
      :         present  524288
      :         managed  515766
      
      Note how only Node 0 has a hutplugable memory region which would rule it
      out from the early memblock allocations (most likely memmap).  Node1
      will surely contain memmaps on the same node and those would prevent
      offlining to succeed.  So this is arguably a configuration issue.
      Although one could argue that we should be more clever and rule early
      allocations from the zone movable.  This would be correct but probably
      not worth the effort considering what a hack movablecore is.
      
      Anyway, We could do better for those cases though.  We rely on
      start_isolate_page_range resp.  has_unmovable_pages to do their job.
      The first one isolates the whole range to be offlined so that we do not
      allocate from it anymore and the later makes sure we are not stumbling
      over non-migrateable pages.
      
      has_unmovable_pages is overly optimistic, however.  It doesn't check all
      the pages if we are withing zone_movable because we rely that those
      pages will be always migrateable.  As it turns out we are still not
      perfect there.  While bootmem pages in zonemovable sound like a clear
      bug which should be fixed let's remove the optimization for now and warn
      if we encounter unmovable pages in zone_movable in the meantime.  That
      should help for now at least.
      
      Btw.  this wasn't a real problem until commit 72b39cfc ("mm,
      memory_hotplug: do not fail offlining too early") because we used to
      have a small number of retries and then failed.  This turned out to be
      too fragile though.
      
      Link: http://lkml.kernel.org/r/20180523125555.30039-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NOscar Salvador <osalvador@techadventures.net>
      Tested-by: NOscar Salvador <osalvador@techadventures.net>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15c30bc0
    • A
      mm/kasan: don't vfree() nonexistent vm_area · 0f901dcb
      Andrey Ryabinin 提交于
      KASAN uses different routines to map shadow for hot added memory and
      memory obtained in boot process.  Attempt to offline memory onlined by
      normal boot process leads to this:
      
          Trying to vfree() nonexistent vm area (000000005d3b34b9)
          WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190
      
          Call Trace:
           kasan_mem_notifier+0xad/0xb9
           notifier_call_chain+0x166/0x260
           __blocking_notifier_call_chain+0xdb/0x140
           __offline_pages+0x96a/0xb10
           memory_subsys_offline+0x76/0xc0
           device_offline+0xb8/0x120
           store_mem_state+0xfa/0x120
           kernfs_fop_write+0x1d5/0x320
           __vfs_write+0xd4/0x530
           vfs_write+0x105/0x340
           SyS_write+0xb0/0x140
      
      Obviously we can't call vfree() to free memory that wasn't allocated via
      vmalloc().  Use find_vm_area() to see if we can call vfree().
      
      Unfortunately it's a bit tricky to properly unmap and free shadow
      allocated during boot, so we'll have to keep it.  If memory will come
      online again that shadow will be reused.
      
      Matthew asked: how can you call vfree() on something that isn't a
      vmalloc address?
      
        vfree() is able to free any address returned by
        __vmalloc_node_range().  And __vmalloc_node_range() gives you any
        address you ask.  It doesn't have to be an address in [VMALLOC_START,
        VMALLOC_END] range.
      
        That's also how the module_alloc()/module_memfree() works on
        architectures that have designated area for modules.
      
      [aryabinin@virtuozzo.com: improve comments]
        Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com
      [akpm@linux-foundation.org: fix typos, reflow comment]
      Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com
      Fixes: fa69b598 ("mm/kasan: add support for memory hotplug")
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: NPaul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f901dcb