1. 31 3月, 2020 3 次提交
    • Y
      net: hns3: drop the WQ_MEM_RECLAIM flag when allocating WQ · 16deaef2
      Yunsheng Lin 提交于
      The WQ in hns3 driver is allocated with WQ_MEM_RECLAIM flag
      in order to guarantee forward progress, which may cause hns3'
      WQ_MEM_RECLAIM WQ flushing infiniband' !WQ_MEM_RECLAIM WQ
      warning:
      
      [11246.200168] hns3 0000:bd:00.1: Reset done, hclge driver initialization finished.
      [11246.209979] hns3 0000:bd:00.1 eth7: net open
      [11246.227608] ------------[ cut here ]------------
      [11246.237370] workqueue: WQ_MEM_RECLAIM hclge:hclge_service_task [hclge] is flushing !WQ_MEM_RECLAIM infiniband:0x0
      [11246.237391] WARNING: CPU: 50 PID: 2279 at ./kernel/workqueue.c:2605 check_flush_dependency+0xcc/0x140
      [11246.260412] Modules linked in: hclgevf hns_roce_hw_v2 rdma_test(O) hns3 xt_CHECKSUM iptable_mangle xt_conntrack ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter vfio_iommu_type1 vfio_pci vfio_virqfd vfio ib_isert iscsi_target_mod ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi aes_ce_blk crypto_simd cryptd aes_ce_cipher sunrpc nls_iso8859_1 crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce joydev input_leds hid_generic usbkbd usbmouse sbsa_gwdt usbhid usb_storage hid ses hclge hisi_zip hisi_hpre hisi_sec2 hnae3 hisi_qm ahci hisi_trng_v2 evbug uacce rng_core gpio_dwapb autofs4 hisi_sas_v3_hw megaraid_sas hisi_sas_main libsas scsi_transport_sas [last unloaded: hns_roce_hw_v2]
      [11246.325742] CPU: 50 PID: 2279 Comm: kworker/50:0 Kdump: loaded Tainted: G           O      5.4.0-rc4+ #1
      [11246.335181] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 2280-V2 CS V3.B140.01 12/18/2019
      [11246.344802] Workqueue: hclge hclge_service_task [hclge]
      [11246.350007] pstate: 60c00009 (nZCv daif +PAN +UAO)
      [11246.354779] pc : check_flush_dependency+0xcc/0x140
      [11246.359549] lr : check_flush_dependency+0xcc/0x140
      [11246.364317] sp : ffff800268a73990
      [11246.367618] x29: ffff800268a73990 x28: 0000000000000001
      [11246.372907] x27: ffffcbe4f5868000 x26: ffffcbe4f5541000
      [11246.378196] x25: 00000000000000b8 x24: ffff002fdd0ff868
      [11246.383483] x23: ffff002fdd0ff800 x22: ffff2027401ba600
      [11246.388770] x21: 0000000000000000 x20: ffff002fdd0ff800
      [11246.394059] x19: ffff202719293b00 x18: ffffcbe4f5541948
      [11246.399347] x17: 000000006f8ad8dd x16: 0000000000000002
      [11246.404634] x15: ffff8002e8a734f7 x14: 6c66207369205d65
      [11246.409922] x13: 676c63685b206b73 x12: 61745f6563697672
      [11246.415208] x11: 65735f65676c6368 x10: 3a65676c6368204d
      [11246.420494] x9 : 49414c4345525f4d x8 : 6e6162696e69666e
      [11246.425782] x7 : 69204d49414c4345 x6 : ffffcbe4f5765145
      [11246.431068] x5 : 0000000000000000 x4 : 0000000000000000
      [11246.436355] x3 : 0000000000000030 x2 : 00000000ffffffff
      [11246.441642] x1 : 3349eb1ac5310100 x0 : 0000000000000000
      [11246.446928] Call trace:
      [11246.449363]  check_flush_dependency+0xcc/0x140
      [11246.453785]  flush_workqueue+0x110/0x410
      [11246.457691]  ib_cache_cleanup_one+0x54/0x468
      [11246.461943]  __ib_unregister_device+0x70/0xa8
      [11246.466279]  ib_unregister_device+0x2c/0x40
      [11246.470455]  hns_roce_exit+0x34/0x198 [hns_roce_hw_v2]
      [11246.475571]  __hns_roce_hw_v2_uninit_instance.isra.56+0x3c/0x58 [hns_roce_hw_v2]
      [11246.482934]  hns_roce_hw_v2_reset_notify+0xd8/0x210 [hns_roce_hw_v2]
      [11246.489261]  hclge_notify_roce_client+0x84/0xe0 [hclge]
      [11246.494464]  hclge_reset_rebuild+0x60/0x730 [hclge]
      [11246.499320]  hclge_reset_service_task+0x400/0x5a0 [hclge]
      [11246.504695]  hclge_service_task+0x54/0x698 [hclge]
      [11246.509464]  process_one_work+0x15c/0x458
      [11246.513454]  worker_thread+0x144/0x520
      [11246.517186]  kthread+0xfc/0x128
      [11246.520314]  ret_from_fork+0x10/0x18
      [11246.523873] ---[ end trace eb980723699c2585 ]---
      [11246.528710] hns3 0000:bd:00.2: Func clear success after reset.
      [11246.528747] hns3 0000:bd:00.0: Func clear success after reset.
      [11246.907710] hns3 0000:bd:00.1 eth7: link up
      
      According to [1] and [2]:
      
      There seems to be no specific guidance about how to handling the
      forward progress guarantee of network device's WQ yet, and other
      network device's WQ seem to be marked with WQ_MEM_RECLAIM without
      a clear reason.
      
      So this patch removes the WQ_MEM_RECLAIM flag when allocating WQ
      to aviod the above warning.
      
      1. https://www.spinics.net/lists/netdev/msg631646.html
      2. https://www.spinics.net/lists/netdev/msg632097.html
      
      Fixes: 0ea68902 ("net: hns3: allocate WQ with WQ_MEM_RECLAIM flag")
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16deaef2
    • F
      net: fix fraglist segmentation reference count leak · cf673ed0
      Florian Westphal 提交于
      Xin Long says:
       On udp rx path udp_rcv_segment() may do segment where the frag skbs
       will get the header copied from the head skb in skb_segment_list()
       by calling __copy_skb_header(), which could overwrite the frag skbs'
       extensions by __skb_ext_copy() and cause a leak.
      
       This issue was found after loading esp_offload where a sec path ext
       is set in the skb.
      
      Fix this by discarding head state of the fraglist skb before replacing
      its contents.
      
      Fixes: 3a1296a3 ("net: Support GRO/GSO fraglist chaining.")
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf673ed0
    • X
      udp: initialize is_flist with 0 in udp_gro_receive · bde1b56f
      Xin Long 提交于
      Without NAPI_GRO_CB(skb)->is_flist initialized, when the dev doesn't
      support NETIF_F_GRO_FRAGLIST, is_flist can still be set and fraglist
      will be used in udp_gro_receive().
      
      So fix it by initializing is_flist with 0 in udp_gro_receive.
      
      Fixes: 9fd1ff5d ("udp: Support UDP fraglist GRO/GSO.")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bde1b56f
  2. 30 3月, 2020 14 次提交
    • W
      net, ip_tunnel: fix interface lookup with no key · 25629fda
      William Dauchy 提交于
      when creating a new ipip interface with no local/remote configuration,
      the lookup is done with TUNNEL_NO_KEY flag, making it impossible to
      match the new interface (only possible match being fallback or metada
      case interface); e.g: `ip link add tunl1 type ipip dev eth0`
      
      To fix this case, adding a flag check before the key comparison so we
      permit to match an interface with no local/remote config; it also avoids
      breaking possible userland tools relying on TUNNEL_NO_KEY flag and
      uninitialised key.
      
      context being on my side, I'm creating an extra ipip interface attached
      to the physical one, and moving it to a dedicated namespace.
      
      Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
      Signed-off-by: NWilliam Dauchy <w.dauchy@criteo.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25629fda
    • M
      sctp: fix possibly using a bad saddr with a given dst · 582eea23
      Marcelo Ricardo Leitner 提交于
      Under certain circumstances, depending on the order of addresses on the
      interfaces, it could be that sctp_v[46]_get_dst() would return a dst
      with a mismatched struct flowi.
      
      For example, if when walking through the bind addresses and the first
      one is not a match, it saves the dst as a fallback (added in
      410f0383), but not the flowi. Then if the next one is also not a
      match, the previous dst will be returned but with the flowi information
      for the 2nd address, which is wrong.
      
      The fix is to use a locally stored flowi that can be used for such
      attempts, and copy it to the parameter only in case it is a possible
      match, together with the corresponding dst entry.
      
      The patch updates IPv6 code mostly just to be in sync. Even though the issue
      is also present there, it fallback is not expected to work with IPv6.
      
      Fixes: 410f0383 ("sctp: add routing output fallback")
      Reported-by: NJin Meng <meng.a.jin@nokia-sbell.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      582eea23
    • Q
      sctp: fix refcount bug in sctp_wfree · 5c3e82fe
      Qiujun Huang 提交于
      We should iterate over the datamsgs to move
      all chunks(skbs) to newsk.
      
      The following case cause the bug:
      for the trouble SKB, it was in outq->transmitted list
      
      sctp_outq_sack
              sctp_check_transmitted
                      SKB was moved to outq->sacked list
              then throw away the sack queue
                      SKB was deleted from outq->sacked
      (but it was held by datamsg at sctp_datamsg_to_asoc
      So, sctp_wfree was not called here)
      
      then migrate happened
      
              sctp_for_each_tx_datachunk(
              sctp_clear_owner_w);
              sctp_assoc_migrate();
              sctp_for_each_tx_datachunk(
              sctp_set_owner_w);
      SKB was not in the outq, and was not changed to newsk
      
      finally
      
      __sctp_outq_teardown
              sctp_chunk_put (for another skb)
                      sctp_datamsg_put
                              __kfree_skb(msg->frag_list)
                                      sctp_wfree (for SKB)
      	SKB->sk was still oldsk (skb->sk != asoc->base.sk).
      
      Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com
      Signed-off-by: NQiujun Huang <hqjagain@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c3e82fe
    • Q
      ipv4: fix a RCU-list lock in fib_triestat_seq_show · fbe4e0c1
      Qian Cai 提交于
      fib_triestat_seq_show() calls hlist_for_each_entry_rcu(tb, head,
      tb_hlist) without rcu_read_lock() will trigger a warning,
      
       net/ipv4/fib_trie.c:2579 RCU-list traversed in non-reader section!!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 2, debug_locks = 1
       1 lock held by proc01/115277:
        #0: c0000014507acf00 (&p->lock){+.+.}-{3:3}, at: seq_read+0x58/0x670
      
       Call Trace:
        dump_stack+0xf4/0x164 (unreliable)
        lockdep_rcu_suspicious+0x140/0x164
        fib_triestat_seq_show+0x750/0x880
        seq_read+0x1a0/0x670
        proc_reg_read+0x10c/0x1b0
        __vfs_read+0x3c/0x70
        vfs_read+0xac/0x170
        ksys_read+0x7c/0x140
        system_call+0x5c/0x68
      
      Fix it by adding a pair of rcu_read_lock/unlock() and use
      cond_resched_rcu() to avoid the situation where walking of a large
      number of items  may prevent scheduling for a long time.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbe4e0c1
    • J
      mac80211: fix authentication with iwlwifi/mvm · be8c827f
      Johannes Berg 提交于
      The original patch didn't copy the ieee80211_is_data() condition
      because on most drivers the management frames don't go through
      this path. However, they do on iwlwifi/mvm, so we do need to keep
      the condition here.
      
      Cc: stable@vger.kernel.org
      Fixes: ce2e1ca7 ("mac80211: Check port authorization in the ieee80211_tx_dequeue() case")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be8c827f
    • L
      Linux 5.6 · 7111951b
      Linus Torvalds 提交于
      7111951b
    • L
      Merge branch 'akpm' (patches from Andrew) · 570203ec
      Linus Torvalds 提交于
      Merge vm fixes from Andrew Morton:
       "5 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/sparse: fix kernel crash with pfn_section_valid check
        mm: fork: fix kernel_stack memcg stats for various stack implementations
        hugetlb_cgroup: fix illegal access to memory
        drivers/base/memory.c: indicate all memory blocks as removable
        mm/swapfile.c: move inode_lock out of claim_swapfile
      570203ec
    • L
      Merge tag 'timers-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ab93e984
      Linus Torvalds 提交于
      Pull timer fix from Thomas Gleixner:
       "A single fix for the Hyper-V clocksource driver to make sched clock
        actually return nanoseconds and not the virtual clock value which
        increments at 10e7 HZ (100ns)"
      
      * tag 'timers-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/hyper-v: Make sched clock return nanoseconds correctly
      ab93e984
    • L
      Merge tag 'irq-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 01af08bd
      Linus Torvalds 提交于
      Pull irq fix from Thomas Gleixner:
       "A single bugfix to prevent reference leaks in irq affinity notifiers"
      
      * tag 'irq-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Fix reference leaks on irq affinity notifiers
      01af08bd
    • A
      mm/sparse: fix kernel crash with pfn_section_valid check · b943f045
      Aneesh Kumar K.V 提交于
      Fix the crash like this:
      
          BUG: Kernel NULL pointer dereference on read at 0x00000000
          Faulting instruction address: 0xc000000000c3447c
          Oops: Kernel access of bad area, sig: 11 [#1]
          LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
          CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
          ...
          NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
          LR [c000000000088354] vmemmap_free+0x144/0x320
          Call Trace:
             section_deactivate+0x220/0x240
             __remove_pages+0x118/0x170
             arch_remove_memory+0x3c/0x150
             memunmap_pages+0x1cc/0x2f0
             devm_action_release+0x30/0x50
             release_nodes+0x2f8/0x3e0
             device_release_driver_internal+0x168/0x270
             unbind_store+0x130/0x170
             drv_attr_store+0x44/0x60
             sysfs_kf_write+0x68/0x80
             kernfs_fop_write+0x100/0x290
             __vfs_write+0x3c/0x70
             vfs_write+0xcc/0x240
             ksys_write+0x7c/0x140
             system_call+0x5c/0x68
      
      The crash is due to NULL dereference at
      
      	test_bit(idx, ms->usage->subsection_map);
      
      due to ms->usage = NULL in pfn_section_valid()
      
      With commit d41e2f3b ("mm/hotplug: fix hot remove failure in
      SPARSEMEM|!VMEMMAP case") section_mem_map is set to NULL after
      depopulate_section_mem().  This was done so that pfn_page() can work
      correctly with kernel config that disables SPARSEMEM_VMEMMAP.  With that
      config pfn_to_page does
      
      	__section_mem_map_addr(__sec) + __pfn;
      
      where
      
        static inline struct page *__section_mem_map_addr(struct mem_section *section)
        {
      	unsigned long map = section->section_mem_map;
      	map &= SECTION_MAP_MASK;
      	return (struct page *)map;
        }
      
      Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is
      used to check the pfn validity (pfn_valid()).  Since section_deactivate
      release mem_section->usage if a section is fully deactivated,
      pfn_valid() check after a subsection_deactivate cause a kernel crash.
      
        static inline int pfn_valid(unsigned long pfn)
        {
        ...
      	return early_section(ms) || pfn_section_valid(ms, pfn);
        }
      
      where
      
        static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
        {
      	int idx = subsection_map_index(pfn);
      
      	return test_bit(idx, ms->usage->subsection_map);
        }
      
      Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is
      freed.  For architectures like ppc64 where large pages are used for
      vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple
      sections.  Hence before a vmemmap mapping page can be freed, the kernel
      needs to make sure there are no valid sections within that mapping.
      Clearing the section valid bit before depopulate_section_memap enables
      this.
      
      [aneesh.kumar@linux.ibm.com: add comment]
        Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.comLink: http://lkml.kernel.org/r/20200325031914.107660-1-aneesh.kumar@linux.ibm.com
      Fixes: d41e2f3b ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
      Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b943f045
    • R
      mm: fork: fix kernel_stack memcg stats for various stack implementations · 8380ce47
      Roman Gushchin 提交于
      Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the
      space for task stacks can be allocated using __vmalloc_node_range(),
      alloc_pages_node() and kmem_cache_alloc_node().
      
      In the first and the second cases page->mem_cgroup pointer is set, but
      in the third it's not: memcg membership of a slab page should be
      determined using the memcg_from_slab_page() function, which looks at
      page->slab_cache->memcg_params.memcg .  In this case, using
      mod_memcg_page_state() (as in account_kernel_stack()) is incorrect:
      page->mem_cgroup pointer is NULL even for pages charged to a non-root
      memory cgroup.
      
      It can lead to kernel_stack per-memcg counters permanently showing 0 on
      some architectures (depending on the configuration).
      
      In order to fix it, let's introduce a mod_memcg_obj_state() helper,
      which takes a pointer to a kernel object as a first argument, uses
      mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls
      mod_memcg_state().  It allows to handle all possible configurations
      (CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without
      spilling any memcg/kmem specifics into fork.c .
      
      Note: This is a special version of the patch created for stable
      backports.  It contains code from the following two patches:
        - mm: memcg/slab: introduce mem_cgroup_from_obj()
        - mm: fork: fix kernel_stack memcg stats for various stack implementations
      
      [guro@fb.com: introduce mem_cgroup_from_obj()]
        Link: http://lkml.kernel.org/r/20200324004221.GA36662@carbon.dhcp.thefacebook.com
      Fixes: 4d96ba35 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Bharata B Rao <bharata@linux.ibm.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8380ce47
    • M
      hugetlb_cgroup: fix illegal access to memory · 726b7bbe
      Mina Almasry 提交于
      This appears to be a mistake in commit faced7e0 ("mm: hugetlb
      controller for cgroups v2").
      
      Essentially that commit does a hugetlb_cgroup_from_counter assuming that
      page_counter_try_charge has initialized counter.
      
      But if that has failed then it seems will not initialize counter, so
      hugetlb_cgroup_from_counter(counter) ends up pointing to random memory,
      causing kasan to complain.
      
      The solution is to simply use 'h_cg', instead of
      hugetlb_cgroup_from_counter(counter), since that is a reference to the
      hugetlb_cgroup anyway.  After this change kasan ceases to complain.
      
      Fixes: faced7e0 ("mm: hugetlb controller for cgroups v2")
      Reported-by: syzbot+cac0c4e204952cf449b1@syzkaller.appspotmail.com
      Signed-off-by: NMina Almasry <almasrymina@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NGiuseppe Scrivano <gscrivan@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Link: http://lkml.kernel.org/r/20200313223920.124230-1-almasrymina@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      726b7bbe
    • D
      drivers/base/memory.c: indicate all memory blocks as removable · 53cdc1cb
      David Hildenbrand 提交于
      We see multiple issues with the implementation/interface to compute
      whether a memory block can be offlined (exposed via
      /sys/devices/system/memory/memoryX/removable) and would like to simplify
      it (remove the implementation).
      
      1. It runs basically lockless. While this might be good for performance,
         we see possible races with memory offlining that will require at
         least some sort of locking to fix.
      
      2. Nowadays, more false positives are possible. No arch-specific checks
         are performed that validate if memory offlining will not be denied
         right away (and such check will require locking). For example, arm64
         won't allow to offline any memory block that was added during boot -
         which will imply a very high error rate. Other archs have other
         constraints.
      
      3. The interface is inherently racy. E.g., if a memory block is detected
         to be removable (and was not a false positive at that time), there is
         still no guarantee that offlining will actually succeed. So any
         caller already has to deal with false positives.
      
      4. It is unclear which performance benefit this interface actually
         provides. The introducing commit 5c755e9f ("memory-hotplug: add
         sysfs removable attribute for hotplug memory remove") mentioned
      
      	"A user-level agent must be able to identify which sections
      	 of memory are likely to be removable before attempting the
      	 potentially expensive operation."
      
         However, no actual performance comparison was included.
      
      Known users:
      
       - lsmem: Will group memory blocks based on the "removable" property. [1]
      
       - chmem: Indirect user. It has a RANGE mode where one can specify
                removable ranges identified via lsmem to be offlined. However,
                it also has a "SIZE" mode, which allows a sysadmin to skip the
                manual "identify removable blocks" step. [2]
      
       - powerpc-utils: Uses the "removable" attribute to skip some memory
                blocks right away when trying to find some to offline+remove.
                However, with ballooning enabled, it already skips this
                information completely (because it once resulted in many false
                negatives). Therefore, the implementation can deal with false
                positives properly already. [3]
      
      According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer
      driven from userspace via the drmgr command (powerpc-utils).  Nowadays
      it's managed in the kernel - including onlining/offlining of memory
      blocks - triggered by drmgr writing to /sys/kernel/dlpar.  So the
      affected legacy userspace handling is only active on old kernels.  Only
      very old versions of drmgr on a new kernel (unlikely) might execute
      slower - totally acceptable.
      
      With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not
      break any user space tool.  We implement a very bad heuristic now.
      Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report
      "not removable" as before.
      
      Original discussion can be found in [4] ("[PATCH RFC v1] mm:
      is_mem_section_removable() overhaul").
      
      Other users of is_mem_section_removable() will be removed next, so that
      we can remove is_mem_section_removable() completely.
      
      [1] http://man7.org/linux/man-pages/man1/lsmem.1.html
      [2] http://man7.org/linux/man-pages/man8/chmem.8.html
      [3] https://github.com/ibm-power-utilities/powerpc-utils
      [4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com
      
      Also, this patch probably fixes a crash reported by Steve.
      http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.comReported-by: N"Scargall, Steve" <steve.scargall@intel.com>
      Suggested-by: NMichal Hocko <mhocko@kernel.org>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NNathan Fontenot <ndfont@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Karel Zak <kzak@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53cdc1cb
    • N
      mm/swapfile.c: move inode_lock out of claim_swapfile · d795a90e
      Naohiro Aota 提交于
      claim_swapfile() currently keeps the inode locked when it is successful,
      or the file is already swapfile (with -EBUSY).  And, on the other error
      cases, it does not lock the inode.
      
      This inconsistency of the lock state and return value is quite confusing
      and actually causing a bad unlock balance as below in the "bad_swap"
      section of __do_sys_swapon().
      
      This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
      check out of claim_swapfile().  The inode is unlocked in
      "bad_swap_unlock_inode" section, so that the inode is ensured to be
      unlocked at "bad_swap".  Thus, error handling codes after the locking now
      jumps to "bad_swap_unlock_inode" instead of "bad_swap".
      
          =====================================
          WARNING: bad unlock balance detected!
          5.5.0-rc7+ #176 Not tainted
          -------------------------------------
          swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: __do_sys_swapon+0x94b/0x3550
          but there are no more locks to release!
      
          other info that might help us debug this:
          no locks held by swapon/4294.
      
          stack backtrace:
          CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
          Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
          Call Trace:
           dump_stack+0xa1/0xea
           print_unlock_imbalance_bug.cold+0x114/0x123
           lock_release+0x562/0xed0
           up_write+0x2d/0x490
           __do_sys_swapon+0x94b/0x3550
           __x64_sys_swapon+0x54/0x80
           do_syscall_64+0xa4/0x4b0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
          RIP: 0033:0x7f15da0a0dc7
      
      Fixes: 1638045c ("mm: set S_SWAPFILE on blockdev swap devices")
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NQais Youef <qais.yousef@arm.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d795a90e
  3. 29 3月, 2020 3 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · e595dd94
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix memory leak in vti6, from Torsten Hilbrich.
      
       2) Fix double free in xfrm_policy_timer, from YueHaibing.
      
       3) NL80211_ATTR_CHANNEL_WIDTH attribute is put with wrong type, from
          Johannes Berg.
      
       4) Wrong allocation failure check in qlcnic driver, from Xu Wang.
      
       5) Get ks8851-ml IO operations right, for real this time, from Marek
          Vasut.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits)
        r8169: fix PHY driver check on platforms w/o module softdeps
        net: ks8851-ml: Fix IO operations, again
        mlxsw: spectrum_mr: Fix list iteration in error path
        qlcnic: Fix bad kzalloc null test
        mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TX
        mac80211: mark station unauthorized before key removal
        mac80211: Check port authorization in the ieee80211_tx_dequeue() case
        cfg80211: Do not warn on same channel at the end of CSA
        mac80211: drop data frames without key on encrypted links
        ieee80211: fix HE SPR size calculation
        nl80211: fix NL80211_ATTR_CHANNEL_WIDTH attribute type
        xfrm: policy: Fix doulbe free in xfrm_policy_timer
        bpf: Explicitly memset some bpf info structures declared on the stack
        bpf: Explicitly memset the bpf_attr structure
        bpf: Sanitize the bpf_struct_ops tcp-cc name
        vti6: Fix memory leak of skb if input policy check fails
        esp: remove the skb from the chain when it's enqueued in cryptd_wq
        ipv6: xfrm6_tunnel.c: Use built-in RCU list checking
        xfrm: add the missing verify_sec_ctx_len check in xfrm_add_acquire
        xfrm: fix uctx len check in verify_sec_ctx_len
        ...
      e595dd94
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 906c4043
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "Three more driver bugfixes, and two doc improvements fixing build
        warnings while we are here"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: pca-platform: Use platform_irq_get_optional
        i2c: st: fix missing struct parameter description
        i2c: nvidia-gpu: Handle timeout correctly in gpu_i2c_check_status()
        i2c: fix a doc warning
        i2c: hix5hd2: add missed clk_disable_unprepare in remove
      906c4043
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 83fd69c9
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "Two small fixes: one in drivers (qla2xxx), and one in the core (sd) to
        try to cope with USB enclosures that silently change reported
        parameters"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: sd: Fix optimal I/O size for devices that change reported values
        scsi: qla2xxx: Fix I/Os being passed down when FC device is being deleted
      83fd69c9
  4. 28 3月, 2020 12 次提交
    • C
      i2c: pca-platform: Use platform_irq_get_optional · 14c1fe69
      Chris Packham 提交于
      The interrupt is not required so use platform_irq_get_optional() to
      avoid error messages like
      
        i2c-pca-platform 22080000.i2c: IRQ index 0 not found
      Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      14c1fe69
    • A
      i2c: st: fix missing struct parameter description · f491c668
      Alain Volmat 提交于
      Fix a missing struct parameter description to allow
      warning free W=1 compilation.
      Signed-off-by: NAlain Volmat <avolmat@me.com>
      Reviewed-by: NPatrice Chotard <patrice.chotard@st.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      f491c668
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · a0ba26f3
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2020-03-27
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 3 non-merge commits during the last 4 day(s) which contain
      a total of 4 files changed, 25 insertions(+), 20 deletions(-).
      
      The main changes are:
      
      1) Explicitly memset the bpf_attr structure on bpf() syscall to avoid
         having to rely on compiler to do so. Issues have been noticed on
         some compilers with padding and other oddities where the request was
         then unexpectedly rejected, from Greg Kroah-Hartman.
      
      2) Sanitize the bpf_struct_ops TCP congestion control name in order to
         avoid problematic characters such as whitespaces, from Martin KaFai Lau.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0ba26f3
    • H
      r8169: fix PHY driver check on platforms w/o module softdeps · 2e8c339b
      Heiner Kallweit 提交于
      On Android/x86 the module loading infrastructure can't deal with
      softdeps. Therefore the check for presence of the Realtek PHY driver
      module fails. mdiobus_register() will try to load the PHY driver
      module, therefore move the check to after this call and explicitly
      check that a dedicated PHY driver is bound to the PHY device.
      
      Fixes: f3259377 ("r8169: check that Realtek PHY driver module is loaded")
      Reported-by: NChih-Wei Huang <cwhuang@android-x86.org>
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e8c339b
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · e00dd941
      David S. Miller 提交于
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2020-03-27
      
      1) Handle NETDEV_UNREGISTER for xfrm device to handle asynchronous
         unregister events cleanly. From Raed Salem.
      
      2) Fix vti6 tunnel inter address family TX through bpf_redirect().
         From Nicolas Dichtel.
      
      3) Fix lenght check in verify_sec_ctx_len() to avoid a
         slab-out-of-bounds. From Xin Long.
      
      4) Add a missing verify_sec_ctx_len check in xfrm_add_acquire
         to avoid a possible out-of-bounds to access. From Xin Long.
      
      5) Use built-in RCU list checking of hlist_for_each_entry_rcu
         to silence false lockdep warning in __xfrm6_tunnel_spi_lookup
         when CONFIG_PROVE_RCU_LIST is enabled. From Madhuparna Bhowmik.
      
      6) Fix a panic on esp offload when crypto is done asynchronously.
         From Xin Long.
      
      7) Fix a skb memory leak in an error path of vti6_rcv.
         From Torsten Hilbrich.
      
      8) Fix a race that can lead to a doulbe free in xfrm_policy_timer.
         From Xin Long.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e00dd941
    • L
      Merge branch 'parisc-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 69c5eea3
      Linus Torvalds 提交于
      Pull parsic fix from Helge Deller:
       "Fix a recursive loop when running 'make ARCH=parisc defconfig'"
      
      * 'parisc-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Fix defconfig selection
      69c5eea3
    • L
      Merge tag 'arm-soc-fixes-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 32db9f10
      Linus Torvalds 提交于
      Pull ARM DT and driver fixes from Arnd Bergmann:
       "For the devicetree files, there are a total of 20 patches, almost
        entirely for 32-bit machines:
      
         - The Allwinner/sun9i r40 SoC dtsi file contains a number of issues,
           both for correctness and for style that are addressed in separate
           patches. This causes most of the changed lines of the DT updates
           this time.
      
         - More Allwinner updates fixing the identification of the security
           system on sun8i/A33, a recent regression of the A83t ethernet, and
           a few board specific issues on the TBS-A711 macine.
      
         - Several bug fixes for OMAP dts files, most notably fixing the
           timings for the NAND flash on the Nokia N900 that regressed a while
           ago after the move to configuring them from DT. Some other OMAPs
           now set the correct dma limits on the L3 bus, and a regression fix
           addresses lost Ethernet on dm814x
      
         - One incorrect setting in the newly added Raspberry Pi Zero W that
           may cause issues with the SD card controller.
      
         - A missing property on the bcm2835 firmware node caused incorrect
           DMA settings.
      
         - An old bug on the oxnas platform causing spurious interrupts is
           finally addressed.
      
         - A regression on the Exynos Midas board broke the OLED panel power
           supply.
      
         - The i.MX6 phycore SoM specified the wrong voltage for the SoC, this
           is now set to the values from the datasheet.
      
         - Some 64-bit machines use a deprecated string to identify the PSCI
           firmware.
      
        There are also several small code fixes addressing mostly serious
        issues:
      
         - Fix the sunxi rsb bus access to no longer return incorrect data
           when mixing 8 and 16 bit I/O.
      
         - Fix a suspend/resume regression on the OMAP2+ lcdc from a missing
           quirk in the ti-sysc driver
      
         - Fix a NULL pointer access from a race in the fsl dpio driver
      
         - Fix a v5.5 regression in the exynos-chipid driver that caused an
           invalid error code probing the device on non-exynos platforms
      
         - Fix an out-of-bounds access in the AMD TEE driver"
      
      * tag 'arm-soc-fixes-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits)
        soc: samsung: chipid: Fix return value on non-Exynos platforms
        arm64: dts: Fix leftover entry-methods for PSCI
        ARM: dts: exynos: Fix regulator node aliasing on Midas-based boards
        ARM: dts: oxnas: Fix clear-mask property
        ARM: dts: bcm283x: Fix vc4's firmware bus DMA limitations
        ARM: dts: omap5: Add bus_dma_limit for L3 bus
        ARM: dts: omap4-droid4: Fix lost touchscreen interrupts
        ARM: dts: dra7: Add bus_dma_limit for L3 bus
        ARM: bcm2835-rpi-zero-w: Add missing pinctrl name
        ARM: dts: sun8i: a33: add the new SS compatible
        dt-bindings: crypto: add new compatible for A33 SS
        ARM: dts: sun8i: r40: Move SPI device nodes based on address order
        ARM: dts: sun8i: r40: Fix register base address for SPI2 and SPI3
        ARM: dts: sun8i: r40: Move AHCI device node based on address order
        ARM: dts: imx6: phycore-som: fix arm and soc minimum voltage
        soc: fsl: dpio: register dpio irq handlers after dpio create
        tee: amdtee: out of bounds read in find_session()
        ARM: dts: N900: fix onenand timings
        bus: ti-sysc: Fix quirk flags for lcdc on am335x
        ARM: dts: Fix dm814x Ethernet by changing to use rgmii-id mode
        ...
      32db9f10
    • L
      Merge tag 'riscv-for-linus-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 823846c3
      Linus Torvalds 提交于
      Pull RISC-V fixes from Palmer Dabbelt:
       "Sorry for the last minute patches, but a few things fell through the
        cracks recently. I was on the fence about sending a late pull request
        just for the M-mode fixes, as we don't really have any users, but the
        last patch fixes the build for Fedora which I consider pretty
        important.
      
        Given that the M-mode fixes should be very low risk, I figured it's
        worth sending them along as well.
      
        Thhis passes my standard 'boot in QEMU' test"
      
      * tag 'riscv-for-linus-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        RISC-V: Move all address space definition macros to one place
        RISC-V: Only select essential drivers for SOC_VIRT config
        riscv: fix the IPI missing issue in nommu mode
        riscv: uaccess should be used in nommu mode
      823846c3
    • L
      Merge tag 'devicetree-fixes-for-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · bb36d37e
      Linus Torvalds 提交于
      Pull Devicetree fix from Rob Herring:
       "A single fix for building dtc with GCC 10"
      
      * tag 'devicetree-fixes-for-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        scripts/dtc: Remove redundant YYLOC global declaration
      bb36d37e
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 1fa8cb0b
      Linus Torvalds 提交于
      Pull arm64 fix from Will Deacon:
       "Fix defconfig build when using Clang's integrated assembler"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: alternative: fix build with clang integrated assembler
      1fa8cb0b
    • L
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 527630fb
      Linus Torvalds 提交于
      Pull clk fixes from Stephen Boyd:
       "A handful of clk driver fixes.
      
        Mostly they're around the i.MX drivers fixing the parents of a few
        clks and making KASAN happy with how the message passing code works.
      
        Besides that we have a TI driver fix for the RTC parent and a fix for
        the basic gate type registration functions introduced this release
        where they didn't actually pass the arguments in the right places to
        the multiplexer function down below"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: imx: Align imx sc clock parent msg structs to 4
        clk: imx: Align imx sc clock msg structs to 4
        clk: Pass correct arguments to __clk_hw_register_gate()
        clk: ti: am43xx: Fix clock parent for RTC clock
        clk: imx8mp: Correct the enet_qos parent clock
        clk: imx8mp: Correct IMX8MP_CLK_HDMI_AXI clock parent
      527630fb
    • L
      Merge tag 'drm-fixes-2020-03-27' of git://anongit.freedesktop.org/drm/drm · 7bf8df68
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Pretty quiet: some minor sg mapping fixes for 3 drivers, and a single
        oops fix for the scheduler. I'm hoping nobody tries to send me a fixes
        pull today but I'll keep an eye out of the weekend.
      
        radeon/amdgpu/dma-buf:
         - sg list fixes
      
        scheduler:
         - oops fix"
      
      * tag 'drm-fixes-2020-03-27' of git://anongit.freedesktop.org/drm/drm:
        drm/scheduler: fix rare NULL ptr race
        drm/radeon: fix scatter-gather mapping with user pages
        drm/amdgpu: fix scatter-gather mapping with user pages
        drm/prime: use dma length macro when mapping sg
      7bf8df68
  5. 27 3月, 2020 8 次提交