1. 02 7月, 2021 1 次提交
  2. 06 5月, 2021 1 次提交
    • O
      mm,memory_hotplug: allocate memmap from the added memory range · a08a2ae3
      Oscar Salvador 提交于
      Physical memory hotadd has to allocate a memmap (struct page array) for
      the newly added memory section.  Currently, alloc_pages_node() is used
      for those allocations.
      
      This has some disadvantages:
       a) an existing memory is consumed for that purpose
          (eg: ~2MB per 128MB memory section on x86_64)
          This can even lead to extreme cases where system goes OOM because
          the physically hotplugged memory depletes the available memory before
          it is onlined.
       b) if the whole node is movable then we have off-node struct pages
          which has performance drawbacks.
       c) It might be there are no PMD_ALIGNED chunks so memmap array gets
          populated with base pages.
      
      This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.
      
      Vmemap page tables can map arbitrary memory.  That means that we can
      reserve a part of the physically hotadded memory to back vmemmap page
      tables.  This implementation uses the beginning of the hotplugged memory
      for that purpose.
      
      There are some non-obviously things to consider though.
      
      Vmemmap pages are allocated/freed during the memory hotplug events
      (add_memory_resource(), try_remove_memory()) when the memory is
      added/removed.  This means that the reserved physical range is not
      online although it is used.  The most obvious side effect is that
      pfn_to_online_page() returns NULL for those pfns.  The current design
      expects that this should be OK as the hotplugged memory is considered a
      garbage until it is onlined.  For example hibernation wouldn't save the
      content of those vmmemmaps into the image so it wouldn't be restored on
      resume but this should be OK as there no real content to recover anyway
      while metadata is reachable from other data structures (e.g.  vmemmap
      page tables).
      
      The reserved space is therefore (de)initialized during the {on,off}line
      events (mhp_{de}init_memmap_on_memory).  That is done by extracting page
      allocator independent initialization from the regular onlining path.
      The primary reason to handle the reserved space outside of
      {on,off}line_pages is to make each initialization specific to the
      purpose rather than special case them in a single function.
      
      As per above, the functions that are introduced are:
      
       - mhp_init_memmap_on_memory:
         Initializes vmemmap pages by calling move_pfn_range_to_zone(), calls
         kasan_add_zero_shadow(), and onlines as many sections as vmemmap pages
         fully span.
      
       - mhp_deinit_memmap_on_memory:
         Offlines as many sections as vmemmap pages fully span, removes the
         range from zhe zone by remove_pfn_range_from_zone(), and calls
         kasan_remove_zero_shadow() for the range.
      
      The new function memory_block_online() calls mhp_init_memmap_on_memory()
      before doing the actual online_pages().  Should online_pages() fail, we
      clean up by calling mhp_deinit_memmap_on_memory().  Adjusting of
      present_pages is done at the end once we know that online_pages()
      succedeed.
      
      On offline, memory_block_offline() needs to unaccount vmemmap pages from
      present_pages() before calling offline_pages().  This is necessary because
      offline_pages() tears down some structures based on the fact whether the
      node or the zone become empty.  If offline_pages() fails, we account back
      vmemmap pages.  If it succeeds, we call mhp_deinit_memmap_on_memory().
      
      Hot-remove:
      
       We need to be careful when removing memory, as adding and
       removing memory needs to be done with the same granularity.
       To check that this assumption is not violated, we check the
       memory range we want to remove and if a) any memory block has
       vmemmap pages and b) the range spans more than a single memory
       block, we scream out loud and refuse to proceed.
      
       If all is good and the range was using memmap on memory (aka vmemmap pages),
       we construct an altmap structure so free_hugepage_table does the right
       thing and calls vmem_altmap_free instead of free_pagetable.
      
      Link: https://lkml.kernel.org/r/20210421102701.25051-5-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a08a2ae3
  3. 27 2月, 2021 1 次提交
  4. 14 10月, 2020 2 次提交
    • D
      mm/memremap_pages: support multiple ranges per invocation · b7b3c01b
      Dan Williams 提交于
      In support of device-dax growing the ability to front physically
      dis-contiguous ranges of memory, update devm_memremap_pages() to track
      multiple ranges with a single reference counter and devm instance.
      
      Convert all [devm_]memremap_pages() users to specify the number of ranges
      they are mapping in their 'struct dev_pagemap' instance.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.co
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7b3c01b
    • D
      mm/memremap_pages: convert to 'struct range' · a4574f63
      Dan Williams 提交于
      The 'struct resource' in 'struct dev_pagemap' is only used for holding
      resource span information.  The other fields, 'name', 'flags', 'desc',
      'parent', 'sibling', and 'child' are all unused wasted space.
      
      This is in preparation for introducing a multi-range extension of
      devm_memremap_pages().
      
      The bulk of this change is unwinding all the places internal to libnvdimm
      that used 'struct resource' unnecessarily, and replacing instances of
      'struct dev_pagemap'.res with 'struct dev_pagemap'.range.
      
      P2PDMA had a minor usage of the resource flags field, but only to report
      failures with "%pR".  That is replaced with an open coded print of the
      range.
      
      [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
        Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadamSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	[xen]
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a4574f63
  5. 04 9月, 2020 1 次提交
  6. 08 4月, 2020 1 次提交
  7. 27 3月, 2020 1 次提交
  8. 21 2月, 2020 1 次提交
    • D
      mm/memremap_pages: Introduce memremap_compat_align() · 9ffc1d19
      Dan Williams 提交于
      The "sub-section memory hotplug" facility allows memremap_pages() users
      like libnvdimm to compensate for hardware platforms like x86 that have a
      section size larger than their hardware memory mapping granularity.  The
      compensation that sub-section support affords is being tolerant of
      physical memory resources shifting by units smaller (64MiB on x86) than
      the memory-hotplug section size (128 MiB). Where the platform
      physical-memory mapping granularity is limited by the number and
      capability of address-decode-registers in the memory controller.
      
      While the sub-section support allows memremap_pages() to operate on
      sub-section (2MiB) granularity, the Power architecture may still
      require 16MiB alignment on "!radix_enabled()" platforms.
      
      In order for libnvdimm to be able to detect and manage this per-arch
      limitation, introduce memremap_compat_align() as a common minimum
      alignment across all driver-facing memory-mapping interfaces, and let
      Power override it to 16MiB in the "!radix_enabled()" case.
      
      The assumption / requirement for 16MiB to be a viable
      memremap_compat_align() value is that Power does not have platforms
      where its equivalent of address-decode-registers never hardware remaps a
      persistent memory resource on smaller than 16MiB boundaries. Note that I
      tried my best to not add a new Kconfig symbol, but header include
      entanglements defeated the #ifndef memremap_compat_align design pattern
      and the need to export it defeats the __weak design pattern for arch
      overrides.
      
      Based on an initial patch by Aneesh.
      
      Link: http://lore.kernel.org/r/CAPcyv4gBGNP95APYaBcsocEa50tQj9b5h__83vgngjq3ouGX_Q@mail.gmail.comReported-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9ffc1d19
  9. 25 9月, 2019 1 次提交
    • A
      libnvdimm/altmap: Track namespace boundaries in altmap · cf387d96
      Aneesh Kumar K.V 提交于
      With PFN_MODE_PMEM namespace, the memmap area is allocated from the device
      area. Some architectures map the memmap area with large page size. On
      architectures like ppc64, 16MB page for memap mapping can map 262144 pfns.
      This maps a namespace size of 16G.
      
      When populating memmap region with 16MB page from the device area,
      make sure the allocated space is not used to map resources outside this
      namespace. Such usage of device area will prevent a namespace destroy.
      
      Add resource end pnf in altmap and use that to check if the memmap area
      allocation can map pfn outside the namespace. On ppc64 in such case we fallback
      to allocation from memory.
      
      This fix kernel crash reported below:
      
      [  132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0
      [  133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000
      [  133.464760] Faulting instruction address: 0xc00000000007580c
      [  133.464766] Oops: Kernel access of bad area, sig: 11 [#1]
      [  133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
      .....
      [  133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0
      [  133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0
      [  133.464910] Call Trace:
      [  133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable)
      [  133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240
      [  133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590
      [  133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160
      [  133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0
      [  133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50
      [  133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400
      [  133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250
      [  133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110
      [  133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70
      [  133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0
      [  133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250
      [  133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70
      [  133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250
      
      djbw: Aneesh notes that this crash can likely be triggered in any kernel that
      supports 'papr_scm', so flagging that commit for -stable consideration.
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Cc: <stable@vger.kernel.org>
      Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reviewed-by: NPankaj Gupta <pagupta@redhat.com>
      Tested-by: NSantosh Sivaraj <santosh@fossix.org>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      cf387d96
  10. 20 8月, 2019 2 次提交
  11. 16 8月, 2019 1 次提交
  12. 03 7月, 2019 8 次提交
  13. 14 6月, 2019 2 次提交
  14. 29 12月, 2018 2 次提交
  15. 18 10月, 2018 1 次提交
    • L
      PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset · 977196b8
      Logan Gunthorpe 提交于
      The DMA address used when mapping PCI P2P memory must be the PCI bus
      address.  Thus, introduce pci_p2pmem_map_sg() to map the correct addresses
      when using P2P memory.
      
      Memory mapped in this way does not need to be unmapped and thus if we
      provided pci_p2pmem_unmap_sg() it would be empty.  This breaks the expected
      balance between map/unmap but was left out as an empty function doesn't
      really provide any benefit.  In the future, if this call becomes necessary
      it can be added without much difficulty.
      
      For this, we assume that an SGL passed to these functions contain all P2P
      memory or no P2P memory.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      977196b8
  16. 11 10月, 2018 1 次提交
    • L
      PCI/P2PDMA: Support peer-to-peer memory · 52916982
      Logan Gunthorpe 提交于
      Some PCI devices may have memory mapped in a BAR space that's intended for
      use in peer-to-peer transactions.  To enable such transactions the memory
      must be registered with ZONE_DEVICE pages so it can be used by DMA
      interfaces in existing drivers.
      
      Add an interface for other subsystems to find and allocate chunks of P2P
      memory as necessary to facilitate transfers between two PCI peers:
      
        struct pci_dev *pci_p2pmem_find[_many]();
        int pci_p2pdma_distance[_many]();
        void *pci_alloc_p2pmem();
      
      The new interface requires a driver to collect a list of client devices
      involved in the transaction then call pci_p2pmem_find() to obtain any
      suitable P2P memory.  Alternatively, if the caller knows a device which
      provides P2P memory, they can use pci_p2pdma_distance() to determine if it
      is usable.  With a suitable p2pmem device, memory can then be allocated
      with pci_alloc_p2pmem() for use in DMA transactions.
      
      Depending on hardware, using peer-to-peer memory may reduce the bandwidth
      of the transfer but can significantly reduce pressure on system memory.
      This may be desirable in many cases: for example a system could be designed
      with a small CPU connected to a PCIe switch by a small number of lanes
      which would maximize the number of lanes available to connect to NVMe
      devices.
      
      The code is designed to only utilize the p2pmem device if all the devices
      involved in a transfer are behind the same PCI bridge.  This is because we
      have no way of knowing whether peer-to-peer routing between PCIe Root Ports
      is supported (PCIe r4.0, sec 1.3.1).  Additionally, the benefits of P2P
      transfers that go through the RC is limited to only reducing DRAM usage
      and, in some cases, coding convenience.  The PCI-SIG may be exploring
      adding a new capability bit to advertise whether this is possible for
      future hardware.
      
      This commit includes significant rework and feedback from Christoph
      Hellwig.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      [bhelgaas: fold in fix from Keith Busch <keith.busch@intel.com>:
      https://lore.kernel.org/linux-pci/20181012155920.15418-1-keith.busch@intel.com,
      to address comment from Dan Carpenter <dan.carpenter@oracle.com>, fold in
      https://lore.kernel.org/linux-pci/20181017160510.17926-1-logang@deltatee.com]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      52916982
  17. 22 5月, 2018 1 次提交
    • D
      mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS · e7638488
      Dan Williams 提交于
      In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
      be able to rely on the fact that they will get wakeups on dev_pagemap
      page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
      generic_dax_page_free() as common indicator / infrastructure for dax
      filesytems to require. With this change there are no users of the
      MEMORY_DEVICE_HOST designation, so remove it.
      
      The HMM sub-system extended dev_pagemap to arrange a callback when a
      dev_pagemap managed page is freed. Since a dev_pagemap page is free /
      idle when its reference count is 1 it requires an additional branch to
      check the page-type at put_page() time. Given put_page() is a hot-path
      we do not want to incur that check if HMM is not in use, so a static
      branch is used to avoid that overhead when not necessary.
      
      Now, the FS_DAX implementation wants to reuse this mechanism for
      receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
      static-key into a generic mechanism that either HMM or FS_DAX code paths
      can enable.
      
      For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
      care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
      However, we still need to support FS_DAX in the FS_DAX_LIMITED case
      implemented by the s390/dcssblk driver.
      
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Reported-by: NThomas Meyer <thomas@m3y3r.de>
      Reported-by: NDave Jiang <dave.jiang@intel.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e7638488
  18. 17 4月, 2018 1 次提交
  19. 09 1月, 2018 5 次提交
  20. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  21. 09 9月, 2017 4 次提交
    • J
      mm/hmm: avoid bloating arch that do not make use of HMM · 6b368cd4
      Jérôme Glisse 提交于
      This moves all new code including new page migration helper behind kernel
      Kconfig option so that there is no codee bloat for arch or user that do
      not want to use HMM or any of its associated features.
      
      arm allyesconfig (without all the patchset, then with and this patch):
         text	   data	    bss	    dec	    hex	filename
      83721896	46511131	27582964	157815991	96814b7	../without/vmlinux
      83722364	46511131	27582964	157816459	968168b	vmlinux
      
      [jglisse@redhat.com: struct hmm is only use by HMM mirror functionality]
        Link: http://lkml.kernel.org/r/20170825213133.27286-1-jglisse@redhat.com
      [sfr@canb.auug.org.au: fix build (arm multi_v7_defconfig)]
        Link: http://lkml.kernel.org/r/20170828181849.323ab81b@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170818032858.7447-1-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b368cd4
    • J
      mm/device-public-memory: device memory cache coherent with CPU · df6ad698
      Jérôme Glisse 提交于
      Platform with advance system bus (like CAPI or CCIX) allow device memory
      to be accessible from CPU in a cache coherent fashion.  Add a new type of
      ZONE_DEVICE to represent such memory.  The use case are the same as for
      the un-addressable device memory but without all the corners cases.
      
      Link: http://lkml.kernel.org/r/20170817000548.32038-19-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mark Hairgrove <mhairgrove@nvidia.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Sherry Cheung <SCheung@nvidia.com>
      Cc: Subhash Gutti <sgutti@nvidia.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df6ad698
    • J
      mm/ZONE_DEVICE: special case put_page() for device private pages · 7b2d55d2
      Jérôme Glisse 提交于
      A ZONE_DEVICE page that reach a refcount of 1 is free ie no longer have
      any user.  For device private pages this is important to catch and thus we
      need to special case put_page() for this.
      
      Link: http://lkml.kernel.org/r/20170817000548.32038-9-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Mark Hairgrove <mhairgrove@nvidia.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Sherry Cheung <SCheung@nvidia.com>
      Cc: Subhash Gutti <sgutti@nvidia.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b2d55d2
    • J
      mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory · 5042db43
      Jérôme Glisse 提交于
      HMM (heterogeneous memory management) need struct page to support
      migration from system main memory to device memory.  Reasons for HMM and
      migration to device memory is explained with HMM core patch.
      
      This patch deals with device memory that is un-addressable memory (ie CPU
      can not access it).  Hence we do not want those struct page to be manage
      like regular memory.  That is why we extend ZONE_DEVICE to support
      different types of memory.
      
      A persistent memory type is define for existing user of ZONE_DEVICE and a
      new device un-addressable type is added for the un-addressable memory
      type.  There is a clear separation between what is expected from each
      memory type and existing user of ZONE_DEVICE are un-affected by new
      requirement and new use of the un-addressable type.  All specific code
      path are protect with test against the memory type.
      
      Because memory is un-addressable we use a new special swap type for when a
      page is migrated to device memory (this reduces the number of maximum swap
      file).
      
      The main two additions beside memory type to ZONE_DEVICE is two callbacks.
      First one, page_free() is call whenever page refcount reach 1 (which
      means the page is free as ZONE_DEVICE page never reach a refcount of 0).
      This allow device driver to manage its memory and associated struct page.
      
      The second callback page_fault() happens when there is a CPU access to an
      address that is back by a device page (which are un-addressable by the
      CPU).  This callback is responsible to migrate the page back to system
      main memory.  Device driver can not block migration back to system memory,
      HMM make sure that such page can not be pin into device memory.
      
      If device is in some error condition and can not migrate memory back then
      a CPU page fault to device memory should end with SIGBUS.
      
      [arnd@arndb.de: fix warning]
        Link: http://lkml.kernel.org/r/20170823133213.712917-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/20170817000548.32038-8-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mark Hairgrove <mhairgrove@nvidia.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Sherry Cheung <SCheung@nvidia.com>
      Cc: Subhash Gutti <sgutti@nvidia.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5042db43
  22. 29 7月, 2016 1 次提交