1. 30 11月, 2017 12 次提交
    • Y
      kmemleak: add scheduling point to kmemleak_scan() · bde5f6bc
      Yisheng Xie 提交于
      kmemleak_scan() will scan struct page for each node and it can be really
      large and resulting in a soft lockup.  We have seen a soft lockup when
      do scan while compile kernel:
      
        watchdog: BUG: soft lockup - CPU#53 stuck for 22s! [bash:10287]
       [...]
        Call Trace:
         kmemleak_scan+0x21a/0x4c0
         kmemleak_write+0x312/0x350
         full_proxy_write+0x5a/0xa0
         __vfs_write+0x33/0x150
         vfs_write+0xad/0x1a0
         SyS_write+0x52/0xc0
         do_syscall_64+0x61/0x1a0
         entry_SYSCALL64_slow_path+0x25/0x25
      
      Fix this by adding cond_resched every MAX_SCAN_SIZE.
      
      Link: http://lkml.kernel.org/r/1511439788-20099-1-git-send-email-xieyisheng1@huawei.comSigned-off-by: NYisheng Xie <xieyisheng1@huawei.com>
      Suggested-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bde5f6bc
    • M
      Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical" · 90daf306
      Michal Hocko 提交于
      This reverts commit 0f6d24f8 ("mm/page-writeback.c: print a warning
      if the vm dirtiness settings are illogical") because it causes false
      positive warnings during OOM situations as noticed by Tetsuo Handa:
      
        Node 0 active_anon:3525940kB inactive_anon:8372kB active_file:216kB inactive_file:1872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2504kB dirty:52kB writeback:0kB shmem:8660kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 636928kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
        Node 0 DMA free:14848kB min:284kB low:352kB high:420kB active_anon:992kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
        lowmem_reserve[]: 0 2687 3645 3645
        Node 0 DMA32 free:53004kB min:49608kB low:62008kB high:74408kB active_anon:2712648kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2773132kB mlocked:0kB kernel_stack:96kB pagetables:5096kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
        lowmem_reserve[]: 0 0 958 958
        Node 0 Normal free:17140kB min:17684kB low:22104kB high:26524kB active_anon:812300kB inactive_anon:8372kB active_file:1228kB inactive_file:1868kB unevictable:0kB writepending:52kB present:1048576kB managed:981224kB mlocked:0kB kernel_stack:3520kB pagetables:8552kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB
        lowmem_reserve[]: 0 0 0 0
        [...]
        Out of memory: Kill process 8459 (a.out) score 999 or sacrifice child
        Killed process 8459 (a.out) total-vm:4180kB, anon-rss:88kB, file-rss:0kB, shmem-rss:0kB
        oom_reaper: reaped process 8459 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        vm direct limit must be set greater than background limit.
      
      The problem is that both thresh and bg_thresh will be 0 if
      available_memory is less than 4 pages when evaluating
      global_dirtyable_memory.
      
      While this might be worked around the whole point of the warning is
      dubious at best.  We do rely on admins to do sensible things when
      changing tunable knobs.  Dirty memory writeback knobs are not any
      special in that regards so revert the warning rather than adding more
      hacks to work this around.
      
      Debugged by Yafang Shao.
      
      Link: http://lkml.kernel.org/r/20171127091939.tahb77nznytcxw55@dhcp22.suse.cz
      Fixes: 0f6d24f8 ("mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90daf306
    • C
      mm/madvise.c: fix madvise() infinite loop under special circumstances · 6ea8d958
      chenjie 提交于
      MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
      Unfortunately madvise_willneed() doesn't communicate this information
      properly to the generic madvise syscall implementation.  The calling
      convention is quite subtle there.  madvise_vma() is supposed to either
      return an error or update &prev otherwise the main loop will never
      advance to the next vma and it will keep looping for ever without a way
      to get out of the kernel.
      
      It seems this has been broken since introduction.  Nobody has noticed
      because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
      
      [mhocko@suse.com: rewrite changelog]
      Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com
      Fixes: fe77ba6f ("[PATCH] xip: madvice/fadvice: execute in place")
      Signed-off-by: Nchenjie <chenjie6@huawei.com>
      Signed-off-by: Nguoxuenan <guoxuenan@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: zhangyi (F) <yi.zhang@huawei.com>
      Cc: Miao Xie <miaoxie@huawei.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ea8d958
    • D
      mm: fail get_vaddr_frames() for filesystem-dax mappings · b7f0554a
      Dan Williams 提交于
      Until there is a solution to the dma-to-dax vs truncate problem it is
      not safe to allow V4L2, Exynos, and other frame vector users to create
      long standing / irrevocable memory registrations against filesytem-dax
      vmas.
      
      [dan.j.williams@intel.com: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan]
        Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Inki Dae <inki.dae@samsung.com>
      Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
      Cc: Joonyoung Shim <jy0922.shim@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7f0554a
    • D
      mm: introduce get_user_pages_longterm · 2bb6d283
      Dan Williams 提交于
      Patch series "introduce get_user_pages_longterm()", v2.
      
      Here is a new get_user_pages api for cases where a driver intends to
      keep an elevated page count indefinitely.  This is distinct from usages
      like iov_iter_get_pages where the elevated page counts are transient.
      The iov_iter_get_pages cases immediately turn around and submit the
      pages to a device driver which will put_page when the i/o operation
      completes (under kernel control).
      
      In the longterm case userspace is responsible for dropping the page
      reference at some undefined point in the future.  This is untenable for
      filesystem-dax case where the filesystem is in control of the lifetime
      of the block / page and needs reasonable limits on how long it can wait
      for pages in a mapping to become idle.
      
      Fixing filesystems to actually wait for dax pages to be idle before
      blocks from a truncate/hole-punch operation are repurposed is saved for
      a later patch series.
      
      Also, allowing longterm registration of dax mappings is a future patch
      series that introduces a "map with lease" semantic where the kernel can
      revoke a lease and force userspace to drop its page references.
      
      I have also tagged these for -stable to purposely break cases that might
      assume that longterm memory registrations for filesystem-dax mappings
      were supported by the kernel.  The behavior regression this policy
      change implies is one of the reasons we maintain the "dax enabled.
      Warning: EXPERIMENTAL, use at your own risk" notification when mounting
      a filesystem in dax mode.
      
      It is worth noting the device-dax interface does not suffer the same
      constraints since it does not support file space management operations
      like hole-punch.
      
      This patch (of 4):
      
      Until there is a solution to the dma-to-dax vs truncate problem it is
      not safe to allow long standing memory registrations against
      filesytem-dax vmas.  Device-dax vmas do not have this problem and are
      explicitly allowed.
      
      This is temporary until a "memory registration with layout-lease"
      mechanism can be implemented for the affected sub-systems (RDMA and
      V4L2).
      
      [akpm@linux-foundation.org: use kcalloc()]
      Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Cc: Inki Dae <inki.dae@samsung.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Joonyoung Shim <jy0922.shim@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2bb6d283
    • D
      mm, hugetlbfs: introduce ->split() to vm_operations_struct · 31383c68
      Dan Williams 提交于
      Patch series "device-dax: fix unaligned munmap handling"
      
      When device-dax is operating in huge-page mode we want it to behave like
      hugetlbfs and fail attempts to split vmas into unaligned ranges.  It
      would be messy to teach the munmap path about device-dax alignment
      constraints in the same (hstate) way that hugetlbfs communicates this
      constraint.  Instead, these patches introduce a new ->split() vm
      operation.
      
      This patch (of 2):
      
      The device-dax interface has similar constraints as hugetlbfs in that it
      requires the munmap path to unmap in huge page aligned units.  Rather
      than add more custom vma handling code in __split_vma() introduce a new
      vm operation to perform this vma specific check.
      
      Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: dee41079 ("/dev/dax, core: file operations and dax-mmap")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31383c68
    • D
      mm: replace pte_write with pte_access_permitted in fault + gup paths · 5c9d2d5c
      Dan Williams 提交于
      The 'access_permitted' helper is used in the gup-fast path and goes
      beyond the simple _PAGE_RW check to also:
      
       - validate that the mapping is writable from a protection keys
         standpoint
      
       - validate that the pte has _PAGE_USER set since all fault paths where
         pte_write is must be referencing user-memory.
      
      Link: http://lkml.kernel.org/r/151043111604.2842.8051684481794973100.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9d2d5c
    • D
      mm: replace pmd_write with pmd_access_permitted in fault + gup paths · c7da82b8
      Dan Williams 提交于
      The 'access_permitted' helper is used in the gup-fast path and goes
      beyond the simple _PAGE_RW check to also:
      
       - validate that the mapping is writable from a protection keys
         standpoint
      
       - validate that the pte has _PAGE_USER set since all fault paths where
         pmd_write is must be referencing user-memory.
      
      Link: http://lkml.kernel.org/r/151043111049.2842.15241454964150083466.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7da82b8
    • D
      mm: replace pud_write with pud_access_permitted in fault + gup paths · e7fe7b5c
      Dan Williams 提交于
      The 'access_permitted' helper is used in the gup-fast path and goes
      beyond the simple _PAGE_RW check to also:
      
       - validate that the mapping is writable from a protection keys
         standpoint
      
       - validate that the pte has _PAGE_USER set since all fault paths where
         pud_write is must be referencing user-memory.
      
      [dan.j.williams@intel.com: fix powerpc compile error]
        Link: http://lkml.kernel.org/r/151129127237.37405.16073414520854722485.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: http://lkml.kernel.org/r/151043110453.2842.2166049702068628177.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7fe7b5c
    • M
      mm/cma: fix alloc_contig_range ret code/potential leak · 63cd4489
      Mike Kravetz 提交于
      If the call __alloc_contig_migrate_range() in alloc_contig_range returns
      -EBUSY, processing continues so that test_pages_isolated() is called
      where there is a tracepoint to identify the busy pages.  However, it is
      possible for busy pages to become available between the calls to these
      two routines.  In this case, the range of pages may be allocated.
      Unfortunately, the original return code (ret == -EBUSY) is still set and
      returned to the caller.  Therefore, the caller believes the pages were
      not allocated and they are leaked.
      
      Update the comment to indicate that allocation is still possible even if
      __alloc_contig_migrate_range returns -EBUSY.  Also, clear return code in
      this case so that it is not accidentally used or returned to caller.
      
      Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com
      Fixes: 8ef5849f ("mm/cma: always check which page caused allocation failure")
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63cd4489
    • W
      mm, oom_reaper: gather each vma to prevent leaking TLB entry · 687cb088
      Wang Nan 提交于
      tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory
      space.  In this case, tlb->fullmm is true.  Some archs like arm64
      doesn't flush TLB when tlb->fullmm is true:
      
        commit 5a7862e8 ("arm64: tlbflush: avoid flushing when fullmm == 1").
      
      Which causes leaking of tlb entries.
      
      Will clarifies his patch:
       "Basically, we tag each address space with an ASID (PCID on x86) which
        is resident in the TLB. This means we can elide TLB invalidation when
        pulling down a full mm because we won't ever assign that ASID to
        another mm without doing TLB invalidation elsewhere (which actually
        just nukes the whole TLB).
      
        I think that means that we could potentially not fault on a kernel
        uaccess, because we could hit in the TLB"
      
      There could be a window between complete_signal() sending IPI to other
      cores and all threads sharing this mm are really kicked off from cores.
      In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to
      flush TLB then frees pages.  However, due to the above problem, the TLB
      entries are not really flushed on arm64.  Other threads are possible to
      access these pages through TLB entries.  Moreover, a copy_to_user() can
      also write to these pages without generating page fault, causes
      use-after-free bugs.
      
      This patch gathers each vma instead of gathering full vm space.  In this
      case tlb->fullmm is not true.  The behavior of oom reaper become similar
      to munmapping before do_exit, which should be safe for all archs.
      
      Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com
      Fixes: aac45363 ("mm, oom: introduce oom reaper")
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      687cb088
    • M
      mm, memory_hotplug: do not back off draining pcp free pages from kworker context · 4b81cb2f
      Michal Hocko 提交于
      drain_all_pages backs off when called from a kworker context since
      commit 0ccce3b9 ("mm, page_alloc: drain per-cpu pages from workqueue
      context") because the original IPI based pcp draining has been replaced
      by a WQ based one and the check wanted to prevent from recursion and
      inter workers dependencies.  This has made some sense at the time
      because the system WQ has been used and one worker holding the lock
      could be blocked while waiting for new workers to emerge which can be a
      problem under OOM conditions.
      
      Since then commit ce612879 ("mm: move pcp and lru-pcp draining into
      single wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a
      rescuer so we shouldn't depend on any other WQ activity to make a
      forward progress so calling drain_all_pages from a worker context is
      safe as long as this doesn't happen from mm_percpu_wq itself which is
      not the case because all workers are required to _not_ depend on any MM
      locks.
      
      Why is this a problem in the first place? ACPI driven memory hot-remove
      (acpi_device_hotplug) is executed from the worker context.  We end up
      calling __offline_pages to free all the pages and that requires both
      lru_add_drain_all_cpuslocked and drain_all_pages to do their job
      otherwise we can have dangling pages on pcp lists and fail the offline
      operation (__test_page_isolated_in_pageblock would see a page with 0 ref
      count but without PageBuddy set).
      
      Fix the issue by removing the worker check in drain_all_pages.
      lru_add_drain_all_cpuslocked doesn't have this restriction so it works
      as expected.
      
      Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org
      Fixes: 0ccce3b9 ("mm, page_alloc: drain per-cpu pages from workqueue context")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>	[4.11+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b81cb2f
  2. 28 11月, 2017 3 次提交
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
    • K
      mm, thp: Do not make pmd/pud dirty without a reason · 152e93af
      Kirill A. Shutemov 提交于
      Currently we make page table entries dirty all the time regardless of
      access type and don't even consider if the mapping is write-protected.
      The reasoning is that we don't really need dirty tracking on THP and
      making the entry dirty upfront may save some time on first write to the
      page.
      
      Unfortunately, such approach may result in false-positive
      can_follow_write_pmd() for huge zero page or read-only shmem file.
      
      Let's only make page dirty only if we about to write to the page anyway
      (as we do for small pages).
      
      I've restructured the code to make entry dirty inside
      maybe_p[mu]d_mkwrite(). It also takes into account if the vma is
      write-protected.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      152e93af
    • K
      mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() · a8f97366
      Kirill A. Shutemov 提交于
      Currently, we unconditionally make page table dirty in touch_pmd().
      It may result in false-positive can_follow_write_pmd().
      
      We may avoid the situation, if we would only make the page table entry
      dirty if caller asks for write access -- FOLL_WRITE.
      
      The patch also changes touch_pud() in the same way.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8f97366
  3. 22 11月, 2017 1 次提交
    • K
      block/laptop_mode: Convert timers to use timer_setup() · bca237a5
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: linux-block@vger.kernel.org
      Cc: linux-mm@kvack.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      bca237a5
  4. 18 11月, 2017 8 次提交
  5. 16 11月, 2017 16 次提交