1. 07 10月, 2019 2 次提交
    • S
      smb3: cleanup some recent endian errors spotted by updated sparse · 52870d50
      Steve French 提交于
      Now that sparse has been fixed, it spotted a couple recent minor
      endian errors (and removed one additional sparse warning).
      
      Thanks to Luc Van Oostenryck for his help fixing sparse.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      52870d50
    • L
      elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings · b212921b
      Linus Torvalds 提交于
      In commit 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") we
      changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
      executable mappings.
      
      Then, people reported that it broke some binaries that had overlapping
      segments from the same file, and commit ad55eac7 ("elf: enforce
      MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
      overlaying elf segment cases.  But only some - despite the summary line
      of that commit, it only did it when it also does a temporary brk vma for
      one obvious overlapping case.
      
      Now Russell King reports another overlapping case with old 32-bit x86
      binaries, which doesn't trigger that limited case.  End result: we had
      better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.
      
      Yes, it's a sign of old binaries generated with old tool-chains, but we
      do pride ourselves on not breaking existing setups.
      
      This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
      and the old load_elf_library() use-cases, because nobody has reported
      breakage for those. Yet.
      
      Note that in all the cases seen so far, the overlapping elf sections
      seem to be just re-mapping of the same executable with different section
      attributes.  We could possibly introduce a new MAP_FIXED_NOFILECHANGE
      flag or similar, which acts like NOREPLACE, but allows just remapping
      the same executable file using different protection flags.
      
      It's not clear that would make a huge difference to anything, but if
      people really hate that "elf remaps over previous maps" behavior, maybe
      at least a more limited form of remapping would alleviate some concerns.
      
      Alternatively, we should take a look at our elf_map() logic to see if we
      end up not mapping things properly the first time.
      
      In the meantime, this is the minimal "don't do that then" patch while
      people hopefully think about it more.
      Reported-by: NRussell King <linux@armlinux.org.uk>
      Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map")
      Fixes: ad55eac7 ("elf: enforce  MAP_FIXED on overlaying elf segments")
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b212921b
  2. 06 10月, 2019 2 次提交
    • L
      Make filldir[64]() verify the directory entry filename is valid · 8a23eb80
      Linus Torvalds 提交于
      This has been discussed several times, and now filesystem people are
      talking about doing it individually at the filesystem layer, so head
      that off at the pass and just do it in getdents{64}().
      
      This is partially based on a patch by Jann Horn, but checks for NUL
      bytes as well, and somewhat simplified.
      
      There's also commentary about how it might be better if invalid names
      due to filesystem corruption don't cause an immediate failure, but only
      an error at the end of the readdir(), so that people can still see the
      filenames that are ok.
      
      There's also been discussion about just how much POSIX strictly speaking
      requires this since it's about filesystem corruption.  It's really more
      "protect user space from bad behavior" as pointed out by Jann.  But
      since Eric Biederman looked up the POSIX wording, here it is for context:
      
       "From readdir:
      
         The readdir() function shall return a pointer to a structure
         representing the directory entry at the current position in the
         directory stream specified by the argument dirp, and position the
         directory stream at the next entry. It shall return a null pointer
         upon reaching the end of the directory stream. The structure dirent
         defined in the <dirent.h> header describes a directory entry.
      
        From definitions:
      
         3.129 Directory Entry (or Link)
      
         An object that associates a filename with a file. Several directory
         entries can associate names with the same file.
      
        ...
      
         3.169 Filename
      
         A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
         characters composing the name may be selected from the set of all
         character values excluding the slash character and the null byte. The
         filenames dot and dot-dot have special meaning. A filename is
         sometimes referred to as a 'pathname component'."
      
      Note that I didn't bother adding the checks to any legacy interfaces
      that nobody uses.
      
      Also note that if this ends up being noticeable as a performance
      regression, we can fix that to do a much more optimized model that
      checks for both NUL and '/' at the same time one word at a time.
      
      We haven't really tended to optimize 'memchr()', and it only checks for
      one pattern at a time anyway, and we really _should_ check for NUL too
      (but see the comment about "soft errors" in the code about why it
      currently only checks for '/')
      
      See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
      lookup code looks for pathname terminating characters in parallel.
      
      Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jann Horn <jannh@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a23eb80
    • L
      Convert filldir[64]() from __put_user() to unsafe_put_user() · 9f79b78e
      Linus Torvalds 提交于
      We really should avoid the "__{get,put}_user()" functions entirely,
      because they can easily be mis-used and the original intent of being
      used for simple direct user accesses no longer holds in a post-SMAP/PAN
      world.
      
      Manually optimizing away the user access range check makes no sense any
      more, when the range check is generally much cheaper than the "enable
      user accesses" code that the __{get,put}_user() functions still need.
      
      So instead of __put_user(), use the unsafe_put_user() interface with
      user_access_{begin,end}() that really does generate better code these
      days, and which is generally a nicer interface.  Under some loads, the
      multiple user writes that filldir() does are actually quite noticeable.
      
      This also makes the dirent name copy use unsafe_put_user() with a couple
      of macros.  We do not want to make function calls with SMAP/PAN
      disabled, and the code this generates is quite good when the
      architecture uses "asm goto" for unsafe_put_user() like x86 does.
      
      Note that this doesn't bother with the legacy cases.  Nobody should use
      them anyway, so performance doesn't really matter there.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f79b78e
  3. 04 10月, 2019 1 次提交
    • E
      vfs: Fix EOVERFLOW testing in put_compat_statfs64 · cc3a7bfe
      Eric Sandeen 提交于
      Today, put_compat_statfs64() disallows nearly any field value over
      2^32 if f_bsize is only 32 bits, but that makes no sense.
      compat_statfs64 is there for the explicit purpose of providing 64-bit
      fields for f_files, f_ffree, etc.  And f_bsize is always only 32 bits.
      
      As a result, 32-bit userspace gets -EOVERFLOW for i.e.  large file
      counts even with -D_FILE_OFFSET_BITS=64 set.
      
      In reality, only f_bsize and f_frsize can legitimately overflow
      (fields like f_type and f_namelen should never be large), so test
      only those fields.
      
      This bug was discussed at length some time ago, and this is the proposal
      Al suggested at https://lkml.org/lkml/2018/8/6/640.  It seemed to get
      dropped amid the discussion of other related changes, but this
      part seems obviously correct on its own, so I've picked it up and
      sent it, for expediency.
      
      Fixes: 64d2ab32 ("vfs: fix put_compat_statfs64() does not handle errors")
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc3a7bfe
  4. 01 10月, 2019 4 次提交
  5. 30 9月, 2019 1 次提交
    • L
      Revert "Revert "ext4: make __ext4_get_inode_loc plug"" · 02f03c42
      Linus Torvalds 提交于
      This reverts commit 72dbcf72.
      
      Instead of waiting forever for entropy that may just not happen, we now
      try to actively generate entropy when required, and are thus hopefully
      avoiding the problem that caused the nice ext4 IO pattern fix to be
      reverted.
      
      So revert the revert.
      
      Cc: Ahmed S. Darwish <darwish.07@gmail.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Alexander E. Patrakov <patrakov@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02f03c42
  6. 27 9月, 2019 8 次提交
  7. 26 9月, 2019 16 次提交
  8. 25 9月, 2019 6 次提交
    • M
      sched/membarrier: Fix p->mm->membarrier_state racy load · 227a4aad
      Mathieu Desnoyers 提交于
      The membarrier_state field is located within the mm_struct, which
      is not guaranteed to exist when used from runqueue-lock-free iteration
      on runqueues by the membarrier system call.
      
      Copy the membarrier_state from the mm_struct into the scheduler runqueue
      when the scheduler switches between mm.
      
      When registering membarrier for mm, after setting the registration bit
      in the mm membarrier state, issue a synchronize_rcu() to ensure the
      scheduler observes the change. In order to take care of the case
      where a runqueue keeps executing the target mm without swapping to
      other mm, iterate over each runqueue and issue an IPI to copy the
      membarrier_state from the mm_struct into each runqueue which have the
      same mm which state has just been modified.
      
      Move the mm membarrier_state field closer to pgd in mm_struct to use
      a cache line already touched by the scheduler switch_mm.
      
      The membarrier_execve() (now membarrier_exec_mmap) hook now needs to
      clear the runqueue's membarrier state in addition to clear the mm
      membarrier state, so move its implementation into the scheduler
      membarrier code so it can access the runqueue structure.
      
      Add memory barrier in membarrier_exec_mmap() prior to clearing
      the membarrier state, ensuring memory accesses executed prior to exec
      are not reordered with the stores clearing the membarrier state.
      
      As suggested by Linus, move all membarrier.c RCU read-side locks outside
      of the for each cpu loops.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190919173705.2181-5-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      227a4aad
    • Q
      btrfs: Fix a regression which we can't convert to SINGLE profile · fab27359
      Qu Wenruo 提交于
      [BUG]
      With v5.3 kernel, we can't convert to SINGLE profile:
      
        # btrfs balance start -f -dconvert=single $mnt
        ERROR: error during balancing '/mnt/btrfs': Invalid argument
        # dmesg -t | tail
        validate_convert_profile: data profile=0x1000000000000 allowed=0x20 is_valid=1 final=0x1000000000000 ret=1
        BTRFS error (device dm-3): balance: invalid convert data profile single
      
      [CAUSE]
      With the extra debug output added, it shows that the @allowed bit is
      lacking the special in-memory only SINGLE profile bit.
      
      Thus we fail at that (profile & ~allowed) check.
      
      This regression is caused by commit 081db89b ("btrfs: use raid_attr
      to get allowed profiles for balance conversion") and the fact that we
      don't use any bit to indicate SINGLE profile on-disk, but uses special
      in-memory only bit to help distinguish different profiles.
      
      [FIX]
      Add that BTRFS_AVAIL_ALLOC_BIT_SINGLE to @allowed, so the code should be
      the same as it was and fix the regression.
      Reported-by: NChris Murphy <lists@colorremedies.com>
      Fixes: 081db89b ("btrfs: use raid_attr to get allowed profiles for balance conversion")
      CC: stable@vger.kernel.org # 5.3+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fab27359
    • Q
      btrfs: relocation: fix use-after-free on dead relocation roots · 1fac4a54
      Qu Wenruo 提交于
      [BUG]
      One user reported a reproducible KASAN report about use-after-free:
      
        BTRFS info (device sdi1): balance: start -dvrange=1256811659264..1256811659265
        BTRFS info (device sdi1): relocating block group 1256811659264 flags data|raid0
        ==================================================================
        BUG: KASAN: use-after-free in btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
        Write of size 8 at addr ffff88856f671710 by task kworker/u24:10/261579
      
        CPU: 2 PID: 261579 Comm: kworker/u24:10 Tainted: P           OE     5.2.11-arch1-1-kasan #4
        Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018
        Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
        Call Trace:
         dump_stack+0x7b/0xba
         print_address_description+0x6c/0x22e
         ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
         __kasan_report.cold+0x1b/0x3b
         ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
         kasan_report+0x12/0x17
         __asan_report_store8_noabort+0x17/0x20
         btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
         record_root_in_trans+0x2a0/0x370 [btrfs]
         btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
         start_transaction+0x1ab/0xe90 [btrfs]
         btrfs_join_transaction+0x1d/0x20 [btrfs]
         btrfs_finish_ordered_io+0x7bf/0x18a0 [btrfs]
         ? lock_repin_lock+0x400/0x400
         ? __kmem_cache_shutdown.cold+0x140/0x1ad
         ? btrfs_unlink_subvol+0x9b0/0x9b0 [btrfs]
         finish_ordered_fn+0x15/0x20 [btrfs]
         normal_work_helper+0x1bd/0xca0 [btrfs]
         ? process_one_work+0x819/0x1720
         ? kasan_check_read+0x11/0x20
         btrfs_endio_write_helper+0x12/0x20 [btrfs]
         process_one_work+0x8c9/0x1720
         ? pwq_dec_nr_in_flight+0x2f0/0x2f0
         ? worker_thread+0x1d9/0x1030
         worker_thread+0x98/0x1030
         kthread+0x2bb/0x3b0
         ? process_one_work+0x1720/0x1720
         ? kthread_park+0x120/0x120
         ret_from_fork+0x35/0x40
      
        Allocated by task 369692:
         __kasan_kmalloc.part.0+0x44/0xc0
         __kasan_kmalloc.constprop.0+0xba/0xc0
         kasan_kmalloc+0x9/0x10
         kmem_cache_alloc_trace+0x138/0x260
         btrfs_read_tree_root+0x92/0x360 [btrfs]
         btrfs_read_fs_root+0x10/0xb0 [btrfs]
         create_reloc_root+0x47d/0xa10 [btrfs]
         btrfs_init_reloc_root+0x1e2/0x340 [btrfs]
         record_root_in_trans+0x2a0/0x370 [btrfs]
         btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
         start_transaction+0x1ab/0xe90 [btrfs]
         btrfs_start_transaction+0x1e/0x20 [btrfs]
         __btrfs_prealloc_file_range+0x1c2/0xa00 [btrfs]
         btrfs_prealloc_file_range+0x13/0x20 [btrfs]
         prealloc_file_extent_cluster+0x29f/0x570 [btrfs]
         relocate_file_extent_cluster+0x193/0xc30 [btrfs]
         relocate_data_extent+0x1f8/0x490 [btrfs]
         relocate_block_group+0x600/0x1060 [btrfs]
         btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
         btrfs_relocate_chunk+0x9e/0x180 [btrfs]
         btrfs_balance+0x14e4/0x2fc0 [btrfs]
         btrfs_ioctl_balance+0x47f/0x640 [btrfs]
         btrfs_ioctl+0x119d/0x8380 [btrfs]
         do_vfs_ioctl+0x9f5/0x1060
         ksys_ioctl+0x67/0x90
         __x64_sys_ioctl+0x73/0xb0
         do_syscall_64+0xa5/0x370
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Freed by task 369692:
         __kasan_slab_free+0x14f/0x210
         kasan_slab_free+0xe/0x10
         kfree+0xd8/0x270
         btrfs_drop_snapshot+0x154c/0x1eb0 [btrfs]
         clean_dirty_subvols+0x227/0x340 [btrfs]
         relocate_block_group+0x972/0x1060 [btrfs]
         btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
         btrfs_relocate_chunk+0x9e/0x180 [btrfs]
         btrfs_balance+0x14e4/0x2fc0 [btrfs]
         btrfs_ioctl_balance+0x47f/0x640 [btrfs]
         btrfs_ioctl+0x119d/0x8380 [btrfs]
         do_vfs_ioctl+0x9f5/0x1060
         ksys_ioctl+0x67/0x90
         __x64_sys_ioctl+0x73/0xb0
         do_syscall_64+0xa5/0x370
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        The buggy address belongs to the object at ffff88856f671100
         which belongs to the cache kmalloc-4k of size 4096
        The buggy address is located 1552 bytes inside of
         4096-byte region [ffff88856f671100, ffff88856f672100)
        The buggy address belongs to the page:
        page:ffffea0015bd9c00 refcount:1 mapcount:0 mapping:ffff88864400e600 index:0x0 compound_mapcount: 0
        flags: 0x2ffff0000010200(slab|head)
        raw: 02ffff0000010200 dead000000000100 dead000000000200 ffff88864400e600
        raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
        page dumped because: kasan: bad access detected
      
        Memory state around the buggy address:
         ffff88856f671600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
         ffff88856f671680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        >ffff88856f671700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                 ^
         ffff88856f671780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
         ffff88856f671800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ==================================================================
        BTRFS info (device sdi1): 1 enospc errors during balance
        BTRFS info (device sdi1): balance: ended with status: -28
      
      [CAUSE]
      The problem happens when finish_ordered_io() get called with balance
      still running, while the reloc root of that subvolume is already dead.
      (Tree is swap already done, but tree not yet deleted for possible qgroup
      usage.)
      
      That means root->reloc_root still exists, but that reloc_root can be
      under btrfs_drop_snapshot(), thus we shouldn't access it.
      
      The following race could cause the use-after-free problem:
      
                      CPU1              |                CPU2
      --------------------------------------------------------------------------
                                        | relocate_block_group()
                                        | |- unset_reloc_control(rc)
                                        | |- btrfs_commit_transaction()
      btrfs_finish_ordered_io()         | |- clean_dirty_subvols()
      |- btrfs_join_transaction()       |    |
         |- record_root_in_trans()      |    |
            |- btrfs_init_reloc_root()  |    |
               |- if (root->reloc_root) |    |
               |                        |    |- root->reloc_root = NULL
               |                        |    |- btrfs_drop_snapshot(reloc_root);
               |- reloc_root->last_trans|
                       = trans->transid |
      	    ^^^^^^^^^^^^^^^^^^^^^^
                  Use after free
      
      [FIX]
      Fix it by the following modifications:
      
      - Test if the root has dead reloc tree before accessing root->reloc_root
        If the root has BTRFS_ROOT_DEAD_RELOC_TREE, then we don't need to
        create or update root->reloc_tree
      
      - Clear the BTRFS_ROOT_DEAD_RELOC_TREE flag until we have fully dropped
        reloc tree
        To co-operate with above modification, so as long as
        BTRFS_ROOT_DEAD_RELOC_TREE is still set, we won't try to re-create
        reloc tree at record_root_in_trans().
      Reported-by: NCebtenzzre <cebtenzzre@gmail.com>
      Fixes: d2311e69 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
      CC: stable@vger.kernel.org # 5.1+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1fac4a54
    • S
      smb3: Add missing reparse tags · 131ea1ed
      Steve French 提交于
      Additional reparse tags were described for WSL and file sync.
      Add missing defines for these tags. Some will be useful for
      POSIX extensions (as discussed at Storage Developer Conference).
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      131ea1ed
    • A
      mm, fs: move randomize_stack_top from fs to mm · 649775be
      Alexandre Ghiti 提交于
      Patch series "Provide generic top-down mmap layout functions", v6.
      
      This series introduces generic functions to make top-down mmap layout
      easily accessible to architectures, in particular riscv which was the
      initial goal of this series.  The generic implementation was taken from
      arm64 and used successively by arm, mips and finally riscv.
      
      Note that in addition the series fixes 2 issues:
      
      - stack randomization was taken into account even if not necessary.
      
      - [1] fixed an issue with mmap base which did not take into account
        randomization but did not report it to arm and mips, so by moving arm64
        into a generic library, this problem is now fixed for both
        architectures.
      
      This work is an effort to factorize architecture functions to avoid code
      duplication and oversights as in [1].
      
      [1]: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1429066.html
      
      This patch (of 14):
      
      This preparatory commit moves this function so that further introduction
      of generic topdown mmap layout is contained only in mm/util.c.
      
      Link: http://lkml.kernel.org/r/20190730055113.23635-2-alex@ghiti.frSigned-off-by: NAlexandre Ghiti <alex@ghiti.fr>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      649775be
    • S
      mm,thp: avoid writes to file with THP in pagecache · 09d91cda
      Song Liu 提交于
      In previous patch, an application could put part of its text section in
      THP via madvise().  These THPs will be protected from writes when the
      application is still running (TXTBSY).  However, after the application
      exits, the file is available for writes.
      
      This patch avoids writes to file THP by dropping page cache for the file
      when the file is open for write.  A new counter nr_thps is added to struct
      address_space.  In do_dentry_open(), if the file is open for write and
      nr_thps is non-zero, we drop page cache for the whole file.
      
      Link: http://lkml.kernel.org/r/20190801184244.3169074-8-songliubraving@fb.comSigned-off-by: NSong Liu <songliubraving@fb.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Acked-by: NRik van Riel <riel@surriel.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09d91cda