1. 28 7月, 2014 3 次提交
  2. 27 7月, 2014 2 次提交
    • L
      Fix gcc-4.9.0 miscompilation of load_balance() in scheduler · 2062afb4
      Linus Torvalds 提交于
      Michel Dänzer and a couple of other people reported inexplicable random
      oopses in the scheduler, and the cause turns out to be gcc mis-compiling
      the load_balance() function when debugging is enabled.  The gcc bug
      apparently goes back to gcc-4.5, but slight optimization changes means
      that it now showed up as a problem in 4.9.0 and 4.9.1.
      
      The instruction scheduling problem causes gcc to schedule a spill
      operation to before the stack frame has been created, which in turn can
      corrupt the spilled value if an interrupt comes in.  There may be other
      effects of this bug too, but that's the code generation problem seen in
      Michel's case.
      
      This is fixed in current gcc HEAD, but the workaround as suggested by
      Markus Trippelsdorf is pretty simple: use -fno-var-tracking-assignments
      when compiling the kernel, which disables the gcc code that causes the
      problem.  This can result in slightly worse debug information for
      variable accesses, but that is infinitely preferable to actual code
      generation problems.
      
      Doing this unconditionally (not just for CONFIG_DEBUG_INFO) also allows
      non-debug builds to verify that the debug build would be identical: we
      can do
      
          export GCC_COMPARE_DEBUG=1
      
      to make gcc internally verify that the result of the build is
      independent of the "-g" flag (it will make the compiler build everything
      twice, toggling the debug flag, and compare the results).
      
      Without the "-fno-var-tracking-assignments" option, the build would fail
      (even with 4.8.3 that didn't show the actual stack frame bug) with a gcc
      compare failure.
      
      See also gcc bugzilla:
      
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61801Reported-by: NMichel Dänzer <michel@daenzer.net>
      Suggested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2062afb4
    • H
      mm: fix direct reclaim writeback regression · 8bdd6380
      Hugh Dickins 提交于
      Shortly before 3.16-rc1, Dave Jones reported:
      
        WARNING: CPU: 3 PID: 19721 at fs/xfs/xfs_aops.c:971
                 xfs_vm_writepage+0x5ce/0x630 [xfs]()
        CPU: 3 PID: 19721 Comm: trinity-c61 Not tainted 3.15.0+ #3
        Call Trace:
          xfs_vm_writepage+0x5ce/0x630 [xfs]
          shrink_page_list+0x8f9/0xb90
          shrink_inactive_list+0x253/0x510
          shrink_lruvec+0x563/0x6c0
          shrink_zone+0x3b/0x100
          shrink_zones+0x1f1/0x3c0
          try_to_free_pages+0x164/0x380
          __alloc_pages_nodemask+0x822/0xc90
          alloc_pages_vma+0xaf/0x1c0
          handle_mm_fault+0xa31/0xc50
        etc.
      
       970   if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
       971                   PF_MEMALLOC))
      
      I did not respond at the time, because a glance at the PageDirty block
      in shrink_page_list() quickly shows that this is impossible: we don't do
      writeback on file pages (other than tmpfs) from direct reclaim nowadays.
      Dave was hallucinating, but it would have been disrespectful to say so.
      
      However, my own /var/log/messages now shows similar complaints
      
        WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b()
        WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b()
      
      from stressing some mmotm trees during July.
      
      Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked,
      so fail shrink_page_list()'s page_is_file_cache() test, and so proceed
      to mapping->a_ops->writepage()?
      
      Yes, 3.16-rc1's commit 68711a74 ("mm, migration: add destination
      page freeing callback") has provided such a way to compaction: if
      migrating a SwapBacked page fails, its newpage may be put back on the
      list for later use with PageSwapBacked still set, and nothing will clear
      it.
      
      Whether that can do anything worse than issue WARN_ON_ONCEs, and get
      some statistics wrong, is unclear: easier to fix than to think through
      the consequences.
      
      Fixing it here, before the put_new_page(), addresses the bug directly,
      but is probably the worst place to fix it.  Page migration is doing too
      many parts of the job on too many levels: fixing it in
      move_to_new_page() to complement its SetPageSwapBacked would be
      preferable, except why is it (and newpage->mapping and newpage->index)
      done there, rather than down in migrate_page_move_mapping(), once we are
      sure of success? Not a cleanup to get into right now, especially not
      with memcg cleanups coming in 3.17.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bdd6380
  3. 26 7月, 2014 13 次提交
  4. 25 7月, 2014 6 次提交
    • H
      x86: Merge tag 'ras_urgent' into x86/urgent · bf72f5de
      H. Peter Anvin 提交于
      Promote one fix for 3.16
      
      This fix was necessary after
      
      9c15a24b ("x86/mce: Improve mcheck_init_device() error handling")
      
      went in. What this patch did was, among others, check the return value
      of misc_register and exit early if it encountered an error. Original
      code sloppily didn't do that.
      
      However,
      
              cef12ee5 ("xen/mce: Add mcelog support for Xen platform")
      
      made it so that xen's init routine xen_late_init_mcelog runs first. This
      was needed for the xen mcelog device which is supposed to be independent
      from the baremetal one.
      
      Initially it was reported that misc_register() fails often on xen and
      that's why it needed fixing. However, it is *supposed* to fail by
      design, when running in dom0 so that the xen mcelog device file gets
      registered first.
      
      And *then* you need the notifier *not* unregistered on the error path so
      that the timer does get deleted properly in the CPU hotplug notifier.
      
      Btw, this fix is needed also on baremetal in the unlikely event that
      misc_register(&mce_chrdev_device) fails there too.
      
      I was unsure whether to rush it in now and decided to delay it to 3.17.
      However, xen people wanted it promoted as it breaks xen when doing cpu
      hotplug there. So, after a bit of simmering in tip/master for initial
      smoke testing, let's move it to 3.16. It fixes a semi-regression which
      got introduced in 3.16 so no need for stable tagging.
      
      tip/x86/ras contains that exact same commit but we can't remove it
      there as it is not the last one. It won't cause any merge issues, as I
      confirmed locally but I should state here the special situation of this
      one fix explicitly anyway.
      
      Thanks.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      bf72f5de
    • J
      drm/radeon: fix cut and paste issue for hawaii. · 1b2c4869
      Jerome Glisse 提交于
      This is a halfway fix for hawaii acceleration. More fixes to come
      but hopefully isolated to userspace.
      Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      1b2c4869
    • D
      Merge branch 'drm-fixes-3.16' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · 97cefc3e
      Dave Airlie 提交于
      two more radeon fixes.
      
      * 'drm-fixes-3.16' of git://people.freedesktop.org/~agd5f/linux:
        drm/radeon: fix irq ring buffer overflow handling
        drm/radeon: fix error handling in radeon_vm_bo_set_addr
      97cefc3e
    • D
      Merge tag 'drm-intel-fixes-2014-07-24' of git://anongit.freedesktop.org/drm-intel into drm-fixes · 9d6ed3c6
      Dave Airlie 提交于
      This time in time! Just 32bit-pae fix from Hugh, semaphores fun from Chris
      and a fix for runtime pm cherry-picked from next.
      
      Paulo is still working on a fix for runtime pm when X does cursor fun when
      the display is off, but that one isn't ready yet.
      
      * tag 'drm-intel-fixes-2014-07-24' of git://anongit.freedesktop.org/drm-intel:
        drm/i915: Simplify i915_gem_release_all_mmaps()
        drm/i915: fix freeze with blank screen booting highmem
        drm/i915: Reorder the semaphore deadlock check, again
      9d6ed3c6
    • H
      parisc: Eliminate memset after alloc_bootmem_pages · 9794144d
      HIMANGI SARAOGI 提交于
      alloc_bootmem and related function always return zeroed region of
      memory. Thus a memset after calls to these functions is unnecessary.
      
      The following Coccinelle semantic patch was used for making the change:
      
      @@
      expression E,E1;
      @@
      
      E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\)(...)
      ... when != E
      - memset(E,0,E1);
      Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
      Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      9794144d
    • J
      parisc: Remove SA_RESTORER define · 20dbea49
      John David Anglin 提交于
      The sa_restorer field in struct sigaction is obsolete and no longer in
      the parisc implementation.  However, the core code assumes the field is
      present if SA_RESTORER is defined. So, the define needs to be removed.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      20dbea49
  5. 24 7月, 2014 16 次提交
    • G
      hwmon: (smsc47m192) Fix temperature limit and vrm write operations · 043572d5
      Guenter Roeck 提交于
      Temperature limit clamps are applied after converting the temperature
      from milli-degrees C to degrees C, so either the clamp limit needs
      to be specified in degrees C, not milli-degrees C, or clamping must
      happen before converting to degrees C. Use the latter method to avoid
      overflows.
      
      vrm is an u8, so the written value needs to be limited to [0, 255].
      
      Cc: Axel Lin <axel.lin@ingics.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: NJean Delvare <jdelvare@suse.de>
      043572d5
    • V
      fs: umount on symlink leaks mnt count · 295dc39d
      Vasily Averin 提交于
      Currently umount on symlink blocks following umount:
      
      /vz is separate mount
      
      # ls /vz/ -al | grep test
      drwxr-xr-x.  2 root root       4096 Jul 19 01:14 testdir
      lrwxrwxrwx.  1 root root         11 Jul 19 01:16 testlink -> /vz/testdir
      # umount -l /vz/testlink
      umount: /vz/testlink: not mounted (expected)
      
      # lsof /vz
      # umount /vz
      umount: /vz: device is busy. (unexpected)
      
      In this case mountpoint_last() gets an extra refcount on path->mnt
      Signed-off-by: NVasily Averin <vvs@openvz.org>
      Acked-by: NIan Kent <raven@themaw.net>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      295dc39d
    • B
      direct-io: fix uninitialized warning in do_direct_IO() · 6fcc5420
      Boaz Harrosh 提交于
      The following warnings:
      
        fs/direct-io.c: In function ‘__blockdev_direct_IO’:
        fs/direct-io.c:1011:12: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        fs/direct-io.c:913:16: note: ‘to’ was declared here
        fs/direct-io.c:1011:12: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        fs/direct-io.c:913:10: note: ‘from’ was declared here
      
      are false positive because dio_get_page() either fails, or sets both
      'from' and 'to'.
      
      Paul Bolle said ...
      Maybe it's better to move initializing "to" and "from" out of
      dio_get_page(). That _might_ make it easier for both the the reader and
      the compiler to understand what's going on. Something like this:
      
      Christoph Hellwig said ...
      The fix of moving the code definitively looks nicer, while I think
      uninitialized_var is horrible wart that won't get anywhere near my code.
      
      Boaz Harrosh: I agree with Christoph and Paul
      Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      6fcc5420
    • L
      Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux · 82e13c71
      Linus Torvalds 提交于
      Pull nfsd bugfix from Bruce Fields:
       "Another regression from the xdr encoding rewrite"
      
      * 'for-3.16' of git://linux-nfs.org/~bfields/linux:
        NFSD: Fix crash encoding lock reply on 32-bit
      82e13c71
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 98de5ab7
      Linus Torvalds 提交于
      Pull arm64 fix from Catalin Marinas:
       "Fix arm64 regression introduced by limiting the CMA buffer to ZONE_DMA
        on platforms where RAM starts above 4GB (and ZONE_DMA becoming 0)"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: Create non-empty ZONE_DMA when DRAM starts above 4GB
      98de5ab7
    • L
      Merge tag 'xtensa-next-20140721' of git://github.com/czankel/xtensa-linux · 29ae8a6a
      Linus Torvalds 提交于
      Pull Xtensa fixes from Chris Zankel:
       - resolve FIXMEs in double exception handler for window overflow. This
         fix makes native building of linux on xtensa host possible;
       - fix sysmem region removal issue introduced in 3.15.
      
      * tag 'xtensa-next-20140721' of git://github.com/czankel/xtensa-linux:
        xtensa: fix sysmem reservation at the end of existing block
        xtensa: add fixup for double exception raised in window overflow
      29ae8a6a
    • L
      Merge tag 'pinctrl-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 02ec4747
      Linus Torvalds 提交于
      Pull pin control fixes from Linus Walleij:
       "Here are three pin control fixes for the v3.16 series.  Sorry that
        some of these arrive late, the summer heat in Sweden makes me slow.
      
         - an IRQ handling fix for the STi driver, also for stable
         - another IRQ fix for the RCAR GPIO driver
         - a MAINTAINERS entry"
      
      * tag 'pinctrl-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        gpio: rcar: Add support for DT IRQ flags
        MAINTAINERS: Add entry for the Renesas pin controller driver
        pinctrl: st: Fix irqmux handler
      02ec4747
    • L
      Merge branch 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · ea9339e5
      Linus Torvalds 提交于
      Pull libata regression fix from Tejun Heo:
       "The last libata/for-3.16-fixes pull contained a regression introduced
        by 1871ee13 ("libata: support the ata host which implements a
        queue depth less than 32") which in turn was a fix for a regression
        introduced earlier while changing queue tag order to accomodate hard
        drives which perform poorly if tags are not allocated in circular
        order (ugh...).
      
        The regression happens only for SAS controllers making use of libata
        to serve ATA devices.  They don't fill an ata_host field which is used
        by the new tag allocation function leading to NULL dereference.
      
        This patch adds a new intermediate field ata_host->n_tags which is
        initialized for both SAS and !SAS cases to fix the issue"
      
      * 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        libata: introduce ata_host->n_tags to avoid oops on SAS controllers
      ea9339e5
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · b292d6b5
      Linus Torvalds 提交于
      Pull input layer fixes from Dmitry Torokhov:
       "A few fixups for the input subsystem"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: document INPUT_PROP_TOPBUTTONPAD
        Input: fix defuzzing logic
        Input: sirfsoc-onkey - fix GPL v2 license string typo
        Input: st-keyscan - fix 'defined but not used' compiler warnings
        Input: synaptics - add min/max quirk for pnp-id LEN2002 (Edge E531)
        Input: i8042 - add Acer Aspire 5710 to nomux blacklist
        Input: ti_am335x_tsc - warn about incorrect spelling
        Input: wacom - cleanup multitouch code when touch_max is 2
      b292d6b5
    • L
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 7442cf9a
      Linus Torvalds 提交于
      Pull powerpc fixes from Ben Herrenschmidt:
       "Here is a handful of powerpc fixes for 3.16.  They are all pretty
        simple and self contained and should still make this release"
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: use _GLOBAL_TOC for memmove
        powerpc/pseries: dynamically added OF nodes need to call of_node_init
        powerpc: subpage_protect: Increase the array size to take care of 64TB
        powerpc: Fix bugs in emulate_step()
        powerpc: Disable doorbells on Power8 DD1.x
      7442cf9a
    • L
      Merge tag 'urgent-slab-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 355cb093
      Linus Torvalds 提交于
      Pull slab fix from Mike Snitzer:
       "This fixes the broken duplicate slab name check in
        kmem_cache_sanity_check() that has been repeatedly reported (as
        recently as today against Fedora rawhide).
      
        Pekka seemed to have it staged for a late 3.15-rc in his 'slab/urgent'
        branch but never sent a pull request, see:
            https://lkml.org/lkml/2014/5/23/648"
      
      * tag 'urgent-slab-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        slab_common: fix the check for duplicate slab names
      355cb093
    • L
      Merge branch 'akpm' (patches from Andrew Morton) · ed4a1084
      Linus Torvalds 提交于
      Merge fixes from Andrew Morton:
       "10 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: hugetlb: fix copy_hugetlb_page_range()
        simple_xattr: permit 0-size extended attributes
        mm/fs: fix pessimization in hole-punching pagecache
        shmem: fix splicing from a hole while it's punched
        shmem: fix faulting into a hole, not taking i_mutex
        mm: do not call do_fault_around for non-linear fault
        sh: also try passing -m4-nofpu for SH2A builds
        zram: avoid lockdep splat by revalidate_disk
        mm/rmap.c: fix pgoff calculation to handle hugepage correctly
        coredump: fix the setting of PF_DUMPCORE
      ed4a1084
    • N
      mm: hugetlb: fix copy_hugetlb_page_range() · 0253d634
      Naoya Horiguchi 提交于
      Commit 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle
      migration/hwpoisoned entry") changed the order of
      huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
      in some workloads like hugepage-backed heap allocation via libhugetlbfs.
      This patch fixes it.
      
      The test program for the problem is shown below:
      
        $ cat heap.c
        #include <unistd.h>
        #include <stdlib.h>
        #include <string.h>
      
        #define HPS 0x200000
      
        int main() {
        	int i;
        	char *p = malloc(HPS);
        	memset(p, '1', HPS);
        	for (i = 0; i < 5; i++) {
        		if (!fork()) {
        			memset(p, '2', HPS);
        			p = malloc(HPS);
        			memset(p, '3', HPS);
        			free(p);
        			return 0;
        		}
        	}
        	sleep(1);
        	free(p);
        	return 0;
        }
      
        $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap
      
      Fixes 4a705fef ("hugetlb: fix copy_hugetlb_page_range() to handle
      migration/hwpoisoned entry"), so is applicable to -stable kernels which
      include it.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: NGuillaume Morin <guillaume@morinfr.org>
      Suggested-by: NGuillaume Morin <guillaume@morinfr.org>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>	[2.6.37+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0253d634
    • H
      simple_xattr: permit 0-size extended attributes · 4e66d445
      Hugh Dickins 提交于
      If a filesystem uses simple_xattr to support user extended attributes,
      LTP setxattr01 and xfstests generic/062 fail with "Cannot allocate
      memory": simple_xattr_alloc()'s wrap-around test mistakenly excludes
      values of zero size.  Fix that off-by-one (but apparently no filesystem
      needs them yet).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Aristeu Rozanski <aris@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e66d445
    • H
      mm/fs: fix pessimization in hole-punching pagecache · 792ceaef
      Hugh Dickins 提交于
      I wanted to revert my v3.1 commit d0823576 ("mm: pincer in
      truncate_inode_pages_range"), to keep truncate_inode_pages_range() in
      synch with shmem_undo_range(); but have stepped back - a change to
      hole-punching in truncate_inode_pages_range() is a change to
      hole-punching in every filesystem (except tmpfs) that supports it.
      
      If there's a logical proof why no filesystem can depend for its own
      correctness on the pincer guarantee in truncate_inode_pages_range() - an
      instant when the entire hole is removed from pagecache - then let's
      revisit later.  But the evidence is that only tmpfs suffered from the
      livelock, and we have no intention of extending hole-punch to ramfs.  So
      for now just add a few comments (to match or differ from those in
      shmem_undo_range()), and fix one silliness noticed in d0823576...
      
      Its "index == start" addition to the hole-punch termination test was
      incomplete: it opened a way for the end condition to be missed, and the
      loop go on looking through the radix_tree, all the way to end of file.
      Fix that pessimization by resetting index when detected in inner loop.
      
      Note that it's actually hard to hit this case, without the obsessive
      concurrent faulting that trinity does: normally all pages are removed in
      the initial trylock_page() pass, and this loop finds nothing to do.  I
      had to "#if 0" out the initial pass to reproduce bug and test fix.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      792ceaef
    • H
      shmem: fix splicing from a hole while it's punched · b1a36650
      Hugh Dickins 提交于
      shmem_fault() is the actual culprit in trinity's hole-punch starvation,
      and the most significant cause of such problems: since a page faulted is
      one that then appears page_mapped(), needing unmap_mapping_range() and
      i_mmap_mutex to be unmapped again.
      
      But it is not the only way in which a page can be brought into a hole in
      the radix_tree while that hole is being punched; and Vlastimil's testing
      implies that if enough other processors are busy filling in the hole,
      then shmem_undo_range() can be kept from completing indefinitely.
      
      shmem_file_splice_read() is the main other user of SGP_CACHE, which can
      instantiate shmem pagecache pages in the read-only case (without holding
      i_mutex, so perhaps concurrently with a hole-punch).  Probably it's
      silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
      ought to be safe, but might bring surprises - not a change to be rushed.
      
      shmem_read_mapping_page_gfp() is an internal interface used by
      drivers/gpu/drm GEM (and next by uprobes): it should be okay.  And
      shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
      called internally by the kernel (perhaps for a stacking filesystem,
      which might rely on holes to be reserved): it's unclear whether it could
      be provoked to keep hole-punch busy or not.
      
      We could apply the same umbrella as now used in shmem_fault() to
      shmem_file_splice_read() and the others; but it looks ugly, and use over
      a range raises questions - should it actually be per page? can these get
      starved themselves?
      
      The origin of this part of the problem is my v3.1 commit d0823576
      ("mm: pincer in truncate_inode_pages_range"), once it was duplicated
      into shmem.c.  It seemed like a nice idea at the time, to ensure
      (barring RCU lookup fuzziness) that there's an instant when the entire
      hole is empty; but the indefinitely repeated scans to ensure that make
      it vulnerable.
      
      Revert that "enhancement" to hole-punch from shmem_undo_range(), but
      retain the unproblematic rescanning when it's truncating; add a couple
      of comments there.
      
      Remove the "indices[0] >= end" test: that is now handled satisfactorily
      by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
      to be worth avoiding here.
      
      But if we do not always loop indefinitely, we do need to handle the case
      of swap swizzled back to page before shmem_free_swap() gets it: add a
      retry for that case, as suggested by Konstantin Khlebnikov; and for the
      case of page swizzled back to swap, as suggested by Johannes Weiner.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Suggested-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.1+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1a36650