1. 20 5月, 2014 4 次提交
    • J
      xfs: fix infinite loop at xfs_vm_writepage on 32bit system · 8695d27e
      Jie Liu 提交于
      Write to a file with an offset greater than 16TB on 32-bit system and
      then trigger page write-back via sync(1) will cause task hang.
      
      # block_size=4096
      # offset=$(((2**32 - 1) * $block_size))
      # xfs_io -f -c "pwrite $offset $block_size" /storage/test_file
      # sync
      
      INFO: task sync:2590 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      sync            D c1064a28     0  2590   2097 0x00000000
      .....
      Call Trace:
      [<c1064a28>] ? ttwu_do_wakeup+0x18/0x130
      [<c1066d0e>] ? try_to_wake_up+0x1ce/0x220
      [<c1066dbf>] ? wake_up_process+0x1f/0x40
      [<c104fc2e>] ? wake_up_worker+0x1e/0x30
      [<c15b6083>] schedule+0x23/0x60
      [<c15b3c2d>] schedule_timeout+0x18d/0x1f0
      [<c12a143e>] ? do_raw_spin_unlock+0x4e/0x90
      [<c10515f1>] ? __queue_delayed_work+0x91/0x150
      [<c12a12ef>] ? do_raw_spin_lock+0x3f/0x100
      [<c12a143e>] ? do_raw_spin_unlock+0x4e/0x90
      [<c15b5b5d>] wait_for_completion+0x7d/0xc0
      [<c1066d60>] ? try_to_wake_up+0x220/0x220
      [<c116a4d2>] sync_inodes_sb+0x92/0x180
      [<c116fb05>] sync_inodes_one_sb+0x15/0x20
      [<c114a8f8>] iterate_supers+0xb8/0xc0
      [<c116faf0>] ? fdatawrite_one_bdev+0x20/0x20
      [<c116fc21>] sys_sync+0x31/0x80
      [<c15be18d>] sysenter_do_call+0x12/0x28
      
      This issue can be triggered via xfstests/generic/308.
      
      The reason is that the end_index is unsigned long with maximum value
      '2^32-1=4294967295' on 32-bit platform, and the given offset cause it
      wrapped to 0, so that the following codes will repeat again and again
      until the task schedule time out:
      
      end_index = offset >> PAGE_CACHE_SHIFT;
      last_index = (offset - 1) >> PAGE_CACHE_SHIFT;
      if (page->index >= end_index) {
      	unsigned offset_into_page = offset & (PAGE_CACHE_SIZE - 1);
              /*
               * Just skip the page if it is fully outside i_size, e.g. due
               * to a truncate operation that is in progress.
               */
              if (page->index >= end_index + 1 || offset_into_page == 0) {
      	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      		unlock_page(page);
      		return 0;
      	}
      
      In order to check if a page is fully outsids i_size or not, we can fix
      the code logic as below:
      	if (page->index > end_index ||
      	    (page->index == end_index && offset_into_page == 0))
      
      Secondly, there still has another similar issue when calculating the
      end offset for mapping the filesystem blocks to the file blocks for
      delalloc.  With the same tests to above, run unmount(8) will cause
      kernel panic if CONFIG_XFS_DEBUG is enabled:
      
      XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || \
      	ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 964
      
      kernel BUG at fs/xfs/xfs_message.c:108!
      invalid opcode: 0000 [#1] SMP
      task: edddc100 ti: ec6ee000 task.ti: ec6ee000
      EIP: 0060:[<f83d87cb>] EFLAGS: 00010296 CPU: 1
      EIP is at assfail+0x2b/0x30 [xfs]
      ..............
      Call Trace:
      [<f83d9cd4>] xfs_fs_destroy_inode+0x74/0x120 [xfs]
      [<c115ddf1>] destroy_inode+0x31/0x50
      [<c115deff>] evict+0xef/0x170
      [<c115dfb2>] dispose_list+0x32/0x40
      [<c115ea3a>] evict_inodes+0xca/0xe0
      [<c1149706>] generic_shutdown_super+0x46/0xd0
      [<c11497b9>] kill_block_super+0x29/0x70
      [<c1149a14>] deactivate_locked_super+0x44/0x70
      [<c114a427>] deactivate_super+0x47/0x60
      [<c1161c3d>] mntput_no_expire+0xcd/0x120
      [<c1162ae8>] SyS_umount+0xa8/0x370
      [<c1162dce>] SyS_oldumount+0x1e/0x20
      [<c15be18d>] sysenter_do_call+0x12/0x28
      
      That because the end_offset is evaluated to 0 which is the same reason
      to above, hence the mapping and covertion for dealloc file blocks to
      file system blocks did not happened.
      
      This patch just fixed both issues.
      Reported-by: NMichael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      8695d27e
    • D
      xfs: remove redundant checks from xfs_da_read_buf · 7c166350
      Dave Chinner 提交于
      All of the verification checks of magic numbers are now done by
      verifiers, so ther eis no need to check them again once the buffer
      has been successfully read. If the magic number is bad, it won't
      even get to that code to verify it so it really serves no purpose at
      all anymore. Remove it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      7c166350
    • D
      xfs: log vector rounding leaks log space · 110dc24a
      Dave Chinner 提交于
      The addition of direct formatting of log items into the CIL
      linear buffer added alignment restrictions that the start of each
      vector needed to be 64 bit aligned. Hence padding was added in
      xlog_finish_iovec() to round up the vector length to ensure the next
      vector started with the correct alignment.
      
      This adds a small number of bytes to the size of
      the linear buffer that is otherwise unused. The issue is that we
      then use the linear buffer size to determine the log space used by
      the log item, and this includes the unused space. Hence when we
      account for space used by the log item, it's more than is actually
      written into the iclogs, and hence we slowly leak this space.
      
      This results on log hangs when reserving space, with threads getting
      stuck with these stack traces:
      
      Call Trace:
      [<ffffffff81d15989>] schedule+0x29/0x70
      [<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0
      [<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140
      [<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220
      [<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310
      .....
      
      The 4 bytes is significant. Brain Foster did all the hard work in
      tracking down a reproducable leak to inode chunk allocation (it went
      away with the ikeep mount option). His rough numbers were that
      creating 50,000 inodes leaked 11 log blocks. This turns out to be
      roughly 800 inode chunks or 1600 inode cluster buffers. That
      works out at roughly 4 bytes per cluster buffer logged, and at that
      I started looking for a 4 byte leak in the buffer logging code.
      
      What I found was that a struct xfs_buf_log_format structure for an
      inode cluster buffer is 28 bytes in length. This gets rounded up to
      32 bytes, but the vector length remains 28 bytes. Hence the CIL
      ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
      for that vector rather than 28 bytes which are written into the log.
      
      The fix for this problem is to separately track the bytes used by
      the log vectors in the item and use that instead of the buffer
      length when accounting for the log space that will be used by the
      formatted log item.
      
      Again, thanks to Brian Foster for doing all the hard work and long
      hours to isolate this leak and make finding the bug relatively
      simple.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      110dc24a
    • N
      xfs: remove XFS_TRANS_RESERVE in collapse range · ce576f1c
      Namjae Jeon 提交于
      There is no need to dip into reserve pool. Reserve pool is used for much
      more important things. And xfs_trans_reserve will never return ENOSPC
      because punch hole is already done. If we get ENOSPC, collapse range
      will be simply failed.
      
      Cc: Brian Foster <bfoster@redhat.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ce576f1c
  2. 10 5月, 2014 2 次提交
    • L
      Linux 3.15-rc5 · d6d211db
      Linus Torvalds 提交于
      d6d211db
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 181da3c3
      Linus Torvalds 提交于
      Pull x86 fixes from Peter Anvin:
       "A somewhat unpleasantly large collection of small fixes.  The big ones
        are the __visible tree sweep and a fix for 'earlyprintk=efi,keep'.  It
        was using __init functions with predictably suboptimal results.
      
        Another key fix is a build fix which would produce output that simply
        would not decompress correctly in some configuration, due to the
        existing Makefiles picking up an unfortunate local label and mistaking
        it for the global symbol _end.
      
        Additional fixes include the handling of 64-bit numbers when setting
        the vdso data page (a latent bug which became manifest when i386
        started exporting a vdso with time functions), a fix to the new MSR
        manipulation accessors which would cause features to not get properly
        unblocked, a build fix for 32-bit userland, and a few new platform
        quirks"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall()
        x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro
        x86: Fix typo preventing msr_set/clear_bit from having an effect
        x86/intel: Add quirk to disable HPET for the Baytrail platform
        x86/hpet: Make boot_hpet_disable extern
        x86-64, build: Fix stack protector Makefile breakage with 32-bit userland
        x86/reboot: Add reboot quirk for Certec BPC600
        asmlinkage: Add explicit __visible to drivers/*, lib/*, kernel/*
        asmlinkage, x86: Add explicit __visible to arch/x86/*
        asmlinkage: Revert "lto: Make asmlinkage __visible"
        x86, build: Don't get confused by local symbols
        x86/efi: earlyprintk=efi,keep fix
      181da3c3
  3. 09 5月, 2014 8 次提交
  4. 08 5月, 2014 10 次提交
  5. 07 5月, 2014 16 次提交
    • C
      x86/reboot: Add reboot quirk for Certec BPC600 · aadca6fa
      Christian Gmeiner 提交于
      Certec BPC600 needs reboot=pci to actually reboot.
      Signed-off-by: NChristian Gmeiner <christian.gmeiner@gmail.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Li Aubrey <aubrey.li@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1399446114-2147-1-git-send-email-christian.gmeiner@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aadca6fa
    • D
      Merge branch 'mullins' of git://people.freedesktop.org/~deathsimple/linux into drm-fixes · 995c376e
      Dave Airlie 提交于
      Add Mullins chips support.
      
      * 'mullins' of git://people.freedesktop.org/~deathsimple/linux:
        drm/radeon: add pci ids for Mullins
        drm/radeon: add Mullins VCE support
        drm/radeon: modesetting updates for Mullins.
        drm/radeon: dpm updates for KV/KB
        drm/radeon: add Mullins dpm support.
        drm/radeon: add Mullins UVD support.
        drm/radeon: update cik init for Mullins.
        drm/radeon: add Mullins chip family
      995c376e
    • D
      Merge branch 'drm-nouveau-next' of... · 2a1235e5
      Dave Airlie 提交于
      Merge branch 'drm-nouveau-next' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
      
      nouveau fixes.
      
      * 'drm-nouveau-next' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
        drm/gm107/gr: bump attrib cb size quite a bit
        drm/nouveau: fix another lock unbalance in nouveau_crtc_page_flip
        drm/nouveau/bios: fix shadowing from PROM on big-endian systems
        drm/nouveau/acpi: allow non-optimus setups to load vbios from acpi
      2a1235e5
    • D
      Merge tag 'topc/core-stuff-2014-05-05' of git://anongit.freedesktop.org/drm-intel into drm-fixes · 508200c5
      Dave Airlie 提交于
      Some more i915 fixes. There's still some DP issues we are looking into,
      but wanted to get these moving.
      
      * tag 'topc/core-stuff-2014-05-05' of git://anongit.freedesktop.org/drm-intel:
        drm/i915: don't try DP_LINK_BW_5_4 on HSW ULX
        drm/i915: Sanitize the enable_ppgtt module option once
        drm/i915: Break encoder->crtc link separately in intel_sanitize_crtc()
      508200c5
    • D
      Merge branch 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux into drm-fixes · 9eabb911
      Dave Airlie 提交于
      this is the next pull quested for stashed up radeon fixes for 3.15. As discussed support for Mullins was separated out and will get it's own pull request. Remaining highlights are:
      1. Some more patches to better handle PLL limits.
      2. Making use of the PFLIP additional to the VBLANK interrupt, otherwise we sometimes miss page flip events.
      3. Fix for the UVD command stream parser.
      4. Fix for bootup UVD clocks on RV7xx systems.
      5. Adding missing error check on dpcd reads.
      6. Fixes number of banks calculation on SI.
      
      * 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux:
        drm/radeon: lower the ref * post PLL maximum
        drm/radeon: check that we have a clock before PLL setup
        drm/radeon: drm/radeon: add missing radeon_semaphore_free to error path
        drm/radeon: Fix num_banks calculation for SI
        drm/radeon/dp: check for errors in dpcd reads
        drm/radeon: avoid high jitter with small frac divs
        drm/radeon: check buffer relocation offset
        drm/radeon: use pflip irq on R600+ v2
        drm/radeon/uvd: use lower clocks on old UVD to boot v2
      9eabb911
    • L
      Merge branch 'akpm' (incoming from Andrew) · 38583f09
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "13 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        agp: info leak in agpioc_info_wrap()
        fs/affs/super.c: bugfix / double free
        fanotify: fix -EOVERFLOW with large files on 64-bit
        slub: use sysfs'es release mechanism for kmem_cache
        revert "mm: vmscan: do not swap anon pages just because free+file is low"
        autofs: fix lockref lookup
        mm: filemap: update find_get_pages_tag() to deal with shadow entries
        mm/compaction: make isolate_freepages start at pageblock boundary
        MAINTAINERS: zswap/zbud: change maintainer email address
        mm/page-writeback.c: fix divide by zero in pos_ratio_polynom
        hugetlb: ensure hugepage access is denied if hugepages are not supported
        slub: fix memcg_propagate_slab_attrs
        drivers/rtc/rtc-pcf8523.c: fix month definition
      38583f09
    • D
      agp: info leak in agpioc_info_wrap() · 3ca9e5d3
      Dan Carpenter 提交于
      On 64 bit systems the agp_info struct has a 4 byte hole between
      ->agp_mode and ->aper_base.  We need to clear it to avoid disclosing
      stack information to userspace.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ca9e5d3
    • F
      fs/affs/super.c: bugfix / double free · d353efd0
      Fabian Frederick 提交于
      Commit 842a859d ("affs: use ->kill_sb() to simplify ->put_super()
      and failure exits of ->mount()") adds .kill_sb which frees sbi but
      doesn't remove sbi free in case of parse_options error causing double
      free+random crash.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[3.14.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d353efd0
    • W
      fanotify: fix -EOVERFLOW with large files on 64-bit · 1e2ee49f
      Will Woods 提交于
      On 64-bit systems, O_LARGEFILE is automatically added to flags inside
      the open() syscall (also openat(), blkdev_open(), etc).  Userspace
      therefore defines O_LARGEFILE to be 0 - you can use it, but it's a
      no-op.  Everything should be O_LARGEFILE by default.
      
      But: when fanotify does create_fd() it uses dentry_open(), which skips
      all that.  And userspace can't set O_LARGEFILE in fanotify_init()
      because it's defined to 0.  So if fanotify gets an event regarding a
      large file, the read() will just fail with -EOVERFLOW.
      
      This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit
      systems, using the same test as open()/openat()/etc.
      
      Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821Signed-off-by: NWill Woods <wwoods@redhat.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e2ee49f
    • C
      slub: use sysfs'es release mechanism for kmem_cache · 41a21285
      Christoph Lameter 提交于
      debugobjects warning during netfilter exit:
      
          ------------[ cut here ]------------
          WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G        W 3.11.0-next-20130906-sasha #3984
          Workqueue: netns cleanup_net
          Call Trace:
            dump_stack+0x52/0x87
            warn_slowpath_common+0x8c/0xc0
            warn_slowpath_fmt+0x46/0x50
            debug_print_object+0x8d/0xb0
            __debug_check_no_obj_freed+0xa5/0x220
            debug_check_no_obj_freed+0x15/0x20
            kmem_cache_free+0x197/0x340
            kmem_cache_destroy+0x86/0xe0
            nf_conntrack_cleanup_net_list+0x131/0x170
            nf_conntrack_pernet_exit+0x5d/0x70
            ops_exit_list+0x5e/0x70
            cleanup_net+0xfb/0x1c0
            process_one_work+0x338/0x550
            worker_thread+0x215/0x350
            kthread+0xe7/0xf0
            ret_from_fork+0x7c/0xb0
      
      Also during dcookie cleanup:
      
          WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408
          Call Trace:
            dump_stack (lib/dump_stack.c:52)
            warn_slowpath_common (kernel/panic.c:430)
            warn_slowpath_fmt (kernel/panic.c:445)
            debug_print_object (lib/debugobjects.c:262)
            __debug_check_no_obj_freed (lib/debugobjects.c:697)
            debug_check_no_obj_freed (lib/debugobjects.c:726)
            kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717)
            kmem_cache_destroy (mm/slab_common.c:363)
            dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343)
            event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153)
            __fput (fs/file_table.c:217)
            ____fput (fs/file_table.c:253)
            task_work_run (kernel/task_work.c:125 (discriminator 1))
            do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751)
            int_signal (arch/x86/kernel/entry_64.S:807)
      
      Sysfs has a release mechanism.  Use that to release the kmem_cache
      structure if CONFIG_SYSFS is enabled.
      
      Only slub is changed - slab currently only supports /proc/slabinfo and
      not /sys/kernel/slab/*.  We talked about adding that and someone was
      working on it.
      
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build]
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more]
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NGreg KH <greg@kroah.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41a21285
    • J
      revert "mm: vmscan: do not swap anon pages just because free+file is low" · 62376251
      Johannes Weiner 提交于
      This reverts commit 0bf1457f ("mm: vmscan: do not swap anon pages
      just because free+file is low") because it introduced a regression in
      mostly-anonymous workloads, where reclaim would become ineffective and
      trap every allocating task in direct reclaim.
      
      The problem is that there is a runaway feedback loop in the scan balance
      between file and anon, where the balance tips heavily towards a tiny
      thrashing file LRU and anonymous pages are no longer being looked at.
      The commit in question removed the safe guard that would detect such
      situations and respond with forced anonymous reclaim.
      
      This commit was part of a series to fix premature swapping in loads with
      relatively little cache, and while it made a small difference, the cure
      is obviously worse than the disease.  Revert it.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NRafael Aquini <aquini@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@kernel.org>		[3.12+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62376251
    • I
      autofs: fix lockref lookup · 6b6751f7
      Ian Kent 提交于
      autofs needs to be able to see private data dentry flags for its dentrys
      that are being created but not yet hashed and for its dentrys that have
      been rmdir()ed but not yet freed.  It needs to do this so it can block
      processes in these states until a status has been returned to indicate
      the given operation is complete.
      
      It does this by keeping two lists, active and expring, of dentrys in
      this state and uses ->d_release() to keep them stable while it checks
      the reference count to determine if they should be used.
      
      But with the recent lockref changes dentrys being freed sometimes don't
      transition to a reference count of 0 before being freed so autofs can
      occassionally use a dentry that is invalid which can lead to a panic.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b6751f7
    • J
      mm: filemap: update find_get_pages_tag() to deal with shadow entries · 139b6a6f
      Johannes Weiner 提交于
      Dave Jones reports the following crash when find_get_pages_tag() runs
      into an exceptional entry:
      
        kernel BUG at mm/filemap.c:1347!
        RIP: find_get_pages_tag+0x1cb/0x220
        Call Trace:
          find_get_pages_tag+0x36/0x220
          pagevec_lookup_tag+0x21/0x30
          filemap_fdatawait_range+0xbe/0x1e0
          filemap_fdatawait+0x27/0x30
          sync_inodes_sb+0x204/0x2a0
          sync_inodes_one_sb+0x19/0x20
          iterate_supers+0xb2/0x110
          sys_sync+0x44/0xb0
          ia32_do_call+0x13/0x13
      
        1343                         /*
        1344                          * This function is never used on a shmem/tmpfs
        1345                          * mapping, so a swap entry won't be found here.
        1346                          */
        1347                         BUG();
      
      After commit 0cd6144a ("mm + fs: prepare for non-page entries in
      page cache radix trees") this comment and BUG() are out of date because
      exceptional entries can now appear in all mappings - as shadows of
      recently evicted pages.
      
      However, as Hugh Dickins notes,
      
        "it is truly surprising for a PAGECACHE_TAG_WRITEBACK (and probably
         any other PAGECACHE_TAG_*) to appear on an exceptional entry.
      
         I expect it comes down to an occasional race in RCU lookup of the
         radix_tree: lacking absolute synchronization, we might sometimes
         catch an exceptional entry, with the tag which really belongs with
         the unexceptional entry which was there an instant before."
      
      And indeed, not only is the tree walk lockless, the tags are also read
      in chunks, one radix tree node at a time.  There is plenty of time for
      page reclaim to swoop in and replace a page that was already looked up
      as tagged with a shadow entry.
      
      Remove the BUG() and update the comment.  While reviewing all other
      lookup sites for whether they properly deal with shadow entries of
      evicted pages, update all the comments and fix memcg file charge moving
      to not miss shmem/tmpfs swapcache pages.
      
      Fixes: 0cd6144a ("mm + fs: prepare for non-page entries in page cache radix trees")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NDave Jones <davej@redhat.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      139b6a6f
    • V
      mm/compaction: make isolate_freepages start at pageblock boundary · 49e068f0
      Vlastimil Babka 提交于
      The compaction freepage scanner implementation in isolate_freepages()
      starts by taking the current cc->free_pfn value as the first pfn.  In a
      for loop, it scans from this first pfn to the end of the pageblock, and
      then subtracts pageblock_nr_pages from the first pfn to obtain the first
      pfn for the next for loop iteration.
      
      This means that when cc->free_pfn starts at offset X rather than being
      aligned on pageblock boundary, the scanner will start at offset X in all
      scanned pageblock, ignoring potentially many free pages.  Currently this
      can happen when
      
       a) zone's end pfn is not pageblock aligned, or
      
       b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE
          enabled and a hole spanning the beginning of a pageblock
      
      This patch fixes the problem by aligning the initial pfn in
      isolate_freepages() to pageblock boundary.  This also permits replacing
      the end-of-pageblock alignment within the for loop with a simple
      pageblock_nr_pages increment.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NHeesub Shin <heesub.shin@samsung.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49e068f0
    • S
      MAINTAINERS: zswap/zbud: change maintainer email address · 0e3b7e54
      Seth Jennings 提交于
      sjenning@linux.vnet.ibm.com is no longer a viable entity.
      Signed-off-by: NSeth Jennings <sjennings@variantweb.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e3b7e54
    • R
      mm/page-writeback.c: fix divide by zero in pos_ratio_polynom · d5c9fde3
      Rik van Riel 提交于
      It is possible for "limit - setpoint + 1" to equal zero, after getting
      truncated to a 32 bit variable, and resulting in a divide by zero error.
      
      Using the fully 64 bit divide functions avoids this problem.  It also
      will cause pos_ratio_polynom() to return the correct value when
      (setpoint - limit) exceeds 2^32.
      
      Also uninline pos_ratio_polynom, at Andrew's request.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5c9fde3