1. 22 1月, 2014 40 次提交
    • A
      mm/hugetlb.c: defer PageHeadHuge() symbol export · 9b7ac260
      Andrea Arcangeli 提交于
      No actual need of it. So keep it internal.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b7ac260
    • A
      mm/swap.c: reorganize put_compound_page() · 26296ad2
      Andrew Morton 提交于
      Tweak it so save a tab stop, make code layout slightly less nutty.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26296ad2
    • A
      mm/hugetlb.c: simplify PageHeadHuge() and PageHuge() · 758f66a2
      Andrew Morton 提交于
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      758f66a2
    • A
      mm: hugetlbfs: use __compound_tail_refcounted in __get_page_tail too · 3bfcd13e
      Andrea Arcangeli 提交于
      Also remove hugetlb.h which isn't needed anymore as PageHeadHuge is
      handled in mm.h.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3bfcd13e
    • A
      mm: tail page refcounting optimization for slab and hugetlbfs · 44518d2b
      Andrea Arcangeli 提交于
      This skips the _mapcount mangling for slab and hugetlbfs pages.
      
      The main trouble in doing this is to guarantee that PageSlab and
      PageHeadHuge remains constant for all get_page/put_page run on the tail
      of slab or hugetlbfs compound pages.  Otherwise if they're set during
      get_page but not set during put_page, the _mapcount of the tail page
      would underflow.
      
      PageHeadHuge will remain true until the compound page is released and
      enters the buddy allocator so it won't risk to change even if the tail
      page is the last reference left on the page.
      
      PG_slab instead is cleared before the slab frees the head page with
      put_page, so if the tail pin is released after the slab freed the page,
      we would have a problem.  But in the slab case the tail pin cannot be
      the last reference left on the page.  This is because the slab code is
      free to reuse the compound page after a kfree/kmem_cache_free without
      having to check if there's any tail pin left.  In turn all tail pins
      must be always released while the head is still pinned by the slab code
      and so we know PG_slab will be still set too.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44518d2b
    • A
      mm: thp: optimize compound_trans_huge · ca641514
      Andrea Arcangeli 提交于
      Currently we don't clobber page_tail->first_page during split_huge_page,
      so compound_trans_head can be set to compound_head without adverse
      effects, and this mostly optimizes away a smp_rmb.
      
      It looks worthwhile to keep around the implementation that doesn't relay
      on page_tail->first_page not to be clobbered, because it would be
      necessary if we'll decide to enforce page->private to zero at all times
      whenever PG_private is not set, also for anonymous pages.  For anonymous
      pages enforcing such an invariant doesn't matter as anonymous pages
      don't use page->private so we can get away with this microoptimization.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca641514
    • A
      mm: hugetlbfs: move the put/get_page slab and hugetlbfs optimization in a faster path · ebf360f9
      Andrea Arcangeli 提交于
      We don't actually need a reference on the head page in the slab and
      hugetlbfs paths, as long as we add a smp_rmb() which should be faster
      than get_page_unless_zero.
      
      [akpm@linux-foundation.org: fix typo in comment]
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebf360f9
    • A
      mm: hugetlb: use get_page_foll() in follow_hugetlb_page() · a0368d4e
      Andrea Arcangeli 提交于
      get_page_foll() is more optimal and is always safe to use under the PT
      lock.  More so for hugetlbfs as there's no risk of race conditions with
      split_huge_page regardless of the PT lock.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Tested-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0368d4e
    • D
      mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages · 0e147aed
      Dave Hansen 提交于
      Dave Jiang reported that he was seeing oopses when running NUMA systems
      and default_hugepagesz=1G.  I traced the issue down to
      migrate_page_copy() trying to use the same code for hugetlb pages and
      transparent hugepages.  It should not have been trying to pass thp pages
      in there.
      
      So, add some VM_BUG_ON()s for the next hapless VM developer that tries
      the same thing.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Tested-by: NDave Jiang <dave.jiang@intel.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e147aed
    • G
      mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL · f92f455f
      Geert Uytterhoeven 提交于
      {,set}page_address() are macros if WANT_PAGE_VIRTUAL.  If
      !WANT_PAGE_VIRTUAL, they're plain C functions.
      
      If someone calls them with a void *, this pointer is auto-converted to
      struct page * if !WANT_PAGE_VIRTUAL, but causes a build failure on
      architectures using WANT_PAGE_VIRTUAL (arc, m68k and sparc64):
      
        drivers/md/bcache/bset.c: In function `__btree_sort':
        drivers/md/bcache/bset.c:1190: warning: dereferencing `void *' pointer
        drivers/md/bcache/bset.c:1190: error: request for member `virtual' in something not a structure or union
      
      Convert them to static inline functions to fix this.  There are already
      plenty of users of struct page members inside <linux/mm.h>, so there's
      no reason to keep them as macros.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f92f455f
    • P
      fs/ramfs: don't use module_init for non-modular core code · af52b040
      Paul Gortmaker 提交于
      The ramfs is always built in.  It will never be modular, so using
      module_init as an alias for __initcall is rather misleading.
      
      Fix this up now, so that we can relocate module_init from init.h into
      module.h in the future.  If we don't do this, we'd have to add module.h
      to obviously non-modular code, and that would be a worse thing.
      
      Note that direct use of __initcall is discouraged, vs.  one of the
      priority categorized subgroups.  As __initcall gets mapped onto
      device_initcall, our use of fs_initcall (which makes sense for fs code)
      will thus change this registration from level 6-device to level 5-fs
      (i.e. slightly earlier).  However no observable impact of that small
      difference has been observed during testing, or is expected.
      
      Also note that this change uncovers a missing semicolon bug in the
      registration of the initcall.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af52b040
    • V
      fs/super.c: fix WARN on alloc_super() fail path · b5bd856a
      Vladimir Davydov 提交于
      On fail path alloc_super() calls destroy_super(), which issues a warning
      if the sb's s_mounts list is not empty, in particular if it has not been
      initialized.  That said s_mounts must be initialized in alloc_super()
      before any possible failure, but currently it is initialized close to
      the end of the function leading to a useless warning dumped to log if
      either percpu_counter_init() or list_lru_init() fails.  Let's fix this.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5bd856a
    • C
      fs/read_write.c:compat_readv(): remove bogus area verify · 4e4f9e66
      Corey Minyard 提交于
      The compat_do_readv_writev() function was doing a verify_area on the
      incoming iov, but the nr_segs value is not checked.  If someone passes
      in a -1 for nr_segs, for instance, the function should return an EINVAL.
      However, it returns a EFAULT because the verify_area fails because it is
      checking an array of size MAX_UINT.  The check is bogus, anyway, because
      the next check, compat_rw_copy_check_uvector(), will do all the
      necessary checking, anyway.  The non-compat do_readv_writev() function
      doesn't do this check, so I think it's safe to just remove the code.
      Signed-off-by: NCorey Minyard <cminyard@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e4f9e66
    • D
      fs/compat_ioctl.c: fix an underflow issue (harmless) · 38316c8a
      Dan Carpenter 提交于
      We cap "nmsgs" at I2C_RDRW_IOCTL_MAX_MSGS (42) but the current code
      allows negative values.  It's harmless but it makes my static checker
      upset so I've made nsmgs unsigned.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38316c8a
    • A
      posix_acl: uninlining · 0afaa120
      Andrew Morton 提交于
      Uninline vast tracts of nested inline functions in
      include/linux/posix_acl.h.
      
      This reduces the text+data+bss size of x86_64 allyesconfig vmlinux by
      8026 bytes.
      
      The patch also regularises the positioning of the EXPORT_SYMBOLs in
      posix_acl.c.
      
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Tested-by: NBenny Halevy <bhalevy@primarydata.com>
      Cc: Benny Halevy <bhalevy@panasas.com>
      Cc: Andreas Gruenbacher <agruen@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0afaa120
    • W
      arch/sh/kernel/kgdb.c: add missing #include <linux/sched.h> · 53a52f17
      Wanlong Gao 提交于
        arch/sh/kernel/kgdb.c: In function 'sleeping_thread_to_gdb_regs':
        arch/sh/kernel/kgdb.c:225:32: error: implicit declaration of function 'task_stack_page' [-Werror=implicit-function-declaration]
        arch/sh/kernel/kgdb.c:242:23: error: dereferencing pointer to incomplete type
        arch/sh/kernel/kgdb.c:243:22: error: dereferencing pointer to incomplete type
        arch/sh/kernel/kgdb.c: In function 'singlestep_trap_handler':
        arch/sh/kernel/kgdb.c:310:27: error: 'SIGTRAP' undeclared (first use in this function)
        arch/sh/kernel/kgdb.c:310:27: note: each undeclared identifier is reported only once for each function it appears in
      
      This was introduced by commit 16559ae4 ("kgdb: remove #include
      <linux/serial_8250.h> from kgdb.h").
      
      [geert@linux-m68k.org: reworded and reformatted]
      Signed-off-by: NWanlong Gao <gaowanlong@cn.fujitsu.com>
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@linux-m68k.org>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53a52f17
    • Y
      ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously · 75f82eaa
      Yiwen Jiang 提交于
      2 nodes cluster, say Node A and Node B, mount the same ocfs2 volume, and
      create a file 1.
      
      Node A			Node B
      open 1, get open lock
                              rm 1, and then add 1 to orphan_dir
      storage link down,
      o2hb_write_timeout
      ->o2quo_disk_timeout
      ->emergency_restart
                              at the moment, Node B dismount and do
      			ocfs2rec simultaneously
                              1) ocfs2_dismount_volume
      			->ocfs2_recovery_exit
      			->wait_event(osb->recovery_event)
      			->flush_workqueue(ocfs2_wq)
      			2) ocfs2rec
      			->queue_work(&journal->j_recovery_work)
                              ->ocfs2_recover_orphans
      			->ocfs2_commit_truncate
                              ->queue_delayed_work(&osb->osb_truncate_log_wq)
      
      In ocfs2_recovery_exit, it flushes workqueue and then releases system
      inodes.  When doing ocfs2rec, it will call ocfs2_flush_truncate_log
      which will try to get sys_root_inode, and NULL pointer dereference
      occurs.
      Signed-off-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Signed-off-by: Njoyce <xuejiufei@huawei.com>
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      75f82eaa
    • T
      ocfs2: punch hole should return EINVAL if the length argument in ioctl is negative · a2a3b398
      Tariq Saeed 提交于
      An unreserve space ioctl OCFS2_IOC_UNRESVSP/64 should reject a negative
      length.
      
      Orabug:14789508
      Signed-off-by: NTariq Saseed <tariq.x.saeed@oracle.com>
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2a3b398
    • W
      ocfs2: fix sparse non static symbol warning · 16eac4be
      Wei Yongjun 提交于
      Fixes the following sparse warning:
      
        fs/ocfs2/stack_user.c:930:32: warning:
         symbol 'ocfs2_ls_ops' was not declared. Should it be static?
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16eac4be
    • J
      ocfs2: adjust minlen with discard_granularity in the FITRIM ioctl · 1ba2212b
      Jie Liu 提交于
      Adjust minlen with discard_granularity for FITRIM ioctl(2) if the given
      minimum size in bytes is less than it because, discard granularity is
      used to tell us that the minimum size of extent that can be discarded by
      the storage device.
      
      This is inspired by ext4 commit 5c2ed62f ("ext4: Adjust minlen with
      discard_granularity in the FITRIM ioctl") from Lukas Czerner.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ba2212b
    • J
      ocfs2: return EINVAL if the given range to discard is less than block size · aa89762c
      Jie Liu 提交于
      For FITRIM ioctl(2), we should not keep silence if the given range
      length ls less than a block size as there is no data blocks would be
      discareded.  Hence it should return EINVAL instead.  This issue can be
      verified via xfstests/generic/288 which is used for FITRIM argument
      handling tests.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa89762c
    • J
      ocfs2: return EOPNOTSUPP if the device does not support discard · 19e8ac27
      Jie Liu 提交于
      For FITRIM ioctl(2), we should return EOPNOTSUPP to inform the user that
      the storage device does not support discard if it is, otherwise return
      success would confuse the user even though there is no free blocks were
      trimmed at all.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19e8ac27
    • Y
      ocfs2: remove redundant ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits() · 0a2fcd89
      Younger Liu 提交于
      ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits() are
      already provided in suballoc.c.  So, the same functions in
      move_extents.c are not needed any more.
      
      Declare the functions in suballoc.h and remove redundant functions in
      move_extents.c.
      Signed-off-by: NYounger Liu <liuyiyang@hisense.com>
      Cc: Younger Liu <younger.liucn@gmail.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a2fcd89
    • G
      ocfs2: use the new DLM operation callbacks while requesting new lockspace · c994c2eb
      Goldwyn Rodrigues 提交于
      Attempt to use the new DLM operations.  If it is not supported, use the
      traditional ocfs2_controld.
      
      To exchange ocfs2 versioning, we use the LVB of the version dlm lock.
      It first attempts to take the lock in EX mode (non-blocking).  If
      successful (which means it is the first mount), it writes the version
      number and downconverts to PR lock.  If it is unsuccessful, it reads the
      version from the lock.
      
      If this becomes the standard (with o2cb as well), it could simplify
      userspace tools to check if the filesystem is mounted on other nodes.
      
      Dan: Since ocfs2_protocol_version are two u8 values, the additional
      checks with LONG* don't make sense.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c994c2eb
    • G
      ocfs2: framework for version LVB · 41503630
      Goldwyn Rodrigues 提交于
      Use the native DLM locks for version control negotiation.  Most of the
      framework is taken from gfs2/lock_dlm.c
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41503630
    • G
      ocfs2: pass ocfs2_cluster_connection to ocfs2_this_node · 3e834151
      Goldwyn Rodrigues 提交于
      This is done to differentiate between using and not using controld and
      use the connection information accordingly.
      
      We need to be backward compatible.  So, we use a new enum
      ocfs2_connection_type to identify when controld is used and when it is
      not.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e834151
    • G
      ocfs2: shift allocation ocfs2_live_connection to user_connect() · 24aa3386
      Goldwyn Rodrigues 提交于
      We perform this because the DLM recovery callbacks will require the
      ocfs2_live_connection structure to record the node information when
      dlm_new_lockspace() is updated (in the last patch of the series).
      
      Before calling dlm_new_lockspace(), we need the structure ready for the
      .recover_done() callback, which would set oc_this_node.  This is the
      reason we allocate ocfs2_live_connection beforehand in user_connect().
      
      [AKPM] rc initialization is not required because it assigned in case of
      errors.  It will be cleared by compiler anyways.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reveiwed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24aa3386
    • G
      ocfs2: add DLM recovery callbacks · 66e188fc
      Goldwyn Rodrigues 提交于
      These are the callbacks called by the fs/dlm code in case the membership
      changes.  If there is a failure while/during calling any of these, the
      DLM creates a new membership and relays to the rest of the nodes.
      
       - recover_prep() is called when DLM understands a node is down.
       - recover_slot() is called once all nodes have acknowledged
         recover_prep and recovery can begin.
       - recover_done() is called once the recovery is complete.  It returns
         the new membership.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66e188fc
    • G
      ocfs2: add clustername to cluster connection · c74a3bdd
      Goldwyn Rodrigues 提交于
      This is an effort of removing ocfs2_controld.pcmk and getting ocfs2 DLM
      handling up to the times with respect to DLM (>=4.0.1) and corosync
      (2.3.x).  AFAIK, cman also is being phased out for a unified corosync
      cluster stack.
      
      fs/dlm performs all the functions with respect to fencing and node
      management and provides the API's to do so for ocfs2.  For all future
      references, DLM stands for fs/dlm code.
      
      The advantages are:
       + No need to run an additional userspace daemon (ocfs2_controld)
       + No controld device handling and controld protocol
       + Shifting responsibilities of node management to DLM layer
      
      For backward compatibility, we are keeping the controld handling code.
      Once enough time has passed we can remove a significant portion of the
      code.  This was tested by using the kernel with changes on older
      unmodified tools.  The kernel used ocfs2_controld as expected, and
      displayed the appropriate warning message.
      
      This feature requires modification in the userspace ocfs2-tools.  The
      changes can be found at: https://github.com/goldwynr/ocfs2-tools branch:
      nocontrold Currently, not many checks are present in the userspace code,
      but that would change soon.
      
      This patch (of 6):
      
      Add clustername to cluster connection.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c74a3bdd
    • G
      ocfs2: remove versioning information · ff8fb335
      Goldwyn Rodrigues 提交于
      The versioning information is confusing for end-users.  The numbers are
      stuck at 1.5.0 when the tools version have moved to 1.8.2.  Remove the
      versioning system in the OCFS2 modules and let the kernel version be the
      guide to debug issues.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Acked-by: NSunil Mushran <sunil.mushran@gmail.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Acked-by: NJoel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff8fb335
    • G
      score: remove "select HAVE_GENERIC_HARDIRQS" again · 227d0066
      Geert Uytterhoeven 提交于
      Commit 5fbbf8a1 ("Score: The commit is for compiling successfully.")
      re-introduced "select HAVE_GENERIC_HARDIRQS" in v3.12-rc4, which had
      just been removed in v3.12-rc1 by 0244ad00 ("Remove GENERIC_HARDIRQ
      config option").
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      227d0066
    • A
      intel-iommu: fix off-by-one in pagetable freeing · 08336fd2
      Alex Williamson 提交于
      dma_pte_free_level() has an off-by-one error when checking whether a pte
      is completely covered by a range.  Take for example the case of
      attempting to free pfn 0x0 - 0x1ff, ie.  512 entries covering the first
      2M superpage.
      
      The level_size() is 0x200 and we test:
      
        static void dma_pte_free_level(...
      	...
      
      	if (!(0 > 0 || 0x1ff < 0 + 0x200)) {
      		...
      	}
      
      Clearly the 2nd test is true, which means we fail to take the branch to
      clear and free the pagetable entry.  As a result, we're leaking
      pagetables and failing to install new pages over the range.
      
      This was found with a PCI device assigned to a QEMU guest using vfio-pci
      without a VGA device present.  The first 1M of guest address space is
      mapped with various combinations of 4K pages, but eventually the range
      is entirely freed and replaced with a 2M contiguous mapping.
      intel-iommu errors out with something like:
      
        ERROR: DMA PTE for vPFN 0x0 already set (to 5c2b8003 not 849c00083)
      
      In this case 5c2b8003 is the pointer to the previous leaf page that was
      neither freed nor cleared and 849c00083 is the superpage entry that
      we're trying to replace it with.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08336fd2
    • J
      fsnotify: remove pointless NULL initializers · 56b27cf6
      Jan Kara 提交于
      We usually rely on the fact that struct members not specified in the
      initializer are set to NULL.  So do that with fsnotify function pointers
      as well.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56b27cf6
    • J
      fsnotify: remove .should_send_event callback · 83c4c4b0
      Jan Kara 提交于
      After removing event structure creation from the generic layer there is
      no reason for separate .should_send_event and .handle_event callbacks.
      So just remove the first one.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c4c4b0
    • J
      fsnotify: do not share events between notification groups · 7053aee2
      Jan Kara 提交于
      Currently fsnotify framework creates one event structure for each
      notification event and links this event into all interested notification
      groups.  This is done so that we save memory when several notification
      groups are interested in the event.  However the need for event
      structure shared between inotify & fanotify bloats the event structure
      so the result is often higher memory consumption.
      
      Another problem is that fsnotify framework keeps path references with
      outstanding events so that fanotify can return open file descriptors
      with its events.  This has the undesirable effect that filesystem cannot
      be unmounted while there are outstanding events - a regression for
      inotify compared to a situation before it was converted to fsnotify
      framework.  For fanotify this problem is hard to avoid and users of
      fanotify should kind of expect this behavior when they ask for file
      descriptors from notified files.
      
      This patch changes fsnotify and its users to create separate event
      structure for each group.  This allows for much simpler code (~400 lines
      removed by this patch) and also smaller event structures.  For example
      on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
      additional space for file name, additional 24 bytes for second and each
      subsequent group linking the event, and additional 32 bytes for each
      inotify group for private data.  After the conversion inotify event
      consumes 48 bytes plus space for file name which is considerably less
      memory unless file names are long and there are several groups
      interested in the events (both of which are uncommon).  Fanotify event
      fits in 56 bytes after the conversion (fanotify doesn't care about file
      names so its events don't have to have it allocated).  A win unless
      there are four or more fanotify groups interested in the event.
      
      The conversion also solves the problem with unmount when only inotify is
      used as we don't have to grab path references for inotify events.
      
      [hughd@google.com: fanotify: fix corruption preventing startup]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7053aee2
    • J
      inotify: provide function for name length rounding · e9fe6904
      Jan Kara 提交于
      Rounding of name length when passing it to userspace was done in several
      places.  Provide a function to do it and use it in all places.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9fe6904
    • D
      dma-debug: introduce debug_dma_assert_idle() · 0abdd7a8
      Dan Williams 提交于
      Record actively mapped pages and provide an api for asserting a given
      page is dma inactive before execution proceeds.  Placing
      debug_dma_assert_idle() in cow_user_page() flagged the violation of the
      dma-api in the NET_DMA implementation (see commit 77873803 "net_dma:
      mark broken").
      
      The implementation includes the capability to count, in a limited way,
      repeat mappings of the same page that occur without an intervening
      unmap.  This 'overlap' counter is limited to the few bits of tag space
      in a radix tree.  This mechanism is added to mitigate false negative
      cases where, for example, a page is dma mapped twice and
      debug_dma_assert_idle() is called after the page is un-mapped once.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0abdd7a8
    • L
      Merge tag 'for-v3.14' of git://git.infradead.org/battery-2.6 · 03d11a0e
      Linus Torvalds 提交于
      Pull battery updates from Dmitry Eremin-Solenikov:
       "I'm picking up power supply maintainership from Anton Vorontov.  Could
        you please pull battery-2.6 git tree changes prepared for the v3.14
        release.
      
        Highlights:
      
         - Power supply notifier
      
         - Several drivers gained DT support
      
         - Added Maxim 14577 driver
      
         - Change of maintainer"
      
      * tag 'for-v3.14' of git://git.infradead.org/battery-2.6:
        MAINTAINERS: Pick up power supply maintainership
        max17042_battery: Add IRQF_ONESHOT flag to use default irq handler
        gpio-charger: Support wakeup events
        power_supply: Add charger support for Maxim 14577
        dt: Binding documentation for isp1704 charger
        isp1704_charger: Add DT support
        charger-manager: of_cm_parse_desc() should be static
        bq2415x_charger: Add DT support
        power_supply: Add power_supply_get_by_phandle
        bq2415x_charger: Use power_supply notifier for automode
        power: reset: Add as3722 power-off driver
        mfd: AS3722: Add dt node properties for system power controller
        charger-manager: Support deivce tree in charger manager driver
        charger-manager: Modify the way of checking battery's temperature
        power_supply: Add power_supply notifier
      03d11a0e
    • L
      Merge tag 'mfd-3.14-1' of git://git.linaro.org/people/ljones/mfd · ac266635
      Linus Torvalds 提交于
      Pull MFD changes from Lee Jones:
       "New drivers
         - Samsung Maxim 14577; Micro USB, Regulator, IRQ Controller and
           Battery Charger
         - TI/National Semiconductor LP3943 I2C GPIO Expander and PWM
           Generator
      
        Existing driver adaptions
         - Expansion of Wolfson Arizona DSP and High-Pass filter controls
         - TI TWL6040 default Regmap support and Regcache addition/bypass
         - Some nice Smatch catch fixes
         - Conversion of TI OMAP-USB and TI TWL6030 to endian neutralness
         - ChromeOS EC timing (delay) adaptions and added dependency on OF
         - Many constifications of 'struct {mfd_cell,regmap_irq,et.al}'
         - Watchdog support added for NVIDIA AS3722
         - Convert functions to static in TI AM335x
         - Realigned previously defeated functionality in TI AM335x
         - IIO ADC-TSC concurrency dead-lock/timeout resolution
         - Addition of Power Management and Clock support for Samsung core
         - DEFINE_PCI_DEVICE_TABLE macro removal from MFD Subsystem
         - Greater use of irqdomain functionality in ST-E AB8500
         - Removal of 'include/linux/mfd/abx500/ab8500-gpio.h'
         - Wolfson WM831x PMIC Power Management changes s/poweroff/shutdown/
         - Device Tree documentation added for TI/Nat Semi LP3943
         - Version detection and voltage tables for TI TPS6586x PMIC devices
         - Simplification of Freescale MC13XXX (de-)initialisation routines
         - Clean-up and simplification of the Realtek parent driver
         - Added support for RTL8402 Realtek PCI-Express card reader
         - Resource leak fix for Maxim 77686
         - Possible suspend BUG() fix in OMAP USB TLL
         - Support for new Wolfson WM5110 Revision (D)
         - Testing of automatic assignment of of_node in mfd_add_device()
         - Reversion of the above when it started to cause issues
         - Remove legacy Platform Data from;
                    TI TWL Core, Qualcomm SSBI and ST-E ABx500 Pinctrl
         - Clean-ups; tabbing issues, function name changes, 'drvdata = NULL'
                    removal, unused uninitialised warning mitigation, error
                    message clarity, removal of redundant/duplicate checks,
                    licensing (GPL -> GPL2), coding consistency, duplicate
                    function declaration, ret checks, commit corrections,
                    redundant of_match_ptr() helper removal, spelling,
                    #if-deffery removal and header guards name changes"
      
      * tag 'mfd-3.14-1' of git://git.linaro.org/people/ljones/mfd: (78 commits)
        mfd: wm5110: Add register patch for rev D chip
        mfd: omap-usb-tll: Don't hold lock during pm_runtime_get/put_sync()
        gpio: lp3943: Remove redundant of_match_ptr helper
        mfd: sta2x11-mfd: Use named constants for pci_power_t values
        Documentation: mfd: Fix LDO index in s2mps11.txt
        mfd: Cleanup mfd-mcp-sa11x0.h header
        mfd: max8997: Use "IS_ENABLED(CONFIG_OF)" for DT code.
        mfd: twl6030: Fix endianness problem in IRQ handler
        mfd: sec-core: Add cells for S5M8767-clocks
        mfd: max14577: Remove redundant of_match_ptr helper
        mfd: twl6040: Fix sparse non static symbol warning
        mfd: Revert "mfd: Always assign of_node in mfd_add_device()"
        mfd: rtsx: Fix sparse non static symbol warning
        mfd: max77693: Set proper maximum register for MUIC regmap
        mfd: max77686: Fix regmap resource leak on driver remove
        mfd: Represent correct filenames in file headers
        mfd: rtsx: Add support for card reader rtl8402
        mfd: rtsx: Add set pull control macro and simplify rtl8411
        mfd: max8997: Enforce mfd_add_devices() return value check
        mfd: mc13xxx: Simplify probe() & remove()
        ...
      ac266635
    • L
      Merge tag 'sound-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · d4371f94
      Linus Torvalds 提交于
      Pull sound updates from Takashi Iwai:
       "It was holiday season, so no wonder that there are little changes in
        framework level, although diffstat shows quite many changes spreaded
        over sound/* directories.  Most of changes are cleanups, code
        refactoring and fixes.
      
        Some highlights:
         - Removal of OSS sleep_on usages by Arnd
         - Simplified memalloc helper codes, drop obsoleted features; now it's
           built into PCM driver instead of an individual module
         - Warn if PCM buffer preallocation fails, which will show page
           allocation issues more clearly
         - Compress offload API updates for sample rates by Vinod
         - PCM glitch workaround on ctxfi emu20k1 by Sarah
         - Drop cs46xx DSP blobs, using firmware loader now
         - USB-audio quitks for Plantronics Gamecom 780, Creative VF0420, and
           Focusrite Saffire 6
      
        HD-audio specifics:
         - Standardize Kconfigs of HD-audio codec drivers; now "make
           localmodconfig" recognizes configs properly (finally!)
         - Parallel PM implementation by Mengdong
         - BayleyBay/ValleyView2 board fixups
         - Broadwell audio support
         - Runtime PM improvement (PantherPoint, etc)
         - Quirks: Dell subwooer, Gigabyte mobo jack detection oddity, Dell
           AiO click noise fixes, Dell headset mic fixes, etc
         - Automatic bind with HDMI codec parser without generic parser
         - More AD codec fixes (since 3.12 regression) including the automatic
           stereo mix support
         - Common Thinkpad ACPI helper for Realtek and Conexant codecs
      
        ASoC specifics:
         - Update to the generic DMA code to support deferred probe and
           managed resources
         - New drivers for BCM2835 (used in Raspberry Pi), Tegra with MAX98090
           and Analog Devices AXI I2S and S/PDIF controller IPs
         - Device tree support for the simple card, max98090 and cs42l52
         - Conversion of the Samsung drivers to native dmaengine, making them
           multiplatform compatible and hopefully helping keep them more
           modern and up to date.
         - More regmap conversions, including a very welcome one for twl6040
           from Peter Ujfalusi
         - A big overhaul of the DaVinci drivers also from Peter Ujfalusi
         - Lots of DMA updates from Lars-Peter
         - Improvements to the constraints handling code from Lars-Peter
         - A very helpful conversion of the TWL4030 driver to regmap from Peter
         - A new driver for the Freescale ESAI controller from Nicolin Chen
         - Conversion of some of the drivers to use params_width()
         - Extensions to DPCM for use with compressed audio from Liam"
      
      * tag 'sound-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (396 commits)
        ASoC: dapm: Fix double prefix addition
        ASoC: compress: Add suport for DPCM into compressed audio
        ASoC: DPCM: make some DPCM API calls non static for compressed usage
        ASoC: core: Fix possible NULL pointer dereference of pcm->config
        ALSA: hda - add headset mic detect quirks for some Dell machines
        ASoC: tlv320aic32x4: Fix regmap range_min
        ASoC: core: Return -ENOTSUPP from set_sysclk() if no operation provided
        ASoC: dapm: Change prototype of soc_widget_read
        ASoC: samsung: Remove SND_DMAENGINE_PCM_FLAG_NO_RESIDUE flag
        ASoC: axi-{spdif,i2s}: Remove SND_DMAENGINE_PCM_FLAG_NO_RESIDUE flag
        ASoC: generic-dmaengine-pcm: Check DMA residue granularity
        ASoC: generic-dmaengine-pcm: Check NO_RESIDUE flag at runtime
        dma: pl330: Set residue_granularity
        dma: Indicate residue granularity in dma_slave_caps
        ASoC: simple-card: fix one bug to writing to the platform data
        ASoC: pcm: Use snd_pcm_rate_mask_intersect() helper
        ALSA: Add helper function for intersecting two rate masks
        ASoC: s6000: Don't mix SNDRV_PCM_RATE_CONTINUOUS with specific rates
        ASoC: fsl: Don't mix SNDRV_PCM_RATE_CONTINUOUS with specific rates
        ASoC: pcm: Properly initialize hw->rate_max
        ...
      d4371f94