1. 14 11月, 2015 1 次提交
    • A
      9p: xattr simplifications · e409de99
      Andreas Gruenbacher 提交于
      Now that the xattr handler is passed to the xattr handler operations, we
      can use the same get and set operations for the user, trusted, and security
      xattr namespaces.  In those namespaces, we can access the full attribute
      name by "reattaching" the name prefix the vfs has skipped for us.  Add a
      xattr_full_name helper to make this obvious in the code.
      
      For the "system.posix_acl_access" and "system.posix_acl_default"
      attributes, handler->prefix is the full attribute name; the suffix is the
      empty string.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Cc: v9fs-developer@lists.sourceforge.net
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e409de99
  2. 12 11月, 2015 1 次提交
  3. 11 11月, 2015 5 次提交
  4. 10 11月, 2015 12 次提交
  5. 09 11月, 2015 1 次提交
  6. 08 11月, 2015 3 次提交
  7. 07 11月, 2015 17 次提交
    • A
      include/linux/zutil.h: fix usage example of zlib_adler32() · cb7ae262
      Anish Bhatt 提交于
      alder32 was renamed to zlib_adler32 since before 2.6.11.
      Signed-off-by: NAnish Bhatt <anish@chelsio.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb7ae262
    • R
      dma-mapping: tidy up dma_parms default handling · 002edb6f
      Robin Murphy 提交于
      Many DMA controllers and other devices set max_segment_size to
      indicate their scatter-gather capability, but have no interest in
      segment_boundary_mask. However, the existence of a dma_parms structure
      precludes the use of any default value, leaving them as zeros (assuming
      a properly kzalloc'ed structure). If a well-behaved IOMMU (or SWIOTLB)
      then tries to respect this by ensuring a mapped segment does not cross
      a zero-byte boundary, hilarity ensues.
      
      Since zero is a nonsensical value for either parameter, treat it as an
      indicator for "default", as might be expected. In the process, clean up
      a bit by replacing the bare constants with slightly more meaningful
      macros and removing the superfluous "else" statements.
      
      [akpm@linux-foundation.org: dma-mapping.h needs sizes.h for SZ_64K]
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Reviewed-by: NSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Sakari Ailus <sakari.ailus@iki.fi>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      002edb6f
    • O
      signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() · 9a13049e
      Oleg Nesterov 提交于
      jffs2_garbage_collect_thread() can race with SIGCONT and sleep in
      TASK_STOPPED state after it was already sent. Add the new helper,
      kernel_signal_stop(), which does this correctly.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a13049e
    • O
      signal: turn dequeue_signal_lock() into kernel_dequeue_signal() · be0e6f29
      Oleg Nesterov 提交于
      1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This
         matches another "for kthreads only" kernel_sigaction() helper.
      
      2. Remove the "tsk" and "mask" arguments, they are always current
         and current->blocked. And it is simply wrong if tsk != current.
      
      3. We could also remove the 3rd "siginfo_t *info" arg but it looks
         potentially useful. However we can simplify the callers if we
         change kernel_dequeue_signal() to accept info => NULL.
      
      4. Remove _irqsave, it is never called from atomic context.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be0e6f29
    • O
      signals: kill block_all_signals() and unblock_all_signals() · 2e01fabe
      Oleg Nesterov 提交于
      It is hardly possible to enumerate all problems with block_all_signals()
      and unblock_all_signals().  Just for example,
      
      1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is
         multithreaded. Another thread can dequeue the signal and force the
         group stop.
      
      2. Even is the caller is single-threaded, it will "stop" anyway. It
         will not sleep, but it will spin in kernel space until SIGCONT or
         SIGKILL.
      
      And a lot more. In short, this interface doesn't work at all, at least
      the last 10+ years.
      
      Daniel said:
      
        Yeah the only times I played around with the DRM_LOCK stuff was when
        old drivers accidentally deadlocked - my impression is that the entire
        DRM_LOCK thing was never really tested properly ;-) Hence I'm all for
        purging where this leaks out of the drm subsystem.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: NDave Airlie <airlied@redhat.com>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e01fabe
    • H
      nilfs2: add tracepoints for analyzing reading and writing metadata files · a9cd207c
      Hitoshi Mitake 提交于
      This patch adds tracepoints for analyzing requests of reading and writing
      metadata files.  The tracepoints cover every in-place mdt files (cpfile,
      sufile, and datfile).
      
      Example of tracing mdt_insert_new_block():
                    cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155
                    cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5
                    cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253
      Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: TK Kato <TK.Kato@wdc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9cd207c
    • H
      nilfs2: add tracepoints for analyzing sufile manipulation · 83eec5e6
      Hitoshi Mitake 提交于
      This patch adds tracepoints which would be useful for analyzing segment
      usage from a perspective of high level sufile manipulation (check, alloc,
      free).  sufile is an important in-place updated metadata file, so
      analyzing the behavior would be useful for performance turning.
      
      example of usage (a case of allocation):
      
      $ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
      Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
              segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
              segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3
      Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benixon Dhas <benixon.dhas@wdc.com>
      Cc: TK Kato <TK.Kato@wdc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83eec5e6
    • H
      nilfs2: add a tracepoint for transaction events · 44fda114
      Hitoshi Mitake 提交于
      This patch adds a tracepoint for transaction events of nilfs.  With the
      tracepoint, these events can be tracked: begin, abort, commit, trylock,
      lock, and unlock.  Basically, these events have corresponding functions
      e.g.  begin event corresponds nilfs_transaction_begin().  The unlock event
      is an exception.  It corresponds to the iteration in
      nilfs_transaction_lock().
      
      Only one tracepoint is introcued: nilfs2_transaction_transition.  The
      above events are distinguished with newly introduced enum.  With this
      tracepoint, we can analyse a critical section of segment constructoin.
      
      Sample output by tpoint of perf-tools:
                    cp-4457  [000] ...1    63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
              segctord-4371  [001] ...1    68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
              segctord-4371  [001] ...1    68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
              segctord-4371  [001] ...1    68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
              segctord-4371  [001] ...1    68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
              segctord-4371  [001] ...1    68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
              segctord-4371  [001] ...1   132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
      
      This patch also does trivial cleaning of comma usage in collection stage
      transition event for consistent coding style.
      Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44fda114
    • H
      nilfs2: add a tracepoint for tracking stage transition of segment construction · 58497703
      Hitoshi Mitake 提交于
      This patch adds a tracepoint for tracking stage transition of block
      collection in segment construction.  With the tracepoint, we can analysis
      the behavior of segment construction in depth.  It would be useful for
      bottleneck detection and debugging, etc.
      
      The tracepoint is created with the standard trace API of linux (like ext3,
      ext4, f2fs and btrfs).  So we can analysis with existing tools easily.  Of
      course, more detailed analysis will be possible if we can create nilfs
      specific analysis tools.
      
      Below is an example of event dump with Brendan Gregg's perf-tools
      (https://github.com/brendangregg/perf-tools).  Time consumption between
      each stage can be obtained.
      
      $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
      Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
              segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
              segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
              segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
              segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
              segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
              segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
              segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
              segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
              segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE
      
      For capturing transition correctly, this patch adds wrappers for the
      member scnt of nilfs_cstage.  With this change, every transition of the
      stage can produce trace event in a correct manner.
      Signed-off-by: NHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58497703
    • C
      rbtree: clarify documentation of rbtree_postorder_for_each_entry_safe() · 8de1ee7e
      Cody P Schafer 提交于
      I noticed that commit a20135ff ("writeback: don't drain
      bdi_writeback_congested on bdi destruction") added a usage of
      rbtree_postorder_for_each_entry_safe() in mm/backing-dev.c which appears
      to try to rb_erase() elements from an rbtree while iterating over it using
      rbtree_postorder_for_each_entry_safe().
      
      Doing this will cause random nodes to be missed by the iteration because
      rb_erase() may rebalance the tree, changing the ordering that we're trying
      to iterate over.
      
      The previous documentation for rbtree_postorder_for_each_entry_safe()
      wasn't clear that this wasn't allowed, it was taken from the docs for
      list_for_each_entry_safe(), where erasing isn't a problem due to
      list_del() not reordering.
      
      Explicitly warn developers about this potential pit-fall.
      
      Note that I haven't fixed the actual issue that (it appears) the commit
      referenced above introduced (not familiar enough with that code).
      
      In general (and in this case), the patterns to follow are:
       - switch to rb_first() + rb_erase(), don't use
         rbtree_postorder_for_each_entry_safe().
       - keep the postorder iteration and don't rb_erase() at all. Instead
         just clear the fields of rb_node & cgwb_congested_tree as required by
         other users of those structures.
      
      [akpm@linux-foundation.org: tweak comments]
      Signed-off-by: NCody P Schafer <dev@codyps.com>
      Cc: John de la Garza <john@jjdev.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8de1ee7e
    • R
      lib/kasprintf.c: introduce kvasprintf_const · 0a9df786
      Rasmus Villemoes 提交于
      This adds kvasprintf_const which tries to use kstrdup_const if possible:
      If the format string contains no % characters, or if the format string is
      exactly "%s", we delegate to kstrdup_const.  Otherwise, we fall back to
      kvasprintf.
      
      Just as for kstrdup_const, the main motivation is to save memory by
      reusing .rodata when possible.
      
      The return value should be freed by kfree_const, just like for
      kstrdup_const.
      
      There is deliberately no kasprintf_const: In the vast majority of cases,
      the format string argument is a literal, so one can determine statically
      whether one could instead use kstrdup_const directly (which would also
      require one to change all corresponding kfree calls to kfree_const).
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a9df786
    • M
      bitops.h: add sign_extend64() · 48e203e2
      Martin Kepplinger 提交于
      Months back, this was discussed, see https://lkml.org/lkml/2015/1/18/289
      The result was the 64-bit version being "likely fine", "valuable" and
      "correct".  The discussion fell asleep but since there are possible users,
      let's add it.
      Signed-off-by: NMartin Kepplinger <martin.kepplinger@theobroma-systems.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: George Spelvin <linux@horizon.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Maxime Coquelin <maxime.coquelin@st.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Yury Norov <yury.norov@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48e203e2
    • M
      bitops.h: improve sign_extend32()'s documentation · e2eb53aa
      Martin Kepplinger 提交于
      It is often overlooked that sign_extend32(), despite its name, is safe to
      use for 16 and 8 bit types as well.  This should help prevent sign
      extension being done manually some other way.
      Signed-off-by: NMartin Kepplinger <martin.kepplinger@theobroma-systems.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: George Spelvin <linux@horizon.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Maxime Coquelin <maxime.coquelin@st.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Yury Norov <yury.norov@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2eb53aa
    • A
      include/linux/compiler-gcc.h: improve __visible documentation · 9add850c
      Andrew Morton 提交于
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9add850c
    • K
      mm: use 'unsigned int' for compound_dtor/compound_order on 64BIT · 1965c8b7
      Kirill A. Shutemov 提交于
      On 64 bit system we have enough space in struct page to encode
      compound_dtor and compound_order with unsigned int.
      
      On x86-64 it leads to slightly smaller code size due usesage of plain
      MOV instead of MOVZX (zero-extended move) or similar effect.
      
      allyesconfig:
      
         text	   data	    bss	    dec	    hex	filename
      159520446	48146736	72196096	279863278	10ae5fee	vmlinux.pre
      159520382	48146736	72196096	279863214	10ae5fae	vmlinux.post
      
      On other architectures without native support of 16-bit data types the
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1965c8b7
    • K
      mm: use 'unsigned int' for page order · d00181b9
      Kirill A. Shutemov 提交于
      Let's try to be consistent about data type of page order.
      
      [sfr@canb.auug.org.au: fix build (type of pageblock_order)]
      [hughd@google.com: some configs end up with MAX_ORDER and pageblock_order having different types]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d00181b9
    • K
      mm: make compound_head() robust · 1d798ca3
      Kirill A. Shutemov 提交于
      Hugh has pointed that compound_head() call can be unsafe in some
      context. There's one example:
      
      	CPU0					CPU1
      
      isolate_migratepages_block()
        page_count()
          compound_head()
            !!PageTail() == true
      					put_page()
      					  tail->first_page = NULL
            head = tail->first_page
      					alloc_pages(__GFP_COMP)
      					   prep_compound_page()
      					     tail->first_page = head
      					     __SetPageTail(p);
            !!PageTail() == true
          <head == NULL dereferencing>
      
      The race is pure theoretical. I don't it's possible to trigger it in
      practice. But who knows.
      
      We can fix the race by changing how encode PageTail() and compound_head()
      within struct page to be able to update them in one shot.
      
      The patch introduces page->compound_head into third double word block in
      front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
      the rest bits are pointer to head page if bit zero is set.
      
      The patch moves page->pmd_huge_pte out of word, just in case if an
      architecture defines pgtable_t into something what can have the bit 0
      set.
      
      hugetlb_cgroup uses page->lru.next in the second tail page to store
      pointer struct hugetlb_cgroup. The patch switch it to use page->private
      in the second tail page instead. The space is free since ->first_page is
      removed from the union.
      
      The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
      limitation, since there's now space in first tail page to store struct
      hugetlb_cgroup pointer. But that's out of scope of the patch.
      
      That means page->compound_head shares storage space with:
      
       - page->lru.next;
       - page->next;
       - page->rcu_head.next;
      
      That's too long list to be absolutely sure, but looks like nobody uses
      bit 0 of the word.
      
      page->rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
      call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
      call_rcu_lazy() is not allowed as it makes use of the bit and we can
      get false positive PageTail().
      
      [1] http://lkml.kernel.org/g/20150827163634.GD4029@linux.vnet.ibm.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d798ca3