1. 25 10月, 2017 9 次提交
    • B
      workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes · fd1a5b04
      Byungchul Park 提交于
      The workqueue code added manual lock acquisition annotations to catch
      deadlocks.
      
      After lockdepcrossrelease was introduced, some of those became redundant,
      since wait_for_completion() already does the acquisition and tracking.
      
      Remove the duplicate annotations.
      Signed-off-by: NByungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: amir73il@gmail.com
      Cc: axboe@kernel.dk
      Cc: darrick.wong@oracle.com
      Cc: david@fromorbit.com
      Cc: hch@infradead.org
      Cc: idryomov@gmail.com
      Cc: johan@kernel.org
      Cc: johannes.berg@intel.com
      Cc: kernel-team@lge.com
      Cc: linux-block@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-xfs@vger.kernel.org
      Cc: oleg@redhat.com
      Cc: tj@kernel.org
      Link: http://lkml.kernel.org/r/1508921765-15396-9-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fd1a5b04
    • B
      sched/completions: Add support for initializing completions with lockdep_map · a7967bc3
      Byungchul Park 提交于
      Sometimes we want to initialize completions with sparate lockdep maps
      to assign lock classes as desired. For example, the workqueue code
      needs to directly manage lockdep maps, since only the code is aware of
      how to classify lockdep maps properly.
      
      Provide additional macros initializing completions in that way.
      Signed-off-by: NByungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: amir73il@gmail.com
      Cc: axboe@kernel.dk
      Cc: darrick.wong@oracle.com
      Cc: david@fromorbit.com
      Cc: hch@infradead.org
      Cc: idryomov@gmail.com
      Cc: johan@kernel.org
      Cc: johannes.berg@intel.com
      Cc: kernel-team@lge.com
      Cc: linux-block@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-xfs@vger.kernel.org
      Cc: oleg@redhat.com
      Cc: tj@kernel.org
      Link: http://lkml.kernel.org/r/1508921765-15396-8-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a7967bc3
    • B
      locking/lockdep, sched/completions: Change the prefix of lock name for completion variables · 24208435
      Byungchul Park 提交于
      CONFIG_LOCKDEP_COMPLETIONS uses "(complete)" as a prefix of lock name
      for completion variable.
      
      However, what we should use here is a noun - so use "(completion)" instead.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NByungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: amir73il@gmail.com
      Cc: axboe@kernel.dk
      Cc: darrick.wong@oracle.com
      Cc: david@fromorbit.com
      Cc: hch@infradead.org
      Cc: idryomov@gmail.com
      Cc: johan@kernel.org
      Cc: johannes.berg@intel.com
      Cc: kernel-team@lge.com
      Cc: linux-block@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-xfs@vger.kernel.org
      Cc: oleg@redhat.com
      Cc: tj@kernel.org
      Link: http://lkml.kernel.org/r/1508921765-15396-4-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      24208435
    • B
      locking/lockdep: Provide empty lockdep_map structure for !CONFIG_LOCKDEP · 6f0397d7
      Byungchul Park 提交于
      After this patch the lockdep_map structure takes no space if lockdep is
      disabled, reducing the number of #ifdefs in unrelated kernel code.
      Signed-off-by: NByungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: amir73il@gmail.com
      Cc: axboe@kernel.dk
      Cc: darrick.wong@oracle.com
      Cc: david@fromorbit.com
      Cc: hch@infradead.org
      Cc: idryomov@gmail.com
      Cc: johan@kernel.org
      Cc: johannes.berg@intel.com
      Cc: kernel-team@lge.com
      Cc: linux-block@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-xfs@vger.kernel.org
      Cc: oleg@redhat.com
      Cc: tj@kernel.org
      Link: http://lkml.kernel.org/r/1508921765-15396-3-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f0397d7
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
    • M
      locking/atomics, net/average: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE() · ef4d9af6
      Mark Rutland 提交于
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't currently harmful.
      
      However, for some features it is necessary to instrument reads and
      writes separately, which is not possible with ACCESS_ONCE(). This
      distinction is critical to correct operation.
      
      It's possible to transform the bulk of kernel code using the Coccinelle
      script below. However, this doesn't pick up some uses, including those
      in <linux/average.h>. As a preparatory step, this patch converts the
      file to use {READ,WRITE}_ONCE() consistently.
      
      At the same time, this patch addds missing includes necessary for
      {READ,WRITE}_ONCE(), *BUG_ON*(), and ilog2().
      
      ----
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-9-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef4d9af6
    • M
      locking/atomics, net/netlink/netfilter: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE() · 14cd5d4a
      Mark Rutland 提交于
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't currently harmful.
      
      However, for some features it is necessary to instrument reads and
      writes separately, which is not possible with ACCESS_ONCE(). This
      distinction is critical to correct operation.
      
      It's possible to transform the bulk of kernel code using the Coccinelle
      script below. However, this doesn't handle comments, leaving references
      to ACCESS_ONCE() instances which have been removed. As a preparatory
      step, this patch converts netlink and netfilter code and comments to use
      {READ,WRITE}_ONCE() consistently.
      
      ----
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-7-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      14cd5d4a
    • M
      locking/atomics, fs/dcache: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE() · 66702eb5
      Mark Rutland 提交于
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't currently harmful.
      
      However, for some features it is necessary to instrument reads and
      writes separately, which is not possible with ACCESS_ONCE(). This
      distinction is critical to correct operation.
      
      It's possible to transform the bulk of kernel code using the Coccinelle
      script below. However, this doesn't handle comments, leaving references
      to ACCESS_ONCE() instances which have been removed. As a preparatory
      step, this patch converts the dcache code and comments to use
      {READ,WRITE}_ONCE() consistently.
      
      ----
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-4-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66702eb5
    • W
      locking/atomic: Add atomic_cond_read_acquire() · 4df714be
      Will Deacon 提交于
      smp_cond_load_acquire() provides a way to spin on a variable with acquire
      semantics until some conditional expression involving the variable is
      satisfied. Architectures such as arm64 can potentially enter a low-power
      state, waking up only when the value of the variable changes, which
      reduces the system impact of tight polling loops.
      
      This patch makes the same interface available to users of atomic_t,
      atomic64_t and atomic_long_t, rather than require messy accesses to the
      structure internals.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Jeremy.Linton@arm.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1507810851-306-3-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4df714be
  2. 24 10月, 2017 4 次提交
    • W
      locking/barriers: Kill lockless_dereference() · 59ecbbe7
      Will Deacon 提交于
      lockless_dereference() is a nice idea, but it gained little traction in
      kernel code since its introduction three years ago. This is partly
      because it's a pain to type, but also because using READ_ONCE() instead
      has worked correctly on all architectures apart from Alpha, which is a
      fully supported but somewhat niche architecture these days.
      
      Now that READ_ONCE() has been upgraded to contain an implicit
      smp_read_barrier_depends() and the few callers of lockless_dereference()
      have been converted, we can remove lockless_dereference() altogether.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1508840570-22169-5-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      59ecbbe7
    • W
      locking/barriers: Convert users of lockless_dereference() to READ_ONCE() · 506458ef
      Will Deacon 提交于
      READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
      can be used instead of lockless_dereference() without any change in
      semantics.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      506458ef
    • W
      locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE() · 76ebbe78
      Will Deacon 提交于
      In preparation for the removal of lockless_dereference(), which is the
      same as READ_ONCE() on all architectures other than Alpha, add an
      implicit smp_read_barrier_depends() to READ_ONCE() so that it can be
      used to head dependency chains on all architectures.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1508840570-22169-3-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      76ebbe78
    • W
      linux/compiler.h: Split into compiler.h and compiler_types.h · d1515582
      Will Deacon 提交于
      linux/compiler.h is included indirectly by linux/types.h via
      uapi/linux/types.h -> uapi/linux/posix_types.h -> linux/stddef.h
      -> uapi/linux/stddef.h and is needed to provide a proper definition of
      offsetof.
      
      Unfortunately, compiler.h requires a definition of
      smp_read_barrier_depends() for defining lockless_dereference() and soon
      for defining READ_ONCE(), which means that all
      users of READ_ONCE() will need to include asm/barrier.h to avoid splats
      such as:
      
         In file included from include/uapi/linux/stddef.h:1:0,
                          from include/linux/stddef.h:4,
                          from arch/h8300/kernel/asm-offsets.c:11:
         include/linux/list.h: In function 'list_empty':
      >> include/linux/compiler.h:343:2: error: implicit declaration of function 'smp_read_barrier_depends' [-Werror=implicit-function-declaration]
           smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
           ^
      
      A better alternative is to include asm/barrier.h in linux/compiler.h,
      but this requires a type definition for "bool" on some architectures
      (e.g. x86), which is defined later by linux/types.h. Type "bool" is also
      used directly in linux/compiler.h, so the whole thing is pretty fragile.
      
      This patch splits compiler.h in two: compiler_types.h contains type
      annotations, definitions and the compiler-specific parts, whereas
      compiler.h #includes compiler-types.h and additionally defines macros
      such as {READ,WRITE.ACCESS}_ONCE().
      
      uapi/linux/stddef.h and linux/linkage.h are then moved over to include
      linux/compiler_types.h, which fixes the build for h8 and blackfin.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1508840570-22169-2-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d1515582
  3. 20 10月, 2017 5 次提交
  4. 19 10月, 2017 1 次提交
    • B
      locking/static_keys: Improve uninitialized key warning · 5cdda511
      Borislav Petkov 提交于
      Right now it says:
      
        static_key_disable_cpuslocked used before call to jump_label_init
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:161 static_key_disable_cpuslocked+0x68/0x70
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.0-rc5+ #1
        Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
        task: ffffffff81c0e480 task.stack: ffffffff81c00000
        RIP: 0010:static_key_disable_cpuslocked+0x68/0x70
        RSP: 0000:ffffffff81c03ef0 EFLAGS: 00010096 ORIG_RAX: 0000000000000000
        RAX: 0000000000000041 RBX: ffffffff81c32680 RCX: ffffffff81c5cbf8
        RDX: 0000000000000001 RSI: 0000000000000092 RDI: 0000000000000002
        RBP: ffff88807fffd240 R08: 726f666562206465 R09: 0000000000000136
        R10: 0000000000000000 R11: 696e695f6c656261 R12: ffffffff82158900
        R13: ffffffff8215f760 R14: 0000000000000001 R15: 0000000000000008
        FS:  0000000000000000(0000) GS:ffff883f7f400000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffff88807ffff000 CR3: 0000000001c09000 CR4: 00000000000606b0
        Call Trace:
         static_key_disable+0x16/0x20
         start_kernel+0x15a/0x45d
         ? load_ucode_intel_bsp+0x11/0x2d
         secondary_startup_64+0xa5/0xb0
        Code: 48 c7 c7 a0 15 cf 81 e9 47 53 4b 00 48 89 df e8 5f fc ff ff eb e8 48 c7 c6 \
      	c0 97 83 81 48 c7 c7 d0 ff a2 81 31 c0 e8 c5 9d f5 ff <0f> ff eb a7 0f ff eb \
      	b0 e8 eb a2 4b 00 53 48 89 fb e8 42 0e f0
      
      but it doesn't tell me which key it is. So dump the key's name too:
      
        static_key_disable_cpuslocked(): static key 'virt_spin_lock_key' used before call to jump_label_init()
      
      And that makes pinpointing which key is causing that a lot easier.
      
       include/linux/jump_label.h           |   14 +++++++-------
       include/linux/jump_label_ratelimit.h |    6 +++---
       kernel/jump_label.c                  |   14 +++++++-------
       3 files changed, 17 insertions(+), 17 deletions(-)
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171018152428.ffjgak4o25f7ept6@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5cdda511
  5. 18 10月, 2017 1 次提交
    • D
      KEYS: Fix race between updating and finding a negative key · 363b02da
      David Howells 提交于
      Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection
      error into one field such that:
      
       (1) The instantiation state can be modified/read atomically.
      
       (2) The error can be accessed atomically with the state.
      
       (3) The error isn't stored unioned with the payload pointers.
      
      This deals with the problem that the state is spread over three different
      objects (two bits and a separate variable) and reading or updating them
      atomically isn't practical, given that not only can uninstantiated keys
      change into instantiated or rejected keys, but rejected keys can also turn
      into instantiated keys - and someone accessing the key might not be using
      any locking.
      
      The main side effect of this problem is that what was held in the payload
      may change, depending on the state.  For instance, you might observe the
      key to be in the rejected state.  You then read the cached error, but if
      the key semaphore wasn't locked, the key might've become instantiated
      between the two reads - and you might now have something in hand that isn't
      actually an error code.
      
      The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error
      code if the key is negatively instantiated.  The key_is_instantiated()
      function is replaced with key_is_positive() to avoid confusion as negative
      keys are also 'instantiated'.
      
      Additionally, barriering is included:
      
       (1) Order payload-set before state-set during instantiation.
      
       (2) Order state-read before payload-read when using the key.
      
      Further separate barriering is necessary if RCU is being used to access the
      payload content after reading the payload pointers.
      
      Fixes: 146aa8b1 ("KEYS: Merge the type-specific data with the payload data")
      Cc: stable@vger.kernel.org # v4.4+
      Reported-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      363b02da
  6. 17 10月, 2017 1 次提交
    • C
      tun: call dev_get_valid_name() before register_netdevice() · 0ad646c8
      Cong Wang 提交于
      register_netdevice() could fail early when we have an invalid
      dev name, in which case ->ndo_uninit() is not called. For tun
      device, this is a problem because a timer etc. are already
      initialized and it expects ->ndo_uninit() to clean them up.
      
      We could move these initializations into a ->ndo_init() so
      that register_netdevice() knows better, however this is still
      complicated due to the logic in tun_detach().
      
      Therefore, I choose to just call dev_get_valid_name() before
      register_netdevice(), which is quicker and much easier to audit.
      And for this specific case, it is already enough.
      
      Fixes: 96442e42 ("tuntap: choose the txq based on rxq")
      Reported-by: NDmitry Alexeev <avekceeb@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ad646c8
  7. 14 10月, 2017 4 次提交
  8. 13 10月, 2017 3 次提交
    • D
      genirq: generic chip: remove irq_gc_mask_disable_reg_and_ack() · 0d08af35
      Doug Berger 提交于
      Any usage of the irq_gc_mask_disable_reg_and_ack() function has
      been replaced with the desired functionality.
      
      The incorrect and ambiguously named function is removed here to
      prevent accidental misuse.
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      0d08af35
    • D
      genirq: generic chip: Add irq_gc_mask_disable_and_ack_set() · 20608924
      Doug Berger 提交于
      The irq_gc_mask_disable_reg_and_ack() function name implies that it
      provides the combined functions of irq_gc_mask_disable_reg() and
      irq_gc_ack().  However, the implementation does not actually do
      that since it writes the mask instead of the disable register. It
      also does not maintain the mask cache which makes it inappropriate
      to use with other masking functions.
      
      In addition, commit 659fb32d ("genirq: replace irq_gc_ack() with
      {set,clr}_bit variants (fwd)") effectively renamed irq_gc_ack() to
      irq_gc_ack_set_bit() so this function probably should have also been
      renamed at that time.
      
      The generic chip code currently provides three functions for use
      with the irq_mask member of the irq_chip structure and two functions
      for use with the irq_ack member of the irq_chip structure. These
      functions could be combined into six functions for use with the
      irq_mask_ack member of the irq_chip structure.  However, since only
      one of the combinations is currently used, only the function
      irq_gc_mask_disable_and_ack_set() is added by this commit.
      
      The '_reg' and '_bit' portions of the base function name were left
      out of the new combined function name in an attempt to keep the
      function name length manageable with the 80 character source code
      line length while still allowing the distinct aspects of each
      combination to be captured by the name.
      
      If other combinations are desired in the future please add them to
      the irq generic chip library at that time.
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      20608924
    • S
      irqchip/gic-v3-its: Add missing changes to support 52bit physical address · 30ae9610
      Shanker Donthineni 提交于
      The current ITS driver works fine as long as normal memory and GICR
      regions are located within the lower 48bit (>=0 && <2^48) physical
      address space. Some of the registers GICR_PEND/PROP, GICR_VPEND/VPROP
      and GITS_CBASER are handled properly but not all when configuring
      the hardware with 52bit physical address.
      
      This patch does the following changes to support 52bit PA.
        -Handle 52bit PA in GITS_BASERn.
        -Fix ITT_addr width to 52bits, bits[51:8].
        -Fix RDbase width to 52bits, bits[51:16].
        -Fix VPT_addr width to 52bits, bits[51:16].
      
      Definition of the GITS_BASERn register when ITS PageSize is 64KB:
        -Bits[47:16] of the register provide bits[47:16] of the table PA.
        -Bits[15:12] of the register provide bits[51:48] of the table PA.
        -Bits[15:00] of the base physical address are 0.
      Signed-off-by: NShanker Donthineni <shankerd@codeaurora.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      30ae9610
  9. 12 10月, 2017 1 次提交
    • J
      bus: mbus: fix window size calculation for 4GB windows · 2bbbd963
      Jan Luebbe 提交于
      At least the Armada XP SoC supports 4GB on a single DRAM window. Because
      the size register values contain the actual size - 1, the MSB is set in
      that case. For example, the SDRAM window's control register's value is
      0xffffffe1 for 4GB (bits 31 to 24 contain the size).
      
      The MBUS driver reads back each window's size from registers and
      calculates the actual size as (control_reg | ~DDR_SIZE_MASK) + 1, which
      overflows for 32 bit values, resulting in other miscalculations further
      on (a bad RAM window for the CESA crypto engine calculated by
      mvebu_mbus_setup_cpu_target_nooverlap() in my case).
      
      This patch changes the type in 'struct mbus_dram_window' from u32 to
      u64, which allows us to keep using the same register calculation code in
      most MBUS-using drivers (which calculate ->size - 1 again).
      
      Fixes: fddddb52 ("bus: introduce an Marvell EBU MBus driver")
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Luebbe <jlu@pengutronix.de>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      2bbbd963
  10. 10 10月, 2017 4 次提交
    • W
      locking/arch: Remove dummy arch_{read,spin,write}_lock_flags() implementations · a4c1887d
      Will Deacon 提交于
      The arch_{read,spin,write}_lock_flags() macros are simply mapped to the
      non-flags versions by the majority of architectures, so do this in core
      code and remove the dummy implementations. Also remove the implementation
      in spinlock_up.h, since all callers of do_raw_spin_lock_flags() call
      local_irq_save(flags) anyway.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/1507055129-12300-4-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a4c1887d
    • W
      locking/core: Remove {read,spin,write}_can_lock() · a8a217c2
      Will Deacon 提交于
      Outside of the locking code itself, {read,spin,write}_can_lock() have no
      users in tree. Apparmor (the last remaining user of write_can_lock()) got
      moved over to lockdep by the previous patch.
      
      This patch removes the use of {read,spin,write}_can_lock() from the
      BUILD_LOCK_OPS macro, deferring to the trylock operation for testing the
      lock status, and subsequently removes the unused macros altogether. They
      aren't guaranteed to work in a concurrent environment and can give
      incorrect results in the case of qrwlock.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/1507055129-12300-2-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a8a217c2
    • K
      locking/rwsem: Add down_read_killable() · 76f8507f
      Kirill Tkhai 提交于
      Similar to down_read() and down_write_killable(),
      add killable version of down_read(), based on
      __down_read_killable() function, added in previous
      patches.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: avagin@virtuozzo.com
      Cc: davem@davemloft.net
      Cc: fenghua.yu@intel.com
      Cc: gorcunov@virtuozzo.com
      Cc: heiko.carstens@de.ibm.com
      Cc: hpa@zytor.com
      Cc: ink@jurassic.park.msu.ru
      Cc: mattst88@gmail.com
      Cc: rientjes@google.com
      Cc: rth@twiddle.net
      Cc: schwidefsky@de.ibm.com
      Cc: tony.luck@intel.com
      Cc: viro@zeniv.linux.org.uk
      Link: http://lkml.kernel.org/r/150670119884.23930.2585570605960763239.stgit@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>
      76f8507f
    • P
      sched/core: Fix wake_affine() performance regression · d153b153
      Peter Zijlstra 提交于
      Eric reported a sysbench regression against commit:
      
        3fed382b ("sched/numa: Implement NUMA node level wake_affine()")
      
      Similarly, Rik was looking at the NAS-lu.C benchmark, which regressed
      against his v3.10 enterprise kernel.
      
      PRE (current tip/master):
      
       ivb-ep sysbench:
      
         2: [30 secs]     transactions:                        64110  (2136.94 per sec.)
         5: [30 secs]     transactions:                        143644 (4787.99 per sec.)
        10: [30 secs]     transactions:                        274298 (9142.93 per sec.)
        20: [30 secs]     transactions:                        418683 (13955.45 per sec.)
        40: [30 secs]     transactions:                        320731 (10690.15 per sec.)
        80: [30 secs]     transactions:                        355096 (11834.28 per sec.)
      
       hsw-ex NAS:
      
       OMP_PROC_BIND/lu.C.x_threads_144_run_1.log: Time in seconds =                    18.01
       OMP_PROC_BIND/lu.C.x_threads_144_run_2.log: Time in seconds =                    17.89
       OMP_PROC_BIND/lu.C.x_threads_144_run_3.log: Time in seconds =                    17.93
       lu.C.x_threads_144_run_1.log: Time in seconds =                   434.68
       lu.C.x_threads_144_run_2.log: Time in seconds =                   405.36
       lu.C.x_threads_144_run_3.log: Time in seconds =                   433.83
      
      POST (+patch):
      
       ivb-ep sysbench:
      
         2: [30 secs]     transactions:                        64494  (2149.75 per sec.)
         5: [30 secs]     transactions:                        145114 (4836.99 per sec.)
        10: [30 secs]     transactions:                        278311 (9276.69 per sec.)
        20: [30 secs]     transactions:                        437169 (14571.60 per sec.)
        40: [30 secs]     transactions:                        669837 (22326.73 per sec.)
        80: [30 secs]     transactions:                        631739 (21055.88 per sec.)
      
       hsw-ex NAS:
      
       lu.C.x_threads_144_run_1.log: Time in seconds =                    23.36
       lu.C.x_threads_144_run_2.log: Time in seconds =                    22.96
       lu.C.x_threads_144_run_3.log: Time in seconds =                    22.52
      
      This patch takes out all the shiny wake_affine() stuff and goes back to
      utter basics. Between the two CPUs involved with the wakeup (the CPU
      doing the wakeup and the CPU we ran on previously) pick the CPU we can
      run on _now_.
      
      This restores much of the regressions against the older kernels,
      but leaves some ground in the overloaded case. The default-enabled
      WA_WEIGHT (which will be introduced in the next patch) is an attempt
      to address the overloaded situation.
      Reported-by: NEric Farman <farman@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jinpuwang@gmail.com
      Cc: vcaputo@pengaru.com
      Fixes: 3fed382b ("sched/numa: Implement NUMA node level wake_affine()")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d153b153
  11. 09 10月, 2017 1 次提交
    • S
      netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' · 98589a09
      Shmulik Ladkani 提交于
      Commit 2c16d603 ("netfilter: xt_bpf: support ebpf") introduced
      support for attaching an eBPF object by an fd, with the
      'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
      IPT_SO_SET_REPLACE call.
      
      However this breaks subsequent iptables calls:
      
       # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
       # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
       iptables: Invalid argument. Run `dmesg' for more information.
      
      That's because iptables works by loading existing rules using
      IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
      the replacement set.
      
      However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
      (from the initial "iptables -m bpf" invocation) - so when 2nd invocation
      occurs, userspace passes a bogus fd number, which leads to
      'bpf_mt_check_v1' to fail.
      
      One suggested solution [1] was to hack iptables userspace, to perform a
      "entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
      process-local fd per every 'xt_bpf_info_v1' entry seen.
      
      However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
      depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.
      
      This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
      '.fd' and instead perform an in-kernel lookup for the bpf object given
      the provided '.path'.
      
      It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
      XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
      expected to provide the path of the pinned object.
      
      Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.
      
      References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
                  [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2Reported-by: NRafael Buchbinder <rafi@rbk.ms>
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      98589a09
  12. 04 10月, 2017 6 次提交
    • K
      Drivers: hv: vmbus: Fix bugs in rescind handling · 192b2d78
      K. Y. Srinivasan 提交于
      This patch addresses the following bugs in the current rescind handling code:
      
      1. Fixes a race condition where we may be invoking hv_process_channel_removal()
      on an already freed channel.
      
      2. Prevents indefinite wait when rescinding sub-channels by correctly setting
      the probe_complete state.
      
      I would like to thank Dexuan for patiently reviewing earlier versions of this
      patch and identifying many of the issues fixed here.
      
      Greg, please apply this to 4.14-final.
      
      Fixes: '54a66265 ("Drivers: hv: vmbus: Fix rescind handling")'
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Reviewed-by: NDexuan Cui <decui@microsoft.com>
      Cc: stable@vger.kernel.org # (4.13 and above)
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      192b2d78
    • T
      powerpc/watchdog: Make use of watchdog_nmi_probe() · 34ddaa3e
      Thomas Gleixner 提交于
      The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu()
      on powerpc because it is called multiple times for the boot CPU.
      
      The first call is via:
      
        start_wd_on_cpu+0x80/0x2f0
        watchdog_nmi_reconfigure+0x124/0x170
        softlockup_reconfigure_threads+0x110/0x130
        lockup_detector_init+0xbc/0xe0
        kernel_init_freeable+0x18c/0x37c
        kernel_init+0x2c/0x160
        ret_from_kernel_thread+0x5c/0xbc
      
      And then again via the CPU hotplug registration:
      
        start_wd_on_cpu+0x80/0x2f0
        cpuhp_invoke_callback+0x194/0x620
        cpuhp_thread_fun+0x7c/0x1b0
        smpboot_thread_fn+0x290/0x2a0
        kthread+0x168/0x1b0
        ret_from_kernel_thread+0x5c/0xbc
      
      This can be avoided by setting up the cpu hotplug state with nocalls and
      move the initialization to the watchdog_nmi_probe() function. That
      initializes the hotplug callbacks without invoking the callback and the
      following core initialization function then configures the watchdog for the
      online CPUs (in this case CPU0) via softlockup_reconfigure_threads().
      Reported-and-tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      34ddaa3e
    • T
      watchdog/core, powerpc: Replace watchdog_nmi_reconfigure() · 6b9dc480
      Thomas Gleixner 提交于
      The recent cleanup of the watchdog code split watchdog_nmi_reconfigure()
      into two stages. One to stop the NMI and one to restart it after
      reconfiguration. That was done by adding a boolean 'run' argument to the
      code, which is functionally correct but not necessarily a piece of art.
      
      Replace it by two explicit functions: watchdog_nmi_stop() and
      watchdog_nmi_start().
      
      Fixes: 6592ad2f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage")
      Requested-by: NLinus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org>
      Signed-off-by: NThomas 'Mopping up garbage' Gleixner <tglx@linutronix.de>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
      6b9dc480
    • L
      mmc: Delete bounce buffer handling · de3ee99b
      Linus Walleij 提交于
      In may, Steven sent a patch deleting the bounce buffer handling
      and the CONFIG_MMC_BLOCK_BOUNCE option.
      
      I chose the less invasive path of making it a runtime config
      option, and we merged that successfully for kernel v4.12.
      
      The code is however just standing in the way and taking up
      space for seemingly no gain on any systems in wide use today.
      
      Pierre says the code was there to improve speed on TI SDHCI
      controllers on certain HP laptops and possibly some Ricoh
      controllers as well. Early SDHCI controllers lacked the
      scatter-gather feature, which made software bounce buffers
      a significant speed boost.
      
      We are clearly talking about the list of SDHCI PCI-based
      MMC/SD card readers found in the pci_ids[] list in
      drivers/mmc/host/sdhci-pci-core.c.
      
      The TI SDHCI derivative is not supported by the upstream
      kernel. This leaves the Ricoh.
      
      What we can however notice is that the x86 defconfigs in the
      kernel did not enable CONFIG_MMC_BLOCK_BOUNCE option, which
      means that any such laptop would have to have a custom
      configured kernel to actually take advantage of this
      bounce buffer speed-up. It simply seems like there was
      a speed optimization for the Ricoh controllers that noone
      was using. (I have not checked the distro defconfigs but
      I am pretty sure the situation is the same there.)
      
      Bounce buffers increased performance on the OMAP HSMMC
      at one point, and was part of the original submission in
      commit a45c6cb8 ("[ARM] 5369/1: omap mmc: Add new
         omap hsmmc controller for 2430 and 34xx, v3")
      
      This optimization was removed in
      commit 0ccd76d4 ("omap_hsmmc: Implement scatter-gather
         emulation")
      which found that scatter-gather emulation provided even
      better performance.
      
      The same was introduced for SDHCI in
      commit 2134a922 ("sdhci: scatter-gather (ADMA) support")
      
      I am pretty positively convinced that software
      scatter-gather emulation will do for any host controller what
      the bounce buffers were doing. Essentially, the bounce buffer
      was a reimplementation of software scatter-gather-emulation in
      the MMC subsystem, and it should be done away with.
      
      Cc: Pierre Ossman <pierre@ossman.eu>
      Cc: Juha Yrjola <juha.yrjola@solidboot.com>
      Cc: Steven J. Hill <Steven.Hill@cavium.com>
      Cc: Shawn Lin <shawn.lin@rock-chips.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Suggested-by: NSteven J. Hill <Steven.Hill@cavium.com>
      Suggested-by: NShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      de3ee99b
    • M
      include/linux/fs.h: fix comment about struct address_space · 32e57c29
      Mike Rapoport 提交于
      Before commit 9c5d760b ("mm: split gfp_mask and mapping flags into
      separate fields") the private_* fields of struct adrress_space were
      grouped together and using "ditto" in comments describing the last
      fields was correct.
      
      With introduction of gpf_mask between private_lock and private_list
      "ditto" references the wrong description.
      
      Fix it by using the elaborate description.
      
      Link: http://lkml.kernel.org/r/1507009987-8746-1-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32e57c29
    • Y
      mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function · 1dd2bfc8
      YASUAKI ISHIMATSU 提交于
      pfn_to_section_nr() and section_nr_to_pfn() are defined as macro.
      pfn_to_section_nr() has no issue even if it is defined as macro.  But
      section_nr_to_pfn() has overflow issue if sec is defined as int.
      
      section_nr_to_pfn() just shifts sec by PFN_SECTION_SHIFT.  If sec is
      defined as unsigned long, section_nr_to_pfn() returns pfn as 64 bit value.
      But if sec is defined as int, section_nr_to_pfn() returns pfn as 32 bit
      value.
      
      __remove_section() calculates start_pfn using section_nr_to_pfn() and
      scn_nr defined as int.  So if hot-removed memory address is over 16TB,
      overflow issue occurs and section_nr_to_pfn() does not calculate correct
      pfn.
      
      To make callers use proper arg, the patch changes the macros to inline
      functions.
      
      Fixes: 815121d2 ("memory_hotplug: clear zone when removing the memory")
      Link: http://lkml.kernel.org/r/e643a387-e573-6bbf-d418-c60c8ee3d15e@gmail.comSigned-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1dd2bfc8