1. 06 9月, 2013 1 次提交
    • M
      vfs: check submounts and drop atomically · 848ac114
      Miklos Szeredi 提交于
      We check submounts before doing d_drop() on a non-empty directory dentry in
      NFS (have_submounts()), but we do not exclude a racing mount.
      
       Process A: have_submounts() -> returns false
       Process B: mount() -> success
       Process A: d_drop()
      
      This patch prepares the ground for the fix by doing the following
      operations all under the same rename lock:
      
        have_submounts()
        shrink_dcache_parent()
        d_drop()
      
      This is actually an optimization since have_submounts() and
      shrink_dcache_parent() both traverse the same dentry tree separately.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: David Howells <dhowells@redhat.com>
      CC: Steven Whitehouse <swhiteho@redhat.com>
      CC: Trond Myklebust <Trond.Myklebust@netapp.com>
      CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      848ac114
  2. 04 9月, 2013 7 次提交
  3. 03 9月, 2013 5 次提交
    • L
      module: Fix mod->mkobj.kobj potentially freed too early · 942e4431
      Li Zhong 提交于
      DEBUG_KOBJECT_RELEASE helps to find the issue attached below.
      
      After some investigation, it seems the reason is:
      The mod->mkobj.kobj(ffffffffa01600d0 below) is freed together with mod
      itself in free_module(). However, its children still hold references to
      it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
      child(holders below) tries to decrease the reference count to its parent
      in kobject_del(), BUG happens as it tries to access already freed memory.
      
      This patch tries to fix it by waiting for the mod->mkobj.kobj to be
      really released in the module removing process (and some error code
      paths).
      
      [ 1844.175287] kobject: 'holders' (ffff88007c1f1600): kobject_release, parent ffffffffa01600d0 (delayed)
      [ 1844.178991] kobject: 'notes' (ffff8800370b2a00): kobject_release, parent ffffffffa01600d0 (delayed)
      [ 1845.180118] kobject: 'holders' (ffff88007c1f1600): kobject_cleanup, parent ffffffffa01600d0
      [ 1845.182130] kobject: 'holders' (ffff88007c1f1600): auto cleanup kobject_del
      [ 1845.184120] BUG: unable to handle kernel paging request at ffffffffa01601d0
      [ 1845.185026] IP: [<ffffffff812cda81>] kobject_put+0x11/0x60
      [ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
      [ 1845.185026] Oops: 0000 [#1] PREEMPT
      [ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: kprobe_example]
      [ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G           O 3.11.0-rc6-next-20130819+ #1
      [ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [ 1845.185026] Workqueue: events kobject_delayed_cleanup
      [ 1845.185026] task: ffff88007ca51f00 ti: ffff88007ca5c000 task.ti: ffff88007ca5c000
      [ 1845.185026] RIP: 0010:[<ffffffff812cda81>]  [<ffffffff812cda81>] kobject_put+0x11/0x60
      [ 1845.185026] RSP: 0018:ffff88007ca5dd08  EFLAGS: 00010282
      [ 1845.185026] RAX: 0000000000002000 RBX: ffffffffa01600d0 RCX: ffffffff8177d638
      [ 1845.185026] RDX: ffff88007ca5dc18 RSI: 0000000000000000 RDI: ffffffffa01600d0
      [ 1845.185026] RBP: ffff88007ca5dd18 R08: ffffffff824e9810 R09: ffffffffffffffff
      [ 1845.185026] R10: ffff8800ffffffff R11: dead4ead00000001 R12: ffffffff81a95040
      [ 1845.185026] R13: ffff88007b27a960 R14: ffff88007c1f1600 R15: 0000000000000000
      [ 1845.185026] FS:  0000000000000000(0000) GS:ffffffff81a23000(0000) knlGS:0000000000000000
      [ 1845.185026] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1845.185026] CR2: ffffffffa01601d0 CR3: 0000000037207000 CR4: 00000000000006b0
      [ 1845.185026] Stack:
      [ 1845.185026]  ffff88007c1f1600 ffff88007c1f1600 ffff88007ca5dd38 ffffffff812cdb7e
      [ 1845.185026]  0000000000000000 ffff88007c1f1640 ffff88007ca5dd68 ffffffff812cdbfe
      [ 1845.185026]  ffff88007c974800 ffff88007c1f1640 ffff88007ff61a00 0000000000000000
      [ 1845.185026] Call Trace:
      [ 1845.185026]  [<ffffffff812cdb7e>] kobject_del+0x2e/0x40
      [ 1845.185026]  [<ffffffff812cdbfe>] kobject_delayed_cleanup+0x6e/0x1d0
      [ 1845.185026]  [<ffffffff81063a45>] process_one_work+0x1e5/0x670
      [ 1845.185026]  [<ffffffff810639e3>] ? process_one_work+0x183/0x670
      [ 1845.185026]  [<ffffffff810642b3>] worker_thread+0x113/0x370
      [ 1845.185026]  [<ffffffff810641a0>] ? rescuer_thread+0x290/0x290
      [ 1845.185026]  [<ffffffff8106bfba>] kthread+0xda/0xe0
      [ 1845.185026]  [<ffffffff814ff0f0>] ? _raw_spin_unlock_irq+0x30/0x60
      [ 1845.185026]  [<ffffffff8106bee0>] ? kthread_create_on_node+0x130/0x130
      [ 1845.185026]  [<ffffffff8150751a>] ret_from_fork+0x7a/0xb0
      [ 1845.185026]  [<ffffffff8106bee0>] ? kthread_create_on_node+0x130/0x130
      [ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff ff ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d <f6> 87 00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84
      [ 1845.185026] RIP  [<ffffffff812cda81>] kobject_put+0x11/0x60
      [ 1845.185026]  RSP <ffff88007ca5dd08>
      [ 1845.185026] CR2: ffffffffa01601d0
      [ 1845.185026] ---[ end trace 49a70afd109f5653 ]---
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      942e4431
    • L
      lockref: implement lockless reference count updates using cmpxchg() · bc08b449
      Linus Torvalds 提交于
      Instead of taking the spinlock, the lockless versions atomically check
      that the lock is not taken, and do the reference count update using a
      cmpxchg() loop.  This is semantically identical to doing the reference
      count update protected by the lock, but avoids the "wait for lock"
      contention that you get when accesses to the reference count are
      contended.
      
      Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
      Even when the lockref reference counts are updated atomically with
      cmpxchg, the fact that they also verify the state of the spinlock means
      that the lockless updates can never happen while somebody else holds the
      spinlock.
      
      So while "lockref_put_or_lock()" looks a lot like just another name for
      "atomic_dec_and_lock()", and both optimize to lockless updates, they are
      fundamentally different: the decrement done by atomic_dec_and_lock() is
      truly independent of any lock (as long as it doesn't decrement to zero),
      so a locked region can still see the count change.
      
      The lockref structure, in contrast, really is a *locked* reference
      count.  If you hold the spinlock, the reference count will be stable and
      you can modify the reference count without using atomics, because even
      the lockless updates will see and respect the state of the lock.
      
      In order to enable the cmpxchg lockless code, the architecture needs to
      do three things:
      
       (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
           in an aligned u64, and have a "cmpxchg()" implementation that works
           on such a u64 data type.
      
       (2) define a helper function to test for a spinlock being unlocked
           ("arch_spin_value_unlocked()")
      
       (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
           Kconfig file.
      
      This enables it for x86-64 (but not 32-bit, we'd need to make sure
      cmpxchg() turns into the proper cmpxchg8b in order to enable it for
      32-bit mode).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc08b449
    • L
      lockref: uninline lockref helper functions · 2f4f12e5
      Linus Torvalds 提交于
      They aren't very good to inline, since they already call external
      functions (the spinlock code), and we're going to create rather more
      complicated versions of them that can do the reference count updates
      locklessly.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f4f12e5
    • L
      vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock() · 15570086
      Linus Torvalds 提交于
      This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c
      and re-implements it using the lockref infrastructure instead.  It also
      adds a lot of comments about what is actually going on, because turning
      a dentry that was looked up using RCU into a long-lived reference
      counted entry is one of the more subtle parts of the rcu walk.
      
      We also used to be _particularly_ subtle in unlazy_walk() where we
      re-validate both the dentry and its parent using the same sequence
      count.  We used to do it by nesting the locks and then verifying the
      sequence count just once.
      
      That was silly, because nested locking is expensive, but the sequence
      count check is not.  So this just re-validates the dentry and the parent
      separately, avoiding the nested locking, and making the lockref lookup
      possible.
      Acked-by: NWaiman Long <waiman.long@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15570086
    • L
      lockref: add 'lockref_get_or_lock() helper · b3abd802
      Linus Torvalds 提交于
      This behaves like "lockref_get_not_zero()", but instead of doing nothing
      if the count was zero, it returns with the lock held.
      
      This allows callers to revalidate the lockref-protected data structure
      if required even if the count was zero to begin with, and possibly
      increment the count if it passes muster.
      
      In particular, the dentry code wants this when it wants to turn an
      RCU-protected dentry into a stable refcounted one: if the dentry count
      it zero, but the sequence number still validates the dentry, we can take
      a reference to it.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3abd802
  4. 02 9月, 2013 2 次提交
  5. 01 9月, 2013 1 次提交
    • P
      nohz_full: Add full-system-idle state machine · 0edd1b17
      Paul E. McKenney 提交于
      This commit adds the state machine that takes the per-CPU idle data
      as input and produces a full-system-idle indication as output.  This
      state machine is driven out of RCU's quiescent-state-forcing
      mechanism, which invokes rcu_sysidle_check_cpu() to collect per-CPU
      idle state and then rcu_sysidle_report() to drive the state machine.
      
      The full-system-idle state is sampled using rcu_sys_is_idle(), which
      also drives the state machine if RCU is idle (and does so by forcing
      RCU to become non-idle).  This function returns true if all but the
      timekeeping CPU (tick_do_timer_cpu) are idle and have been idle long
      enough to avoid memory contention on the full_sysidle_state state
      variable.  The rcu_sysidle_force_exit() may be called externally
      to reset the state machine back into non-idle state.
      
      For large systems the state machine is driven out of RCU's
      force-quiescent-state logic, which provides good scalability at the price
      of millisecond-scale latencies on the transition to full-system-idle
      state.  This is not so good for battery-powered systems, which are usually
      small enough that they don't need to care about scalability, but which
      do care deeply about energy efficiency.  Small systems therefore drive
      the state machine directly out of the idle-entry code.  The number of
      CPUs in a "small" system is defined by a new NO_HZ_FULL_SYSIDLE_SMALL
      Kconfig parameter, which defaults to 8.  Note that this is a build-time
      definition.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      [ paulmck: Use true and false for boolean constants per Lai Jiangshan. ]
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Simplify logic and provide better comments for memory barriers,
        based on review comments and questions by Lai Jiangshan. ]
      0edd1b17
  6. 31 8月, 2013 1 次提交
  7. 30 8月, 2013 11 次提交
  8. 29 8月, 2013 6 次提交
    • D
      gpu/vga_switcheroo: add driver control power feature. (v3) · 0d69704a
      Dave Airlie 提交于
      For optimus and powerxpress muxless we really want the GPU
      driver deciding when to power up/down the GPU, not userspace.
      
      This adds the ability for a driver to dynamically power up/down
      the GPU and remove the switcheroo from controlling it, the
      switcheroo reports the dynamic state to userspace also.
      
      It also adds 2 power domains, one for machine where the power
      switch is controlled outside the GPU D3 state, so the powerdown
      ordering is done correctly, and the second for the hdmi audio
      device to make sure it can resume for PCI config space accesses.
      
      v1.1: fix build with switcheroo off
      
      v2: add power domain support for radeon and v1 nvidia dsms
      v2.1: fix typo in off case
      
      v3: add audio power domain for hdmi audio + misc audio fixes
      
      v4: use PCI_SLOT macro, drop power reference on hdmi audio resume
      failure also.
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      0d69704a
    • W
      vfs: make the dentry cache use the lockref infrastructure · 98474236
      Waiman Long 提交于
      This just replaces the dentry count/lock combination with the lockref
      structure that contains both a count and a spinlock, and does the
      mechanical conversion to use the lockref infrastructure.
      
      There are no semantic changes here, it's purely syntactic.  The
      reference lockref implementation uses the spinlock exactly the same way
      that the old dcache code did, and the bulk of this patch is just
      expanding the internal "d_count" use in the dcache code to use
      "d_lockref.count" instead.
      
      This is purely preparation for the real change to make the reference
      count updates be lockless during the 3.12 merge window.
      
      [ As with the previous commit, this is a rewritten version of a concept
        originally from Waiman, so credit goes to him, blame for any errors
        goes to me.
      
        Waiman's patch had some semantic differences for taking advantage of
        the lockless update in dget_parent(), while this patch is
        intentionally a pure search-and-replace change with no semantic
        changes.     - Linus ]
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98474236
    • W
      Add new lockref infrastructure reference implementation · 0f8f2aaa
      Waiman Long 提交于
      This introduces a new "lockref" structure that supports the concept of
      lockless updates of reference counts that still honor an attached
      spinlock.
      
      NOTE! This reference implementation is not the optimized lockless
      version, rather it is the fallback implementation using standard
      spinlocks.  The actual optimized versions will be merged into 3.12, but
      I wanted to get the infrastructure in place and document the new
      interfaces.
      
      [ Also note that this particular commit is drastically cut-down minimal
        version of the original patch by Waiman.  In order to properly credit
        the original author I'm marking Waiman as the author here, but in the
        end this patch bears little resemblance to the patch by Waiman.  So
        blame any errors on me editing things down to the point where I can
        introduce the infrastructure before the merge window for 3.12 actually
        opens.     - Linus ]
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f8f2aaa
    • D
      dev-core: fix build break when DEBUG is enabled · 8ef2d651
      Dmitry Kasatkin 提交于
      When DEBUG is defined, dev_dbg_ratelimited uses dynamic debug data
      structures even when CONFIG_DYNAMIC_DEBUG is not defined.
      It leads to build break.
      For example, when I try to use dev_dbg_ratelimited in USB code and
      CONFIG_USB_DEBUG is enabled, but CONFIG_DYNAMIC_DEBUG is not, I get:
      
        CC [M]  drivers/usb/host/xhci-ring.o
        drivers/usb/host/xhci-ring.c: In function ‘xhci_queue_intr_tx’:
        drivers/usb/host/xhci-ring.c:3059:3: error: implicit declaration of function ‘DEFINE_DYNAMIC_DEBUG_METADATA’ [-Werror=implicit-function-declaration]
        drivers/usb/host/xhci-ring.c:3059:3: error: ‘descriptor’ undeclared (first use in this function)
        drivers/usb/host/xhci-ring.c:3059:3: note: each undeclared identifier is reported only once for each function it appears in
        drivers/usb/host/xhci-ring.c:3059:3: error: implicit declaration of function ‘__dynamic_pr_debug’ [-Werror=implicit-function-declaration]
        drivers/usb/host/xhci-ring.c: In function ‘xhci_queue_isoc_tx_prepare’:
        drivers/usb/host/xhci-ring.c:3847:3: error: ‘descriptor’ undeclared (first use in this function)
        cc1: some warnings being treated as errors
        make[2]: *** [drivers/usb/host/xhci-ring.o] Error 1
        make[1]: *** [drivers/usb/host] Error 2
        make: *** [drivers/usb/] Error 2
      
      This patch separates definition for CONFIG_DYNAMIC_DEBUG and DEBUG cases.
      
      [Note, Sarah moved the comment above the macro to avoid checkpatch
      warnings.]
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NSarah Sharp <sarah.a.sharp@linux.intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ef2d651
    • H
      IB/mlx4: Add receive flow steering support · f77c0162
      Hadar Hen Zion 提交于
      Implement ib_create_flow() and ib_destroy_flow().
      
      Translate the verbs structures provided by the user to HW structures
      and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands.
      
      On the ATTACH command completion, the firmware provides a 64-bit
      registration ID, which is placed into struct mlx4_ib_flow that wraps
      the instance of struct ib_flow which is retuned to caller.  Later,
      this reg ID is used for detaching that flow from the firmware.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      f77c0162
    • G
      sysfs: sysfs_create_groups returns a value. · 574979c6
      Greg Kroah-Hartman 提交于
      When I included the "empty" function for sysfs_create_groups() when
      CONFIG_SYSFS=n, I forgot to return a value for it, so things blew up the
      build.  This patch fixes that, stupid me.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      574979c6
  9. 28 8月, 2013 4 次提交
  10. 27 8月, 2013 2 次提交
    • T
      cgroup: implement CFTYPE_NO_PREFIX · 9fa4db33
      Tejun Heo 提交于
      When cgroup files are created, cgroup core automatically prepends the
      name of the subsystem as prefix.  This patch adds CFTYPE_NO_ which
      disables the automatic prefix.  This is to work around historical
      baggages and shouldn't be used for new files.
      
      This will be used to move "cgroup.event_control" from cgroup core to
      memcg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Glauber Costa <glommer@gmail.com>
      9fa4db33
    • T
      cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax · 35cf0836
      Tejun Heo 提交于
      cgroup_css_from_dir() will grow another user.  In preparation, make
      the following changes.
      
      * All css functions are prefixed with just "css_", rename it to
        css_from_dir().
      
      * Take dentry * instead of file * as dentry is what ultimately
        identifies a cgroup and file may not always be available.  Note that
        the function now checkes whether @dentry->d_inode is NULL as the
        caller now may specify a negative dentry.
      
      * Make it take cgroup_subsys * instead of integer subsys_id.  This
        simplifies the function and allows specifying no subsystem for
        cgroup->dummy_css.
      
      * Make return section a bit less verbose.
      
      This patch doesn't introduce any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      35cf0836