1. 09 9月, 2013 1 次提交
    • L
      vfs: reorganize dput() memory accesses · 8aab6a27
      Linus Torvalds 提交于
      This is me being a bit OCD after all the dentry optimization work this
      merge window: profiles end up showing 'dput()' as a rather expensive
      operation, and there were two unrelated bad reasons for that.
      
      The first reason was reading d_lockref.count for debugging purposes,
      which touches the lockref cacheline (for reads) before really need to.
      More importantly, the debugging test in question is _wrong_, and has
      hidden bugs.  It's true that we can only sleep when the count goes down
      to zero, but the test as-is hides the much more subtle bug that happens
      if we race with somebody else deleting the file.
      
      Anyway we _will_ touch that cacheline, but let's do it for a write and
      in the right routine (ie in "lockref_put_or_lock()") which annotates the
      costs better.  So remove the misleading debug code.
      
      The other was an unnecessary access to the cacheline that contains the
      d_lru list, just to check whether we already were on the LRU list or
      not.  This is exactly what we have d_flags for, so that we can avoid
      touching extra cache lines for the common case.  So just add another bit
      for "is this dentry on the LRU".
      
      Finally, mark the tests properly likely/unlikely, so that the common
      fast-paths are dense in the instruction stream.
      
      This makes the profiles look much saner.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8aab6a27
  2. 08 9月, 2013 2 次提交
    • L
      lockref: add ability to mark lockrefs "dead" · e7d33bb5
      Linus Torvalds 提交于
      The only actual current lockref user (dcache) uses zero reference counts
      even for perfectly live dentries, because it's a cache: there may not be
      any users, but that doesn't mean that we want to throw away the dentry.
      
      At the same time, the dentry cache does have a notion of a truly "dead"
      dentry that we must not even increment the reference count of, because
      we have pruned it and it is not valid.
      
      Currently that distinction is not visible in the lockref itself, and the
      dentry cache validation uses "lockref_get_or_lock()" to either get a new
      reference to a dentry that already had existing references (and thus
      cannot be dead), or get the dentry lock so that we can then verify the
      dentry and increment the reference count under the lock if that
      verification was successful.
      
      That's all somewhat complicated.
      
      This adds the concept of being "dead" to the lockref itself, by simply
      using a count that is negative.  This allows a usage scenario where we
      can increment the refcount of a dentry without having to validate it,
      and pushing the special "we killed it" case into the lockref code.
      
      The dentry code itself doesn't actually use this yet, and it's probably
      too late in the merge window to do that code (the dentry_kill() code
      with its "should I decrement the count" logic really is pretty complex
      code), but let's introduce the concept at the lockref level now.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7d33bb5
    • L
      Revert "Input: introduce BTN/ABS bits for drums and guitars" · b04c99e3
      Linus Torvalds 提交于
      This reverts commits 61e00655, 73f8645d and 8e22ecb6:
        "Input: introduce BTN/ABS bits for drums and guitars"
        "HID: wiimote: add support for Guitar-Hero drums"
        "HID: wiimote: add support for Guitar-Hero guitars"
      
      The extra new ABS_xx values resulted in ABS_MAX no longer being a
      power-of-two, which broke the comparison logic.  It also caused the
      ioctl numbers to overflow into the next byte, causing problems for that.
      
      We'll try again for 3.13.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Benjamin Tissoires <benjamin.tissoires@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b04c99e3
  3. 07 9月, 2013 3 次提交
  4. 06 9月, 2013 5 次提交
    • M
      fscache: Netfs function for cleanup post readpages · 5a6f282a
      Milosz Tanski 提交于
      Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
      inside the aops readpages callback.  It marks all the pages in the list
      provided by readahead with PG_private_2.  In the cases that the netfs fails to
      read all the pages (which is legal) it ends up returning to the readahead and
      triggering a BUG.  This happens because the page list still contains marked
      pages.
      
      This patch implements a simple fscache_readpages_cancel function that the netfs
      should call before returning from readpages.  It will revoke the pages from the
      underlying cache backend and unmark them.
      
      The problem was originally worked out in the Ceph devel tree, but it also
      occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
      will clean up the unprocessed pages in the case of an error.
      
      This can be used to address the following oops:
      
      [12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
      [12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
      	(null) index:0x0
      [12410647.597298] page flags: 0x200000000001000(private_2)
      
      ...
      
      [12410647.597334] Call Trace:
      [12410647.597345]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
      [12410647.597356]  [<ffffffff8111def7>] bad_page+0xc7/0x120
      [12410647.597359]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
      [12410647.597361]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
      [12410647.597363]  [<ffffffff81123507>] __put_single_page+0x27/0x30
      [12410647.597365]  [<ffffffff81123df5>] put_page+0x25/0x40
      [12410647.597376]  [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
      [12410647.597379]  [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
      [12410647.597382]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
      [12410647.597384]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
      [12410647.597387]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
      [12410647.597391]  [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
      [12410647.597395]  [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
      [12410647.597398]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
      [12410647.597401]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
      [12410647.597403]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
      [12410647.597405]  [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
      [12410647.597407]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
      [12410647.597411]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
      [12410647.597414]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
      [12410647.597418]  [<ffffffff8108011d>] ? up_write+0x1d/0x20
      [12410647.597422]  [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
      [12410647.597425]  [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
      [12410647.597427]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
      [12410647.597431]  [<ffffffff81558818>] page_fault+0x28/0x30
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a6f282a
    • D
      FS-Cache: Add interface to check consistency of a cached object · da9803bc
      David Howells 提交于
      Extend the fscache netfs API so that the netfs can ask as to whether a cache
      object is up to date with respect to its corresponding netfs object:
      
      	int fscache_check_consistency(struct fscache_cookie *cookie)
      
      This will call back to the netfs to check whether the auxiliary data associated
      with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
      may also return -ENOMEM and -ERESTARTSYS.
      
      The backends now have to implement a mandatory operation pointer:
      
      	int (*check_consistency)(struct fscache_object *object)
      
      that corresponds to the above API call.  FS-Cache takes care of pinning the
      object and the cookie in memory and managing this call with respect to the
      object state.
      
      Original-author: Hongyi Jia <jiayisuse@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hongyi Jia <jiayisuse@gmail.com>
      cc: Milosz Tanski <milosz@adfin.com>
      da9803bc
    • A
      constify dcache.c inlined helpers where possible · f0d3b3de
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f0d3b3de
    • M
      vfs: check submounts and drop atomically · 848ac114
      Miklos Szeredi 提交于
      We check submounts before doing d_drop() on a non-empty directory dentry in
      NFS (have_submounts()), but we do not exclude a racing mount.
      
       Process A: have_submounts() -> returns false
       Process B: mount() -> success
       Process A: d_drop()
      
      This patch prepares the ground for the fix by doing the following
      operations all under the same rename lock:
      
        have_submounts()
        shrink_dcache_parent()
        d_drop()
      
      This is actually an optimization since have_submounts() and
      shrink_dcache_parent() both traverse the same dentry tree separately.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: David Howells <dhowells@redhat.com>
      CC: Steven Whitehouse <swhiteho@redhat.com>
      CC: Trond Myklebust <Trond.Myklebust@netapp.com>
      CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      848ac114
    • J
      vxlan: Notify drivers for listening UDP port changes · 53cf5275
      Joseph Gasparakis 提交于
      This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and
      ndo_del_rx_vxlan_port().
      
      Drivers can get notifications through the above functions about changes
      of the UDP listening port of VXLAN. Also, when physical ports come up,
      now they can call vxlan_get_rx_port() in order to obtain the port number(s)
      of the existing VXLAN interface in case they already up before them.
      
      This information about the listening UDP port would be used for VXLAN
      related offloads.
      
      A big thank you to John Fastabend (john.r.fastabend@intel.com) for his
      input and his suggestions on this patch set.
      
      CC: John Fastabend <john.r.fastabend@intel.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NJoseph Gasparakis <joseph.gasparakis@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53cf5275
  5. 05 9月, 2013 8 次提交
  6. 04 9月, 2013 18 次提交
  7. 03 9月, 2013 3 次提交
    • L
      module: Fix mod->mkobj.kobj potentially freed too early · 942e4431
      Li Zhong 提交于
      DEBUG_KOBJECT_RELEASE helps to find the issue attached below.
      
      After some investigation, it seems the reason is:
      The mod->mkobj.kobj(ffffffffa01600d0 below) is freed together with mod
      itself in free_module(). However, its children still hold references to
      it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
      child(holders below) tries to decrease the reference count to its parent
      in kobject_del(), BUG happens as it tries to access already freed memory.
      
      This patch tries to fix it by waiting for the mod->mkobj.kobj to be
      really released in the module removing process (and some error code
      paths).
      
      [ 1844.175287] kobject: 'holders' (ffff88007c1f1600): kobject_release, parent ffffffffa01600d0 (delayed)
      [ 1844.178991] kobject: 'notes' (ffff8800370b2a00): kobject_release, parent ffffffffa01600d0 (delayed)
      [ 1845.180118] kobject: 'holders' (ffff88007c1f1600): kobject_cleanup, parent ffffffffa01600d0
      [ 1845.182130] kobject: 'holders' (ffff88007c1f1600): auto cleanup kobject_del
      [ 1845.184120] BUG: unable to handle kernel paging request at ffffffffa01601d0
      [ 1845.185026] IP: [<ffffffff812cda81>] kobject_put+0x11/0x60
      [ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
      [ 1845.185026] Oops: 0000 [#1] PREEMPT
      [ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: kprobe_example]
      [ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G           O 3.11.0-rc6-next-20130819+ #1
      [ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [ 1845.185026] Workqueue: events kobject_delayed_cleanup
      [ 1845.185026] task: ffff88007ca51f00 ti: ffff88007ca5c000 task.ti: ffff88007ca5c000
      [ 1845.185026] RIP: 0010:[<ffffffff812cda81>]  [<ffffffff812cda81>] kobject_put+0x11/0x60
      [ 1845.185026] RSP: 0018:ffff88007ca5dd08  EFLAGS: 00010282
      [ 1845.185026] RAX: 0000000000002000 RBX: ffffffffa01600d0 RCX: ffffffff8177d638
      [ 1845.185026] RDX: ffff88007ca5dc18 RSI: 0000000000000000 RDI: ffffffffa01600d0
      [ 1845.185026] RBP: ffff88007ca5dd18 R08: ffffffff824e9810 R09: ffffffffffffffff
      [ 1845.185026] R10: ffff8800ffffffff R11: dead4ead00000001 R12: ffffffff81a95040
      [ 1845.185026] R13: ffff88007b27a960 R14: ffff88007c1f1600 R15: 0000000000000000
      [ 1845.185026] FS:  0000000000000000(0000) GS:ffffffff81a23000(0000) knlGS:0000000000000000
      [ 1845.185026] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1845.185026] CR2: ffffffffa01601d0 CR3: 0000000037207000 CR4: 00000000000006b0
      [ 1845.185026] Stack:
      [ 1845.185026]  ffff88007c1f1600 ffff88007c1f1600 ffff88007ca5dd38 ffffffff812cdb7e
      [ 1845.185026]  0000000000000000 ffff88007c1f1640 ffff88007ca5dd68 ffffffff812cdbfe
      [ 1845.185026]  ffff88007c974800 ffff88007c1f1640 ffff88007ff61a00 0000000000000000
      [ 1845.185026] Call Trace:
      [ 1845.185026]  [<ffffffff812cdb7e>] kobject_del+0x2e/0x40
      [ 1845.185026]  [<ffffffff812cdbfe>] kobject_delayed_cleanup+0x6e/0x1d0
      [ 1845.185026]  [<ffffffff81063a45>] process_one_work+0x1e5/0x670
      [ 1845.185026]  [<ffffffff810639e3>] ? process_one_work+0x183/0x670
      [ 1845.185026]  [<ffffffff810642b3>] worker_thread+0x113/0x370
      [ 1845.185026]  [<ffffffff810641a0>] ? rescuer_thread+0x290/0x290
      [ 1845.185026]  [<ffffffff8106bfba>] kthread+0xda/0xe0
      [ 1845.185026]  [<ffffffff814ff0f0>] ? _raw_spin_unlock_irq+0x30/0x60
      [ 1845.185026]  [<ffffffff8106bee0>] ? kthread_create_on_node+0x130/0x130
      [ 1845.185026]  [<ffffffff8150751a>] ret_from_fork+0x7a/0xb0
      [ 1845.185026]  [<ffffffff8106bee0>] ? kthread_create_on_node+0x130/0x130
      [ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff ff ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d <f6> 87 00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84
      [ 1845.185026] RIP  [<ffffffff812cda81>] kobject_put+0x11/0x60
      [ 1845.185026]  RSP <ffff88007ca5dd08>
      [ 1845.185026] CR2: ffffffffa01601d0
      [ 1845.185026] ---[ end trace 49a70afd109f5653 ]---
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      942e4431
    • L
      lockref: implement lockless reference count updates using cmpxchg() · bc08b449
      Linus Torvalds 提交于
      Instead of taking the spinlock, the lockless versions atomically check
      that the lock is not taken, and do the reference count update using a
      cmpxchg() loop.  This is semantically identical to doing the reference
      count update protected by the lock, but avoids the "wait for lock"
      contention that you get when accesses to the reference count are
      contended.
      
      Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
      Even when the lockref reference counts are updated atomically with
      cmpxchg, the fact that they also verify the state of the spinlock means
      that the lockless updates can never happen while somebody else holds the
      spinlock.
      
      So while "lockref_put_or_lock()" looks a lot like just another name for
      "atomic_dec_and_lock()", and both optimize to lockless updates, they are
      fundamentally different: the decrement done by atomic_dec_and_lock() is
      truly independent of any lock (as long as it doesn't decrement to zero),
      so a locked region can still see the count change.
      
      The lockref structure, in contrast, really is a *locked* reference
      count.  If you hold the spinlock, the reference count will be stable and
      you can modify the reference count without using atomics, because even
      the lockless updates will see and respect the state of the lock.
      
      In order to enable the cmpxchg lockless code, the architecture needs to
      do three things:
      
       (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
           in an aligned u64, and have a "cmpxchg()" implementation that works
           on such a u64 data type.
      
       (2) define a helper function to test for a spinlock being unlocked
           ("arch_spin_value_unlocked()")
      
       (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
           Kconfig file.
      
      This enables it for x86-64 (but not 32-bit, we'd need to make sure
      cmpxchg() turns into the proper cmpxchg8b in order to enable it for
      32-bit mode).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc08b449
    • L
      lockref: uninline lockref helper functions · 2f4f12e5
      Linus Torvalds 提交于
      They aren't very good to inline, since they already call external
      functions (the spinlock code), and we're going to create rather more
      complicated versions of them that can do the reference count updates
      locklessly.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f4f12e5