1. 08 9月, 2013 1 次提交
  2. 06 9月, 2013 1 次提交
    • J
      vxlan: Notify drivers for listening UDP port changes · 53cf5275
      Joseph Gasparakis 提交于
      This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and
      ndo_del_rx_vxlan_port().
      
      Drivers can get notifications through the above functions about changes
      of the UDP listening port of VXLAN. Also, when physical ports come up,
      now they can call vxlan_get_rx_port() in order to obtain the port number(s)
      of the existing VXLAN interface in case they already up before them.
      
      This information about the listening UDP port would be used for VXLAN
      related offloads.
      
      A big thank you to John Fastabend (john.r.fastabend@intel.com) for his
      input and his suggestions on this patch set.
      
      CC: John Fastabend <john.r.fastabend@intel.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NJoseph Gasparakis <joseph.gasparakis@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53cf5275
  3. 04 9月, 2013 12 次提交
  4. 03 9月, 2013 5 次提交
    • L
      module: Fix mod->mkobj.kobj potentially freed too early · 942e4431
      Li Zhong 提交于
      DEBUG_KOBJECT_RELEASE helps to find the issue attached below.
      
      After some investigation, it seems the reason is:
      The mod->mkobj.kobj(ffffffffa01600d0 below) is freed together with mod
      itself in free_module(). However, its children still hold references to
      it, as the delay caused by DEBUG_KOBJECT_RELEASE. So when the
      child(holders below) tries to decrease the reference count to its parent
      in kobject_del(), BUG happens as it tries to access already freed memory.
      
      This patch tries to fix it by waiting for the mod->mkobj.kobj to be
      really released in the module removing process (and some error code
      paths).
      
      [ 1844.175287] kobject: 'holders' (ffff88007c1f1600): kobject_release, parent ffffffffa01600d0 (delayed)
      [ 1844.178991] kobject: 'notes' (ffff8800370b2a00): kobject_release, parent ffffffffa01600d0 (delayed)
      [ 1845.180118] kobject: 'holders' (ffff88007c1f1600): kobject_cleanup, parent ffffffffa01600d0
      [ 1845.182130] kobject: 'holders' (ffff88007c1f1600): auto cleanup kobject_del
      [ 1845.184120] BUG: unable to handle kernel paging request at ffffffffa01601d0
      [ 1845.185026] IP: [<ffffffff812cda81>] kobject_put+0x11/0x60
      [ 1845.185026] PGD 1a13067 PUD 1a14063 PMD 7bd30067 PTE 0
      [ 1845.185026] Oops: 0000 [#1] PREEMPT
      [ 1845.185026] Modules linked in: xfs libcrc32c [last unloaded: kprobe_example]
      [ 1845.185026] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G           O 3.11.0-rc6-next-20130819+ #1
      [ 1845.185026] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [ 1845.185026] Workqueue: events kobject_delayed_cleanup
      [ 1845.185026] task: ffff88007ca51f00 ti: ffff88007ca5c000 task.ti: ffff88007ca5c000
      [ 1845.185026] RIP: 0010:[<ffffffff812cda81>]  [<ffffffff812cda81>] kobject_put+0x11/0x60
      [ 1845.185026] RSP: 0018:ffff88007ca5dd08  EFLAGS: 00010282
      [ 1845.185026] RAX: 0000000000002000 RBX: ffffffffa01600d0 RCX: ffffffff8177d638
      [ 1845.185026] RDX: ffff88007ca5dc18 RSI: 0000000000000000 RDI: ffffffffa01600d0
      [ 1845.185026] RBP: ffff88007ca5dd18 R08: ffffffff824e9810 R09: ffffffffffffffff
      [ 1845.185026] R10: ffff8800ffffffff R11: dead4ead00000001 R12: ffffffff81a95040
      [ 1845.185026] R13: ffff88007b27a960 R14: ffff88007c1f1600 R15: 0000000000000000
      [ 1845.185026] FS:  0000000000000000(0000) GS:ffffffff81a23000(0000) knlGS:0000000000000000
      [ 1845.185026] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1845.185026] CR2: ffffffffa01601d0 CR3: 0000000037207000 CR4: 00000000000006b0
      [ 1845.185026] Stack:
      [ 1845.185026]  ffff88007c1f1600 ffff88007c1f1600 ffff88007ca5dd38 ffffffff812cdb7e
      [ 1845.185026]  0000000000000000 ffff88007c1f1640 ffff88007ca5dd68 ffffffff812cdbfe
      [ 1845.185026]  ffff88007c974800 ffff88007c1f1640 ffff88007ff61a00 0000000000000000
      [ 1845.185026] Call Trace:
      [ 1845.185026]  [<ffffffff812cdb7e>] kobject_del+0x2e/0x40
      [ 1845.185026]  [<ffffffff812cdbfe>] kobject_delayed_cleanup+0x6e/0x1d0
      [ 1845.185026]  [<ffffffff81063a45>] process_one_work+0x1e5/0x670
      [ 1845.185026]  [<ffffffff810639e3>] ? process_one_work+0x183/0x670
      [ 1845.185026]  [<ffffffff810642b3>] worker_thread+0x113/0x370
      [ 1845.185026]  [<ffffffff810641a0>] ? rescuer_thread+0x290/0x290
      [ 1845.185026]  [<ffffffff8106bfba>] kthread+0xda/0xe0
      [ 1845.185026]  [<ffffffff814ff0f0>] ? _raw_spin_unlock_irq+0x30/0x60
      [ 1845.185026]  [<ffffffff8106bee0>] ? kthread_create_on_node+0x130/0x130
      [ 1845.185026]  [<ffffffff8150751a>] ret_from_fork+0x7a/0xb0
      [ 1845.185026]  [<ffffffff8106bee0>] ? kthread_create_on_node+0x130/0x130
      [ 1845.185026] Code: 81 48 c7 c7 28 95 ad 81 31 c0 e8 9b da 01 00 e9 4f ff ff ff 66 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1d <f6> 87 00 01 00 00 01 74 1e 48 8d 7b 38 83 6b 38 01 0f 94 c0 84
      [ 1845.185026] RIP  [<ffffffff812cda81>] kobject_put+0x11/0x60
      [ 1845.185026]  RSP <ffff88007ca5dd08>
      [ 1845.185026] CR2: ffffffffa01601d0
      [ 1845.185026] ---[ end trace 49a70afd109f5653 ]---
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      942e4431
    • L
      lockref: implement lockless reference count updates using cmpxchg() · bc08b449
      Linus Torvalds 提交于
      Instead of taking the spinlock, the lockless versions atomically check
      that the lock is not taken, and do the reference count update using a
      cmpxchg() loop.  This is semantically identical to doing the reference
      count update protected by the lock, but avoids the "wait for lock"
      contention that you get when accesses to the reference count are
      contended.
      
      Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
      Even when the lockref reference counts are updated atomically with
      cmpxchg, the fact that they also verify the state of the spinlock means
      that the lockless updates can never happen while somebody else holds the
      spinlock.
      
      So while "lockref_put_or_lock()" looks a lot like just another name for
      "atomic_dec_and_lock()", and both optimize to lockless updates, they are
      fundamentally different: the decrement done by atomic_dec_and_lock() is
      truly independent of any lock (as long as it doesn't decrement to zero),
      so a locked region can still see the count change.
      
      The lockref structure, in contrast, really is a *locked* reference
      count.  If you hold the spinlock, the reference count will be stable and
      you can modify the reference count without using atomics, because even
      the lockless updates will see and respect the state of the lock.
      
      In order to enable the cmpxchg lockless code, the architecture needs to
      do three things:
      
       (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
           in an aligned u64, and have a "cmpxchg()" implementation that works
           on such a u64 data type.
      
       (2) define a helper function to test for a spinlock being unlocked
           ("arch_spin_value_unlocked()")
      
       (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
           Kconfig file.
      
      This enables it for x86-64 (but not 32-bit, we'd need to make sure
      cmpxchg() turns into the proper cmpxchg8b in order to enable it for
      32-bit mode).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc08b449
    • L
      lockref: uninline lockref helper functions · 2f4f12e5
      Linus Torvalds 提交于
      They aren't very good to inline, since they already call external
      functions (the spinlock code), and we're going to create rather more
      complicated versions of them that can do the reference count updates
      locklessly.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f4f12e5
    • L
      vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock() · 15570086
      Linus Torvalds 提交于
      This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c
      and re-implements it using the lockref infrastructure instead.  It also
      adds a lot of comments about what is actually going on, because turning
      a dentry that was looked up using RCU into a long-lived reference
      counted entry is one of the more subtle parts of the rcu walk.
      
      We also used to be _particularly_ subtle in unlazy_walk() where we
      re-validate both the dentry and its parent using the same sequence
      count.  We used to do it by nesting the locks and then verifying the
      sequence count just once.
      
      That was silly, because nested locking is expensive, but the sequence
      count check is not.  So this just re-validates the dentry and the parent
      separately, avoiding the nested locking, and making the lockref lookup
      possible.
      Acked-by: NWaiman Long <waiman.long@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15570086
    • L
      lockref: add 'lockref_get_or_lock() helper · b3abd802
      Linus Torvalds 提交于
      This behaves like "lockref_get_not_zero()", but instead of doing nothing
      if the count was zero, it returns with the lock held.
      
      This allows callers to revalidate the lockref-protected data structure
      if required even if the count was zero to begin with, and possibly
      increment the count if it passes muster.
      
      In particular, the dentry code wants this when it wants to turn an
      RCU-protected dentry into a stable refcounted one: if the dentry count
      it zero, but the sequence number still validates the dentry, we can take
      a reference to it.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3abd802
  5. 02 9月, 2013 2 次提交
  6. 01 9月, 2013 1 次提交
    • P
      nohz_full: Add full-system-idle state machine · 0edd1b17
      Paul E. McKenney 提交于
      This commit adds the state machine that takes the per-CPU idle data
      as input and produces a full-system-idle indication as output.  This
      state machine is driven out of RCU's quiescent-state-forcing
      mechanism, which invokes rcu_sysidle_check_cpu() to collect per-CPU
      idle state and then rcu_sysidle_report() to drive the state machine.
      
      The full-system-idle state is sampled using rcu_sys_is_idle(), which
      also drives the state machine if RCU is idle (and does so by forcing
      RCU to become non-idle).  This function returns true if all but the
      timekeeping CPU (tick_do_timer_cpu) are idle and have been idle long
      enough to avoid memory contention on the full_sysidle_state state
      variable.  The rcu_sysidle_force_exit() may be called externally
      to reset the state machine back into non-idle state.
      
      For large systems the state machine is driven out of RCU's
      force-quiescent-state logic, which provides good scalability at the price
      of millisecond-scale latencies on the transition to full-system-idle
      state.  This is not so good for battery-powered systems, which are usually
      small enough that they don't need to care about scalability, but which
      do care deeply about energy efficiency.  Small systems therefore drive
      the state machine directly out of the idle-entry code.  The number of
      CPUs in a "small" system is defined by a new NO_HZ_FULL_SYSIDLE_SMALL
      Kconfig parameter, which defaults to 8.  Note that this is a build-time
      definition.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      [ paulmck: Use true and false for boolean constants per Lai Jiangshan. ]
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Simplify logic and provide better comments for memory barriers,
        based on review comments and questions by Lai Jiangshan. ]
      0edd1b17
  7. 31 8月, 2013 2 次提交
  8. 30 8月, 2013 16 次提交