1. 24 7月, 2014 3 次提交
    • P
      of: Transactional DT support. · 201c910b
      Pantelis Antoniou 提交于
      Introducing DT transactional support.
      
      A DT transaction is a method which allows one to apply changes
      in the live tree, in such a way that either the full set of changes
      take effect, or the state of the tree can be rolled-back to the
      state it was before it was attempted. An applied transaction
      can be rolled-back at any time.
      
      Documentation is in
      	Documentation/devicetree/changesets.txt
      Signed-off-by: NPantelis Antoniou <pantelis.antoniou@konsulko.com>
      [glikely: Removed device notifiers and reworked to be more consistent]
      Signed-off-by: NGrant Likely <grant.likely@linaro.org>
      201c910b
    • G
      of: Reorder device tree changes and notifiers · 259092a3
      Grant Likely 提交于
      Currently, devicetree reconfig notifiers get emitted before the change
      is applied to the tree, but that behaviour is problematic if the
      receiver wants the determine the new state of the tree. The current
      users don't care, but the changeset code to follow will be making
      multiple changes at once. Reorder notifiers to get emitted after the
      change has been applied to the tree so that callbacks see the new tree
      state.
      
      At the same time, fixup the existing callbacks to expect the new order.
      There are a few callbacks that compare the old and new values of a
      changed property. Put both property pointers into the of_prop_reconfig
      structure.
      
      The current notifiers also allow the notifier callback to fail and
      cancel the change to the tree, but that feature isn't actually used.
      It really isn't valid to ignore a tree modification provided by firmware
      anyway, so remove the ability to cancel a change to the tree.
      Signed-off-by: NGrant Likely <grant.likely@linaro.org>
      Cc: Nathan Fontenot <nfont@austin.ibm.com>
      259092a3
    • G
      of: Make devicetree sysfs update functions consistent. · 8a2b22a2
      Grant Likely 提交于
      All of the DT modification functions are split into two parts, the first
      part manipulates the DT data structure, and the second part updates
      sysfs, but the code isn't very consistent about how the second half is
      called. They don't all enforce the same rules about when it is valid to
      update sysfs, and there isn't any clarity on locking.
      
      The transactional DT modification feature that is coming also needs
      access to these functions so that it can perform all the structure
      changes together, and then all the sysfs updates as a second stage
      instead of doing each one at a time.
      
      Fix up the second have by creating a separate __of_*_sysfs() function
      for each of the helpers. The new functions have consistent naming (ie.
      of_node_add() becomes __of_attach_node_sysfs()) and all of them now
      defer if of_init hasn't been called yet.
      
      Callers of the new functions must hold the of_mutex to ensure there are
      no race conditions with of_init(). The mutex ensures that there will
      only ever be one writer to the tree at any given time. There can still
      be any number of readers and the raw_spin_lock is still used to make
      sure access to the data structure is still consistent.
      
      Finally, put the function prototypes into of_private.h so they are
      accessible to the transaction code.
      Signed-off-by: NPantelis Antoniou <pantelis.antoniou@konsulko.com>
      [grant.likely: Changed suffix from _post to _sysfs to match existing code]
      [grant.likely: Reorganized to eliminate trivial wrappers]
      Signed-off-by: NGrant Likely <grant.likely@linaro.org>
      8a2b22a2
  2. 07 7月, 2014 1 次提交
    • G
      of/platform: Fix of_platform_device_destroy iteration of devices · 75f353b6
      Grant Likely 提交于
      of_platform_destroy does not work properly, since the tree
      population test was iterating on all devices having as its parent
      the given platform device.
      
      The check was intended to check whether any other platform or amba
      devices created by of_platform_populate were still populated, but
      instead checked for every kind of device. This is wrong, since platform
      devices typically create a subsystem regular device and set themselves
      as parents.
      
      Instead, go ahead and call the unregister functions for any devices
      created with of_platform_populate. The driver core will take care of
      unbinding drivers, and drivers are responsible for getting rid of any
      child devices that weren't created by of_platform_populate.
      Signed-off-by: NGrant Likely <grant.likely@linaro.org>
      Signed-off-by: NPantelis Antoniou <pantelis.antoniou@konsulko.com>
      75f353b6
  3. 04 7月, 2014 1 次提交
    • T
      ptrace,x86: force IRET path after a ptrace_stop() · b9cd18de
      Tejun Heo 提交于
      The 'sysret' fastpath does not correctly restore even all regular
      registers, much less any segment registers or reflags values.  That is
      very much part of why it's faster than 'iret'.
      
      Normally that isn't a problem, because the normal ptrace() interface
      catches the process using the signal handler infrastructure, which
      always returns with an iret.
      
      However, some paths can get caught using ptrace_event() instead of the
      signal path, and for those we need to make sure that we aren't going to
      return to user space using 'sysret'.  Otherwise the modifications that
      may have been done to the register set by the tracer wouldn't
      necessarily take effect.
      
      Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
      arch_ptrace_stop_needed() which is invoked from ptrace_stop().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9cd18de
  4. 03 7月, 2014 1 次提交
    • T
      kernfs: kernfs_notify() must be useable from non-sleepable contexts · ecca47ce
      Tejun Heo 提交于
      d911d987 ("kernfs: make kernfs_notify() trigger inotify events
      too") added fsnotify triggering to kernfs_notify() which requires a
      sleepable context.  There are already existing users of
      kernfs_notify() which invoke it from an atomic context and in general
      it's silly to require a sleepable context for triggering a
      notification.
      
      The following is an invalid context bug triggerd by md invoking
      sysfs_notify() from IO completion path.
      
       BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
       in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
       2 locks held by swapper/1/0:
        #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
        #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
       irq event stamp: 33518
       hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
       hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
       softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
       softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
        0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
        0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
       Call Trace:
        <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
        [<ffffffff810d4f14>] __might_sleep+0x184/0x240
        [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
        [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
        [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
        [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
        [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
        [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
        [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
        [<ffffffff813b46c4>] blk_update_request+0x94/0x420
        [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
        [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
        [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
        [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
        [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
        [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
        [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
        [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
        [<ffffffff81119436>] handle_edge_irq+0x66/0x130
        [<ffffffff8101c3e4>] handle_irq+0x84/0x150
        [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
        [<ffffffff818122f2>] common_interrupt+0x72/0x72
        <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
        [<ffffffff81025454>] default_idle+0x24/0x230
        [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
        [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
        [<ffffffff8104df1b>] start_secondary+0x25b/0x300
      
      This patch fixes it by punting the notification delivery through a
      work item.  This ends up adding an extra pointer to kernfs_elem_attr
      enlarging kernfs_node by a pointer, which is not ideal but not a very
      big deal either.  If this turns out to be an actual issue, we can move
      kernfs_elem_attr->size to kernfs_node->iattr later.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecca47ce
  5. 01 7月, 2014 1 次提交
  6. 28 6月, 2014 1 次提交
  7. 27 6月, 2014 1 次提交
    • A
      Fix 32-bit regression in block device read(2) · 0b86dbf6
      Al Viro 提交于
      blkdev_read_iter() wants to cap the iov_iter by the amount of data
      remaining to the end of device.  That's what iov_iter_truncate() is for
      (trim iter->count if it's above the given limit).  So far, so good, but
      the argument of iov_iter_truncate() is size_t, so on 32bit boxen (in
      case of a large device) we end up with that upper limit truncated down
      to 32 bits *before* comparing it with iter->count.
      
      Easily fixed by making iov_iter_truncate() take 64bit argument - it does
      the right thing after such change (we only reach the assignment in there
      when the current value of iter->count is greater than the limit, i.e.
      for anything that would get truncated we don't reach the assignment at
      all) and that argument is not the new value of iter->count - it's an
      upper limit for such.
      
      The overhead of passing u64 is not an issue - the thing is inlined, so
      callers passing size_t won't pay any penalty.
      Reported-and-tested-by: NTheodore Tso <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: NAlan Cox <gnomes@lxorguk.ukuu.org.uk>
      Tested-by: NBruno Wolff III <bruno@wolff.to>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b86dbf6
  8. 25 6月, 2014 2 次提交
  9. 24 6月, 2014 3 次提交
    • A
      kernel/watchdog.c: print traces for all cpus on lockup detection · ed235875
      Aaron Tomlin 提交于
      A 'softlockup' is defined as a bug that causes the kernel to loop in
      kernel mode for more than a predefined period to time, without giving
      other tasks a chance to run.
      
      Currently, upon detection of this condition by the per-cpu watchdog
      task, debug information (including a stack trace) is sent to the system
      log.
      
      On some occasions, we have observed that the "victim" rather than the
      actual "culprit" (i.e.  the owner/holder of the contended resource) is
      reported to the user.  Often this information has proven to be
      insufficient to assist debugging efforts.
      
      To avoid loss of useful debug information, for architectures which
      support NMI, this patch makes it possible to improve soft lockup
      reporting.  This is accomplished by issuing an NMI to each cpu to obtain
      a stack trace.
      
      If NMI is not supported we just revert back to the old method.  A sysctl
      and boot-time parameter is available to toggle this feature.
      
      [dzickus@redhat.com: add CONFIG_SMP in certain areas]
      [akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
      [mq@suse.cz: fix warning]
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NJan Moskyto Matejka <mq@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed235875
    • A
      nmi: provide the option to issue an NMI back trace to every cpu but current · f3aca3d0
      Aaron Tomlin 提交于
      Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
      routine when one wants to avoid capturing a back trace for current.  For
      instance if one was previously captured recently.
      
      This patch provides a new routine namely
      trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
      an NMI to every cpu but current and capture a back trace accordingly.
      
      Patch x86 and sparc to support new routine.
      
      [dzickus@redhat.com: add stub in #else clause]
      [dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
      [sfr@canb.auug.org.au: undo C99ism]
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3aca3d0
    • P
      kexec: save PG_head_mask in VMCOREINFO · b3acc56b
      Petr Tesarik 提交于
      To allow filtering of huge pages, makedumpfile must be able to identify
      them in the dump.  This can be done by checking the appropriate page
      flag, so communicate its value to makedumpfile through the VMCOREINFO
      interface.
      
      There's only one small catch.  Depending on how many page flags are
      available on a given architecture, this bit can be called PG_head or
      PG_compound.
      
      I sent a similar patch back in 2012, but Eric Biederman did not like
      using an #ifdef.  So, this time I'm adding a common symbol
      (PG_head_mask) instead.
      
      See https://lkml.org/lkml/2012/11/28/91 for the previous version.
      Signed-off-by: NPetr Tesarik <ptesarik@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3acc56b
  10. 23 6月, 2014 1 次提交
  11. 22 6月, 2014 1 次提交
  12. 18 6月, 2014 2 次提交
  13. 17 6月, 2014 1 次提交
  14. 15 6月, 2014 2 次提交
  15. 13 6月, 2014 1 次提交
    • K
      NVMe: Fix hot cpu notification dead lock · f3db22fe
      Keith Busch 提交于
      There is a potential dead lock if a cpu event occurs during nvme probe
      since it registered with hot cpu notification. This fixes the race by
      having the module register with notification outside of probe rather
      than have each device register.
      
      The actual work is done in a scheduled work queue instead of in the
      notifier since assigning IO queues has the potential to block if the
      driver creates additional queues.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      f3db22fe
  16. 12 6月, 2014 10 次提交
  17. 11 6月, 2014 8 次提交