1. 30 7月, 2014 1 次提交
  2. 24 7月, 2014 1 次提交
  3. 23 7月, 2014 1 次提交
    • T
      libata: introduce ata_host->n_tags to avoid oops on SAS controllers · 1a112d10
      Tejun Heo 提交于
      1871ee13 ("libata: support the ata host which implements a queue
      depth less than 32") directly used ata_port->scsi_host->can_queue from
      ata_qc_new() to determine the number of tags supported by the host;
      unfortunately, SAS controllers doing SATA don't initialize ->scsi_host
      leading to the following oops.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
       IP: [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
       PGD 0
       Oops: 0002 [#1] SMP
       Modules linked in: isci libsas scsi_transport_sas mgag200 drm_kms_helper ttm
       CPU: 1 PID: 518 Comm: udevd Not tainted 3.16.0-rc6+ #62
       Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
       task: ffff880c1a00b280 ti: ffff88061a000000 task.ti: ffff88061a000000
       RIP: 0010:[<ffffffff814e0618>]  [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
       RSP: 0018:ffff88061a003ae8  EFLAGS: 00010012
       RAX: 0000000000000001 RBX: ffff88000241ca80 RCX: 00000000000000fa
       RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8806194aa298
       RBP: ffff88061a003ae8 R08: ffff8806194a8000 R09: 0000000000000000
       R10: 0000000000000000 R11: ffff88000241ca80 R12: ffff88061ad58200
       R13: ffff8806194aa298 R14: ffffffff814e67a0 R15: ffff8806194a8000
       FS:  00007f3ad7fe3840(0000) GS:ffff880627620000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000058 CR3: 000000061a118000 CR4: 00000000001407e0
       Stack:
        ffff88061a003b20 ffffffff814e96e1 ffff88000241ca80 ffff88061ad58200
        ffff8800b6bf6000 ffff880c1c988000 ffff880619903850 ffff88061a003b68
        ffffffffa0056ce1 ffff88061a003b48 0000000013d6e6f8 ffff88000241ca80
       Call Trace:
        [<ffffffff814e96e1>] ata_sas_queuecmd+0xa1/0x430
        [<ffffffffa0056ce1>] sas_queuecommand+0x191/0x220 [libsas]
        [<ffffffff8149afee>] scsi_dispatch_cmd+0x10e/0x300
        [<ffffffff814a3bc5>] scsi_request_fn+0x2f5/0x550
        [<ffffffff81317613>] __blk_run_queue+0x33/0x40
        [<ffffffff8131781a>] queue_unplugged+0x2a/0x90
        [<ffffffff8131ceb4>] blk_flush_plug_list+0x1b4/0x210
        [<ffffffff8131d274>] blk_finish_plug+0x14/0x50
        [<ffffffff8117eaa8>] __do_page_cache_readahead+0x198/0x1f0
        [<ffffffff8117ee21>] force_page_cache_readahead+0x31/0x50
        [<ffffffff8117ee7e>] page_cache_sync_readahead+0x3e/0x50
        [<ffffffff81172ac6>] generic_file_read_iter+0x496/0x5a0
        [<ffffffff81219897>] blkdev_read_iter+0x37/0x40
        [<ffffffff811e307e>] new_sync_read+0x7e/0xb0
        [<ffffffff811e3734>] vfs_read+0x94/0x170
        [<ffffffff811e43c6>] SyS_read+0x46/0xb0
        [<ffffffff811e33d1>] ? SyS_lseek+0x91/0xb0
        [<ffffffff8171ee29>] system_call_fastpath+0x16/0x1b
       Code: 00 00 00 88 50 29 83 7f 08 01 19 d2 83 e2 f0 83 ea 50 88 50 34 c6 81 1d 02 00 00 40 c6 81 17 02 00 00 00 5d c3 66 0f 1f 44 00 00 <89> 14 25 58 00 00 00
      
      Fix it by introducing ata_host->n_tags which is initialized to
      ATA_MAX_QUEUE - 1 in ata_host_init() for SAS controllers and set to
      scsi_host_template->can_queue in ata_host_register() for !SAS ones.
      As SAS hosts are never registered, this will give them the same
      ATA_MAX_QUEUE - 1 as before.  Note that we can't use
      scsi_host->can_queue directly for SAS hosts anyway as they can go
      higher than the libata maximum.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMike Qiu <qiudayu@linux.vnet.ibm.com>
      Reported-by: NJesse Brandeburg <jesse.brandeburg@gmail.com>
      Reported-by: NPeter Hurley <peter@hurleysoftware.com>
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Fixes: 1871ee13 ("libata: support the ata host which implements a queue depth less than 32")
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: stable@vger.kernel.org
      1a112d10
  4. 18 7月, 2014 1 次提交
    • B
      cpufreq: make table sentinel macros unsigned to match use · 2b1987a9
      Brian W Hart 提交于
      Commit 5eeaf1f1 (cpufreq: Fix build error on some platforms that
      use cpufreq_for_each_*) moved function cpufreq_next_valid() to a public
      header.  Warnings are now generated when objects including that header
      are built with -Wsign-compare (as an out-of-tree module might be):
      
      .../include/linux/cpufreq.h: In function ‘cpufreq_next_valid’:
      .../include/linux/cpufreq.h:519:27: warning: comparison between signed
      and unsigned integer expressions [-Wsign-compare]
        while ((*pos)->frequency != CPUFREQ_TABLE_END)
                                 ^
      .../include/linux/cpufreq.h:520:25: warning: comparison between signed
      and unsigned integer expressions [-Wsign-compare]
         if ((*pos)->frequency != CPUFREQ_ENTRY_INVALID)
                               ^
      
      Constants CPUFREQ_ENTRY_INVALID and CPUFREQ_TABLE_END are signed, but
      are used with unsigned member 'frequency' of cpufreq_frequency_table.
      Update the macro definitions to be explicitly unsigned to match their
      use.
      
      This also corrects potentially wrong behavior of clk_rate_table_iter()
      if unsigned long is wider than usigned int.
      
      Fixes: 5eeaf1f1 (cpufreq: Fix build error on some platforms that use cpufreq_for_each_*)
      Signed-off-by: NBrian W Hart <hartb@linux.vnet.ibm.com>
      Reviewed-by: NSimon Horman <horms+renesas@verge.net.au>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2b1987a9
  5. 16 7月, 2014 6 次提交
    • D
      locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER · 5db6c6fe
      Davidlohr Bueso 提交于
      Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER),
      encapsulate the dependencies for rwsem optimistic spinning.
      No logical changes here as it continues to depend on both
      SMP and the XADD algorithm variant.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NJason Low <jason.low2@hp.com>
      [ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ]
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com
      Cc: aswin@hp.com
      Cc: Chris Mason <clm@fb.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5db6c6fe
    • J
      locking/rwsem: Reduce the size of struct rw_semaphore · ce069fc9
      Jason Low 提交于
      Recent optimistic spinning additions to rwsem provide significant performance
      benefits on many workloads on large machines. The cost of it was increasing
      the size of the rwsem structure by up to 128 bits.
      
      However, now that the previous patches in this series bring the overhead of
      struct optimistic_spin_queue to 32 bits, this patch reorders some fields in
      struct rw_semaphore such that we can reduce the overhead of the rwsem structure
      by 64 bits (on 64 bit systems).
      
      The extra overhead required for rwsem optimistic spinning would now be up
      to 8 additional bytes instead of up to 16 bytes. Additionally, the size of
      rwsem would now be more in line with mutexes.
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-6-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce069fc9
    • P
      locking/rwsem: Rename 'activity' to 'count' · 13b9a962
      Peter Zijlstra 提交于
      There are two definitions of struct rw_semaphore, one in linux/rwsem.h
      and one in linux/rwsem-spinlock.h.
      
      For some reason they have different names for the initial field. This
      makes it impossible to use C99 named initialization for
      __RWSEM_INITIALIZER() -- or we have to duplicate that entire thing
      along with the structure definitions.
      
      The simpler patch is renaming the rwsem-spinlock variant to match the
      regular rwsem.
      
      This allows us to switch to C99 named initialization.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-bmrZolsbGmautmzrerog27io@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      13b9a962
    • J
      locking/spinlocks/mcs: Introduce and use init macro and function for osq locks · 4d9d951e
      Jason Low 提交于
      Currently, we initialize the osq lock by directly setting the lock's values. It
      would be preferable if we use an init macro to do the initialization like we do
      with other locks.
      
      This patch introduces and uses a macro and function for initializing the osq lock.
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-4-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4d9d951e
    • J
      locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead · 90631822
      Jason Low 提交于
      The cancellable MCS spinlock is currently used to queue threads that are
      doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
      the lock would access and queue the local node corresponding to the CPU that
      it's running on. Currently, the cancellable MCS lock is implemented by using
      pointers to these nodes.
      
      In this patch, instead of operating on pointers to the per-cpu nodes, we
      store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
      A similar concept is used with the qspinlock.
      
      By operating on the CPU # of the nodes using atomic_t instead of pointers
      to those nodes, this can reduce the overhead of the cancellable MCS spinlock
      by 32 bits (on 64 bit systems).
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      90631822
    • J
      locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node() · 046a619d
      Jason Low 提交于
      Currently, the per-cpu nodes structure for the cancellable MCS spinlock is
      named "optimistic_spin_queue". However, in a follow up patch in the series
      we will be introducing a new structure that serves as the new "handle" for
      the lock. It would make more sense if that structure is named
      "optimistic_spin_queue". Additionally, since the current use of the
      "optimistic_spin_queue" structure are  "nodes", it might be better if we
      rename them to "node" anyway.
      
      This preparatory patch renames all current "optimistic_spin_queue"
      to "optimistic_spin_node".
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Scott Norton <scott.norton@hp.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      046a619d
  6. 04 7月, 2014 1 次提交
    • T
      ptrace,x86: force IRET path after a ptrace_stop() · b9cd18de
      Tejun Heo 提交于
      The 'sysret' fastpath does not correctly restore even all regular
      registers, much less any segment registers or reflags values.  That is
      very much part of why it's faster than 'iret'.
      
      Normally that isn't a problem, because the normal ptrace() interface
      catches the process using the signal handler infrastructure, which
      always returns with an iret.
      
      However, some paths can get caught using ptrace_event() instead of the
      signal path, and for those we need to make sure that we aren't going to
      return to user space using 'sysret'.  Otherwise the modifications that
      may have been done to the register set by the tracer wouldn't
      necessarily take effect.
      
      Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
      arch_ptrace_stop_needed() which is invoked from ptrace_stop().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9cd18de
  7. 03 7月, 2014 2 次提交
    • A
      net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map · 35f6f453
      Amir Vadai 提交于
      IRQ affinity notifier can only have a single notifier - cpu_rmap
      notifier. Can't use it to track changes in IRQ affinity map.
      Detect IRQ affinity changes by comparing CPU to current IRQ affinity map
      during NAPI poll thread.
      
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ben Hutchings <ben@decadent.org.uk>
      Fixes: 2eacc23c ("net/mlx4_core: Enforce irq affinity changes immediatly")
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f6f453
    • T
      kernfs: kernfs_notify() must be useable from non-sleepable contexts · ecca47ce
      Tejun Heo 提交于
      d911d987 ("kernfs: make kernfs_notify() trigger inotify events
      too") added fsnotify triggering to kernfs_notify() which requires a
      sleepable context.  There are already existing users of
      kernfs_notify() which invoke it from an atomic context and in general
      it's silly to require a sleepable context for triggering a
      notification.
      
      The following is an invalid context bug triggerd by md invoking
      sysfs_notify() from IO completion path.
      
       BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
       in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
       2 locks held by swapper/1/0:
        #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
        #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
       irq event stamp: 33518
       hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
       hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
       softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
       softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
        0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
        0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
       Call Trace:
        <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
        [<ffffffff810d4f14>] __might_sleep+0x184/0x240
        [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
        [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
        [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
        [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
        [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
        [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
        [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
        [<ffffffff813b46c4>] blk_update_request+0x94/0x420
        [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
        [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
        [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
        [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
        [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
        [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
        [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
        [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
        [<ffffffff81119436>] handle_edge_irq+0x66/0x130
        [<ffffffff8101c3e4>] handle_irq+0x84/0x150
        [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
        [<ffffffff818122f2>] common_interrupt+0x72/0x72
        <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
        [<ffffffff81025454>] default_idle+0x24/0x230
        [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
        [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
        [<ffffffff8104df1b>] start_secondary+0x25b/0x300
      
      This patch fixes it by punting the notification delivery through a
      work item.  This ends up adding an extra pointer to kernfs_elem_attr
      enlarging kernfs_node by a pointer, which is not ideal but not a very
      big deal either.  If this turns out to be an actual issue, we can move
      kernfs_elem_attr->size to kernfs_node->iattr later.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecca47ce
  8. 02 7月, 2014 3 次提交
    • D
      net: fix circular dependency in of_mdio code · d9daa247
      Daniel Mack 提交于
      Commit 86f6cf41 (net: of_mdio: add of_mdiobus_link_phydev()) introduced a
      circular dependency between libphy and of_mdio.
      
      depmod: ERROR: <modroot>/kernel/drivers/net/phy/libphy.ko in
      dependency cycle!
      depmod: ERROR: <modroot>/kernel/drivers/of/of_mdio.ko in dependency cycle!
      
      The problem is that of_mdio.c references &mdio_bus_type and libphy now
      references of_mdiobus_link_phydev.
      
      Fix this by not exporting of_mdiobus_link_phydev() from of_mdio.ko.
      Make it a static function in mdio_bus.c instead.
      Signed-off-by: NDaniel Mack <zonque@gmail.com>
      Reported-by: NJeff Mahoney <jeffm@suse.com>
      Fixes: 86f6cf41 (net: of_mdio: add of_mdiobus_link_phydev())
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9daa247
    • G
      sched: Fix compiler warnings · b6220ad6
      Guenter Roeck 提交于
      Commit 143e1e28 (sched: Rework sched_domain topology definition)
      introduced a number of functions with a return value of 'const int'.
      gcc doesn't know what to do with that and, if the kernel is compiled
      with W=1, complains with the following warnings whenever sched.h
      is included.
      
        include/linux/sched.h:875:25: warning: type qualifiers ignored on function return type
        include/linux/sched.h:882:25: warning: type qualifiers ignored on function return type
        include/linux/sched.h:889:25: warning: type qualifiers ignored on function return type
        include/linux/sched.h:1002:21: warning: type qualifiers ignored on function return type
      
      Commits fb2aa855 (sched, ARM: Create a dedicated scheduler topology table)
      and 607b45e9 (sched, powerpc: Create a dedicated topology table) introduce
      the same warning in the arm and powerpc code.
      
      Drop 'const' from the function declarations to fix the problem.
      
      The fix for all three patches has to be applied together to avoid
      compilation failures for the affected architectures.
      Acked-by: NVincent Guittot <vincent.guittot@linaro.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1403658329-13196-1-git-send-email-linux@roeck-us.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b6220ad6
    • Z
      core: fix typo in percpu read_mostly section · 330d2822
      Zhengyu He 提交于
      This fixes a typo that named the read_mostly section of percpu as
      readmostly. It works fine with SMP because the linker script specifies
      .data..percpu..readmostly. However, UP kernel builds don't have percpu
      sections defined and the non-percpu version of the section is called
      data..read_mostly, so .data..readmostly will float around and may break
      things unexpectedly.
      
      Looking at the original change that introduced data..percpu..readmostly
      (commit c957ef2c), it looks like this
      was the original intention.
      
      Tested: Built UP kernel and confirmed the sections got merged.
      
      - Before the patch:
      $ objdump -h vmlinux.o  | grep '\.data\.\.read.*mostly'
      38 .data..read_mostly 00004418  0000000000000000  0000000000000000  00431ac0  2**6
      50 .data..readmostly 00000014  0000000000000000  0000000000000000  00444000  2**3
      
      - After the patch:
      $ objdump -h vmlinux.o  | grep '\.data\.\.read.*mostly'
      38 .data..read_mostly 00004438  0000000000000000  0000000000000000  00431ac0  2**6
      Signed-off-by: NZhengyu He <hzy@google.com>
      Signed-off-by: NFilipe Brandenburger <filbranden@google.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      330d2822
  9. 01 7月, 2014 1 次提交
  10. 30 6月, 2014 1 次提交
  11. 28 6月, 2014 1 次提交
  12. 27 6月, 2014 1 次提交
    • A
      Fix 32-bit regression in block device read(2) · 0b86dbf6
      Al Viro 提交于
      blkdev_read_iter() wants to cap the iov_iter by the amount of data
      remaining to the end of device.  That's what iov_iter_truncate() is for
      (trim iter->count if it's above the given limit).  So far, so good, but
      the argument of iov_iter_truncate() is size_t, so on 32bit boxen (in
      case of a large device) we end up with that upper limit truncated down
      to 32 bits *before* comparing it with iter->count.
      
      Easily fixed by making iov_iter_truncate() take 64bit argument - it does
      the right thing after such change (we only reach the assignment in there
      when the current value of iter->count is greater than the limit, i.e.
      for anything that would get truncated we don't reach the assignment at
      all) and that argument is not the new value of iter->count - it's an
      upper limit for such.
      
      The overhead of passing u64 is not an issue - the thing is inlined, so
      callers passing size_t won't pay any penalty.
      Reported-and-tested-by: NTheodore Tso <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: NAlan Cox <gnomes@lxorguk.ukuu.org.uk>
      Tested-by: NBruno Wolff III <bruno@wolff.to>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b86dbf6
  13. 25 6月, 2014 2 次提交
  14. 24 6月, 2014 5 次提交
    • A
      kernel/watchdog.c: print traces for all cpus on lockup detection · ed235875
      Aaron Tomlin 提交于
      A 'softlockup' is defined as a bug that causes the kernel to loop in
      kernel mode for more than a predefined period to time, without giving
      other tasks a chance to run.
      
      Currently, upon detection of this condition by the per-cpu watchdog
      task, debug information (including a stack trace) is sent to the system
      log.
      
      On some occasions, we have observed that the "victim" rather than the
      actual "culprit" (i.e.  the owner/holder of the contended resource) is
      reported to the user.  Often this information has proven to be
      insufficient to assist debugging efforts.
      
      To avoid loss of useful debug information, for architectures which
      support NMI, this patch makes it possible to improve soft lockup
      reporting.  This is accomplished by issuing an NMI to each cpu to obtain
      a stack trace.
      
      If NMI is not supported we just revert back to the old method.  A sysctl
      and boot-time parameter is available to toggle this feature.
      
      [dzickus@redhat.com: add CONFIG_SMP in certain areas]
      [akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
      [mq@suse.cz: fix warning]
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NJan Moskyto Matejka <mq@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed235875
    • A
      nmi: provide the option to issue an NMI back trace to every cpu but current · f3aca3d0
      Aaron Tomlin 提交于
      Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
      routine when one wants to avoid capturing a back trace for current.  For
      instance if one was previously captured recently.
      
      This patch provides a new routine namely
      trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
      an NMI to every cpu but current and capture a back trace accordingly.
      
      Patch x86 and sparc to support new routine.
      
      [dzickus@redhat.com: add stub in #else clause]
      [dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
      [sfr@canb.auug.org.au: undo C99ism]
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3aca3d0
    • P
      kexec: save PG_head_mask in VMCOREINFO · b3acc56b
      Petr Tesarik 提交于
      To allow filtering of huge pages, makedumpfile must be able to identify
      them in the dump.  This can be done by checking the appropriate page
      flag, so communicate its value to makedumpfile through the VMCOREINFO
      interface.
      
      There's only one small catch.  Depending on how many page flags are
      available on a given architecture, this bit can be called PG_head or
      PG_compound.
      
      I sent a similar patch back in 2012, but Eric Biederman did not like
      using an #ifdef.  So, this time I'm adding a common symbol
      (PG_head_mask) instead.
      
      See https://lkml.org/lkml/2012/11/28/91 for the previous version.
      Signed-off-by: NPetr Tesarik <ptesarik@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3acc56b
    • P
      rcu: Reduce overhead of cond_resched() checks for RCU · 4a81e832
      Paul E. McKenney 提交于
      Commit ac1bea85 (Make cond_resched() report RCU quiescent states)
      fixed a problem where a CPU looping in the kernel with but one runnable
      task would give RCU CPU stall warnings, even if the in-kernel loop
      contained cond_resched() calls.  Unfortunately, in so doing, it introduced
      performance regressions in Anton Blanchard's will-it-scale "open1" test.
      The problem appears to be not so much the increased cond_resched() path
      length as an increase in the rate at which grace periods complete, which
      increased per-update grace-period overhead.
      
      This commit takes a different approach to fixing this bug, mainly by
      moving the RCU-visible quiescent state from cond_resched() to
      rcu_note_context_switch(), and by further reducing the check to a
      simple non-zero test of a single per-CPU variable.  However, this
      approach requires that the force-quiescent-state processing send
      resched IPIs to the offending CPUs.  These will be sent only once
      the grace period has reached an age specified by the boot/sysfs
      parameter rcutree.jiffies_till_sched_qs, or once the grace period
      reaches an age halfway to the point at which RCU CPU stall warnings
      will be emitted, whichever comes first.
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
        ktest build robot.  Also fixed smp_mb() comment as noted by
        Oleg Nesterov. ]
      
      Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4a81e832
    • P
      rcu: Export debug_init_rcu_head() and and debug_init_rcu_head() · 546a9d85
      Paul E. McKenney 提交于
      Currently, call_rcu() relies on implicit allocation and initialization
      for the debug-objects handling of RCU callbacks.  If you hammer the
      kernel hard enough with Sasha's modified version of trinity, you can end
      up with the sl*b allocators recursing into themselves via this implicit
      call_rcu() allocation.
      
      This commit therefore exports the debug_init_rcu_head() and
      debug_rcu_head_free() functions, which permits the allocators to allocated
      and pre-initialize the debug-objects information, so that there no longer
      any need for call_rcu() to do that initialization, which in turn prevents
      the recursion into the memory allocators.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Looks-good-to: Christoph Lameter <cl@linux.com>
      546a9d85
  15. 23 6月, 2014 1 次提交
  16. 22 6月, 2014 1 次提交
  17. 18 6月, 2014 2 次提交
  18. 17 6月, 2014 1 次提交
  19. 15 6月, 2014 2 次提交
  20. 13 6月, 2014 1 次提交
    • K
      NVMe: Fix hot cpu notification dead lock · f3db22fe
      Keith Busch 提交于
      There is a potential dead lock if a cpu event occurs during nvme probe
      since it registered with hot cpu notification. This fixes the race by
      having the module register with notification outside of probe rather
      than have each device register.
      
      The actual work is done in a scheduled work queue instead of in the
      notifier since assigning IO queues has the potential to block if the
      driver creates additional queues.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      f3db22fe
  21. 12 6月, 2014 5 次提交