1. 18 7月, 2015 12 次提交
    • P
      rcu: Fix synchronize_sched_expedited() type error for "s" · 7fd0ddc5
      Paul E. McKenney 提交于
      The type of "s" has been "long" rather than the correct "unsigned long"
      for quite some time.  This commit fixes this type error.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7fd0ddc5
    • P
      rcu: Abstract funnel locking from synchronize_sched_expedited() · b09e5f86
      Paul E. McKenney 提交于
      This commit abstracts funnel locking from synchronize_sched_expedited()
      so that it may be used by synchronize_rcu_expedited().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b09e5f86
    • P
      rcu: Make synchronize_rcu_expedited() use sequence-counter scheme · 543c6158
      Paul E. McKenney 提交于
      Although synchronize_rcu_expedited() uses a sequence-counter scheme, it
      is based on a single increment per grace period, which means that tasks
      piggybacking off of concurrent grace periods may be forced to wait longer
      than necessary.  This commit therefore applies the new sequence-count
      functions developed for synchronize_sched_expedited() to speed things
      up a bit and to consolidate the sequence-counter implementation.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      543c6158
    • P
      rcu: Abstract sequence counting from synchronize_sched_expedited() · 28f00767
      Paul E. McKenney 提交于
      This commit creates rcu_exp_gp_seq_start() and rcu_exp_gp_seq_end() to
      bracket an expedited grace period, rcu_exp_gp_seq_snap() to snapshot the
      sequence counter, and rcu_exp_gp_seq_done() to check to see if a full
      expedited grace period has elapsed since the snapshot.  These will be
      applied to synchronize_rcu_expedited().  These are defined in terms of
      underlying rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), rcu_seq_done(),
      which will be applied to _rcu_barrier().
      
      One reason that this commit doesn't use the seqcount primitives themselves
      is that the smp_wmb() in those primitive is insufficient due to the fact
      that expedited grace periods do reads as well as writes.  In addition,
      the read-side seqcount primitives detect a potentially partial change,
      where the expedited primitives instead need a guaranteed full change.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      28f00767
    • P
      rcu: Make expedited GP CPU stoppage asynchronous · 3a6d7c64
      Peter Zijlstra 提交于
      Sequentially stopping the CPUs slows down expedited grace periods by
      at least a factor of two, based on rcutorture's grace-period-per-second
      rate.  This is a conservative measure because rcutorture uses unusually
      long RCU read-side critical sections and because rcutorture periodically
      quiesces the system in order to test RCU's ability to ramp down to and
      up from the idle state.  This commit therefore replaces the stop_one_cpu()
      with stop_one_cpu_nowait(), using an atomic-counter scheme to determine
      when all CPUs have passed through the stopped state.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3a6d7c64
    • P
      rcu: Get rid of synchronize_sched_expedited()'s polling loop · 385b73c0
      Paul E. McKenney 提交于
      This commit gets rid of synchronize_sched_expedited()'s mutex_trylock()
      polling loop in favor of a funnel-locking scheme based on the rcu_node
      tree.  The work-done check is done at each level of the tree, allowing
      high-contention situations to be resolved quickly with reasonable levels
      of mutex contention.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      385b73c0
    • P
      rcu: Rework synchronize_sched_expedited() counter handling · d6ada2cf
      Paul E. McKenney 提交于
      Now that synchronize_sched_expedited() have a mutex, it can use simpler
      work-already-done detection scheme.  This commit simplifies this scheme
      by using something similar to the sequence-locking counter scheme.
      A counter is incremented before and after each grace period, so that
      the counter is odd in the midst of the grace period and even otherwise.
      So if the counter has advanced to the second even number that is
      greater than or equal to the snapshot, the required grace period has
      already happened.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d6ada2cf
    • P
      rcu: Switch synchronize_sched_expedited() to stop_one_cpu() · c190c3b1
      Peter Zijlstra 提交于
      The synchronize_sched_expedited() currently invokes try_stop_cpus(),
      which schedules the stopper kthreads on each online non-idle CPU,
      and waits until all those kthreads are running before letting any
      of them stop.  This is disastrous for real-time workloads, which
      get hit with a preemption that is as long as the longest scheduling
      latency on any CPU, including any non-realtime housekeeping CPUs.
      This commit therefore switches to using stop_one_cpu() on each CPU
      in turn.  This avoids inflicting the worst-case scheduling latency
      on the worst-case CPU onto all other CPUs, and also simplifies the
      code a little bit.
      
      Follow-up commits will simplify the counter-snapshotting algorithm
      and convert a number of the counters that are now protected by the
      new ->expedited_mutex to non-atomic.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      [ paulmck: Kept stop_one_cpu(), dropped disabling of "guardrails". ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c190c3b1
    • P
      rcu: Remove CONFIG_RCU_CPU_STALL_INFO · 75c27f11
      Paul E. McKenney 提交于
      The CONFIG_RCU_CPU_STALL_INFO has been default-y for a couple of
      releases with no complaints, so it is time to eliminate this Kconfig
      option entirely, so that the long-form RCU CPU stall warnings cannot
      be disabled.  This commit does just that.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      75c27f11
    • P
      rcu: Stop disabling CPU hotplug in synchronize_rcu_expedited() · 9b683874
      Paul E. McKenney 提交于
      The fact that tasks could be migrated from leaf to root rcu_node
      structures meant that synchronize_rcu_expedited() had to disable
      CPU hotplug.  However, tasks now stay put, so this commit removes the
      CPU-hotplug disabling from synchronize_rcu_expedited().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9b683874
    • P
      rcu: Reset rcu_fanout_leaf if out of bounds · 13bd6494
      Paul E. McKenney 提交于
      Currently if the rcu_fanout_leaf boot parameter is out of bounds (that
      is, less than RCU_FANOUT_LEAF or greater than the number of bits in an
      unsigned long), a warning is issued and execution continues with the
      out-of-bounds value.  This can result in all manner of failures, so this
      patch resets rcu_fanout_leaf to RCU_FANOUT_LEAF when an out-of-bounds
      condition is detected.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      13bd6494
    • A
      rcu: Shut up bogus gcc array bounds warning · 032dfc87
      Alexander Gordeev 提交于
      Because gcc does not realize a loop would not be entered ever
      (i.e. in case of rcu_num_lvls == 1):
      
        for (i = 1; i < rcu_num_lvls; i++)
      	  rsp->level[i] = rsp->level[i - 1] + levelcnt[i - 1];
      
      some compiler (pre- 5.x?) versions give a bogus warning:
      
        kernel/rcu/tree.c: In function ‘rcu_init_one.isra.55’:
        kernel/rcu/tree.c:4108:13: warning: array subscript is above array bounds [-Warray-bounds]
           rsp->level[i] = rsp->level[i - 1] + rsp->levelcnt[i - 1];
                     ^
      Fix that warning by adding an extra item to rcu_state::level[]
      array. Once the bogus warning is fixed in gcc and kernel drops
      support of older versions, the dummy item may be removed from
      the array.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Suggested-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      032dfc87
  2. 16 7月, 2015 10 次提交
  3. 06 7月, 2015 2 次提交
    • L
      Linux 4.2-rc1 · d770e558
      Linus Torvalds 提交于
      d770e558
    • L
      Merge tag 'platform-drivers-x86-v4.2-2' of... · a585d2b7
      Linus Torvalds 提交于
      Merge tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
      
      Pull late x86 platform driver updates from Darren Hart:
       "The following came in a bit later and I wanted them to bake in next a
        few more days before submitting, thus the second pull.
      
        A new intel_pmc_ipc driver, a symmetrical allocation and free fix in
        dell-laptop, a couple minor fixes, and some updated documentation in
        the dell-laptop comments.
      
        intel_pmc_ipc:
         - Add Intel Apollo Lake PMC IPC driver
      
        tc1100-wmi:
         - Delete an unnecessary check before the function call "kfree"
      
        dell-laptop:
         - Fix allocating & freeing SMI buffer page
         - Show info about WiGig and UWB in debugfs
         - Update information about wireless control"
      
      * tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
        intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver
        tc1100-wmi: Delete an unnecessary check before the function call "kfree"
        dell-laptop: Fix allocating & freeing SMI buffer page
        dell-laptop: Show info about WiGig and UWB in debugfs
        dell-laptop: Update information about wireless control
      a585d2b7
  4. 05 7月, 2015 16 次提交
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 1dc51b82
      Linus Torvalds 提交于
      Pull more vfs updates from Al Viro:
       "Assorted VFS fixes and related cleanups (IMO the most interesting in
        that part are f_path-related things and Eric's descriptor-related
        stuff).  UFS regression fixes (it got broken last cycle).  9P fixes.
        fs-cache series, DAX patches, Jan's file_remove_suid() work"
      
      [ I'd say this is much more than "fixes and related cleanups".  The
        file_table locking rule change by Eric Dumazet is a rather big and
        fundamental update even if the patch isn't huge.   - Linus ]
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
        9p: cope with bogus responses from server in p9_client_{read,write}
        p9_client_write(): avoid double p9_free_req()
        9p: forgetting to cancel request on interrupted zero-copy RPC
        dax: bdev_direct_access() may sleep
        block: Add support for DAX reads/writes to block devices
        dax: Use copy_from_iter_nocache
        dax: Add block size note to documentation
        fs/file.c: __fget() and dup2() atomicity rules
        fs/file.c: don't acquire files->file_lock in fd_install()
        fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
        vfs: avoid creation of inode number 0 in get_next_ino
        namei: make set_root_rcu() return void
        make simple_positive() public
        ufs: use dir_pages instead of ufs_dir_pages()
        pagemap.h: move dir_pages() over there
        remove the pointless include of lglock.h
        fs: cleanup slight list_entry abuse
        xfs: Correctly lock inode when removing suid and file capabilities
        fs: Call security_ops->inode_killpriv on truncate
        fs: Provide function telling whether file_remove_privs() will do anything
        ...
      1dc51b82
    • L
      bluetooth: fix list handling · 9b284cbd
      Linus Torvalds 提交于
      Commit 835a6a2f ("Bluetooth: Stop sabotaging list poisoning")
      thought that the code was sabotaging the list poisoning when NULL'ing
      out the list pointers and removed it.
      
      But what was going on was that the bluetooth code was using NULL
      pointers for the list as a way to mark it empty, and that commit just
      broke it (and replaced the test with NULL with a "list_empty()" test on
      a uninitialized list instead, breaking things even further).
      
      So fix it all up to use the regular and real list_empty() handling
      (which does not use NULL, but a pointer to itself), also making sure to
      initialize the list properly (the previous NULL case was initialized
      implicitly by the session being allocated with kzalloc())
      
      This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong
      An.
      
      [ I would normally expect to get this through the bt tree, but I'm going
        to release -rc1, so I'm just committing this directly   - Linus ]
      Reported-and-tested-by: NJörg Otte <jrg.otte@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Original-by: NTedd Ho-Jeong An <tedd.an@intel.com>
      Original-by: Marcel Holtmann <marcel@holtmann.org>:
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b284cbd
    • L
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 5c755fe1
      Linus Torvalds 提交于
      Pull SCSI target updates from Nicholas Bellinger:
       "It's been a busy development cycle for target-core in a number of
        different areas.
      
        The fabric API usage for se_node_acl allocation is now within
        target-core code, dropping the external API callers for all fabric
        drivers tree-wide.
      
        There is a new conversion to RCU hlists for se_node_acl and
        se_portal_group LUN mappings, that turns fast-past LUN lookup into a
        completely lockless code-path.  It also removes the original
        hard-coded limitation of 256 LUNs per fabric endpoint.
      
        The configfs attributes for backends can now be shared between core
        and driver code, allowing existing drivers to use common code while
        still allowing flexibility for new backend provided attributes.
      
        The highlights include:
      
         - Merge sbc_verify_dif_* into common code (sagi)
         - Remove iscsi-target support for obsolete IFMarker/OFMarker
           (Christophe Vu-Brugier)
         - Add bidi support in target/user backend (ilias + vangelis + agover)
         - Move se_node_acl allocation into target-core code (hch)
         - Add crc_t10dif_update common helper (akinobu + mkp)
         - Handle target-core odd SGL mapping for data transfer memory
           (akinobu)
         - Move transport ID handling into target-core (hch)
         - Move task tag into struct se_cmd + support 64-bit tags (bart)
         - Convert se_node_acl->device_list[] to RCU hlist (nab + hch +
           paulmck)
         - Convert se_portal_group->tpg_lun_list[] to RCU hlist (nab + hch +
           paulmck)
         - Simplify target backend driver registration (hch)
         - Consolidate + simplify target backend attribute implementations
           (hch + nab)
         - Subsume se_port + t10_alua_tg_pt_gp_member into se_lun (hch)
         - Drop lun_sep_lock for se_lun->lun_se_dev RCU usage (hch + nab)
         - Drop unnecessary core_tpg_register TFO parameter (nab)
         - Use 64-bit LUNs tree-wide (hannes)
         - Drop left-over TARGET_MAX_LUNS_PER_TRANSPORT limit (hannes)"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (76 commits)
        target: Bump core version to v5.0
        target: remove target_core_configfs.h
        target: remove unused TARGET_CORE_CONFIG_ROOT define
        target: consolidate version defines
        target: implement WRITE_SAME with UNMAP bit using ->execute_unmap
        target: simplify UNMAP handling
        target: replace se_cmd->execute_rw with a protocol_data field
        target/user: Fix inconsistent kmap_atomic/kunmap_atomic
        target: Send UA when changing LUN inventory
        target: Send UA upon LUN RESET tmr completion
        target: Send UA on ALUA target port group change
        target: Convert se_lun->lun_deve_lock to normal spinlock
        target: use 'se_dev_entry' when allocating UAs
        target: Remove 'ua_nacl' pointer from se_ua structure
        target_core_alua: Correct UA handling when switching states
        xen-scsiback: Fix compile warning for 64-bit LUN
        target: Remove TARGET_MAX_LUNS_PER_TRANSPORT
        target: use 64-bit LUNs
        target: Drop duplicate + unused se_dev_check_wce
        target: Drop unnecessary core_tpg_register TFO parameter
        ...
      5c755fe1
    • L
      Merge tag 'ntb-4.2' of git://github.com/jonmason/ntb · 6d7c8e1b
      Linus Torvalds 提交于
      Pull NTB updates from Jon Mason:
       "This includes a pretty significant reworking of the NTB core code, but
        has already produced some significant performance improvements.
      
        An abstraction layer was added to allow the hardware and clients to be
        easily added.  This required rewriting the NTB transport layer for
        this abstraction layer.  This modification will allow future "high
        performance" NTB clients.
      
        In addition to this change, a number of performance modifications were
        added.  These changes include NUMA enablement, using CPU memcpy
        instead of asyncdma, and modification of NTB layer MTU size"
      
      * tag 'ntb-4.2' of git://github.com/jonmason/ntb: (22 commits)
        NTB: Add split BAR output for debugfs stats
        NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe
        NTB: Print driver name and version in module init
        NTB: Increase transport MTU to 64k from 16k
        NTB: Rename Intel code names to platform names
        NTB: Default to CPU memcpy for performance
        NTB: Improve performance with write combining
        NTB: Use NUMA memory in Intel driver
        NTB: Use NUMA memory and DMA chan in transport
        NTB: Rate limit ntb_qp_link_work
        NTB: Add tool test client
        NTB: Add ping pong test client
        NTB: Add parameters for Intel SNB B2B addresses
        NTB: Reset transport QP link stats on down
        NTB: Do not advance transport RX on link down
        NTB: Differentiate transport link down messages
        NTB: Check the device ID to set errata flags
        NTB: Enable link for Intel root port mode in probe
        NTB: Read peer info from local SPAD in transport
        NTB: Split ntb_hw_intel and ntb_transport drivers
        ...
      6d7c8e1b
    • A
      9p: cope with bogus responses from server in p9_client_{read,write} · 0f1db7de
      Al Viro 提交于
      if server claims to have written/read more than we'd told it to,
      warn and cap the claimed byte count to avoid advancing more than
      we are ready to.
      0f1db7de
    • A
      p9_client_write(): avoid double p9_free_req() · 67e808fb
      Al Viro 提交于
      Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
      if response is impossible to parse and we discard the request, get the
      out of the loop right there.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      67e808fb
    • A
      9p: forgetting to cancel request on interrupted zero-copy RPC · a84b69cb
      Al Viro 提交于
      If we'd already sent a request and decide to abort it, we *must*
      issue TFLUSH properly and not just blindly reuse the tag, or
      we'll get seriously screwed when response eventually arrives
      and we confuse it for response to later request that had reused
      the same tag.
      
      Cc: stable@vger.kernel.org # v3.2 and later
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a84b69cb
    • M
      dax: bdev_direct_access() may sleep · 43c3dd08
      Matthew Wilcox 提交于
      The brd driver is the only in-tree driver that may sleep currently.
      After some discussion on linux-fsdevel, we decided that any driver
      may choose to sleep in its ->direct_access method.  To ensure that all
      callers of bdev_direct_access() are prepared for this, add a call
      to might_sleep().
      Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      43c3dd08
    • M
      block: Add support for DAX reads/writes to block devices · bbab37dd
      Matthew Wilcox 提交于
      If a block device supports the ->direct_access methods, bypass the normal
      DIO path and use DAX to go straight to memcpy() instead of allocating
      a DIO and a BIO.
      
      Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in
      do_blockdev_direct_IO().
      Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bbab37dd
    • M
      dax: Use copy_from_iter_nocache · 872eb127
      Matthew Wilcox 提交于
      When userspace does a write, there's no need for the written data to
      pollute the CPU cache.  This matches the original XIP code.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      872eb127
    • M
      dax: Add block size note to documentation · 44f4c054
      Matthew Wilcox 提交于
      For block devices which are small enough, mkfs will default to creating
      a filesystem with block sizes smaller than page size.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      44f4c054
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 1b3618b6
      Linus Torvalds 提交于
      Pull kvm fixes from Paolo Bonzini:
       "Except for the preempt notifiers fix, these are all small bugfixes
        that could have been waited for -rc2.  Sending them now since I was
        taking care of Peter's patch anyway"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: add hyper-v crash msrs values
        KVM: x86: remove data variable from kvm_get_msr_common
        KVM: s390: virtio-ccw: don't overwrite config space values
        KVM: x86: keep track of LVT0 changes under APICv
        KVM: x86: properly restore LVT0
        KVM: x86: make vapics_in_nmi_mode atomic
        sched, preempt_notifier: separate notifier registration from static_key inc/dec
      1b3618b6
    • D
      NTB: Add split BAR output for debugfs stats · bf44fe46
      Dave Jiang 提交于
      When split BAR is enabled, the driver needs to dump out the split BAR
      registers rather than the original 64bit BAR registers.
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NJon Mason <jdmason@kudzu.us>
      bf44fe46
    • D
      NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe · fd839bf8
      Dave Jiang 提交于
      The unsafe doorbell and scratchpad access should display reason when
      WARN is called.  Otherwise we get a stack dump without any explanation.
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NJon Mason <jdmason@kudzu.us>
      fd839bf8
    • D
      NTB: Print driver name and version in module init · 7eb38781
      Dave Jiang 提交于
      Printouts driver name and version to indicate what is being loaded.
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NJon Mason <jdmason@kudzu.us>
      7eb38781
    • D
      NTB: Increase transport MTU to 64k from 16k · 9891417d
      Dave Jiang 提交于
      Benchmarking showed a significant performance increase with the MTU size
      to 64k instead of 16k.  Change the driver default to 64k.
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NJon Mason <jdmason@kudzu.us>
      9891417d