1. 13 4月, 2016 3 次提交
  2. 31 3月, 2016 7 次提交
    • Y
      sched/fair: Initiate a new task's util avg to a bounded value · 2b8c41da
      Yuyang Du 提交于
      A new task's util_avg is set to full utilization of a CPU (100% time
      running). This accelerates a new task's utilization ramp-up, useful to
      boost its execution in early time. However, it may result in
      (insanely) high utilization for a transient time period when a flood
      of tasks are spawned. Importantly, it violates the "fundamentally
      bounded" CPU utilization, and its side effect is negative if we don't
      take any measure to bound it.
      
      This patch proposes an algorithm to address this issue. It has
      two methods to approach a sensible initial util_avg:
      
      (1) An expected (or average) util_avg based on its cfs_rq's util_avg:
      
        util_avg = cfs_rq->util_avg / (cfs_rq->load_avg + 1) * se.load.weight
      
      (2) A trajectory of how successive new tasks' util develops, which
      gives 1/2 of the left utilization budget to a new task such that
      the additional util is noticeably large (when overall util is low) or
      unnoticeably small (when overall util is high enough). In the meantime,
      the aggregate utilization is well bounded:
      
        util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n
      
      where n denotes the nth task.
      
      If util_avg is larger than util_avg_cap, then the effective util is
      clamped to the util_avg_cap.
      Reported-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NYuyang Du <yuyang.du@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: morten.rasmussen@arm.com
      Cc: pjt@google.com
      Cc: steve.muckle@linaro.org
      Link: http://lkml.kernel.org/r/1459283456-21682-1-git-send-email-yuyang.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2b8c41da
    • Y
      sched/fair: Update comments after a variable rename · 1c3de5e1
      Yuyang Du 提交于
      The following commit:
      
        ed82b8a1 ("sched/core: Move the sched_to_prio[] arrays out of line")
      
      renamed prio_to_weight to sched_prio_to_weight, but the old name was not
      updated in comments.
      Signed-off-by: NYuyang Du <yuyang.du@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1459292871-22531-1-git-send-email-yuyang.du@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1c3de5e1
    • S
      sched/core: Add preempt checks in preempt_schedule() code · 47252cfb
      Steven Rostedt 提交于
      While testing the tracer preemptoff, I hit this strange trace:
      
         <...>-259     0...1    0us : schedule <-worker_thread
         <...>-259     0d..1    0us : rcu_note_context_switch <-__schedule
         <...>-259     0d..1    0us : rcu_sched_qs <-rcu_note_context_switch
         <...>-259     0d..1    0us : rcu_preempt_qs <-rcu_note_context_switch
         <...>-259     0d..1    0us : _raw_spin_lock <-__schedule
         <...>-259     0d..1    0us : preempt_count_add <-_raw_spin_lock
         <...>-259     0d..2    0us : do_raw_spin_lock <-_raw_spin_lock
         <...>-259     0d..2    1us : deactivate_task <-__schedule
         <...>-259     0d..2    1us : update_rq_clock.part.84 <-deactivate_task
         <...>-259     0d..2    1us : dequeue_task_fair <-deactivate_task
         <...>-259     0d..2    1us : dequeue_entity <-dequeue_task_fair
         <...>-259     0d..2    1us : update_curr <-dequeue_entity
         <...>-259     0d..2    1us : update_min_vruntime <-update_curr
         <...>-259     0d..2    1us : cpuacct_charge <-update_curr
         <...>-259     0d..2    1us : __rcu_read_lock <-cpuacct_charge
         <...>-259     0d..2    1us : __rcu_read_unlock <-cpuacct_charge
         <...>-259     0d..2    1us : clear_buddies <-dequeue_entity
         <...>-259     0d..2    1us : account_entity_dequeue <-dequeue_entity
         <...>-259     0d..2    2us : update_min_vruntime <-dequeue_entity
         <...>-259     0d..2    2us : update_cfs_shares <-dequeue_entity
         <...>-259     0d..2    2us : hrtick_update <-dequeue_task_fair
         <...>-259     0d..2    2us : wq_worker_sleeping <-__schedule
         <...>-259     0d..2    2us : kthread_data <-wq_worker_sleeping
         <...>-259     0d..2    2us : pick_next_task_fair <-__schedule
         <...>-259     0d..2    2us : check_cfs_rq_runtime <-pick_next_task_fair
         <...>-259     0d..2    2us : pick_next_entity <-pick_next_task_fair
         <...>-259     0d..2    2us : clear_buddies <-pick_next_entity
         <...>-259     0d..2    2us : pick_next_entity <-pick_next_task_fair
         <...>-259     0d..2    2us : clear_buddies <-pick_next_entity
         <...>-259     0d..2    2us : set_next_entity <-pick_next_task_fair
         <...>-259     0d..2    3us : put_prev_entity <-pick_next_task_fair
         <...>-259     0d..2    3us : check_cfs_rq_runtime <-put_prev_entity
         <...>-259     0d..2    3us : set_next_entity <-pick_next_task_fair
      gnome-sh-1031    0d..2    3us : finish_task_switch <-__schedule
      gnome-sh-1031    0d..2    3us : _raw_spin_unlock_irq <-finish_task_switch
      gnome-sh-1031    0d..2    3us : do_raw_spin_unlock <-_raw_spin_unlock_irq
      gnome-sh-1031    0...2    3us!: preempt_count_sub <-_raw_spin_unlock_irq
      gnome-sh-1031    0...1  582us : do_raw_spin_lock <-_raw_spin_lock
      gnome-sh-1031    0...1  583us : _raw_spin_unlock <-drm_gem_object_lookup
      gnome-sh-1031    0...1  583us : do_raw_spin_unlock <-_raw_spin_unlock
      gnome-sh-1031    0...1  583us : preempt_count_sub <-_raw_spin_unlock
      gnome-sh-1031    0...1  584us : _raw_spin_unlock <-drm_gem_object_lookup
      gnome-sh-1031    0...1  584us+: trace_preempt_on <-drm_gem_object_lookup
      gnome-sh-1031    0...1  603us : <stack trace>
       => preempt_count_sub
       => _raw_spin_unlock
       => drm_gem_object_lookup
       => i915_gem_madvise_ioctl
       => drm_ioctl
       => do_vfs_ioctl
       => SyS_ioctl
       => entry_SYSCALL_64_fastpath
      
      As I'm tracing preemption disabled, it seemed incorrect that the trace
      would go across a schedule and report not being in the scheduler.
      Looking into this I discovered the problem.
      
      schedule() calls preempt_disable() but the preempt_schedule() calls
      preempt_enable_notrace(). What happened above was that the gnome-shell
      task was preempted on another CPU, migrated over to the idle cpu. The
      tracer stared with idle calling schedule(), which called
      preempt_disable(), but then gnome-shell finished, and it enabled
      preemption with preempt_enable_notrace() that does stop the trace, even
      though preemption was enabled.
      
      The purpose of the preempt_disable_notrace() in the preempt_schedule()
      is to prevent function tracing from going into an infinite loop.
      Because function tracing can trace the preempt_enable/disable() calls
      that are traced. The problem with function tracing is:
      
        NEED_RESCHED set
        preempt_schedule()
          preempt_disable()
            preempt_count_inc()
              function trace (before incrementing preempt count)
                preempt_disable_notrace()
                preempt_enable_notrace()
                  sees NEED_RESCHED set
                     preempt_schedule() (repeat)
      
      Now by breaking out the preempt off/on tracing into their own code:
      preempt_disable_check() and preempt_enable_check(), we can add these to
      the preempt_schedule() code. As preemption would then be disabled, even
      if they were to be traced by the function tracer, the disabled
      preemption would prevent the recursion.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160321112339.6dc78ad6@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      47252cfb
    • T
      sched/numa: Remove unnecessary NUMA dequeue update from non-SMP kernels · bfdb198c
      Tim Chen 提交于
      In account_entity_enqueue(), we do not do account_numa_enqueue()
      as NUMA balancing is not needed for UP kernels.
      
      Hence, we should remove the account_numa_dequeue() call from
      account_entity_dequeue() for UP kernels.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1454366879.21738.29.camel@schen9-desk2.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bfdb198c
    • S
      sched/fair: Reset nr_balance_failed after active balancing · d02c0711
      Srikar Dronamraju 提交于
      To force a task migration during active balancing, nr_balance_failed is set
      to cache_nice_tries + 1. However nr_balance_failed is not reset. As a side
      effect, the next regular load balance under the same sd, a cache hot task
      might be migrated, just because nr_balance_failed count is high.
      
      Resetting nr_balance_failed after a successful active balance ensures
      that a hot task is not unreasonably migrated. This can be verified by
      looking at othe number of hot task migrations reported by /proc/schedstat.
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1458735884-30105-1-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d02c0711
    • D
      sched/cpuacct: Split usage accounting into user_usage and sys_usage · d740037f
      Dongsheng Yang 提交于
      Sometimes, cpuacct.usage is not detailed enough to see how much CPU
      usage a group had. We want to know how much time it used in user mode
      and how much in kernel mode.
      
      This patch introduces more files to give this information:
      
       # ls /sys/fs/cgroup/cpuacct/cpuacct.usage*
       /sys/fs/cgroup/cpuacct/cpuacct.usage
       /sys/fs/cgroup/cpuacct/cpuacct.usage_percpu
       /sys/fs/cgroup/cpuacct/cpuacct.usage_user
       /sys/fs/cgroup/cpuacct/cpuacct.usage_percpu_user
       /sys/fs/cgroup/cpuacct/cpuacct.usage_sys
       /sys/fs/cgroup/cpuacct/cpuacct.usage_percpu_sys
      
      ... while keeping the ABI with the existing counter.
      Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
      [ Ported to newer kernels. ]
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/aa171da036b520b51c79549e9b3215d29473f19d.1458635566.git.zhaolei@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d740037f
    • Z
      sched/cpuacct: Show all possible CPUs in cpuacct output · 5ca3726a
      Zhao Lei 提交于
      Current code show stats of online CPUs in cpuacct.statcpus,
      show stats of present cpus in cpuacct.usage(_percpu), and using
      present CPUs for setting cpuacct.usage.
      
      It will cause inconsistent result when a CPU is online or offline
      or hotpluged.
      
      We should always use possible CPUs to avoid above problem.
      
      Here are the contents of a cpuacct.usage_percpu sysfs file,
      on a 4 CPU system with maxcpus=32:
      
      Before the patch:
       # cat cpuacct.usage_percpu
       2456565 411435 1052897 832584
      
      After the patch:
       # cat cpuacct.usage_percpu
       2456565 411435 1052897 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/a11d56cef12d0b4807f8be3a46bf9798c3014d59.1458635566.git.zhaolei@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5ca3726a
  3. 30 3月, 2016 2 次提交
  4. 29 3月, 2016 6 次提交
  5. 27 3月, 2016 6 次提交
    • L
      Linux 4.6-rc1 · f55532a0
      Linus Torvalds 提交于
      f55532a0
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · d5a38f6e
      Linus Torvalds 提交于
      Pull Ceph updates from Sage Weil:
       "There is quite a bit here, including some overdue refactoring and
        cleanup on the mon_client and osd_client code from Ilya, scattered
        writeback support for CephFS and a pile of bug fixes from Zheng, and a
        few random cleanups and fixes from others"
      
      [ I already decided not to pull this because of it having been rebased
        recently, but ended up changing my mind after all.  Next time I'll
        really hold people to it.  Oh well.   - Linus ]
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits)
        libceph: use KMEM_CACHE macro
        ceph: use kmem_cache_zalloc
        rbd: use KMEM_CACHE macro
        ceph: use lookup request to revalidate dentry
        ceph: kill ceph_get_dentry_parent_inode()
        ceph: fix security xattr deadlock
        ceph: don't request vxattrs from MDS
        ceph: fix mounting same fs multiple times
        ceph: remove unnecessary NULL check
        ceph: avoid updating directory inode's i_size accidentally
        ceph: fix race during filling readdir cache
        libceph: use sizeof_footer() more
        ceph: kill ceph_empty_snapc
        ceph: fix a wrong comparison
        ceph: replace CURRENT_TIME by current_fs_time()
        ceph: scattered page writeback
        libceph: add helper that duplicates last extent operation
        libceph: enable large, variable-sized OSD requests
        libceph: osdc->req_mempool should be backed by a slab pool
        libceph: make r_request msg_size calculation clearer
        ...
      d5a38f6e
    • L
      Merge tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 698f415c
      Linus Torvalds 提交于
      Pull orangefs filesystem from Mike Marshall.
      
      This finally merges the long-pending orangefs filesystem, which has been
      much cleaned up with input from Al Viro over the last six months.  From
      the documentation file:
      
       "OrangeFS is an LGPL userspace scale-out parallel storage system.  It
        is ideal for large storage problems faced by HPC, BigData, Streaming
        Video, Genomics, Bioinformatics.
      
        Orangefs, originally called PVFS, was first developed in 1993 by Walt
        Ligon and Eric Blumer as a parallel file system for Parallel Virtual
        Machine (PVM) as part of a NASA grant to study the I/O patterns of
        parallel programs.
      
        Orangefs features include:
      
          - Distributes file data among multiple file servers
          - Supports simultaneous access by multiple clients
          - Stores file data and metadata on servers using local file system
            and access methods
          - Userspace implementation is easy to install and maintain
          - Direct MPI support
          - Stateless"
      
      see Documentation/filesystems/orangefs.txt for more in-depth details.
      
      * tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (174 commits)
        orangefs: fix orangefs_superblock locking
        orangefs: fix do_readv_writev() handling of error halfway through
        orangefs: have ->kill_sb() evict the VFS side of things first
        orangefs: sanitize ->llseek()
        orangefs-bufmap.h: trim unused junk
        orangefs: saner calling conventions for getting a slot
        orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer
        orangefs: get rid of readdir_handle_s
        ornagefs: ensure that truncate has an up to date inode size
        orangefs: move code which sets i_link to orangefs_inode_getattr
        orangefs: remove needless wrapper around GFP_KERNEL
        orangefs: remove wrapper around mutex_lock(&inode->i_mutex)
        orangefs: refactor inode type or link_target change detection
        orangefs: use new getattr for revalidate and remove old getattr
        orangefs: use new getattr in inode getattr and permission
        orangefs: use new orangefs_inode_getattr to get size in write and llseek
        orangefs: use new orangefs_inode_getattr to create new inodes
        orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr
        orangefs: remove inode->i_lock wrapper
        orangefs: put register_chrdev immediately before register_filesystem
        ...
      698f415c
    • L
      Merge tag 'ntb-4.6' of git://github.com/jonmason/ntb · b4cec5f6
      Linus Torvalds 提交于
      Pull NTB bug fixes from Jon Mason:
       "NTB bug fixes for tasklet from spinning forever, link errors,
        translation window setup, NULL ptr dereference, and ntb-perf errors.
      
        Also, a modification to the driver API that makes _addr functions
        optional"
      
      * tag 'ntb-4.6' of git://github.com/jonmason/ntb:
        NTB: Remove _addr functions from ntb_hw_amd
        NTB: Make _addr functions optional in the API
        NTB: Fix incorrect clean up routine in ntb_perf
        NTB: Fix incorrect return check in ntb_perf
        ntb: fix possible NULL dereference
        ntb: add missing setup of translation window
        ntb: stop link work when we do not have memory
        ntb: stop tasklet from spinning forever during shutdown.
        ntb: perf test: fix address space confusion
      b4cec5f6
    • L
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 895a1067
      Linus Torvalds 提交于
      Pull more SCSI updates from James Bottomley:
       "The only new stuff which missed the first pull request is an update to
        the UFS driver.
      
        The rest is an assortment of bug fixes and minor tweaks which appeared
        recently (some are fixes for recent code and some are stuff spotted
        recently by the checkers or the new gcc-6 compiler [most of Arnd's
        stuff])"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
        scsi_common: do not clobber fixed sense information
        scsi: ufs: select CONFIG_NLS
        scsi: fc: use get/put_unaligned64 for wwn access
        fnic: move printk()s outside of the critical code section.
        qla2xxx: avoid maybe_uninitialized warning
        megaraid_sas: add missing curly braces in ioctl handler
        lpfc: fix misleading indentation
        scsi_transport_sas: add 'scsi_target_id' sysfs attribute
        scsi_dh_alua: uninitialized variable in alua_check_vpd()
        scsi: ufs-qcom: add printouts of testbus debug registers
        scsi: ufs-qcom: enable/disable the device ref clock
        scsi: ufs-qcom: set PA_Local_TX_LCC_Enable before link startup
        scsi: ufs: add device quirk delay before putting UFS rails in LPM
        scsi: ufs: fix leakage during link off state
        scsi: ufs: tune UniPro parameters to optimize hibern8 exit time
        scsi: ufs: handle non spec compliant bkops behaviour by device
        scsi: ufs: add retry for query descriptors
        scsi: ufs: add error recovery after DL NAC error
        scsi: ufs: make error handling bit faster
        scsi: ufs: disable vccq if it's not needed by UFS device
        ...
      895a1067
    • L
      f2fs/crypto: fix xts_tweak initialization · 02fc59a0
      Linus Torvalds 提交于
      Commit 0b81d077 ("fs crypto: move per-file encryption from f2fs
      tree to fs/crypto") moved the f2fs crypto files to fs/crypto/ and
      renamed the symbol prefixes from "f2fs_" to "fscrypt_" (and from "F2FS_"
      to just "FS" for preprocessor symbols).
      
      Because of the symbol renaming, it's a bit hard to see it as a file
      move: use
      
          git show -M30 0b81d077
      
      to lower the rename detection to just 30% similarity and make git show
      the files as renamed (the header file won't be shown as a rename even
      then - since all it contains is symbol definitions, it looks almost
      completely different).
      
      Even with the renames showing as renames, the diffs are not all that
      easy to read, since so much is just the renames.  But Eric Biggers
      noticed that it's not just all renames: the initialization of the
      xts_tweak had been broken too, using the inode number rather than the
      page offset.
      
      That's not right - it makes the xfs_tweak the same for all pages of each
      inode.  It _might_ make sense to make the xfs_tweak contain both the
      offset _and_ the inode number, but not just the inode number.
      Reported-by: NEric Biggers <ebiggers3@gmail.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02fc59a0
  6. 26 3月, 2016 16 次提交