1. 24 6月, 2020 5 次提交
    • Y
      alinux: sched: Finer grain of sched latency · fa418988
      Yihao Wu 提交于
      to #28739709
      
      Many samples are between 10ms-50ms. To display more informative
      distribution of latency, divide 10ms-50ms into 5 parts uniformly.
      
      Example:
      
        $ cat /sys/fs/cgroup/cpuacct/a/cpuacct.wait_latency
      	0-1ms: 	59726433
      	1-4ms: 	167
      	4-7ms: 	0
      	7-10ms: 	0
      	10-20ms: 	5
      	20-30ms: 	0
      	30-40ms: 	3
      	40-50ms: 	0
      	50-100ms: 	0
      	100-500ms: 	0
      	500-1000ms: 	0
      	1000-5000ms: 	0
      	5000-10000ms: 	0
      	>=10000ms: 	0
      	total(ms): 	45554
      	nr: 	59726600
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
      fa418988
    • Y
      alinux: sched: Add "nr" to sched latency histogram · 2abfd07b
      Yihao Wu 提交于
      to #28739709
      
      Sometimes histogram is not precise enough because each sample is
      roughly accounted into a histogram bar. And average latency is more
      pratical for some users.
      
      This patch adds a "nr" field in 4 latency histogram interfaces, so
      
      	lat(avg) = total(ms) / nr
      
      And compared to histogram, average latency is better to be used as a
      SLI because of simplicity.
      
      Example
      
          $ cat /sys/fs/cgroup/cpuacct/a/cpuacct.wait_latency
            0-1ms:  4139
            1-4ms:  317
            4-7ms:  568
            7-10ms:         0
            10-100ms:       42324
            100-500ms:      9131
            500-1000ms:     95
            1000-5000ms:    134
            5000-10000ms:   0
            >=10000ms:      0
            total(ms):      4256455
            nr:      182128
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
      2abfd07b
    • Y
      alinux: sched: Add cgroup's scheduling latency histograms · 6dbaddaa
      Yihao Wu 提交于
      to #28739709
      
      This patch adds cpuacct.cgroup_wait_latency interface. It exports the
      histogram of the sched entity's schedule latency. Unlike wait_latency,
      the sched entity is a cgroup rather than task.
      
      This is useful when tasks are not directly clustered under one cgroup.
      For examples:
      
      cgroup1 --- cgroupA --- task1
              --- cgroupB --- task2
      cgroup2 --- cgroupC --- task3
              --- cgroupD --- task4
      
      This is a common cgroup hierarchy used by many applications. With
      cgroup_wait_latency, we can just read from cgroup1 to know aggregated
      wait latency information of task1 and task2.
      
      The interface output format is identical to cpuacct.wait_latency.
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
      6dbaddaa
    • Y
      alinux: sched: Add cgroup-level blocked time histograms · a055ee2c
      Yihao Wu 提交于
      to #28739709
      
      This patch measures time that tasks in cpuacct cgroup blocks. There
      are two types: blocked due to IO, and others like locks. And they
      are exported in"cpuacct.ioblock_latency" and "cpuacct.block_latency"
      respectively.
      
      According to histogram, we know the detailed distribution of the
      duration. And according to total(ms), we know the percentage of time
      tasks spent off rq, waiting for resources:
      
      (△ioblock_latency.total(ms) + △block_latency.total(ms)) / △wall_time
      
      The interface output format is identical to cpuacct.wait_latency.
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NXunlei Pang <xlpang@linux.alibaba.com>
      Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
      Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
      a055ee2c
    • Y
      alinux: sched: Introduce cfs scheduling latency histograms · 76d98609
      Yihao Wu 提交于
      to #28739709
      
      Export wait_latency in "cpuacct.wait_latency", which indicates the
      time that tasks in a cpuacct cgroup wait on a cfs_rq to be scheduled.
      
      This is like "perf sched", but it gives smaller overhead. So it can
      be used as monitor constantly.
      
      wait_latency is useful to debug application's high RT problem. It can
      tell if it's caused by scheduling or not. If it is, loadavg can tell
      if it's caused by bad scheduling bahaviour or system overloads.
      
      System admins can also use wait_latency to define SLA. To ensure SLA
      is guaranteed, there are various ways to decrease wait_latency.
      
      This feature is disabled by default for performance concerns. It can
      be switched on dynamically by "echo 0 > /proc/cpusli/sched_lat_enable"
      
      Example:
      
        $ cat /sys/fs/cgroup/cpuacct/a/cpuacct.wait_latency
          0-1ms:  4139
          1-4ms:  317
          4-7ms:  568
          7-10ms:         0
          10-100ms:       42324
          100-500ms:      9131
          500-1000ms:     95
          1000-5000ms:    134
          5000-10000ms:   0
          >=10000ms:      0
          total(ms):      4256455
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NXunlei Pang <xlpang@linux.alibaba.com>
      Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
      Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
      76d98609
  2. 23 6月, 2020 13 次提交
  3. 22 6月, 2020 3 次提交
  4. 19 6月, 2020 1 次提交
    • X
      configs: arm64: use 48-bit virtual address · f44f084b
      Xu Yu 提交于
      fix #28506983
      
      Some ARM machines may have large memory capacity (e.g., more than 256G),
      or large hole(s) in memory layout among nodes.
      
      Kernel with CONFIG_ARM64_VA_BITS as 39 has the linear region size as
      256G, and the memory that we will not be able to cover with the linear
      mapping shall be removed. This may cause part of the physical memory to
      become unavailable, system deadlock on memory, or even boot failure, on
      such ARM machines.
      
      This changes CONFIG_ARM64_VA_BITS to 48 which supports 128T linear
      mapping, in order to adapt to most scenarios.
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reviewed-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
      f44f084b
  5. 16 6月, 2020 18 次提交