1. 05 5月, 2015 1 次提交
  2. 17 4月, 2015 1 次提交
  3. 17 12月, 2014 1 次提交
  4. 13 11月, 2014 1 次提交
    • D
      cpuidle: Invert CPUIDLE_FLAG_TIME_VALID logic · b82b6cca
      Daniel Lezcano 提交于
      The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
      method is not set. Otherwise for all the drivers, the time can be correctly
      measured.
      
      Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
      for all the states, just invert the logic by replacing it by the flag
      CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
      driver, remove the former flag from all the drivers and invert the logic with
      this flag in the different governor.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b82b6cca
  5. 27 8月, 2014 1 次提交
  6. 07 8月, 2014 4 次提交
  7. 28 7月, 2014 1 次提交
  8. 19 6月, 2014 1 次提交
  9. 01 5月, 2014 2 次提交
  10. 06 3月, 2014 5 次提交
  11. 23 8月, 2013 8 次提交
  12. 29 7月, 2013 2 次提交
    • R
      Revert "cpuidle: Quickly notice prediction failure for repeat mode" · 14851912
      Rafael J. Wysocki 提交于
      Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
      repeat mode), because it has been identified as the source of a
      significant performance regression in v3.8 and later as explained by
      Jeremy Eder:
      
        We believe we've identified a particular commit to the cpuidle code
        that seems to be impacting performance of variety of workloads.
        The simplest way to reproduce is using netperf TCP_RR test, so
        we're using that, on a pair of Sandy Bridge based servers.  We also
        have data from a large database setup where performance is also
        measurably/positively impacted, though that test data isn't easily
        share-able.
      
        Included below are test results from 3 test kernels:
      
        kernel       reverts
        -----------------------------------------------------------
        1) vanilla   upstream (no reverts)
      
        2) perfteam2 reverts e11538d1
      
        3) test      reverts 69a37bea
                             e11538d1
      
        In summary, netperf TCP_RR numbers improve by approximately 4%
        after reverting 69a37bea.  When
        69a37bea is included, C0 residency
        never seems to get above 40%.  Taking that patch out gets C0 near
        100% quite often, and performance increases.
      
        The below data are histograms representing the %c0 residency @
        1-second sample rates (using turbostat), while under netperf test.
      
        - If you look at the first 4 histograms, you can see %c0 residency
          almost entirely in the 30,40% bin.
        - The last pair, which reverts 69a37bea,
          shows %c0 in the 80,90,100% bins.
      
        Below each kernel name are netperf TCP_RR trans/s numbers for the
        particular kernel that can be disclosed publicly, comparing the 3
        test kernels.  We ran a 4th test with the vanilla kernel where
        we've also set /dev/cpu_dma_latency=0 to show overall impact
        boosting single-threaded TCP_RR performance over 11% above
        baseline.
      
        3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
        TCP_RR trans/s 54323.78
      
        -----------------------------------------------------------
        3.10-rc2 vanilla RX (no reverts)
        TCP_RR trans/s 48192.47
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    59]:
        ***********************************************************
           40.0000 -    50.0000 [     1]: *
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        Sender %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    11]: ***********
           40.0000 -    50.0000 [    49]:
        *************************************************
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        -----------------------------------------------------------
        3.10-rc2 perfteam2 RX (reverts commit
        e11538d1)
        TCP_RR trans/s 49698.69
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     1]: *
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    59]:
        ***********************************************************
           40.0000 -    50.0000 [     0]:
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        Sender %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [     2]: **
           40.0000 -    50.0000 [    58]:
        **********************************************************
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [     0]:
      
        -----------------------------------------------------------
        3.10-rc2 test RX (reverts 69a37bea
        and e11538d1)
        TCP_RR trans/s 47766.95
      
        Receiver %c0
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     1]: *
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    27]: ***************************
           40.0000 -    50.0000 [     2]: **
           50.0000 -    60.0000 [     0]:
           60.0000 -    70.0000 [     2]: **
           70.0000 -    80.0000 [     0]:
           80.0000 -    90.0000 [     0]:
           90.0000 -   100.0000 [    28]: ****************************
      
        Sender:
            0.0000 -    10.0000 [     1]: *
           10.0000 -    20.0000 [     0]:
           20.0000 -    30.0000 [     0]:
           30.0000 -    40.0000 [    11]: ***********
           40.0000 -    50.0000 [     0]:
           50.0000 -    60.0000 [     1]: *
           60.0000 -    70.0000 [     0]:
           70.0000 -    80.0000 [     3]: ***
           80.0000 -    90.0000 [     7]: *******
           90.0000 -   100.0000 [    38]: **************************************
      
        These results demonstrate gaining back the tendency of the CPU to
        stay in more responsive, performant C-states (and thus yield
        measurably better performance), by reverting commit
        69a37bea.
      Requested-by: NJeremy Eder <jeder@redhat.com>
      Tested-by: NLen Brown <len.brown@intel.com>
      Cc: 3.8+ <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      14851912
    • R
      Revert "cpuidle: Quickly notice prediction failure in general case" · 228b3023
      Rafael J. Wysocki 提交于
      Revert commit e11538d1 (cpuidle: Quickly notice prediction failure in
      general case), since it depends on commit 69a37bea (cpuidle: Quickly
      notice prediction failure for repeat mode) that has been identified
      as the source of a significant performance regression in v3.8 and
      later.
      Requested-by: NJeremy Eder <jeder@redhat.com>
      Tested-by: NLen Brown <len.brown@intel.com>
      Cc: 3.8+ <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      228b3023
  13. 15 7月, 2013 1 次提交
    • D
      cpuidle: Make it clear that governors cannot be modules · 137b944e
      Daniel Lezcano 提交于
      cpufreq governors are defined as modules in the code, but the Kconfig
      options do not allow them to be built as modules.  This is not really
      a problem, but the cpuidle init ordering is: the cpuidle init
      functions (framework and driver) and then the governors.  That leads
      to some weirdness in the cpuidle framework.
      
      Namely,  cpuidle_register_device() calls cpuidle_enable_device() which
      fails at the first attempt, because governors have not been registered
      yet.  When a governor is registered, the framework calls
      cpuidle_enable_device() again which runs __cpuidle_register_device()
      only then.  Of course, for that to work, the cpuidle_enable_device()
      return value has to be ignored by cpuidle_register_device().
      
      Instead of having this cyclic call graph and relying on a positive
      side effects of the hackish back and forth cpuidle_enable_device()
      calls it is better to fix the cpuidle init ordering.
      
      To that end, replace the module init code with postcore_initcall()
      so we have:
      
       * cpuidle framework : core_initcall
       * cpuidle governors : postcore_initcall
       * cpuidle drivers   : device_initcall
      
      and remove the corresponding module exit code as it is dead anyway
      (governors can't be built as modules).
      
      [rjw: Changelog]
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      137b944e
  14. 15 1月, 2013 1 次提交
  15. 03 1月, 2013 1 次提交
  16. 23 11月, 2012 1 次提交
    • L
      cpuidle: fix a suspicious RCU usage in menu governor · a093b93e
      Li Zhong 提交于
      I saw this suspicious RCU usage on the next tree of 11/15
      
      [   67.123404] ===============================
      [   67.123413] [ INFO: suspicious RCU usage. ]
      [   67.123423] 3.7.0-rc5-next-20121115-dirty #1 Not tainted
      [   67.123434] -------------------------------
      [   67.123444] include/trace/events/timer.h:186 suspicious rcu_dereference_check() usage!
      [   67.123458]
      [   67.123458] other info that might help us debug this:
      [   67.123458]
      [   67.123474]
      [   67.123474] RCU used illegally from idle CPU!
      [   67.123474] rcu_scheduler_active = 1, debug_locks = 0
      [   67.123493] RCU used illegally from extended quiescent state!
      [   67.123507] 1 lock held by swapper/1/0:
      [   67.123516]  #0:  (&cpu_base->lock){-.-...}, at: [<c0000000000979b0>] .__hrtimer_start_range_ns+0x28c/0x524
      [   67.123555]
      [   67.123555] stack backtrace:
      [   67.123566] Call Trace:
      [   67.123576] [c0000001e2ccb920] [c00000000001275c] .show_stack+0x78/0x184 (unreliable)
      [   67.123599] [c0000001e2ccb9d0] [c0000000000c15a0] .lockdep_rcu_suspicious+0x120/0x148
      [   67.123619] [c0000001e2ccba70] [c00000000009601c] .enqueue_hrtimer+0x1c0/0x1c8
      [   67.123639] [c0000001e2ccbb00] [c000000000097aa0] .__hrtimer_start_range_ns+0x37c/0x524
      [   67.123660] [c0000001e2ccbc20] [c0000000005c9698] .menu_select+0x508/0x5bc
      [   67.123678] [c0000001e2ccbd20] [c0000000005c740c] .cpuidle_idle_call+0xa8/0x6e4
      [   67.123699] [c0000001e2ccbdd0] [c0000000000459a0] .pSeries_idle+0x10/0x34
      [   67.123717] [c0000001e2ccbe40] [c000000000014dc8] .cpu_idle+0x130/0x280
      [   67.123738] [c0000001e2ccbee0] [c0000000006ffa8c] .start_secondary+0x378/0x384
      [   67.123758] [c0000001e2ccbf90] [c00000000000936c] .start_secondary_prolog+0x10/0x14
      
      hrtimer_start was added in 198fd638 and ae515197. The patch below tries
      to use RCU_NONIDLE around it to avoid the above report.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a093b93e
  17. 15 11月, 2012 3 次提交
    • Y
      cpuidle: Get typical recent sleep interval · c96ca4fb
      Youquan Song 提交于
      The function detect_repeating_patterns was not very useful for
      workloads with alternating long and short pauses, for example
      virtual machines handling network requests for each other (say
      a web and database server).
      
      Instead, try to find a recent sleep interval that is somewhere
      between the median and the mode sleep time, by discarding outliers
      to the up side and recalculating the average and standard deviation
      until that is no longer required.
      
      This should do something sane with a sleep interval series like:
      
      	200 180 210 10000 30 1000 170 200
      
      The current code would simply discard such a series, while the
      new code will guess a typical sleep interval just shy of 200.
      
      The original patch come from Rik van Riel <riel@redhat.com>.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c96ca4fb
    • Y
      cpuidle: Quickly notice prediction failure in general case · e11538d1
      Youquan Song 提交于
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      The patch extends to general case that prediction logic get a small predicted
      residency, so it choose a shallow C-state though the expected residency is large
      . Once the prediction will be fail, the CPU will keep staying at shallow C-state
      for a long time. Acutally, the CPU has change enter into deep C-state.
      So when the expected residency is long enough but governor choose a shallow
      C-state, an timer will be added in order to monitor if the prediction failure.
      
      When C-state is waken up prior to the adding timer, the timer will be cancelled
      initiatively. When the timer is triggered and menu governor will quickly notice
      prediction failure and re-evaluates deeper C-states possibility.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e11538d1
    • Y
      cpuidle: Quickly notice prediction failure for repeat mode · 69a37bea
      Youquan Song 提交于
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      cpuidle menu governor has a method to predict the repeat pattern if there are 8
      C-states residency which are continuous and the same or very close, so it will
      predict the next C-states residency will keep same residency time.
      
      There is a real case that turbostat utility (tools/power/x86/turbostat)
      at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
      Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
       governor will predict it is repeat mode and there is another IPI wake up idle
       CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
      idle. However, in the turbostat, following 10 registers reading is sleep 5
      seconds by default, so the idle CPU will keep at C1 for a long time though it is
       idle until break event occurs.
      In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
      C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
      deep C-state stays at >99.98%.
      
      In the patch, a timer is added when menu governor detects a repeat mode and
      choose a shallow C-state. The timer is set to a time out value that greater
      than predicted time, and we conclude repeat mode prediction failure if timer is
      triggered. When repeat mode happens as expected, the timer is not triggered
      and CPU waken up from C-states and it will cancel the timer initiatively.
      When repeat mode does not happen, the timer will be time out and menu governor
      will quickly notice that the repeat mode prediction fails and then re-evaluates
      deeper C-states possibility.
      
      Below is another case which will clearly show the patch much benefit:
      
      #include <stdlib.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <signal.h>
      #include <sys/time.h>
      #include <time.h>
      #include <pthread.h>
      
      volatile int * shutdown;
      volatile long * count;
      int delay = 20;
      int loop = 8;
      
      void usage(void)
      {
      	fprintf(stderr,
      		"Usage: idle_predict [options]\n"
      		"  --help	-h  Print this help\n"
      		"  --thread	-n  Thread number\n"
      		"  --loop     	-l  Loop times in shallow Cstate\n"
      		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
      }
      
      void *simple_loop() {
      	int idle_num = 1;
      	while (!(*shutdown)) {
      		*count = *count + 1;
      
      		if (idle_num % loop)
      			usleep(delay);
      		else {
      			/* sleep 1 second */
      			usleep(1000000);
      			idle_num = 0;
      		}
      		idle_num++;
      	}
      
      }
      
      static void sighand(int sig)
      {
      	*shutdown = 1;
      }
      
      int main(int argc, char *argv[])
      {
      	sigset_t sigset;
      	int signum = SIGALRM;
      	int i, c, er = 0, thread_num = 8;
      	pthread_t pt[1024];
      
      	static char optstr[] = "n:l:t:h:";
      
      	while ((c = getopt(argc, argv, optstr)) != EOF)
      		switch (c) {
      			case 'n':
      				thread_num = atoi(optarg);
      				break;
      			case 'l':
      				loop = atoi(optarg);
      				break;
      			case 't':
      				delay = atoi(optarg);
      				break;
      			case 'h':
      			default:
      				usage();
      				exit(1);
      		}
      
      	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
      	count = malloc(sizeof(long));
      	shutdown = malloc(sizeof(int));
      	*count = 0;
      	*shutdown = 0;
      
      	sigemptyset(&sigset);
      	sigaddset(&sigset, signum);
      	sigprocmask (SIG_BLOCK, &sigset, NULL);
      	signal(SIGINT, sighand);
      	signal(SIGTERM, sighand);
      
      	for(i = 0; i < thread_num ; i++)
      		pthread_create(&pt[i], NULL, simple_loop, NULL);
      
      	for (i = 0; i < thread_num; i++)
      		pthread_join(pt[i], NULL);
      
      	exit(0);
      }
      
      Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
      After build the above test application, then run it.
      Test plaform can be Intel Sandybridge or other recent platforms.
      #./idle_predict -l 10 &
      #./powertop
      
      We will find that deep C-state will dangle between 40%~100% and much time spent
      on C1 state. It is because menu governor wrongly predict that repeat mode
      is kept, so it will choose the C1 shallow C-state even though it has chance to
      sleep 1 second in deep C-state.
      
      While after patched the kernel, we find that deep C-state will keep >99.6%.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69a37bea
  18. 04 7月, 2012 2 次提交
    • R
      PM / Domains: Add preliminary support for cpuidle, v2 · cbc9ef02
      Rafael J. Wysocki 提交于
      On some systems there are CPU cores located in the same power
      domains as I/O devices.  Then, power can only be removed from the
      domain if all I/O devices in it are not in use and the CPU core
      is idle.  Add preliminary support for that to the generic PM domains
      framework.
      
      First, the platform is expected to provide a cpuidle driver with one
      extra state designated for use with the generic PM domains code.
      This state should be initially disabled and its exit_latency value
      should be set to whatever time is needed to bring up the CPU core
      itself after restoring power to it, not including the domain's
      power on latency.  Its .enter() callback should point to a procedure
      that will remove power from the domain containing the CPU core at
      the end of the CPU power transition.
      
      The remaining characteristics of the extra cpuidle state, referred to
      as the "domain" cpuidle state below, (e.g. power usage, target
      residency) should be populated in accordance with the properties of
      the hardware.
      
      Next, the platform should execute genpd_attach_cpuidle() on the PM
      domain containing the CPU core.  That will cause the generic PM
      domains framework to treat that domain in a special way such that:
      
       * When all devices in the domain have been suspended and it is about
         to be turned off, the states of the devices will be saved, but
         power will not be removed from the domain.  Instead, the "domain"
         cpuidle state will be enabled so that power can be removed from
         the domain when the CPU core is idle and the state has been chosen
         as the target by the cpuidle governor.
      
       * When the first I/O device in the domain is resumed and
         __pm_genpd_poweron(() is called for the first time after
         power has been removed from the domain, the "domain" cpuidle
         state will be disabled to avoid subsequent surprise power removals
         via cpuidle.
      
      The effective exit_latency value of the "domain" cpuidle state
      depends on the time needed to bring up the CPU core itself after
      restoring power to it as well as on the power on latency of the
      domain containing the CPU core.  Thus the "domain" cpuidle state's
      exit_latency has to be recomputed every time the domain's power on
      latency is updated, which may happen every time power is restored
      to the domain, if the measured power on latency is greater than
      the latency stored in the corresponding generic_pm_domain structure.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      cbc9ef02
    • S
      cpuidle: move field disable from per-driver to per-cpu · dc7fd275
      ShuoX Liu 提交于
      Andrew J.Schorr raises a question.  When he changes the disable setting on
      a single CPU, it affects all the other CPUs.  Basically, currently, the
      disable field is per-driver instead of per-cpu.  All the C states of the
      same driver are shared by all CPU in the same machine.
      
      The patch changes the `disable' field to per-cpu, so we could set this
      separately for each cpu.
      Signed-off-by: NShuoX Liu <shuox.liu@intel.com>
      Reported-by: NAndrew J.Schorr <aschorr@telemetry-investments.com>
      Reviewed-by: NYanmin Zhang <yanmin_zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      dc7fd275
  19. 30 3月, 2012 2 次提交
  20. 07 11月, 2011 1 次提交