1. 15 11月, 2012 7 次提交
    • D
      cpuidle / sysfs: move structure declaration into the sysfs.c file · 349631e0
      Daniel Lezcano 提交于
      The structure cpuidle_state_kobj is not used anywhere except
      in the sysfs.c file. The definition of this structure is not
      needed in the cpuidle header file. This patch moves it to the
      sysfs.c file in order to encapsulate the code a bit more.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      349631e0
    • Y
      cpuidle: Get typical recent sleep interval · c96ca4fb
      Youquan Song 提交于
      The function detect_repeating_patterns was not very useful for
      workloads with alternating long and short pauses, for example
      virtual machines handling network requests for each other (say
      a web and database server).
      
      Instead, try to find a recent sleep interval that is somewhere
      between the median and the mode sleep time, by discarding outliers
      to the up side and recalculating the average and standard deviation
      until that is no longer required.
      
      This should do something sane with a sleep interval series like:
      
      	200 180 210 10000 30 1000 170 200
      
      The current code would simply discard such a series, while the
      new code will guess a typical sleep interval just shy of 200.
      
      The original patch come from Rik van Riel <riel@redhat.com>.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c96ca4fb
    • Y
      cpuidle: Set residency to 0 if target Cstate not enter · d73d68dc
      Youquan Song 提交于
      When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
      there is tasks request to be executed. So the idle CPU will not really enter
      the target C-state and go to run task.
      
      In this situation, it will use the residency of previous really entered target
      C-states. Obviously, it is not reasonable.
      
      So, this patch fix it by set the target C-state residency to 0.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      d73d68dc
    • Y
      cpuidle: Quickly notice prediction failure in general case · e11538d1
      Youquan Song 提交于
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      The patch extends to general case that prediction logic get a small predicted
      residency, so it choose a shallow C-state though the expected residency is large
      . Once the prediction will be fail, the CPU will keep staying at shallow C-state
      for a long time. Acutally, the CPU has change enter into deep C-state.
      So when the expected residency is long enough but governor choose a shallow
      C-state, an timer will be added in order to monitor if the prediction failure.
      
      When C-state is waken up prior to the adding timer, the timer will be cancelled
      initiatively. When the timer is triggered and menu governor will quickly notice
      prediction failure and re-evaluates deeper C-states possibility.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e11538d1
    • Y
      cpuidle: Quickly notice prediction failure for repeat mode · 69a37bea
      Youquan Song 提交于
      The prediction for future is difficult and when the cpuidle governor prediction
      fails and govenor possibly choose the shallower C-state than it should. How to
      quickly notice and find the failure becomes important for power saving.
      
      cpuidle menu governor has a method to predict the repeat pattern if there are 8
      C-states residency which are continuous and the same or very close, so it will
      predict the next C-states residency will keep same residency time.
      
      There is a real case that turbostat utility (tools/power/x86/turbostat)
      at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
      Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
       governor will predict it is repeat mode and there is another IPI wake up idle
       CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
      idle. However, in the turbostat, following 10 registers reading is sleep 5
      seconds by default, so the idle CPU will keep at C1 for a long time though it is
       idle until break event occurs.
      In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
      C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
      deep C-state stays at >99.98%.
      
      In the patch, a timer is added when menu governor detects a repeat mode and
      choose a shallow C-state. The timer is set to a time out value that greater
      than predicted time, and we conclude repeat mode prediction failure if timer is
      triggered. When repeat mode happens as expected, the timer is not triggered
      and CPU waken up from C-states and it will cancel the timer initiatively.
      When repeat mode does not happen, the timer will be time out and menu governor
      will quickly notice that the repeat mode prediction fails and then re-evaluates
      deeper C-states possibility.
      
      Below is another case which will clearly show the patch much benefit:
      
      #include <stdlib.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <signal.h>
      #include <sys/time.h>
      #include <time.h>
      #include <pthread.h>
      
      volatile int * shutdown;
      volatile long * count;
      int delay = 20;
      int loop = 8;
      
      void usage(void)
      {
      	fprintf(stderr,
      		"Usage: idle_predict [options]\n"
      		"  --help	-h  Print this help\n"
      		"  --thread	-n  Thread number\n"
      		"  --loop     	-l  Loop times in shallow Cstate\n"
      		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
      }
      
      void *simple_loop() {
      	int idle_num = 1;
      	while (!(*shutdown)) {
      		*count = *count + 1;
      
      		if (idle_num % loop)
      			usleep(delay);
      		else {
      			/* sleep 1 second */
      			usleep(1000000);
      			idle_num = 0;
      		}
      		idle_num++;
      	}
      
      }
      
      static void sighand(int sig)
      {
      	*shutdown = 1;
      }
      
      int main(int argc, char *argv[])
      {
      	sigset_t sigset;
      	int signum = SIGALRM;
      	int i, c, er = 0, thread_num = 8;
      	pthread_t pt[1024];
      
      	static char optstr[] = "n:l:t:h:";
      
      	while ((c = getopt(argc, argv, optstr)) != EOF)
      		switch (c) {
      			case 'n':
      				thread_num = atoi(optarg);
      				break;
      			case 'l':
      				loop = atoi(optarg);
      				break;
      			case 't':
      				delay = atoi(optarg);
      				break;
      			case 'h':
      			default:
      				usage();
      				exit(1);
      		}
      
      	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
      	count = malloc(sizeof(long));
      	shutdown = malloc(sizeof(int));
      	*count = 0;
      	*shutdown = 0;
      
      	sigemptyset(&sigset);
      	sigaddset(&sigset, signum);
      	sigprocmask (SIG_BLOCK, &sigset, NULL);
      	signal(SIGINT, sighand);
      	signal(SIGTERM, sighand);
      
      	for(i = 0; i < thread_num ; i++)
      		pthread_create(&pt[i], NULL, simple_loop, NULL);
      
      	for (i = 0; i < thread_num; i++)
      		pthread_join(pt[i], NULL);
      
      	exit(0);
      }
      
      Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
      After build the above test application, then run it.
      Test plaform can be Intel Sandybridge or other recent platforms.
      #./idle_predict -l 10 &
      #./powertop
      
      We will find that deep C-state will dangle between 40%~100% and much time spent
      on C1 state. It is because menu governor wrongly predict that repeat mode
      is kept, so it will choose the C1 shallow C-state even though it has chance to
      sleep 1 second in deep C-state.
      
      While after patched the kernel, we find that deep C-state will keep >99.6%.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      69a37bea
    • D
      cpuidle / sysfs: move kobj initialization in the syfs file · e45a00d6
      Daniel Lezcano 提交于
      Move the kobj initialization and completion in the sysfs.c
      and encapsulate the code more.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e45a00d6
    • D
      cpuidle / sysfs: change function parameter · 1aef40e2
      Daniel Lezcano 提交于
      The function needs the cpuidle_device which is initially passed to the
      caller.
      
      The current code gets the struct device from the struct cpuidle_device,
      pass it the cpuidle_add_sysfs function. This function calls
      per_cpu(cpuidle_devices, cpu) to get the cpuidle_device.
      
      This patch pass the cpuidle_device instead and simplify the code.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1aef40e2
  2. 11 11月, 2012 3 次提交
    • L
      Linux 3.7-rc5 · 77b67063
      Linus Torvalds 提交于
      77b67063
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b251f0f3
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Bug fixes galore, mostly in drivers as is often the case:
      
        1) USB gadget and cdc_eem drivers need adjustments to their frame size
           lengths in order to handle VLANs correctly.  From Ian Coolidge.
      
        2) TIPC and several network drivers erroneously call tasklet_disable
           before tasklet_kill, fix from Xiaotian Feng.
      
        3) r8169 driver needs to apply the WOL suspend quirk to more chipsets,
           fix from Cyril Brulebois.
      
        4) Fix multicast filters on RTL_GIGA_MAC_VER_35 r8169 chips, from
           Nathan Walp.
      
        5) FDB netlink dumps should use RTM_NEWNEIGH as the message type, not
           zero.  From John Fastabend.
      
        6) Fix smsc95xx tx checksum offload on big-endian, from Steve
           Glendinning.
      
        7) __inet_diag_dump() needs to repsect and report the error value
           returned from inet_diag_lock_handler() rather than ignore it.
           Otherwise if an inet diag handler is not available for a particular
           protocol, we essentially report success instead of giving an error
           indication.  Fix from Cyrill Gorcunov.
      
        8) When the QFQ packet scheduler sees TSO/GSO packets it does not
           handle things properly, and in fact ends up corrupting it's
           datastructures as well as mis-schedule packets.  Fix from Paolo
           Valente.
      
        9) Fix oopser in skb_loop_sk(), from Eric Leblond.
      
        10) CXGB4 passes partially uninitialized datastructures in to FW
            commands, fix from Vipul Pandya.
      
        11) When we send unsolicited ipv6 neighbour advertisements, we should
            send them to the link-local allnodes multicast address, as per
            RFC4861.  Fix from Hannes Frederic Sowa.
      
        12) There is some kind of bug in the usbnet's kevent deferral
            mechanism, but more immediately when it triggers an uncontrolled
            stream of kernel messages spam the log.  Rate limit the error log
            message triggered when this problem occurs, as sending thousands
            of error messages into the kernel log doesn't help matters at all,
            and in fact makes further diagnosis more difficult.
      
            From Steve Glendinning.
      
        13) Fix gianfar restore from hibernation, from Wang Dongsheng.
      
        14) The netlink message attribute sizes are wrong in the ipv6 GRE
            driver, it was using the size of ipv4 addresses instead of ipv6
            ones :-) Fix from Nicolas Dichtel."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        gre6: fix rtnl dump messages
        gianfar: ethernet vanishes after restoring from hibernation
        usbnet: ratelimit kevent may have been dropped warnings
        ipv6: send unsolicited neighbour advertisements to all-nodes
        net: usb: cdc_eem: Fix rx skb allocation for 802.1Q VLANs
        usb: gadget: g_ether: fix frame size check for 802.1Q
        cxgb4: Fix initialization of SGE_CONTROL register
        isdn: Make CONFIG_ISDN depend on CONFIG_NETDEVICES
        cxgb4: Initialize data structures before using.
        af-packet: fix oops when socket is not present
        pkt_sched: enable QFQ to support TSO/GSO
        net: inet_diag -- Return error code if protocol handler is missed
        net: bnx2x: Fix typo in bnx2x driver
        smsc95xx: fix tx checksum offload for big endian
        rtnetlink: Use nlmsg type RTM_NEWNEIGH from dflt fdb dump
        ptp: update adjfreq callback description
        r8169: allow multicast packets on sub-8168f chipset.
        r8169: Fix WoL on RTL8168d/8111d.
        drivers/net: use tasklet_kill in device remove/close process
        tipc: do not use tasklet_disable before tasklet_kill
      b251f0f3
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 2b1768f3
      Linus Torvalds 提交于
      Pull sparc fixes from David Miller:
       "Several build/bug fixes for sparc, including:
      
        1) Configuring a mix of static vs.  modular sparc64 crypto modules
           didn't work, remove an ill-conceived attempt to only have to build
           the device match table for these drivers once to fix the problem.
      
           Reported by Meelis Roos.
      
        2) Make the montgomery multiple/square and mpmul instructions actually
           usable in 32-bit tasks.  Essentially this involves providing 32-bit
           userspace with a way to use a 64-bit stack when it needs to.
      
        3) Our sparc64 atomic backoffs don't yield cpu strands properly on
           Niagara chips.  Use pause instruction when available to achieve
           this, otherwise use a benign instruction we know blocks the strand
           for some time.
      
        4) Wire up kcmp
      
        5) Fix the build of various drivers by removing the unnecessary
           blocking of OF_GPIO when SPARC.
      
        6) Fix unintended regression wherein of_address_to_resource stopped
           being provided.  Fix from Andreas Larsson.
      
        7) Fix NULL dereference in leon_handle_ext_irq(), also from Andreas
           Larsson."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Fix build with mix of modular vs. non-modular crypto drivers.
        sparc: Support atomic64_dec_if_positive properly.
        of/address: sparc: Declare of_address_to_resource() as an extern function for sparc again
        sparc32, leon: Check for existent irq_map entry in leon_handle_ext_irq
        sparc: Add sparc support for platform_get_irq()
        sparc: Allow OF_GPIO on sparc.
        qlogicpti: Fix build warning.
        sparc: Wire up sys_kcmp.
        sparc64: Improvde documentation and readability of atomic backoff code.
        sparc64: Use pause instruction when available.
        sparc64: Fix cpu strand yielding.
        sparc64: Make montmul/montsqr/mpmul usable in 32-bit threads.
      2b1768f3
  3. 10 11月, 2012 17 次提交
  4. 09 11月, 2012 13 次提交