1. 22 10月, 2011 1 次提交
    • R
      PM / Sleep: Mark devices involved in wakeup signaling during suspend · 4ca46ff3
      Rafael J. Wysocki 提交于
      The generic PM domains code in drivers/base/power/domain.c has
      to avoid powering off domains that provide power to wakeup devices
      during system suspend.  Currently, however, this only works for
      wakeup devices directly belonging to the given domain and not for
      their children (or the children of their children and so on).
      Thus, if there's a wakeup device whose parent belongs to a power
      domain handled by the generic PM domains code, the domain will be
      powered off during system suspend preventing the device from
      signaling wakeup.
      
      To address this problem introduce a device flag, power.wakeup_path,
      that will be set during system suspend for all wakeup devices,
      their parents, the parents of their parents and so on.  This way,
      all wakeup paths in the device hierarchy will be marked and the
      generic PM domains code will only need to avoid powering off
      domains containing devices whose power.wakeup_path is set.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      4ca46ff3
  2. 17 10月, 2011 4 次提交
    • R
      PM / Hibernate: Freeze kernel threads after preallocating memory · 2aede851
      Rafael J. Wysocki 提交于
      There is a problem with the current ordering of hibernate code which
      leads to deadlocks in some filesystems' memory shrinkers.  Namely,
      some filesystems use freezable kernel threads that are inactive when
      the hibernate memory preallocation is carried out.  Those same
      filesystems use memory shrinkers that may be triggered by the
      hibernate memory preallocation.  If those memory shrinkers wait for
      the frozen kernel threads, the hibernate process deadlocks (this
      happens with XFS, for one example).
      
      Apparently, it is not technically viable to redesign the filesystems
      in question to avoid the situation described above, so the only
      possible solution of this issue is to defer the freezing of kernel
      threads until the hibernate memory preallocation is done, which is
      implemented by this change.
      
      Unfortunately, this requires the memory preallocation to be done
      before the "prepare" stage of device freeze, so after this change the
      only way drivers can allocate additional memory for their freeze
      routines in a clean way is to use PM notifiers.
      Reported-by: NChristoph <cr2005@u-club.de>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      2aede851
    • H
      PM / VT: Cleanup #if defined uglyness and fix compile error · 37cce26b
      H Hartley Sweeten 提交于
      Introduce the config option CONFIG_VT_CONSOLE_SLEEP in order to cleanup
      the #if defined ugliness for the vt suspend support functions. Note that
      CONFIG_VT_CONSOLE is already dependant on CONFIG_VT.
      
      The function pm_set_vt_switch is actually dependant on CONFIG_VT and not
      CONFIG_PM_SLEEP. This fixes a compile error when CONFIG_PM_SLEEP is
      not set:
      
      drivers/tty/vt/vt_ioctl.c:1794: error: redefinition of 'pm_set_vt_switch'
      include/linux/suspend.h:17: error: previous definition of 'pm_set_vt_switch' was here
      
      Also, remove the incorrect path from the comment in console.c.
      
      [rjw: Replaced #if defined() with #ifdef in suspend.h.]
      Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      37cce26b
    • M
      PM / Hibernate: Include storage keys in hibernation image on s390 · 85055dd8
      Martin Schwidefsky 提交于
      For s390 there is one additional byte associated with each page,
      the storage key. This byte contains the referenced and changed
      bits and needs to be included into the hibernation image.
      If the storage keys are not restored to their previous state all
      original pages would appear to be dirty. This can cause
      inconsistencies e.g. with read-only filesystems.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      85055dd8
    • S
      PM / Suspend: Add statistics debugfs file for suspend to RAM · 2a77c46d
      ShuoX Liu 提交于
      Record S3 failure time about each reason and the latest two failed
      devices' names in S3 progress.
      We can check it through 'suspend_stats' entry in debugfs.
      
      The motivation of the patch:
      
      We are enabling power features on Medfield. Comparing with PC/notebook,
      a mobile enters/exits suspend-2-ram (we call it s3 on Medfield) far
      more frequently. If it can't enter suspend-2-ram in time, the power
      might be used up soon.
      
      We often find sometimes, a device suspend fails. Then, system retries
      s3 over and over again. As display is off, testers and developers
      don't know what happens.
      
      Some testers and developers complain they don't know if system
      tries suspend-2-ram, and what device fails to suspend. They need
      such info for a quick check. The patch adds suspend_stats under
      debugfs for users to check suspend to RAM statistics quickly.
      
      If not using this patch, we have other methods to get info about
      what device fails. One is to turn on  CONFIG_PM_DEBUG, but users
      would get too much info and testers need recompile the system.
      
      In addition, dynamic debug is another good tool to dump debug info.
      But it still doesn't match our utilization scenario closely.
      1) user need write a user space parser to process the syslog output;
      2) Our testing scenario is we leave the mobile for at least hours.
         Then, check its status. No serial console available during the
         testing. One is because console would be suspended, and the other
         is serial console connecting with spi or HSU devices would consume
         power. These devices are powered off at suspend-2-ram.
      Signed-off-by: NShuoX Liu <shuox.liu@intel.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      2a77c46d
  3. 05 10月, 2011 2 次提交
    • R
      PM / QoS: Add function dev_pm_qos_read_value() (v3) · 1a9a9152
      Rafael J. Wysocki 提交于
      To read the current PM QoS value for a given device we need to
      make sure that the device's power.constraints object won't be
      removed while we're doing that.  For this reason, put the
      operation under dev->power.lock and acquire the lock
      around the initialization and removal of power.constraints.
      
      Moreover, since we're using the value of power.constraints to
      determine whether or not the object is present, the
      power.constraints_state field isn't necessary any more and may be
      removed.  However, dev_pm_qos_add_request() needs to check if the
      device is being removed from the system before allocating a new
      PM QoS constraints object for it, so make it use the
      power.power_state field of struct device for this purpose.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      1a9a9152
    • J
      PCI: Disable MPS configuration by default · 5f39e670
      Jon Mason 提交于
      Add the ability to disable PCI-E MPS turning and using the BIOS
      configured MPS defaults.  Due to the number of issues recently
      discovered on some x86 chipsets, make this the default behavior.
      
      Also, add the option for peer to peer DMA MPS configuration.  Peer to
      peer DMA is outside the scope of this patch, but MPS configuration could
      prevent it from working by having the MPS on one root port different
      than the MPS on another.  To work around this, simply make the system
      wide MPS the smallest possible value (128B).
      Signed-off-by: NJon Mason <mason@myri.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f39e670
  4. 02 10月, 2011 2 次提交
    • M
      PM / devfreq: Add basic governors · ce26c5bb
      MyungJoo Ham 提交于
      Four cpufreq-like governors are provided as examples.
      
      powersave: use the lowest frequency possible. The user (device) should
      set the polling_ms as 0 because polling is useless for this governor.
      
      performance: use the highest freqeuncy possible. The user (device)
      should set the polling_ms as 0 because polling is useless for this
      governor.
      
      userspace: use the user specified frequency stored at
      devfreq.user_set_freq. With sysfs support in the following patch, a user
      may set the value with the sysfs interface.
      
      simple_ondemand: simplified version of cpufreq's ondemand governor.
      
      When a user updates OPP entries (enable/disable/add), OPP framework
      automatically notifies devfreq to update operating frequency
      accordingly. Thus, devfreq users (device drivers) do not need to update
      devfreq manually with OPP entry updates or set polling_ms for powersave
      , performance, userspace, or any other "static" governors.
      
      Note that these are given only as basic examples for governors and any
      devices with devfreq may implement their own governors with the drivers
      and use them.
      Signed-off-by: NMyungJoo Ham <myungjoo.ham@samsung.com>
      Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
      Reviewed-by: NMike Turquette <mturquette@ti.com>
      Acked-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      ce26c5bb
    • M
      PM: Introduce devfreq: generic DVFS framework with device-specific OPPs · a3c98b8b
      MyungJoo Ham 提交于
      With OPPs, a device may have multiple operable frequency and voltage
      sets. However, there can be multiple possible operable sets and a system
      will need to choose one from them. In order to reduce the power
      consumption (by reducing frequency and voltage) without affecting the
      performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
      scheme may be used.
      
      This patch introduces the DVFS capability to non-CPU devices with OPPs.
      DVFS is a techique whereby the frequency and supplied voltage of a
      device is adjusted on-the-fly. DVFS usually sets the frequency as low
      as possible with given conditions (such as QoS assurance) and adjusts
      voltage according to the chosen frequency in order to reduce power
      consumption and heat dissipation.
      
      The generic DVFS for devices, devfreq, may appear quite similar with
      /drivers/cpufreq.  However, cpufreq does not allow to have multiple
      devices registered and is not suitable to have multiple heterogenous
      devices with different (but simple) governors.
      
      Normally, DVFS mechanism controls frequency based on the demand for
      the device, and then, chooses voltage based on the chosen frequency.
      devfreq also controls the frequency based on the governor's frequency
      recommendation and let OPP pick up the pair of frequency and voltage
      based on the recommended frequency. Then, the chosen OPP is passed to
      device driver's "target" callback.
      
      When PM QoS is going to be used with the devfreq device, the device
      driver should enable OPPs that are appropriate with the current PM QoS
      requests. In order to do so, the device driver may call opp_enable and
      opp_disable at the notifier callback of PM QoS so that PM QoS's
      update_target() call enables the appropriate OPPs. Note that at least
      one of OPPs should be enabled at any time; be careful when there is a
      transition.
      Signed-off-by: NMyungJoo Ham <myungjoo.ham@samsung.com>
      Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
      Reviewed-by: NMike Turquette <mturquette@ti.com>
      Acked-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      a3c98b8b
  5. 01 10月, 2011 1 次提交
  6. 30 9月, 2011 1 次提交
    • P
      posix-cpu-timers: Cure SMP wobbles · d670ec13
      Peter Zijlstra 提交于
      David reported:
      
        Attached below is a watered-down version of rt/tst-cpuclock2.c from
        GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
        similar.
      
        Run it several times, and you will see cases where the main thread
        will measure a process clock difference before and after the nanosleep
        which is smaller than the cpu-burner thread's individual thread clock
        difference.  This doesn't make any sense since the cpu-burner thread
        is part of the top-level process's thread group.
      
        I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
        64-bit binaries).
      
        For example:
      
        [davem@boricha build-x86_64-linux]$ ./test
        process: before(0.001221967) after(0.498624371) diff(497402404)
        thread:  before(0.000081692) after(0.498316431) diff(498234739)
        self:    before(0.001223521) after(0.001240219) diff(16698)
        [davem@boricha build-x86_64-linux]$ 
      
        The diff of 'process' should always be >= the diff of 'thread'.
      
        I make sure to wrap the 'thread' clock measurements the most tightly
        around the nanosleep() call, and that the 'process' clock measurements
        are the outer-most ones.
      
        ---
        #include <unistd.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <time.h>
        #include <fcntl.h>
        #include <string.h>
        #include <errno.h>
        #include <pthread.h>
      
        static pthread_barrier_t barrier;
      
        static void *chew_cpu(void *arg)
        {
      	  pthread_barrier_wait(&barrier);
      	  while (1)
      		  __asm__ __volatile__("" : : : "memory");
      	  return NULL;
        }
      
        int main(void)
        {
      	  clockid_t process_clock, my_thread_clock, th_clock;
      	  struct timespec process_before, process_after;
      	  struct timespec me_before, me_after;
      	  struct timespec th_before, th_after;
      	  struct timespec sleeptime;
      	  unsigned long diff;
      	  pthread_t th;
      	  int err;
      
      	  err = clock_getcpuclockid(0, &process_clock);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_init(&barrier, NULL, 2);
      	  err = pthread_create(&th, NULL, chew_cpu, NULL);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(th, &th_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_wait(&barrier);
      
      	  err = clock_gettime(process_clock, &process_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(th_clock, &th_before);
      	  if (err)
      		  return 1;
      
      	  sleeptime.tv_sec = 0;
      	  sleeptime.tv_nsec = 500000000;
      	  nanosleep(&sleeptime, NULL);
      
      	  err = clock_gettime(th_clock, &th_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(process_clock, &process_after);
      	  if (err)
      		  return 1;
      
      	  diff = process_after.tv_nsec - process_before.tv_nsec;
      	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 process_before.tv_sec, process_before.tv_nsec,
      		 process_after.tv_sec, process_after.tv_nsec, diff);
      	  diff = th_after.tv_nsec - th_before.tv_nsec;
      	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 th_before.tv_sec, th_before.tv_nsec,
      		 th_after.tv_sec, th_after.tv_nsec, diff);
      	  diff = me_after.tv_nsec - me_before.tv_nsec;
      	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 me_before.tv_sec, me_before.tv_nsec,
      		 me_after.tv_sec, me_after.tv_nsec, diff);
      
      	  return 0;
        }
      
      This is due to us using p->se.sum_exec_runtime in
      thread_group_cputime() where we iterate the thread group and sum all
      data. This does not take time since the last schedule operation (tick
      or otherwise) into account. We can cure this by using
      task_sched_runtime() at the cost of having to take locks.
      
      This also means we can (and must) do away with
      thread_group_sched_runtime() since the modified thread_group_cputime()
      is now more accurate and would deadlock when called from
      thread_group_sched_runtime().
      
      Aside of that it makes the function safe on 32 bit systems. The old
      code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
      64bit value and could be changed on another cpu at the same time.
      Reported-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twinsTested-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d670ec13
  7. 29 9月, 2011 1 次提交
    • R
      ptp: fix L2 event message recognition · f75159e9
      Richard Cochran 提交于
      The IEEE 1588 standard defines two kinds of messages, event and general
      messages. Event messages require time stamping, and general do not. When
      using UDP transport, two separate ports are used for the two message
      types.
      
      The BPF designed to recognize event messages incorrectly classifies L2
      general messages as event messages. This commit fixes the issue by
      extending the filter to check the message type field for L2 PTP packets.
      Event messages are be distinguished from general messages by testing
      the "general" bit.
      Signed-off-by: NRichard Cochran <richard.cochran@omicron.at>
      Cc: <stable@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f75159e9
  8. 27 9月, 2011 3 次提交
    • L
      vfs: remove LOOKUP_NO_AUTOMOUNT flag · b6c8069d
      Linus Torvalds 提交于
      That flag no longer makes sense, since we don't look up automount points
      as eagerly any more.  Additionally, it turns out that the NO_AUTOMOUNT
      handling was buggy to begin with: it would avoid automounting even for
      cases where we really *needed* to do the automount handling, and could
      return ENOENT for autofs entries that hadn't been instantiated yet.
      
      With our new non-eager automount semantics, one discussion has been
      about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
      newfstatat() and fstatat64() system calls), but it's probably not worth
      it: you can always force at least directory automounting by simply
      adding the final '/' to the filename, which works for *all* of the stat
      family system calls, old and new.
      
      So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
      result of our bad default behavior.
      Acked-by: NIan Kent <raven@themaw.net>
      Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6c8069d
    • L
      vfs pathname lookup: Add LOOKUP_AUTOMOUNT flag · d94c177b
      Linus Torvalds 提交于
      Since we've now turned around and made LOOKUP_FOLLOW *not* force an
      automount, we want to add the ability to force an automount event on
      lookup even if we don't happen to have one of the other flags that force
      it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)
      
      Most cases will never want to use this, since you'd normally want to
      delay automounting as long as possible, which usually implies
      LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
      the automount any more).
      
      But Trond argued sufficiently forcefully that at a minimum bind mounting
      a file and quotactl will want to force the automount lookup.  Some other
      cases (like nfs_follow_remote_path()) could use it too, although
      LOOKUP_DIRECTORY would work there as well.
      
      This commit just adds the flag and logic, no users yet, though.  It also
      doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
      was made irrelevant by the same change that made us not follow on
      LOOKUP_FOLLOW.
      
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Greg KH <gregkh@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d94c177b
    • R
      PM / Domains: Split device PM domain data into base and need_restore · cd0ea672
      Rafael J. Wysocki 提交于
      The struct pm_domain_data data type is defined in such a way that
      adding new fields specific to the generic PM domains code will
      require include/linux/pm.h to be modified.  As a result, data types
      used only by the generic PM domains code will be defined in two
      headers, although they all should be defined in pm_domain.h and
      pm.h will need to include more headers, which won't be very nice.
      
      For this reason change the definition of struct pm_subsys_data
      so that its domain_data member is a pointer, which will allow
      struct pm_domain_data to be subclassed by various PM domains
      implementations.  Remove the need_restore member from
      struct pm_domain_data and make the generic PM domains code
      subclass it by adding the need_restore member to the new data type.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      cd0ea672
  9. 26 9月, 2011 1 次提交
    • M
      dm crypt: always disable discard_zeroes_data · 983c7db3
      Milan Broz 提交于
      If optional discard support in dm-crypt is enabled, discards requests
      bypass the crypt queue and blocks of the underlying device are discarded.
      For the read path, discarded blocks are handled the same as normal
      ciphertext blocks, thus decrypted.
      
      So if the underlying device announces discarded regions return zeroes,
      dm-crypt must disable this flag because after decryption there is just
      random noise instead of zeroes.
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      983c7db3
  10. 20 9月, 2011 2 次提交
  11. 16 9月, 2011 2 次提交
    • M
      net: copy userspace buffers on device forwarding · 48c83012
      Michael S. Tsirkin 提交于
      dev_forward_skb loops an skb back into host networking
      stack which might hang on the memory indefinitely.
      In particular, this can happen in macvtap in bridged mode.
      Copy the userspace fragments to avoid blocking the
      sender in that case.
      
      As this patch makes skb_copy_ubufs extern now,
      I also added some documentation and made it clear
      the SKBTX_DEV_ZEROCOPY flag automatically instead
      of doing it in all callers. This can be made into a separate
      patch if people feel it's worth it.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48c83012
    • E
      tcp: Change possible SYN flooding messages · 946cedcc
      Eric Dumazet 提交于
      "Possible SYN flooding on port xxxx " messages can fill logs on servers.
      
      Change logic to log the message only once per listener, and add two new
      SNMP counters to track :
      
      TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client
      
      TCPReqQFullDrop : number of times a SYN request was dropped because
      syncookies were not enabled.
      
      Based on a prior patch from Tom Herbert, and suggestions from David.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      946cedcc
  12. 15 9月, 2011 2 次提交
  13. 09 9月, 2011 1 次提交
  14. 06 9月, 2011 1 次提交
  15. 29 8月, 2011 1 次提交
    • S
      perf events: Fix slow and broken cgroup context switch code · a8d757ef
      Stephane Eranian 提交于
      The current cgroup context switch code was incorrect leading
      to bogus counts. Furthermore, as soon as there was an active
      cgroup event on a CPU, the context switch cost on that CPU
      would increase by a significant amount as demonstrated by a
      simple ping/pong example:
      
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10684.51 ctxsw/s
      
      Now start a cgroup perf stat:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
      $ ./pong
       Both processes pinned to CPU1, running for 10s
       6674.61 ctxsw/s
      
      That's a 37% penalty.
      
      Note that pong is not even in the monitored cgroup.
      
      The results shown by perf stat are bogus:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
       Performance counter stats for 'sleep 100':
      
       CPU1 <not counted> cycles   test
       CPU1 16,984,189,138 cycles  #    0.000 GHz
      
      The second 'cycles' event should report a count @ CPU clock
      (here 2.4GHz) as it is counting across all cgroups.
      
      The patch below fixes the bogus accounting and bypasses any
      cgroup switches in case the outgoing and incoming tasks are
      in the same cgroup.
      
      With this patch the same test now yields:
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10775.30 ctxsw/s
      
      Start perf stat with cgroup:
      
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
      Run pong outside the cgroup:
       $ /pong
       Both processes pinned to CPU1, running for 10s
       10687.80 ctxsw/s
      
      The penalty is now less than 2%.
      
      And the results for perf stat are correct:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 <not counted> cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
      Now perf stat reports the correct counts for
      for the non cgroup event.
      
      If we run pong inside the cgroup, then we also get the
      correct counts:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 22,297,726,205 cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
            10.001457237 seconds time elapsed
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110825135803.GA4697@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8d757ef
  16. 27 8月, 2011 1 次提交
  17. 26 8月, 2011 5 次提交
    • D
      backlight: add a callback 'notify_after' for backlight control · cc7993f6
      Dilan Lee 提交于
      We need a callback to do some things after pwm_enable, pwm_disable
      and pwm_config.
      Signed-off-by: NDilan Lee <dilee@nvidia.com>
      Reviewed-by: NRobert Morell <rmorell@nvidia.com>
      Reviewed-by: NArun Murthy <arun.murthy@stericsson.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc7993f6
    • A
      rapidio: fix use of non-compatible registers · 284fb68d
      Alexandre Bounine 提交于
      Replace/remove use of RIO v.1.2 registers/bits that are not
      forward-compatible with newer versions of RapidIO specification.
      
      RapidIO specification v.1.3 removed Write Port CSR, Doorbell CSR,
      Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.
      
      Use of removed (since RIO v.1.3) register bits affects users of
      currently available 1.3 and 2.x compliant devices who may use not so
      recent kernel versions.
      
      Removing checks for unsupported bits makes corresponding routines
      compatible with all versions of RapidIO specification.  Therefore,
      backporting makes stable kernel versions compliant with RIO v.1.3 and
      later as well.
      Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Thomas Moll <thomas.moll@sysgo.com>
      Cc: Chul Kim <chul.kim@idt.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      284fb68d
    • E
      a8018766
    • J
      lockdep: Add helper function for dir vs file i_mutex annotation · e096d0c7
      Josh Boyer 提交于
      Purely in-memory filesystems do not use the inode hash as the dcache
      tells us if an entry already exists.  As a result, they do not call
      unlock_new_inode, and thus directory inodes do not get put into a
      different lockdep class for i_sem.
      
      We need the different lockdep classes, because the locking order for
      i_mutex is different for directory inodes and regular inodes.  Directory
      inodes can do "readdir()", which takes i_mutex *before* possibly taking
      mm->mmap_sem (due to a page fault while copying the directory entry to
      user space).
      
      In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
      before accessing i_mutex.
      
      The two cases can never happen for the same inode, so no real deadlock
      can occur, but without the different lockdep classes, lockdep cannot
      understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
      can lead to false positives from lockdep like below:
      
          find/645 is trying to acquire lock:
           (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac
      
          but task is already holding lock:
           (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>]
          vfs_readdir+0x5b/0xb4
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
                [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
                [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
                [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
                [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
                [<ffffffff81111557>] mmap_region+0x258/0x432
                [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
                [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
                [<ffffffff8100c858>] sys_mmap+0x22/0x24
                [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b
      
          -> #0 (&mm->mmap_sem){++++++}:
                [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
                [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
                [<ffffffff81109541>] might_fault+0x89/0xac
                [<ffffffff81149cff>] filldir+0x6f/0xc7
                [<ffffffff811586ea>] dcache_readdir+0x67/0x205
                [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
                [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
                [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b
      
      This patch moves the directory vs file lockdep annotation into a helper
      function that can be called by in-memory filesystems and has hugetlbfs
      call it.
      Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e096d0c7
    • A
      Add a personality to report 2.6.x version numbers · be27425d
      Andi Kleen 提交于
      I ran into a couple of programs which broke with the new Linux 3.0
      version.  Some of those were binary only.  I tried to use LD_PRELOAD to
      work around it, but it was quite difficult and in one case impossible
      because of a mix of 32bit and 64bit executables.
      
      For example, all kind of management software from HP doesnt work, unless
      we pretend to run a 2.6 kernel.
      
        $ uname -a
        Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux
      
        $ hpacucli ctrl all show
      
        Error: No controllers detected.
      
        $ rpm -qf /usr/sbin/hpacucli
        hpacucli-8.75-12.0
      
      Another notable case is that Python now reports "linux3" from
      sys.platform(); which in turn can break things that were checking
      sys.platform() == "linux2":
      
        https://bugzilla.mozilla.org/show_bug.cgi?id=664564
      
      It seems pretty clear to me though it's a bug in the apps that are using
      '==' instead of .startswith(), but this allows us to unbreak broken
      programs.
      
      This patch adds a UNAME26 personality that makes the kernel report a
      2.6.40+x version number instead.  The x is the x in 3.x.
      
      I know this is somewhat ugly, but I didn't find a better workaround, and
      compatibility to existing programs is important.
      
      Some programs also read /proc/sys/kernel/osrelease.  This can be worked
      around in user space with mount --bind (and a mount namespace)
      
      To use:
      
        wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
        gcc -o uname26 uname26.c
        ./uname26 program
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be27425d
  18. 25 8月, 2011 9 次提交
    • R
      PM / Domains: Preliminary support for devices with power.irq_safe set · 0aa2a221
      Rafael J. Wysocki 提交于
      The generic PM domains framework currently doesn't work with devices
      whose power.irq_safe flag is set, because runtime PM callbacks for
      such devices are run with interrupts disabled and the callbacks
      provided by the generic PM domains framework use domain mutexes
      and may sleep.  However, such devices very well may belong to
      power domains on some systems, so the generic PM domains framework
      should take them into account.
      
      For this reason, modify the generic PM domains framework so that the
      domain .power_off() and .power_on() callbacks are never executed for
      a domain containing devices with power.irq_safe set, although the
      .stop_device() and .start_device() callbacks are still run for them.
      
      Additionally, introduce a flag allowing the creator of a
      struct generic_pm_domain object to indicate that its .stop_device()
      and .start_device() callbacks may be run in interrupt context
      (might_sleep_if() triggers if that flag is not set and one of those
      callbacks is run in interrupt context).
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      0aa2a221
    • J
      PM QoS: Add global notification mechanism for device constraints · b66213cd
      Jean Pihet 提交于
      Add a global notification chain that gets called upon changes to the
      aggregated constraint value for any device.
      The notification callbacks are passing the full constraint request data
      in order for the callees to have access to it. The current use is for the
      platform low-level code to access the target device of the constraint.
      Signed-off-by: NJean Pihet <j-pihet@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      b66213cd
    • J
      PM QoS: Implement per-device PM QoS constraints · 91ff4cb8
      Jean Pihet 提交于
      Implement the per-device PM QoS constraints by creating a device
      PM QoS API, which calls the PM QoS constraints management core code.
      
      The per-device latency constraints data strctures are stored
      in the device dev_pm_info struct.
      
      The device PM code calls the init and destroy of the per-device constraints
      data struct in order to support the dynamic insertion and removal of the
      devices in the system.
      
      To minimize the data usage by the per-device constraints, the data struct
      is only allocated at the first call to dev_pm_qos_add_request.
      The data is later free'd when the device is removed from the system.
      A global mutex protects the constraints users from the data being
      allocated and free'd.
      Signed-off-by: NJean Pihet <j-pihet@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      91ff4cb8
    • J
      PM QoS: Generalize and export constraints management code · abe98ec2
      Jean Pihet 提交于
      In preparation for the per-device constratins support:
       - rename update_target to pm_qos_update_target
       - generalize and export pm_qos_update_target for usage by the upcoming
         per-device latency constraints framework:
         * operate on struct pm_qos_constraints for constraints management,
         * introduce an 'action' parameter for constraints add/update/remove,
         * the return value indicates if the aggregated constraint value has
           changed,
       - update the internal code to operate on struct pm_qos_constraints
       - add a NULL pointer check in the API functions
      Signed-off-by: NJean Pihet <j-pihet@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      abe98ec2
    • J
      PM QoS: Reorganize data structs · 4e1779ba
      Jean Pihet 提交于
      In preparation for the per-device constratins support, re-organize
      the data strctures:
       - add a struct pm_qos_constraints which contains the constraints
       related data
       - update struct pm_qos_object contents to the PM QoS internal object
       data. Add a pointer to struct pm_qos_constraints
       - update the internal code to use the new data structs.
      Signed-off-by: NJean Pihet <j-pihet@ti.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      4e1779ba
    • J
      PM QoS: Minor clean-ups · cc749986
      Jean Pihet 提交于
       - Misc fixes to improve code readability:
        * rename struct pm_qos_request_list to struct pm_qos_request,
        * rename pm_qos_req parameter to req in internal code,
          consistenly use req in the API parameters,
        * update the in-kernel API callers to the new parameters names,
        * rename of fields names (requests, list, node, constraints)
      Signed-off-by: NJean Pihet <j-pihet@ti.com>
      Acked-by: Nmarkgross <markgross@thegnar.org>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      cc749986
    • J
      PM QoS: Move and rename the implementation files · e8db0be1
      Jean Pihet 提交于
      The PM QoS implementation files are better named
      kernel/power/qos.c and include/linux/pm_qos.h.
      
      The PM QoS support is compiled under the CONFIG_PM option.
      Signed-off-by: NJean Pihet <j-pihet@ti.com>
      Acked-by: Nmarkgross <markgross@thegnar.org>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      e8db0be1
    • R
      PM: Move clock-related definitions and headers to separate file · b5e8d269
      Rafael J. Wysocki 提交于
      Since the PM clock management code in drivers/base/power/clock_ops.c
      is used for both runtime PM and system suspend/hibernation, the
      definitions of data structures and headers related to it should not
      be located in include/linux/pm_rumtime.h.  Move them to a separate
      header file.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      b5e8d269
    • R
      PM / Domains: Use power.sybsys_data to reduce overhead · 4605ab65
      Rafael J. Wysocki 提交于
      Currently pm_genpd_runtime_resume() has to walk the list of devices
      from the device's PM domain to find the corresponding device list
      object containing the need_restore field to check if the driver's
      .runtime_resume() callback should be executed for the device.
      This is suboptimal and can be simplified by using power.sybsys_data
      to store device information used by the generic PM domains code.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      4605ab65