1. 31 10月, 2011 2 次提交
  2. 26 10月, 2011 2 次提交
  3. 24 10月, 2011 1 次提交
  4. 18 10月, 2011 1 次提交
    • P
      cputimer: Cure lock inversion · bcd5cff7
      Peter Zijlstra 提交于
      There's a lock inversion between the cputimer->lock and rq->lock;
      notably the two callchains involved are:
      
       update_rlimit_cpu()
         sighand->siglock
         set_process_cpu_timer()
           cpu_timer_sample_group()
             thread_group_cputimer()
               cputimer->lock
               thread_group_cputime()
                 task_sched_runtime()
                   ->pi_lock
                   rq->lock
      
       scheduler_tick()
         rq->lock
         task_tick_fair()
           update_curr()
             account_group_exec()
               cputimer->lock
      
      Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
      the second one is keeping up-to-date.
      
      This problem was introduced by e8abccb7 ("posix-cpu-timers: Cure
      SMP accounting oddities").
      
      Cure the problem by removing the cputimer->lock and rq->lock nesting,
      this leaves concurrent enablers doing duplicate work, but the time
      wasted should be on the same order otherwise wasted spinning on the
      lock and the greater-than assignment filter should ensure we preserve
      monotonicity.
      Reported-by: NDave Jones <davej@redhat.com>
      Reported-by: NSimon Kirby <sim@hostway.ca>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/1318928713.21167.4.camel@twinsSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bcd5cff7
  5. 17 10月, 2011 14 次提交
    • L
      Avoid using variable-length arrays in kernel/sys.c · a84a79e4
      Linus Torvalds 提交于
      The size is always valid, but variable-length arrays generate worse code
      for no good reason (unless the function happens to be inlined and the
      compiler sees the length for the simple constant it is).
      
      Also, there seems to be some code generation problem on POWER, where
      Henrik Bakken reports that register r28 can get corrupted under some
      subtle circumstances (interrupt happening at the wrong time?).  That all
      indicates some seriously broken compiler issues, but since variable
      length arrays are bad regardless, there's little point in trying to
      chase it down.
      
      "Just don't do that, then".
      Reported-by: NHenrik Grindal Bakken <henribak@cisco.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a84a79e4
    • I
      genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlier · 9bab0b7f
      Ian Campbell 提交于
      This adds a mechanism to resume selected IRQs during syscore_resume
      instead of dpm_resume_noirq.
      
      Under Xen we need to resume IRQs associated with IPIs early enough
      that the resched IPI is unmasked and we can therefore schedule
      ourselves out of the stop_machine where the suspend/resume takes
      place.
      
      This issue was introduced by 676dc3cf "xen: Use IRQF_FORCE_RESUME".
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
      Cc: xen-devel <xen-devel@lists.xensource.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk
      Cc: stable@kernel.org (at least to 2.6.32.y)
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9bab0b7f
    • B
      PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image · 081a9d04
      Bojan Smojver 提交于
      Use threads for LZO compression/decompression on hibernate/thaw.
      Improve buffering on hibernate/thaw.
      Calculate/verify CRC32 of the image pages on hibernate/thaw.
      
      In my testing, this improved write/read speed by a factor of about two.
      Signed-off-by: NBojan Smojver <bojan@rexursive.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      081a9d04
    • B
      PM / Hibernate: Do not initialize static and extern variables to 0 · d231ff1a
      Barry Song 提交于
      Static and extern variables in kernel/power/hibernate.c need not be
      initialized to 0 explicitly, so remove those initializations.
      
      [rjw: Modified subject, added changelog.]
      Signed-off-by: NBarry Song <Baohua.Song@csr.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      d231ff1a
    • J
      PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too · 27920651
      Jeff Layton 提交于
      TASK_KILLABLE is often used to put tasks to sleep for quite some time.
      One of the most common uses is to put tasks to sleep while waiting for
      replies from a server on a networked filesystem (such as CIFS or NFS).
      
      Unfortunately, fake_signal_wake_up does not currently wake up tasks
      that are sleeping in TASK_KILLABLE state. This means that even if the
      code were in place to allow them to freeze while in this sleep, it
      wouldn't work anyway.
      
      This patch changes this function to wake tasks in this state as well.
      This should be harmless -- if the code doing the sleeping doesn't have
      handling to deal with freezer events, it should just go back to sleep.
      If it does, then this will allow that code to do the right thing.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      27920651
    • B
      PM / Hibernate: Add resumedelay kernel param in addition to resumewait · f126f733
      Barry Song 提交于
      Patch "PM / Hibernate: Add resumewait param to support MMC-like
      devices as resume file" added the resumewait kernel command line
      option.  The present patch adds resumedelay so that
      resumewait/delay were analogous to rootwait/delay.
      
      [rjw: Modified the subject and changelog slightly.]
      Signed-off-by: NBarry Song <baohua.song@csr.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      f126f733
    • B
      PM / Hibernate: Add resumewait param to support MMC-like devices as resume file · 6f8d7022
      Barry Song 提交于
      Some devices like MMC are async detected very slow. For example,
      drivers/mmc/host/sdhci.c launches a 200ms delayed work to detect
      MMC partitions then add disk.
      
      We have wait_for_device_probe() and scsi_complete_async_scans()
      before calling swsusp_check(), but it is not enough to wait for MMC.
      
      This patch adds resumewait kernel param just like rootwait so
      that we have enough time to wait until MMC is ready. The difference is
      that we wait for resume partition whereas rootwait waits for rootfs
      partition (which may be on a different device).
      
      This patch will make hibernation support many embedded products
      without SCSI devices, but with devices like MMC.
      
      [rjw: Modified the changelog slightly.]
      Signed-off-by: NBarry Song <Baohua.Song@csr.com>
      Reviewed-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      6f8d7022
    • B
      PM / Hibernate: Fix typo in a kerneldoc comment · 21e82808
      Barry Song 提交于
      Fix a typo in a function name in the kerneldoc comment next to
      resume_target_kernel().
      
      [rjw: Changed the subject slightly, added the changelog.]
      Signed-off-by: NBarry Song <Baohua.Song@csr.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      21e82808
    • R
      PM / Hibernate: Freeze kernel threads after preallocating memory · 2aede851
      Rafael J. Wysocki 提交于
      There is a problem with the current ordering of hibernate code which
      leads to deadlocks in some filesystems' memory shrinkers.  Namely,
      some filesystems use freezable kernel threads that are inactive when
      the hibernate memory preallocation is carried out.  Those same
      filesystems use memory shrinkers that may be triggered by the
      hibernate memory preallocation.  If those memory shrinkers wait for
      the frozen kernel threads, the hibernate process deadlocks (this
      happens with XFS, for one example).
      
      Apparently, it is not technically viable to redesign the filesystems
      in question to avoid the situation described above, so the only
      possible solution of this issue is to defer the freezing of kernel
      threads until the hibernate memory preallocation is done, which is
      implemented by this change.
      
      Unfortunately, this requires the memory preallocation to be done
      before the "prepare" stage of device freeze, so after this change the
      only way drivers can allocate additional memory for their freeze
      routines in a clean way is to use PM notifiers.
      Reported-by: NChristoph <cr2005@u-club.de>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      2aede851
    • H
      PM / VT: Cleanup #if defined uglyness and fix compile error · 37cce26b
      H Hartley Sweeten 提交于
      Introduce the config option CONFIG_VT_CONSOLE_SLEEP in order to cleanup
      the #if defined ugliness for the vt suspend support functions. Note that
      CONFIG_VT_CONSOLE is already dependant on CONFIG_VT.
      
      The function pm_set_vt_switch is actually dependant on CONFIG_VT and not
      CONFIG_PM_SLEEP. This fixes a compile error when CONFIG_PM_SLEEP is
      not set:
      
      drivers/tty/vt/vt_ioctl.c:1794: error: redefinition of 'pm_set_vt_switch'
      include/linux/suspend.h:17: error: previous definition of 'pm_set_vt_switch' was here
      
      Also, remove the incorrect path from the comment in console.c.
      
      [rjw: Replaced #if defined() with #ifdef in suspend.h.]
      Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      37cce26b
    • D
      PM / Suspend: Off by one in pm_suspend() · 528f7ce6
      Dan Carpenter 提交于
      In enter_state() we use "state" as an offset for the pm_states[]
      array.  The pm_states[] array only has PM_SUSPEND_MAX elements so
      this test is off by one.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: stable@kernel.org
      528f7ce6
    • M
      PM / Hibernate: Include storage keys in hibernation image on s390 · 85055dd8
      Martin Schwidefsky 提交于
      For s390 there is one additional byte associated with each page,
      the storage key. This byte contains the referenced and changed
      bits and needs to be included into the hibernation image.
      If the storage keys are not restored to their previous state all
      original pages would appear to be dirty. This can cause
      inconsistencies e.g. with read-only filesystems.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      85055dd8
    • R
      PM: Fix build issue in main.c for CONFIG_PM_SLEEP unset · ca123102
      Rafael J. Wysocki 提交于
      Suspend statistics should depend on CONFIG_PM_SLEEP, so make that
      happen.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      ca123102
    • S
      PM / Suspend: Add statistics debugfs file for suspend to RAM · 2a77c46d
      ShuoX Liu 提交于
      Record S3 failure time about each reason and the latest two failed
      devices' names in S3 progress.
      We can check it through 'suspend_stats' entry in debugfs.
      
      The motivation of the patch:
      
      We are enabling power features on Medfield. Comparing with PC/notebook,
      a mobile enters/exits suspend-2-ram (we call it s3 on Medfield) far
      more frequently. If it can't enter suspend-2-ram in time, the power
      might be used up soon.
      
      We often find sometimes, a device suspend fails. Then, system retries
      s3 over and over again. As display is off, testers and developers
      don't know what happens.
      
      Some testers and developers complain they don't know if system
      tries suspend-2-ram, and what device fails to suspend. They need
      such info for a quick check. The patch adds suspend_stats under
      debugfs for users to check suspend to RAM statistics quickly.
      
      If not using this patch, we have other methods to get info about
      what device fails. One is to turn on  CONFIG_PM_DEBUG, but users
      would get too much info and testers need recompile the system.
      
      In addition, dynamic debug is another good tool to dump debug info.
      But it still doesn't match our utilization scenario closely.
      1) user need write a user space parser to process the syslog output;
      2) Our testing scenario is we leave the mobile for at least hours.
         Then, check its status. No serial console available during the
         testing. One is because console would be suspended, and the other
         is serial console connecting with spi or HSU devices would consume
         power. These devices are powered off at suspend-2-ram.
      Signed-off-by: NShuoX Liu <shuox.liu@intel.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      2a77c46d
  6. 14 10月, 2011 2 次提交
  7. 11 10月, 2011 3 次提交
    • S
      tracing: Do not allocate buffer for trace_marker · d696b58c
      Steven Rostedt 提交于
      When doing intense tracing, the kmalloc inside trace_marker can
      introduce side effects to what is being traced.
      
      As trace_marker() is used by userspace to inject data into the
      kernel ring buffer, it needs to do so with the least amount
      of intrusion to the operations of the kernel or the user space
      application.
      
      As the ring buffer is designed to write directly into the buffer
      without the need to make a temporary buffer, and userspace already
      went through the hassle of knowing how big the write will be,
      we can simply pin the userspace pages and write the data directly
      into the buffer. This improves the impact of tracing via trace_marker
      tremendously!
      
      Thanks to Peter Zijlstra and Thomas Gleixner for pointing out the
      use of get_user_pages_fast() and kmap_atomic().
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d696b58c
    • S
      tracing: Warn on output if the function tracer was found corrupted · e0a413f6
      Steven Rostedt 提交于
      As the function tracer is very intrusive, lots of self checks are
      performed on the tracer and if something is found to be strange
      it will shut itself down keeping it from corrupting the rest of the
      kernel. This shutdown may still allow functions to be traced, as the
      tracing only stops new modifications from happening. Trying to stop
      the function tracer itself can cause more harm as it requires code
      modification.
      
      Although a WARN_ON() is executed, a user may not notice it. To help
      the user see that something isn't right with the tracing of the system
      a big warning is added to the output of the tracer that lets the user
      know that their data may be incomplete.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e0a413f6
    • M
      ftrace/kprobes: Fix not to delete probes if in use · 02ca1521
      Masami Hiramatsu 提交于
      Fix kprobe-tracer not to delete a probe if the probe is in use.
      In that case, delete operation will return -EBUSY.
      
      This bug can cause a kernel panic if enabled probes are deleted
      during perf record.
      
      (Add some probes on functions)
      sh-4.2# perf probe --del probe:\*
      sh-4.2# exit
      (kernel panic)
      
      This is originally reported on the fedora bugzilla:
      
       https://bugzilla.redhat.com/show_bug.cgi?id=742383
      
      I've also checked that this problem doesn't happen on
      tracepoints when module removing because perf event
      locks target module.
      
      $ sudo ./perf record -e xfs:\* -aR sh
      sh-4.2# rmmod xfs
      ERROR: Module xfs is in use
      sh-4.2# exit
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.203 MB perf.data (~8862 samples) ]
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: stable@kernel.org
      Link: http://lkml.kernel.org/r/20111004104438.14591.6553.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      02ca1521
  8. 06 10月, 2011 6 次提交
  9. 05 10月, 2011 2 次提交
  10. 04 10月, 2011 5 次提交
  11. 03 10月, 2011 2 次提交
    • M
      genirq: percpu: allow interrupt type to be set at enable time · 1e7c5fd2
      Marc Zyngier 提交于
      As request_percpu_irq() doesn't allow for a percpu interrupt to have
      its type configured (it is generally impossible to configure it on all
      CPUs at once), add a 'type' argument to enable_percpu_irq().
      
      This allows some low-level, board specific init code to be switched to
      a generic API.
      
      [ tglx: Added WARN_ON argument ]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1e7c5fd2
    • M
      genirq: Add support for per-cpu dev_id interrupts · 31d9d9b6
      Marc Zyngier 提交于
      The ARM GIC interrupt controller offers per CPU interrupts (PPIs),
      which are usually used to connect local timers to each core. Each CPU
      has its own private interface to the GIC, and only sees the PPIs that
      are directly connect to it.
      
      While these timers are separate devices and have a separate interrupt
      line to a core, they all use the same IRQ number.
      
      For these devices, request_irq() is not the right API as it assumes
      that an IRQ number is visible by a number of CPUs (through the
      affinity setting), but makes it very awkward to express that an IRQ
      number can be handled by all CPUs, and yet be a different interrupt
      line on each CPU, requiring a different dev_id cookie to be passed
      back to the handler.
      
      The *_percpu_irq() functions is designed to overcome these
      limitations, by providing a per-cpu dev_id vector:
      
      int request_percpu_irq(unsigned int irq, irq_handler_t handler,
      		   const char *devname, void __percpu *percpu_dev_id);
      void free_percpu_irq(unsigned int, void __percpu *);
      int setup_percpu_irq(unsigned int irq, struct irqaction *new);
      void remove_percpu_irq(unsigned int irq, struct irqaction *act);
      void enable_percpu_irq(unsigned int irq);
      void disable_percpu_irq(unsigned int irq);
      
      The API has a number of limitations:
      - no interrupt sharing
      - no threading
      - common handler across all the CPUs
      
      Once the interrupt is requested using setup_percpu_irq() or
      request_percpu_irq(), it must be enabled by each core that wishes its
      local interrupt to be delivered.
      
      Based on an initial patch by Thomas Gleixner.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1316793788-14500-2-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      31d9d9b6