1. 04 8月, 2011 2 次提交
  2. 26 7月, 2011 2 次提交
  3. 23 6月, 2011 1 次提交
    • R
      Fix CPU spinlock lockups on secondary CPU bringup · 1b19ca9f
      Russell King 提交于
      Secondary CPU bringup typically calls calibrate_delay() during its
      initialization.  However, calibrate_delay() modifies a global variable
      (loops_per_jiffy) used for udelay() and __delay().
      
      A side effect of 71c696b1 ("calibrate: extract fall-back calculation
      into own helper") introduced in the 2.6.39 merge window means that we
      end up with a substantial period where loops_per_jiffy is zero.  This
      causes the spinlock debugging code to malfunction:
      
      	u64 loops = loops_per_jiffy * HZ;
      	for (;;) {
      		for (i = 0; i < loops; i++) {
      			if (arch_spin_trylock(&lock->raw_lock))
      				return;
      			__delay(1);
      		}
      		...
      	}
      
      by never calling arch_spin_trylock() - resulting in the CPU locking
      up in an infinite loop inside __spin_lock_debug().
      
      Work around this by only writing to loops_per_jiffy only once we have
      completed all the calibration decisions.
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Cc: <stable@kernel.org> (2.6.39-stable)
      --
      Better solutions (such as omitting the calibration for secondary CPUs,
      or arranging for calibrate_delay() to return the LPJ value and leave
      it to the caller to decide where to store it) are a possibility, but
      would be much more invasive into each architecture.
      
      I think this is the best solution for -rc and stable, but it should be
      revisited for the next merge window.
      
       init/calibrate.c |   14 ++++++++------
       1 files changed, 8 insertions(+), 6 deletions(-)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b19ca9f
  4. 17 6月, 2011 1 次提交
  5. 16 6月, 2011 3 次提交
  6. 09 6月, 2011 2 次提交
  7. 07 6月, 2011 1 次提交
  8. 30 5月, 2011 1 次提交
    • L
      mm: Fix boot crash in mm_alloc() · 6345d24d
      Linus Torvalds 提交于
      Thomas Gleixner reports that we now have a boot crash triggered by
      CONFIG_CPUMASK_OFFSTACK=y:
      
          BUG: unable to handle kernel NULL pointer dereference at   (null)
          IP: [<c11ae035>] find_next_bit+0x55/0xb0
          Call Trace:
           [<c11addda>] cpumask_any_but+0x2a/0x70
           [<c102396b>] flush_tlb_mm+0x2b/0x80
           [<c1022705>] pud_populate+0x35/0x50
           [<c10227ba>] pgd_alloc+0x9a/0xf0
           [<c103a3fc>] mm_init+0xec/0x120
           [<c103a7a3>] mm_alloc+0x53/0xd0
      
      which was introduced by commit de03c72c ("mm: convert
      mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
      mm_init() vs mm_init_cpumask
      
      Thomas wrote a patch to just fix the ordering of initialization, but I
      hate the new double allocation in the fork path, so I ended up instead
      doing some more radical surgery to clean it all up.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6345d24d
  9. 27 5月, 2011 1 次提交
  10. 25 5月, 2011 3 次提交
    • M
      printk: allocate kernel log buffer earlier · 162a7e75
      Mike Travis 提交于
      On larger systems, because of the numerous ACPI, Bootmem and EFI messages,
      the static log buffer overflows before the larger one specified by the
      log_buf_len param is allocated.  Minimize the overflow by allocating the
      new log buffer as soon as possible.
      
      On kernels without memblock, a later call to setup_log_buf from
      kernel/init.c is the fallback.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix CONFIG_PRINTK=n build]
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      162a7e75
    • A
      init/calibrate.c: fix for critical bogoMIPS intermittent calculation failure · d2b46313
      Andrew Worsley 提交于
      A fix to the TSC (Time Stamp Counter) based bogoMIPS calculation used on
      secondary CPUs which has two faults:
      
      1: Not handling wrapping of the lower 32 bits of the TSC counter on
         32bit kernel - perhaps TSC is not reset by a warm reset?
      
      2: TSC and Jiffies are no incrementing together properly.  Either
         jiffies increment too quickly or Time Stamp Counter isn't incremented
         in during an SMI but the real time clock is and jiffies are
         incremented.
      
      Case 1 can result in a factor of 16 too large a value which makes udelay()
      values too small and can cause mysterious driver errors.  Case 2 appears
      to give smaller 10-15% errors after averaging but enough to cause
      occasional failures on my own board
      
      I have tested this code on my own branch and attach patch suitable for
      current kernel code.  See below for examples of the failures and how the
      fix handles these situations now.
      
      I reported this issue earlier here:
           Intermittent problem with BogoMIPs calculation on Intel AP CPUs -
      http://marc.info/?l=linux-kernel&m=129947246316875&w=4
      
      I suspect this issue has been seen by others but as it is intermittent and
      bogoMIPS for secondary CPUs are no longer printed out it might have been
      difficult to identify this as the cause.  Perhaps these unresolved issues,
      although quite old, might be relevant as possibly this fault has been
      around for a while.  In particular Case 1 may only be relevant to 32bit
      kernels on newer HW (most people run 64bit kernels?).  Case 2 is less
      dramatic since the earlier fix in this area and also intermittent.
      
         Re: bogomips discrepancy on Intel Core2 Quad CPU -
      http://marc.info/?l=linux-kernel&m=118929277524298&w=4
         slow system and bogus bogomips  -
      http://marc.info/?l=linux-kernel&m=116791286716107&w=4
         Re: Re: [RFC-PATCH] clocksource: update lpj if clocksource has -
      http://marc.info/?l=linux-kernel&m=128952775819467&w=4
      
      This issue is masked a little by commit feae3203 ("timers, init:
      Limit the number of per cpu calibration bootup messages") which only
      prints out the first bogoMIPS value making it much harder to notice other
      values differing.  Perhaps it should be changed to only suppress them when
      they are similar values?
      
      Here are some outputs showing faults occurring and the new code handling
      them properly.  See my earlier message for examples of the original
      failure.
      
          Case 1:   A Time Stamp Counter wrap:
      ...
      Calibrating delay loop (skipped), value calculated using timer
      frequency.. 6332.70 BogoMIPS (lpj=31663540)
      ....
      calibrate_delay_direct() timer_rate_max=31666493
      timer_rate_min=31666151 pre_start=4170369255 pre_end=4202035539
      calibrate_delay_direct() timer_rate_max=2425955274
      timer_rate_min=2425954941 pre_start=4265368533 pre_end=2396356387
      calibrate_delay_direct() ignoring timer_rate as we had a TSC wrap
      around start=4265368581 >=post_end=2396356511
      calibrate_delay_direct() timer_rate_max=31666274
      timer_rate_min=31665942 pre_start=2440373374 pre_end=2472039515
      calibrate_delay_direct() timer_rate_max=31666492
      timer_rate_min=31666160 pre_start=2535372139 pre_end=2567038422
      calibrate_delay_direct() timer_rate_max=31666455
      timer_rate_min=31666207 pre_start=2630371084 pre_end=2662037415
      Calibrating delay using timer specific routine.. 6333.28 BogoMIPS (lpj=31666428)
      Total of 2 processors activated (12665.99 BogoMIPS).
      ....
      
          Case 2:  Some thing (presumably the SMM interrupt?) causing the
      very low increase in TSC counter for the DELAY_CALIBRATION_TICKS
      increase in jiffies
      ...
      Calibrating delay loop (skipped), value calculated using timer
      frequency.. 6333.25 BogoMIPS (lpj=31666270)
      ...
      calibrate_delay_direct() timer_rate_max=31666483
      timer_rate_min=31666074 pre_start=4199536526 pre_end=4231202809
      calibrate_delay_direct() timer_rate_max=864348 timer_rate_min=864016
      pre_start=2405343672 pre_end=2406207897
      calibrate_delay_direct() timer_rate_max=31666483
      timer_rate_min=31666179 pre_start=2469540464 pre_end=2501206823
      calibrate_delay_direct() timer_rate_max=31666511
      timer_rate_min=31666122 pre_start=2564539400 pre_end=2596205712
      calibrate_delay_direct() timer_rate_max=31666084
      timer_rate_min=31665685 pre_start=2659538782 pre_end=2691204657
      calibrate_delay_direct() dropping min bogoMips estimate 1 = 864348
      Calibrating delay using timer specific routine.. 6333.27 BogoMIPS (lpj=31666390)
      Total of 2 processors activated (12666.53 BogoMIPS).
      ...
      
      After 70 boots I saw 2 variations <1% slip through
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix straggly printk mess]
      Signed-off-by: NAndrew Worsley <amworsley@gmail.com>
      Reviewed-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2b46313
    • K
      mm: convert mm->cpu_vm_cpumask into cpumask_var_t · de03c72c
      KOSAKI Motohiro 提交于
      cpumask_t is very big struct and cpu_vm_mask is placed wrong position.
      It might lead to reduce cache hit ratio.
      
      This patch has two change.
      1) Move the place of cpumask into last of mm_struct. Because usually cpumask
         is accessed only front bits when the system has cpu-hotplug capability
      2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory
         footprint if cpumask_size() will use nr_cpumask_bits properly in future.
      
      In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var.
      It may help to detect out of tree cpu_vm_mask users.
      
      This patch has no functional change.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de03c72c
  11. 23 5月, 2011 1 次提交
    • L
      Give up on pushing CC_OPTIMIZE_FOR_SIZE · 281dc5c5
      Linus Torvalds 提交于
      I still happen to believe that I$ miss costs are a major thing, but
      sadly, -Os doesn't seem to be the solution.  With or without it, gcc
      will miss some obvious code size improvements, and with it enabled gcc
      will sometimes make choices that aren't good even with high I$ miss
      ratios.
      
      For example, with -Os, gcc on x86 will turn a 20-byte constant memcpy
      into a "rep movsl".  While I sincerely hope that x86 CPU's will some day
      do a good job at that, they certainly don't do it yet, and the cost is
      higher than a L1 I$ miss would be.
      
      Some day I hope we can re-enable this.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      281dc5c5
  12. 21 5月, 2011 1 次提交
  13. 20 5月, 2011 1 次提交
  14. 11 5月, 2011 1 次提交
  15. 06 5月, 2011 1 次提交
  16. 27 4月, 2011 1 次提交
    • R
      init/Kconfig: fix EXPERT menu list · 6befe5f6
      Randy Dunlap 提交于
      The EXPERT menu list was recently broken by the insertion of a
      kconfig symbol (EMBEDDED) at the beginning of the EXPERT list of
      kconfig items.  Broken by:
      
        commit 6a108a14
        Author: David Rientjes <rientjes@google.com>
        Date:   Thu Jan 20 14:44:16 2011 -0800
          kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
      
      Restore the EXPERT menu list -- don't inject a symbol (EMBEDDED)
      that does not depend on EXPERT into the list.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Peter Foley <pefoley2@verizon.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6befe5f6
  17. 23 4月, 2011 1 次提交
    • J
      [PARISC] slub: fix panic with DISCONTIGMEM · 4a5fa359
      James Bottomley 提交于
      Slub makes assumptions about page_to_nid() which are violated by
      DISCONTIGMEM and !NUMA.  This violation results in a panic because
      page_to_nid() can be non-zero for pages in the discontiguous ranges and
      this leads to a null return by get_node().  The assertion by the
      maintainer is that DISCONTIGMEM should only be allowed when NUMA is also
      defined.  However, at least six architectures: alpha, ia64, m32r, m68k,
      mips, parisc violate this.  The panic is a regression against slab, so
      just mark slub broken in the problem configuration to prevent users
      reporting these panics.
      
      Cc: stable@kernel.org
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      4a5fa359
  18. 15 4月, 2011 2 次提交
    • A
      kbuild: move KALLSYMS_EXTRA_PASS from Kconfig to Makefile · 1e2795a1
      Artem Bityutskiy 提交于
      At the moment we have the CONFIG_KALLSYMS_EXTRA_PASS Kconfig switch,
      which users can enable or disable while configuring the kernel. This
      option is then used by 'make' to determine whether an extra kallsyms
      pass is needed or not.
      
      However, this approach is not nice and confusing, and this patch moves
      CONFIG_KALLSYMS_EXTRA_PASS from Kconfig to Makefile instead. The
      rationale is below.
      
      1. CONFIG_KALLSYMS_EXTRA_PASS is really about the build time, not
         run-time. There is no real need for it to be in Kconfig. It is
         just an additional work-around which should be used only in rare
         cases, when someone breaks kallsyms, so Kbuild/Makefile is much
         better place for this option.
      2. Grepping CONFIG_KALLSYMS_EXTRA_PASS shows that many defconfigs have
         it enabled, probably not because they try to work-around a kallsyms
         bug, but just because the Kconfig help text is confusing and does
         not really make it clear that this option should not be used unless
         except when kallsyms is broken.
      3. And since many people have CONFIG_KALLSYMS_EXTRA_PASS enabled in
         their Kconfig, we do might fail to notice kallsyms bugs in time. E.g.,
         many testers use "make allyesconfig" to test builds, which will enable
         CONFIG_KALLSYMS_EXTRA_PASS and kallsyms breakage will not be noticed.
      
      To address that, this patch:
      
      1. Kills CONFIG_KALLSYMS_EXTRA_PASS
      2. Changes Makefile so that people can use "make KALLSYMS_EXTRA_PASS=1"
         to enable the extra pass if needed. Additionally, they may define
         KALLSYMS_EXTRA_PASS as an environment variable.
      3. By default KALLSYMS_EXTRA_PASS is disabled and if kallsyms has issues,
         "make" should print a warning and suggest using KALLSYMS_EXTRA_PASS
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      [mmarek: Removed make help text, is not necessary]
      Signed-off-by: NMichal Marek <mmarek@suse.cz>
      1e2795a1
    • A
      Kconfig: improve KALLSYMS_ALL documentation · 71a83ec7
      Artem Bityutskiy 提交于
      Dumb users like myself are not able to grasp from the existing KALLSYMS_ALL
      documentation that this option is not what they need. Improve the help
      message and make it clearer that KALLSYMS is enough in the majority of
      use cases, and KALLSYMS_ALL should really be used very rarely.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Signed-off-by: NMichal Marek <mmarek@suse.cz>
      71a83ec7
  19. 14 4月, 2011 1 次提交
  20. 31 3月, 2011 1 次提交
  21. 24 3月, 2011 2 次提交
    • S
      userns: add a user_namespace as creator/owner of uts_namespace · 59607db3
      Serge E. Hallyn 提交于
      The expected course of development for user namespaces targeted
      capabilities is laid out at https://wiki.ubuntu.com/UserNamespace.
      
      Goals:
      
      - Make it safe for an unprivileged user to unshare namespaces.  They
        will be privileged with respect to the new namespace, but this should
        only include resources which the unprivileged user already owns.
      
      - Provide separate limits and accounting for userids in different
        namespaces.
      
      Status:
      
        Currently (as of 2.6.38) you can clone with the CLONE_NEWUSER flag to
        get a new user namespace if you have the CAP_SYS_ADMIN, CAP_SETUID, and
        CAP_SETGID capabilities.  What this gets you is a whole new set of
        userids, meaning that user 500 will have a different 'struct user' in
        your namespace than in other namespaces.  So any accounting information
        stored in struct user will be unique to your namespace.
      
        However, throughout the kernel there are checks which
      
        - simply check for a capability.  Since root in a child namespace
          has all capabilities, this means that a child namespace is not
          constrained.
      
        - simply compare uid1 == uid2.  Since these are the integer uids,
          uid 500 in namespace 1 will be said to be equal to uid 500 in
          namespace 2.
      
        As a result, the lxc implementation at lxc.sf.net does not use user
        namespaces.  This is actually helpful because it leaves us free to
        develop user namespaces in such a way that, for some time, user
        namespaces may be unuseful.
      
      Bugs aside, this patchset is supposed to not at all affect systems which
      are not actively using user namespaces, and only restrict what tasks in
      child user namespace can do.  They begin to limit privilege to a user
      namespace, so that root in a container cannot kill or ptrace tasks in the
      parent user namespace, and can only get world access rights to files.
      Since all files currently belong to the initila user namespace, that means
      that child user namespaces can only get world access rights to *all*
      files.  While this temporarily makes user namespaces bad for system
      containers, it starts to get useful for some sandboxing.
      
      I've run the 'runltplite.sh' with and without this patchset and found no
      difference.
      
      This patch:
      
      copy_process() handles CLONE_NEWUSER before the rest of the namespaces.
      So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS) the new uts namespace
      will have the new user namespace as its owner.  That is what we want,
      since we want root in that new userns to be able to have privilege over
      it.
      
      Changelog:
      	Feb 15: don't set uts_ns->user_ns if we didn't create
      		a new uts_ns.
      	Feb 23: Move extern init_user_ns declaration from
      		init/version.c to utsname.h.
      Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59607db3
    • E
      pid: remove the child_reaper special case in init/main.c · 45a68628
      Eric W. Biederman 提交于
      This patchset is a cleanup and a preparation to unshare the pid namespace.
      These prerequisites prepare for Eric's patchset to give a file descriptor
      to a namespace and join an existing namespace.
      
      This patch:
      
      It turns out that the existing assignment in copy_process of the
      child_reaper can handle the initial assignment of child_reaper we just
      need to generalize the test in kernel/fork.c
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NSerge E. Hallyn <serge@hallyn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45a68628
  22. 23 3月, 2011 6 次提交
    • D
      init: return proper error code in do_mounts_rd() · ea611b26
      Davidlohr Bueso 提交于
      In do_mounts_rd() if memory cannot be allocated, return -ENOMEM.
      Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea611b26
    • P
      calibrate: retry with wider bounds when converge seems to fail · b1b5f65e
      Phil Carmody 提交于
      Systems with unmaskable interrupts such as SMIs may massively
      underestimate loops_per_jiffy, and fail to converge anywhere near the real
      value.  A case seen on x86_64 was an initial estimate of 256<<12, which
      converged to 511<<12 where the real value should have been over 630<<12.
      This admitedly requires bypassing the TSC calibration (lpj_fine), and a
      failure to settle in the direct calibration too, but is physically
      possible.  This failure does not depend on my previous calibration
      optimisation, but by luck is easy to fix with the optimisation in place
      with a trivial retry loop.
      
      In the context of the optimised converging method, as we can no longer
      trust the starting estimate, enlarge the search bounds exponentially so
      that the number of retries is logarithmically bounded.
      
      [akpm@linux-foundation.org: mention x86_64 SMIs in comment]
      Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Tested-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1b5f65e
    • P
      calibrate: home in on correct lpj value more quickly · 191e5688
      Phil Carmody 提交于
      Binary chop with a jiffy-resync on each step to find an upper bound is
      slow, so just race in a tight-ish loop to find an underestimate.
      
      If done with lots of individual steps, sometimes several hundreds of
      iterations would be required, which would impose a significant overhead,
      and make the initial estimate very low.  By taking slowly increasing steps
      there will be less overhead.
      
      E.g.  an x86_64 2.67GHz could have fitted in 613 individual small delays,
      but in reality should have been able to fit in a single delay 644 times
      longer, so underestimated by 31 steps.  To reach the equivalent of 644
      small delays with the accelerating scheme now requires about 130
      iterations, so has <1/4th of the overhead, and can therefore be expected
      to underestimate by only 7 steps.
      
      As now we have a better initial estimate we can binary chop over a smaller
      range.  With the loop overhead in the initial estimate kept low, and the
      step sizes moderate, we won't have under-estimated by much, so chose as
      tight a range as we can.
      Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Tested-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      191e5688
    • P
      calibrate: extract fall-back calculation into own helper · 71c696b1
      Phil Carmody 提交于
      The motivation for this patch series is that currently our OMAP calibrates
      itself using the trial-and-error binary chop fallback that some other
      architectures no longer need to perform.  This is a lengthy process,
      taking 0.2s in an environment where boot time is of great interest.
      
      Patch 2/4 has two optimisations.  Firstly, it replaces the initial
      repeated- doubling to find the relevant power of 2 with a tight loop that
      just does as much as it can in a jiffy.  Secondly, it doesn't binary chop
      over an entire power of 2 range, it choses a much smaller range based on
      how much it squeezed in, and failed to squeeze in, during the first stage.
       Both are significant optimisations, and bring our calibration down from
      23 jiffies to 5, and, in the process, often arrive at a more accurate lpj
      value.
      
      The 'bands' and 'sub-logarithmic' growth may look over-engineered, but
      they only cost a small level of inaccuracy in the initial guess (for all
      architectures) in order to avoid the very large inaccuracies that appeared
      during testing (on x86_64 architectures, and presumably others with less
      metronomic operation).  Note that due to the existence of the TSC and
      other timers, the x86_64 will not typically use this fallback routine, but
      I wanted to code defensively, able to cope with all kinds of processor
      behaviours and kernel command line options.
      
      Patch 3/4 is an additional trap for the nightmare scenario where the
      initial estimate is very inaccurate, possibly due to things like SMIs.
      It simply retries with a larger bound.
      
      Stephen said:
      
      I tried this patch set out on an MSM7630.
      :
      : Before:
      :
      : Calibrating delay loop... 681.57 BogoMIPS (lpj=3407872)
      :
      : After:
      :
      : Calibrating delay loop... 680.75 BogoMIPS (lpj=3403776)
      :
      : But the really good news is calibration time dropped from ~247ms to ~56ms.
      :  Sadly we won't be able to benefit from this should my udelay patches make
      : it into ARM because we would be using calibrate_delay_direct() instead (at
      : least on machines who choose to).  Can we somehow reapply the logic behind
      : this to calibrate_delay_direct()?  That would be even better, but this is
      : definitely a boot time improvement.
      :
      : Or maybe we could just replace calibrate_delay_direct() with this fallback
      : calculation?  If __delay() is a thin wrapper around read_current_timer()
      : it should work just as well (plus patch 3 makes it handle SMIs).  I'll try
      : that out.
      
      This patch:
      
      ... so that it can be modified more clinically.
      
      This is almost entirely cosmetic. The only change to the operation
      is that the global variable is only set once after the estimation is
      completed, rather than taking on all the intermediate values. However,
      there are no readers of that variable, so this change is unimportant.
      Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Tested-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71c696b1
    • A
      smp: move smp setup functions to kernel/smp.c · 34db18a0
      Amerigo Wang 提交于
      Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter
      setup functions from init/main.c to kenrel/smp.c, saves some #ifdef
      CONFIG_SMP.
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Rakib Mullick <rakib.mullick@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34db18a0
    • M
      fs: use appropriate printk priority levels · 80cdc6da
      Mandeep Singh Baines 提交于
      printk()s without a priority level default to KERN_WARNING.  To reduce
      noise at KERN_WARNING, this patch set the priority level appriopriately
      for unleveled printks()s.  This should be useful to folks that look at
      dmesg warnings closely.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80cdc6da
  23. 15 3月, 2011 1 次提交
  24. 05 3月, 2011 1 次提交
    • A
      BKL: That's all, folks · 4ba8216c
      Arnd Bergmann 提交于
      This removes the implementation of the big kernel lock,
      at last. A lot of people have worked on this in the
      past, I so the credit for this patch should be with
      everyone who participated in the hunt.
      
      The names on the Cc list are the people that were the
      most active in this, according to the recorded git
      history, in alphabetical order.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NAlan Cox <alan@linux.intel.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Hendry <andrew.hendry@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Jan Blunck <jblunck@infradead.org>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Oliver Neukum <oliver@neukum.org>
      Cc: Paul Menage <menage@google.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      4ba8216c
  25. 04 3月, 2011 1 次提交
  26. 16 2月, 2011 1 次提交
    • S
      perf: Add cgroup support · e5d1367f
      Stephane Eranian 提交于
      This kernel patch adds the ability to filter monitoring based on
      container groups (cgroups). This is for use in per-cpu mode only.
      
      The cgroup to monitor is passed as a file descriptor in the pid
      argument to the syscall. The file descriptor must be opened to
      the cgroup name in the cgroup filesystem. For instance, if the
      cgroup name is foo and cgroupfs is mounted in /cgroup, then the
      file descriptor is opened to /cgroup/foo. Cgroup mode is
      activated by passing PERF_FLAG_PID_CGROUP in the flags argument
      to the syscall.
      
      For instance to measure in cgroup foo on CPU1 assuming
      cgroupfs is mounted under /cgroup:
      
      struct perf_event_attr attr;
      int cgroup_fd, fd;
      
      cgroup_fd = open("/cgroup/foo", O_RDONLY);
      fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP);
      close(cgroup_fd);
      Signed-off-by: NStephane Eranian <eranian@google.com>
      [ added perf_cgroup_{exit,attach} ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d590250.114ddf0a.689e.4482@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5d1367f