1. 26 3月, 2013 3 次提交
  2. 13 3月, 2013 1 次提交
    • P
      rcu: Remove restrictions on no-CBs CPUs · 34ed6246
      Paul E. McKenney 提交于
      Currently, CPU 0 is constrained to not be a no-CBs CPU, and furthermore
      at least one no-CBs CPU must remain online at any given time.  These
      restrictions are problematic in some situations, such as cases where
      all CPUs must run a real-time workload that needs to be insulated from
      OS jitter and latencies due to RCU callback invocation.  This commit
      therefore provides no-CBs CPUs a (very crude and energy-inefficient)
      way to start and to wait for grace periods independently of the normal
      RCU callback mechanisms.  This approach allows any or all of the CPUs to
      be designated as no-CBs CPUs, and allows any proper subset of the CPUs
      (whether no-CBs CPUs or not) to be offlined.
      
      This commit also provides a fix for a locking bug spotted by Xie
      ChanglongX <changlongx.xie@intel.com>.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      34ed6246
  3. 16 2月, 2013 1 次提交
  4. 13 2月, 2013 8 次提交
  5. 12 2月, 2013 2 次提交
  6. 08 2月, 2013 3 次提交
  7. 07 2月, 2013 1 次提交
  8. 31 1月, 2013 1 次提交
    • M
      efi: Make 'efi_enabled' a function to query EFI facilities · 83e68189
      Matt Fleming 提交于
      Originally 'efi_enabled' indicated whether a kernel was booted from
      EFI firmware. Over time its semantics have changed, and it now
      indicates whether or not we are booted on an EFI machine with
      bit-native firmware, e.g. 64-bit kernel with 64-bit firmware.
      
      The immediate motivation for this patch is the bug report at,
      
          https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557
      
      which details how running a platform driver on an EFI machine that is
      designed to run under BIOS can cause the machine to become
      bricked. Also, the following report,
      
          https://bugzilla.kernel.org/show_bug.cgi?id=47121
      
      details how running said driver can also cause Machine Check
      Exceptions. Drivers need a new means of detecting whether they're
      running on an EFI machine, as sadly the expression,
      
          if (!efi_enabled)
      
      hasn't been a sufficient condition for quite some time.
      
      Users actually want to query 'efi_enabled' for different reasons -
      what they really want access to is the list of available EFI
      facilities.
      
      For instance, the x86 reboot code needs to know whether it can invoke
      the ResetSystem() function provided by the EFI runtime services, while
      the ACPI OSL code wants to know whether the EFI config tables were
      mapped successfully. There are also checks in some of the platform
      driver code to simply see if they're running on an EFI machine (which
      would make it a bad idea to do BIOS-y things).
      
      This patch is a prereq for the samsung-laptop fix patch.
      
      Cc: David Airlie <airlied@linux.ie>
      Cc: Corentin Chary <corentincj@iksaif.net>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Steve Langasek <steve.langasek@canonical.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      83e68189
  9. 29 1月, 2013 2 次提交
  10. 28 1月, 2013 1 次提交
    • F
      cputime: Generic on-demand virtual cputime accounting · abf917cd
      Frederic Weisbecker 提交于
      If we want to stop the tick further idle, we need to be
      able to account the cputime without using the tick.
      
      Virtual based cputime accounting solves that problem by
      hooking into kernel/user boundaries.
      
      However implementing CONFIG_VIRT_CPU_ACCOUNTING require
      low level hooks and involves more overhead. But we already
      have a generic context tracking subsystem that is required
      for RCU needs by archs which plan to shut down the tick
      outside idle.
      
      This patch implements a generic virtual based cputime
      accounting that relies on these generic kernel/user hooks.
      
      There are some upsides of doing this:
      
      - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
      if context tracking is already built (already necessary for RCU in full
      tickless mode).
      
      - We can rely on the generic context tracking subsystem to dynamically
      (de)activate the hooks, so that we can switch anytime between virtual
      and tick based accounting. This way we don't have the overhead
      of the virtual accounting when the tick is running periodically.
      
      And one downside:
      
      - There is probably more overhead than a native virtual based cputime
      accounting. But this relies on hooks that are already set anyway.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      abf917cd
  11. 27 1月, 2013 1 次提交
  12. 25 1月, 2013 2 次提交
  13. 20 1月, 2013 1 次提交
  14. 19 1月, 2013 1 次提交
    • T
      init, block: try to load default elevator module early during boot · bb813f4c
      Tejun Heo 提交于
      This patch adds default module loading and uses it to load the default
      block elevator.  During boot, it's called right after initramfs or
      initrd is made available and right before control is passed to
      userland.  This ensures that as long as the modules are available in
      the usual places in initramfs, initrd or the root filesystem, the
      default modules are loaded as soon as possible.
      
      This will replace the on-demand elevator module loading from elevator
      init path.
      
      v2: Fixed build breakage when !CONFIG_BLOCK.  Reported by kbuild test
          robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alex Riesen <raa.lkml@gmail.com>
      Cc: Fengguang We <fengguang.wu@intel.com>
      bb813f4c
  15. 17 1月, 2013 1 次提交
    • K
      Tell the world we gave up on pushing CC_OPTIMIZE_FOR_SIZE · 3a55fb0d
      Kirill Smelkov 提交于
      In commit 281dc5c5 ("Give up on pushing CC_OPTIMIZE_FOR_SIZE") we
      already changed the actual default value, but the help-text still
      suggested 'y'. Fix the help text too, for all the same reasons.
      
      Sadly, -Os keeps on generating some very suboptimal code for certain
      cases, to the point where any I$ miss upside is swamped by the downside.
      The main ones are:
      
       - using "rep movsb" for memcpy, even on CPU's where that is
         horrendously bad for performance.
      
       - not honoring branch prediction information, so any I$ footprint you
         win from smaller code, you lose from less code density in the I$.
      
       - using divide instructions when that is very expensive.
      Signed-off-by: NKirill Smelkov <kirr@mns.spb.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a55fb0d
  16. 12 1月, 2013 2 次提交
    • K
      init: remove depends on CONFIG_EXPERIMENTAL · 19c92399
      Kees Cook 提交于
      The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
      while now and is almost always enabled by default. As agreed during the
      Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
      
      CC: "Eric W. Biederman" <ebiederm@xmission.com>
      CC: Serge Hallyn <serge.hallyn@canonical.com>
      CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@ubuntu.com>
      19c92399
    • K
      make CONFIG_EXPERIMENTAL invisible and default · 5a958db3
      Kees Cook 提交于
      This config item has not carried much meaning for a while now and is
      almost always enabled by default (especially in distro builds). As agreed
      during the Linux kernel summit, it should be removed. As a first step,
      remove it from being listed, and default it to on. Once it has been
      removed from all subsystem Kconfigs, it will be dropped entirely.
      
      For items that really are experimental, maintainers should use "default
      n", optionally include "(EXPERIMENTAL)" in the title, and add language to
      the help text indicating why the item should be considered experimental.
      
      For items that are dangerously experimental, the maintainer is encouraged
      to follow the above title recommendation, add stronger language to the
      help text, and optionally use (depending on the extent of the danger,
      from least to most dangerous): printk(), add_taint(TAINT_WARN),
      add_taint(TAINT_CRAP), WARN_ON(1), and CONFIG_BROKEN.
      
      CC: Greg KH <gregkh@linuxfoundation.org>
      CC: "Eric W. Biederman" <ebiederm@xmission.com>
      CC: Serge Hallyn <serge.hallyn@canonical.com>
      CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5a958db3
  17. 10 1月, 2013 1 次提交
  18. 26 12月, 2012 1 次提交
    • V
      Ensure that kernel_init_freeable() is not inlined into non __init code · f80b0c90
      Vineet Gupta 提交于
      Commit d6b21238 "make sure that we always have a return path from
      kernel_execve()" reshuffled kernel_init()/init_post() to ensure that
      kernel_execve() has a caller to return to.
      
      It removed __init annotation for kernel_init() and introduced/calls a
      new routine kernel_init_freeable(). Latter however is inlined by any
      reasonable compiler (ARC gcc 4.4 in this case), causing slight code
      bloat.
      
      This patch forces kernel_init_freeable() as noinline reducing the .text
      
      bloat-o-meter vmlinux vmlinux_new
      add/remove: 1/0 grow/shrink: 0/1 up/down: 374/-334 (40)
      function                        old     new   delta
      kernel_init_freeable              -     374    +374 (.init.text)
      kernel_init                     628     294    -334 (.text)
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f80b0c90
  19. 20 12月, 2012 1 次提交
  20. 19 12月, 2012 2 次提交
    • G
      memcg: infrastructure to match an allocation to the right cache · d7f25f8a
      Glauber Costa 提交于
      The page allocator is able to bind a page to a memcg when it is
      allocated.  But for the caches, we'd like to have as many objects as
      possible in a page belonging to the same cache.
      
      This is done in this patch by calling memcg_kmem_get_cache in the
      beginning of every allocation function.  This function is patched out by
      static branches when kernel memory controller is not being used.
      
      It assumes that the task allocating, which determines the memcg in the
      page allocator, belongs to the same cgroup throughout the whole process.
      Misaccounting can happen if the task calls memcg_kmem_get_cache() while
      belonging to a cgroup, and later on changes.  This is considered
      acceptable, and should only happen upon task migration.
      
      Before the cache is created by the memcg core, there is also a possible
      imbalance: the task belongs to a memcg, but the cache being allocated from
      is the global cache, since the child cache is not yet guaranteed to be
      ready.  This case is also fine, since in this case the GFP_KMEMCG will not
      be passed and the page allocator will not attempt any cgroup accounting.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7f25f8a
    • G
      memcg: kmem accounting basic infrastructure · 510fc4e1
      Glauber Costa 提交于
      Add the basic infrastructure for the accounting of kernel memory.  To
      control that, the following files are created:
      
       * memory.kmem.usage_in_bytes
       * memory.kmem.limit_in_bytes
       * memory.kmem.failcnt
       * memory.kmem.max_usage_in_bytes
      
      They have the same meaning of their user memory counterparts.  They
      reflect the state of the "kmem" res_counter.
      
      Per cgroup kmem memory accounting is not enabled until a limit is set for
      the group.  Once the limit is set the accounting cannot be disabled for
      that group.  This means that after the patch is applied, no behavioral
      changes exists for whoever is still using memcg to control their memory
      usage, until memory.kmem.limit_in_bytes is set for the first time.
      
      We always account to both user and kernel resource_counters.  This
      effectively means that an independent kernel limit is in place when the
      limit is set to a lower value than the user memory.  A equal or higher
      value means that the user limit will always hit first, meaning that kmem
      is effectively unlimited.
      
      People who want to track kernel memory but not limit it, can set this
      limit to a very high number (like RESOURCE_MAX - 1page - that no one will
      ever hit, or equal to the user memory)
      
      [akpm@linux-foundation.org: MEMCG_MMEM only works with slab and slub]
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      510fc4e1
  21. 16 12月, 2012 1 次提交
  22. 13 12月, 2012 1 次提交
  23. 11 12月, 2012 2 次提交
    • M
      mm: sched: numa: Control enabling and disabling of NUMA balancing · 1a687c2e
      Mel Gorman 提交于
      This patch adds Kconfig options and kernel parameters to allow the
      enabling and disabling of automatic NUMA balancing. The existance
      of such a switch was and is very important when debugging problems
      related to transparent hugepages and we should have the same for
      automatic NUMA placement.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      1a687c2e
    • A
      mm: numa: pte_numa() and pmd_numa() · be3a7284
      Andrea Arcangeli 提交于
      Implement pte_numa and pmd_numa.
      
      We must atomically set the numa bit and clear the present bit to
      define a pte_numa or pmd_numa.
      
      Once a pte or pmd has been set as pte_numa or pmd_numa, the next time
      a thread touches a virtual address in the corresponding virtual range,
      a NUMA hinting page fault will trigger. The NUMA hinting page fault
      will clear the NUMA bit and set the present bit again to resolve the
      page fault.
      
      The expectation is that a NUMA hinting page fault is used as part
      of a placement policy that decides if a page should remain on the
      current node or migrated to a different node.
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      be3a7284