1. 28 9月, 2011 1 次提交
  2. 20 8月, 2011 1 次提交
    • C
      slub: per cpu cache for partial pages · 49e22585
      Christoph Lameter 提交于
      Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
      partial pages. The partial page list is used in slab_free() to avoid
      per node lock taking.
      
      In __slab_alloc() we can then take multiple partial pages off the per
      node partial list in one go reducing node lock pressure.
      
      We can also use the per cpu partial list in slab_alloc() to avoid scanning
      partial lists for pages with free objects.
      
      The main effect of a per cpu partial list is that the per node list_lock
      is taken for batches of partial pages instead of individual ones.
      
      Potential future enhancements:
      
      1. The pickup from the partial list could be perhaps be done without disabling
         interrupts with some work. The free path already puts the page into the
         per cpu partial list without disabling interrupts.
      
      2. __slab_free() may have some code paths that could use optimization.
      
      Performance:
      
      				Before		After
      ./hackbench 100 process 200000
      				Time: 1953.047	1564.614
      ./hackbench 100 process 20000
      				Time: 207.176   156.940
      ./hackbench 100 process 20000
      				Time: 204.468	156.940
      ./hackbench 100 process 20000
      				Time: 204.879	158.772
      ./hackbench 10 process 20000
      				Time: 20.153	15.853
      ./hackbench 10 process 20000
      				Time: 20.153	15.986
      ./hackbench 10 process 20000
      				Time: 19.363	16.111
      ./hackbench 1 process 20000
      				Time: 2.518	2.307
      ./hackbench 1 process 20000
      				Time: 2.258	2.339
      ./hackbench 1 process 20000
      				Time: 2.864	2.163
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      49e22585
  3. 14 8月, 2011 1 次提交
  4. 12 8月, 2011 1 次提交
    • V
      move RLIMIT_NPROC check from set_user() to do_execve_common() · 72fa5997
      Vasiliy Kulikov 提交于
      The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
      check in set_user() to check for NPROC exceeding via setuid() and
      similar functions.
      
      Before the check there was a possibility to greatly exceed the allowed
      number of processes by an unprivileged user if the program relied on
      rlimit only.  But the check created new security threat: many poorly
      written programs simply don't check setuid() return code and believe it
      cannot fail if executed with root privileges.  So, the check is removed
      in this patch because of too often privilege escalations related to
      buggy programs.
      
      The NPROC can still be enforced in the common code flow of daemons
      spawning user processes.  Most of daemons do fork()+setuid()+execve().
      The check introduced in execve() (1) enforces the same limit as in
      setuid() and (2) doesn't create similar security issues.
      
      Neil Brown suggested to track what specific process has exceeded the
      limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
      this process would fail on execve(), and other processes' execve()
      behaviour is not changed.
      
      Solar Designer suggested to re-check whether NPROC limit is still
      exceeded at the moment of execve().  If the process was sleeping for
      days between set*uid() and execve(), and the NPROC counter step down
      under the limit, the defered execve() failure because NPROC limit was
      exceeded days ago would be unexpected.  If the limit is not exceeded
      anymore, we clear the flag on successful calls to execve() and fork().
      
      The flag is also cleared on successful calls to set_user() as the limit
      was exceeded for the previous user, not the current one.
      
      Similar check was introduced in -ow patches (without the process flag).
      
      v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72fa5997
  5. 10 8月, 2011 1 次提交
  6. 09 8月, 2011 3 次提交
  7. 08 8月, 2011 2 次提交
  8. 07 8月, 2011 5 次提交
    • L
      vfs: optimize inode cache access patterns · 3ddcd056
      Linus Torvalds 提交于
      The inode structure layout is largely random, and some of the vfs paths
      really do care.  The path lookup in particular is already quite D$
      intensive, and profiles show that accessing the 'inode->i_op->xyz'
      fields is quite costly.
      
      We already optimized the dcache to not unnecessarily load the d_op
      structure for members that are often NULL using the DCACHE_OP_xyz bits
      in dentry->d_flags, and this does something very similar for the inode
      ops that are used during pathname lookup.
      
      It also re-orders the fields so that the fields accessed by 'stat' are
      together at the beginning of the inode structure, and roughly in the
      order accessed.
      
      The effect of this seems to be in the 1-2% range for an empty kernel
      "make -j" run (which is fairly kernel-intensive, mostly in filename
      lookup), so it's visible.  The numbers are fairly noisy, though, and
      likely depend a lot on exact microarchitecture.  So there's more tuning
      to be done.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ddcd056
    • L
      vfs: renumber DCACHE_xyz flags, remove some stale ones · 830c0f0e
      Linus Torvalds 提交于
      Gcc tends to generate better code with small integers, including the
      DCACHE_xyz flag tests - so move the common ones to be first in the list.
      Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
      DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
      tree.
      
      And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
      common case to be a nice straight-line fall-through.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      830c0f0e
    • D
      net: Compute protocol sequence numbers and fragment IDs using MD5. · 6e5714ea
      David S. Miller 提交于
      Computers have become a lot faster since we compromised on the
      partial MD4 hash which we use currently for performance reasons.
      
      MD5 is a much safer choice, and is inline with both RFC1948 and
      other ISS generators (OpenBSD, Solaris, etc.)
      
      Furthermore, only having 24-bits of the sequence number be truly
      unpredictable is a very serious limitation.  So the periodic
      regeneration and 8-bit counter have been removed.  We compute and
      use a full 32-bit sequence number.
      
      For ipv6, DCCP was found to use a 32-bit truncated initial sequence
      number (it needs 43-bits) and that is fixed here as well.
      Reported-by: NDan Kaminsky <dan@doxpara.com>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e5714ea
    • D
      crypto: Move md5_transform to lib/md5.c · bc0b96b5
      David S. Miller 提交于
      We are going to use this for TCP/IP sequence number and fragment ID
      generation.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc0b96b5
    • M
      lib/sha1: use the git implementation of SHA-1 · 1eb19a12
      Mandeep Singh Baines 提交于
      For ChromiumOS, we use SHA-1 to verify the integrity of the root
      filesystem.  The speed of the kernel sha-1 implementation has a major
      impact on our boot performance.
      
      To improve boot performance, we investigated using the heavily optimized
      sha-1 implementation used in git.  With the git sha-1 implementation, we
      see a 11.7% improvement in boot time.
      
      10 reboots, remove slowest/fastest.
      
      Before:
      
        Mean: 6.58 seconds Stdev: 0.14
      
      After (with git sha-1, this patch):
      
        Mean: 5.89 seconds Stdev: 0.07
      
      The other cool thing about the git SHA-1 implementation is that it only
      needs 64 bytes of stack for the workspace while the original kernel
      implementation needed 320 bytes.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
      Cc: Nicolas Pitre <nico@cam.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: linux-crypto@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eb19a12
  9. 06 8月, 2011 1 次提交
  10. 05 8月, 2011 1 次提交
  11. 04 8月, 2011 14 次提交
    • G
      Revert "dt: add of_alias_scan and of_alias_get_id" · fe55c184
      Grant Likely 提交于
      This reverts commit 750f463a.
      
      of_alias_* still needs work to be generalized for 'promtree' dt
      platforms, and to no implicitly create entries for available ids.
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      fe55c184
    • H
      tmpfs radix_tree: locate_item to speed up swapoff · e504f3fd
      Hugh Dickins 提交于
      We have already acknowledged that swapoff of a tmpfs file is slower than
      it was before conversion to the generic radix_tree: a little slower
      there will be acceptable, if the hotter paths are faster.
      
      But it was a shock to find swapoff of a 500MB file 20 times slower on my
      laptop, taking 10 minutes; and at that rate it significantly slows down
      my testing.
      
      Now, most of that turned out to be overhead from PROVE_LOCKING and
      PROVE_RCU: without those it was only 4 times slower than before; and
      more realistic tests on other machines don't fare as badly.
      
      I've tried a number of things to improve it, including tagging the swap
      entries, then doing lookup by tag: I'd expected that to halve the time,
      but in practice it's erratic, and often counter-productive.
      
      The only change I've so far found to make a consistent improvement, is
      to short-circuit the way we go back and forth, gang lookup packing
      entries into the array supplied, then shmem scanning that array for the
      target entry.  Scanning in place doubles the speed, so it's now only
      twice as slow as before (or three times slower when the PROVEs are on).
      
      So, add radix_tree_locate_item() as an expedient, once-off,
      single-caller hack to do the lookup directly in place.  #ifdef it on
      CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited
      applicability as save space in other configurations.  And, sadly,
      #include sched.h for cond_resched().
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e504f3fd
    • H
      tmpfs: use kmemdup for short symlinks · 69f07ec9
      Hugh Dickins 提交于
      But we've not yet removed the old swp_entry_t i_direct[16] from
      shmem_inode_info.  That's because it was still being shared with the
      inline symlink.  Remove it now (saving 64 or 128 bytes from shmem inode
      size), and use kmemdup() for short symlinks, say, those up to 128 bytes.
      
      I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode()
      rather than shmem_evict_inode(), where we usually do such freeing? I
      guess it doesn't matter, and I'm not into NUMA mpol testing right now.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69f07ec9
    • H
      tmpfs: convert mem_cgroup shmem to radix-swap · aa3b1895
      Hugh Dickins 提交于
      Remove mem_cgroup_shmem_charge_fallback(): it was only required when we
      had to move swappage to filecache with GFP_NOWAIT.
      
      Remove the GFP_NOWAIT special case from mem_cgroup_cache_charge(), by
      moving its call out from shmem_add_to_page_cache() to two of thats three
      callers.  But leave it doing mem_cgroup_uncharge_cache_page() on error:
      although asymmetrical, it's easier for all 3 callers to handle.
      
      These two changes would also be appropriate if anyone were to start
      using shmem_read_mapping_page_gfp() with GFP_NOWAIT.
      
      Remove mem_cgroup_get_shmem_target(): mc_handle_file_pte() can test
      radix_tree_exceptional_entry() to get what it needs for itself.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa3b1895
    • H
      tmpfs: miscellaneous trivial cleanups · 41ffe5d5
      Hugh Dickins 提交于
      While it's at its least, make a number of boring nitpicky cleanups to
      shmem.c, mostly for consistency of variable naming.  Things like "swap"
      instead of "entry", "pgoff_t index" instead of "unsigned long idx".
      
      And since everything else here is prefixed "shmem_", better change
      init_tmpfs() to shmem_init().
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41ffe5d5
    • H
      tmpfs: demolish old swap vector support · 285b2c4f
      Hugh Dickins 提交于
      The maximum size of a shmem/tmpfs file has been limited by the maximum
      size of its triple-indirect swap vector.  With 4kB page size, maximum
      filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of
      that on a 64-bit kernel.  (With 8kB page size, maximum filesize was just
      over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel,
      MAX_LFS_FILESIZE being then more restrictive than swap vector layout.)
      
      It's a shame that tmpfs should be more restrictive than ramfs, and this
      limitation has now been noticed.  Add another level to the swap vector?
      No, it became obscure and hard to maintain, once I complicated it to
      make use of highmem pages nine years ago: better choose another way.
      
      Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then
      tmpfs would never have invented its own peculiar radix tree: we would
      have fitted swap entries into the common radix tree instead, in much the
      same way as we fit swap entries into page tables.
      
      And why should each file have a separate radix tree for its pages and
      for its swap entries? The swap entries are required precisely where and
      when the pages are not.  We want to put them together in a single radix
      tree: which can then avoid much of the locking which was needed to
      prevent them from being exchanged underneath us.
      
      This also avoids the waste of memory devoted to swap vectors, first in
      the shmem_inode itself, then at least two more pages once a file grew
      beyond 16 data pages (pages accounted by df and du, but not by memcg).
      Allocated upfront, to avoid allocation when under swapping pressure, but
      pure waste when CONFIG_SWAP is not set - I have never spattered around
      the ifdefs to prevent that, preferring this move to sharing the common
      radix tree instead.
      
      There are three downsides to sharing the radix tree.  One, that it binds
      tmpfs more tightly to the rest of mm, either requiring knowledge of swap
      entries in radix tree there, or duplication of its code here in shmem.c.
      I believe that the simplications and memory savings (and probable higher
      performance, not yet measured) justify that.
      
      Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix
      nodes that cannot be freed under memory pressure - whereas before it was
      the less precious highmem swap vector pages that could not be freed.
      I'm hoping that 64-bit has now been accessible for long enough, that the
      highmem argument has grown much less persuasive.
      
      Three, that swapoff is slower than it used to be on tmpfs files, since
      it's using a simple generic mechanism not tailored to it: I find this
      noticeable, and shall want to improve, but maybe nobody else will
      notice.
      
      So...  now remove most of the old swap vector code from shmem.c.  But,
      for the moment, keep the simple i_direct vector of 16 pages, with simple
      accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation
      to help mark where swap needs to be handled in subsequent patches.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      285b2c4f
    • H
      mm: let swap use exceptional entries · a2c16d6c
      Hugh Dickins 提交于
      If swap entries are to be stored along with struct page pointers in a
      radix tree, they need to be distinguished as exceptional entries.
      
      Most of the handling of swap entries in radix tree will be contained in
      shmem.c, but a few functions in filemap.c's common code need to check
      for their appearance: find_get_page(), find_lock_page(),
      find_get_pages() and find_get_pages_contig().
      
      So as not to slow their fast paths, tuck those checks inside the
      existing checks for unlikely radix_tree_deref_slot(); except for
      find_lock_page(), where it is an added test.  And make it a BUG in
      find_get_pages_tag(), which is not applied to tmpfs files.
      
      A part of the reason for eliminating shmem_readpage() earlier, was to
      minimize the places where common code would need to allow for swap
      entries.
      
      The swp_entry_t known to swapfile.c must be massaged into a slightly
      different form when stored in the radix tree, just as it gets massaged
      into a pte_t when stored in page tables.
      
      In an i386 kernel this limits its information (type and page offset) to
      30 bits: given 32 "types" of swapfile and 4kB pagesize, that's a maximum
      swapfile size of 128GB.  Which is less than the 512GB we previously
      allowed with X86_PAE (where the swap entry can occupy the entire upper
      32 bits of a pte_t), but not a new limitation on 32-bit without PAE; and
      there's not a new limitation on 64-bit (where swap filesize is already
      limited to 16TB by a 32-bit page offset).  Thirty areas of 128GB is
      probably still enough swap for a 64GB 32-bit machine.
      
      Provide swp_to_radix_entry() and radix_to_swp_entry() conversions, and
      enforce filesize limit in read_swap_header(), just as for ptes.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2c16d6c
    • H
      radix_tree: exceptional entries and indices · 6328650b
      Hugh Dickins 提交于
      A patchset to extend tmpfs to MAX_LFS_FILESIZE by abandoning its
      peculiar swap vector, instead keeping a file's swap entries in the same
      radix tree as its struct page pointers: thus saving memory, and
      simplifying its code and locking.
      
      This patch:
      
      The radix_tree is used by several subsystems for different purposes.  A
      major use is to store the struct page pointers of a file's pagecache for
      memory management.  But what if mm wanted to store something other than
      page pointers there too?
      
      The low bit of a radix_tree entry is already used to denote an indirect
      pointer, for internal use, and the unlikely radix_tree_deref_retry()
      case.
      
      Define the next bit as denoting an exceptional entry, and supply inline
      functions radix_tree_exception() to return non-0 in either unlikely
      case, and radix_tree_exceptional_entry() to return non-0 in the second
      case.
      
      If a subsystem already uses radix_tree with that bit set, no problem: it
      does not affect internal workings at all, but is defined for the
      convenience of those storing well-aligned pointers in the radix_tree.
      
      The radix_tree_gang_lookups have an implicit assumption that the caller
      can deduce the offset of each entry returned e.g.  by the page->index of
      a struct page.  But that may not be feasible for some kinds of item to
      be stored there.
      
      radix_tree_gang_lookup_slot() allow for an optional indices argument,
      output array in which to return those offsets.  The same could be added
      to other radix_tree_gang_lookups, but for now keep it to the only one
      for which we need it.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6328650b
    • A
      drivers/video/backlight/aat2870_bl.c: fix setting max_current · 5d6f921b
      Axel Lin 提交于
       - Current implementation tests wrong value for setting
         aat2870_bl->max_current.
      
       - In the current implementation, we cannot differentiate between 2 cases:
      
         a) if pdata->max_current is not set , or
      
         b) pdata->max_current is set to AAT2870_CURRENT_0_45 (which is also 0).
      
         Fix it by setting AAT2870_CURRENT_0_45 to be 1 and adjust the equation in
         aat2870_brightness() accordingly.
      Signed-off-by: NAxel Lin <axel.lin@gmail.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Tested-by: NJin Park <jinyoungp@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d6f921b
    • J
      mm: page_alloc: increase __GFP_BITS_SHIFT to include __GFP_OTHER_NODE · 3dab1bce
      Johannes Weiner 提交于
      __GFP_OTHER_NODE is used for NUMA allocations on behalf of other nodes.
      It's supposed to be passed through from the page allocator to
      zone_statistics(), but it never gets there as gfp_allowed_mask is not
      wide enough and masks out the flag early in the allocation path.
      
      The result is an accounting glitch where successful NUMA allocations
      by-agent are not properly attributed as local.
      
      Increase __GFP_BITS_SHIFT so that it includes __GFP_OTHER_NODE.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3dab1bce
    • R
      ida: simplified functions for id allocation · 88eca020
      Rusty Russell 提交于
      The current hyper-optimized functions are overkill if you simply want to
      allocate an id for a device.  Create versions which use an internal
      lock.
      
      In followup patches, numerous drivers are converted to use this
      interface.
      
      Thanks to Tejun for feedback.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJonathan Cameron <jic23@cam.ac.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88eca020
    • A
      fault-injection: add ability to export fault_attr in arbitrary directory · dd48c085
      Akinobu Mita 提交于
      init_fault_attr_dentries() is used to export fault_attr via debugfs.
      But it can only export it in debugfs root directory.
      
      Per Forlin is working on mmc_fail_request which adds support to inject
      data errors after a completed host transfer in MMC subsystem.
      
      The fault_attr for mmc_fail_request should be defined per mmc host and
      export it in debugfs directory per mmc host like
      /sys/kernel/debug/mmc0/mmc_fail_request.
      
      init_fault_attr_dentries() doesn't help for mmc_fail_request.  So this
      introduces fault_create_debugfs_attr() which is able to create a
      directory in the arbitrary directory and replace
      init_fault_attr_dentries().
      
      [akpm@linux-foundation.org: extraneous semicolon, per Randy]
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Tested-by: NPer Forlin <per.forlin@linaro.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd48c085
    • L
      cpuidle: stop depending on pm_idle · a0bfa137
      Len Brown 提交于
      cpuidle users should call cpuidle_call_idle() directly
      rather than via (pm_idle)() function pointer.
      
      Architecture may choose to continue using (pm_idle)(),
      but cpuidle need not depend on it:
      
        my_arch_cpu_idle()
      	...
      	if(cpuidle_call_idle())
      		pm_idle();
      
      cc: Kevin Hilman <khilman@deeprootsystems.com>
      cc: Paul Mundt <lethal@linux-sh.org>
      cc: x86@kernel.org
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      a0bfa137
    • L
      cpuidle: replace xen access to x86 pm_idle and default_idle · d91ee586
      Len Brown 提交于
      When a Xen Dom0 kernel boots on a hypervisor, it gets access
      to the raw-hardware ACPI tables.  While it parses the idle tables
      for the hypervisor's beneift, it uses HLT for its own idle.
      
      Rather than have xen scribble on pm_idle and access default_idle,
      have it simply disable_cpuidle() so acpi_idle will not load and
      architecture default HLT will be used.
      
      cc: xen-devel@lists.xensource.com
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      d91ee586
  12. 03 8月, 2011 8 次提交
    • H
      HWPoison: add memory_failure_queue() · ea8f5fb8
      Huang Ying 提交于
      memory_failure() is the entry point for HWPoison memory error
      recovery.  It must be called in process context.  But commonly
      hardware memory errors are notified via MCE or NMI, so some delayed
      execution mechanism must be used.  In MCE handler, a work queue + ring
      buffer mechanism is used.
      
      In addition to MCE, now APEI (ACPI Platform Error Interface) GHES
      (Generic Hardware Error Source) can be used to report memory errors
      too.  To add support to APEI GHES memory recovery, a mechanism similar
      to that of MCE is implemented.  memory_failure_queue() is the new
      entry point that can be called in IRQ context.  The next step is to
      make MCE handler uses this interface too.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      ea8f5fb8
    • H
      lib, Make gen_pool memory allocator lockless · 7f184275
      Huang Ying 提交于
      This version of the gen_pool memory allocator supports lockless
      operation.
      
      This makes it safe to use in NMI handlers and other special
      unblockable contexts that could otherwise deadlock on locks.  This is
      implemented by using atomic operations and retries on any conflicts.
      The disadvantage is that there may be livelocks in extreme cases.  For
      better scalability, one gen_pool allocator can be used for each CPU.
      
      The lockless operation only works if there is enough memory available.
      If new memory is added to the pool a lock has to be still taken.  So
      any user relying on locklessness has to ensure that sufficient memory
      is preallocated.
      
      The basic atomic operation of this allocator is cmpxchg on long.  On
      architectures that don't have NMI-safe cmpxchg implementation, the
      allocator can NOT be used in NMI handler.  So code uses the allocator
      in NMI handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      7f184275
    • H
      lib, Add lock-less NULL terminated single list · f49f23ab
      Huang Ying 提交于
      Cmpxchg is used to implement adding new entry to the list, deleting
      all entries from the list, deleting first entry of the list and some
      other operations.
      
      Because this is a single list, so the tail can not be accessed in O(1).
      
      If there are multiple producers and multiple consumers, llist_add can
      be used in producers and llist_del_all can be used in consumers.  They
      can work simultaneously without lock.  But llist_del_first can not be
      used here.  Because llist_del_first depends on list->first->next does
      not changed if list->first is not changed during its operation, but
      llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
      llist_add) sequence in another consumer may violate that.
      
      If there are multiple producers and one consumer, llist_add can be
      used in producers and llist_del_all or llist_del_first can be used in
      the consumer.
      
      This can be summarized as follow:
      
                 |   add    | del_first |  del_all
       add       |    -     |     -     |     -
       del_first |          |     L     |     L
       del_all   |          |           |     -
      
      Where "-" stands for no lock is needed, while "L" stands for lock is
      needed.
      
      The list entries deleted via llist_del_all can be traversed with
      traversing function such as llist_for_each etc.  But the list entries
      can not be traversed safely before deleted from the list.  The order
      of deleted entries is from the newest to the oldest added one.  If you
      want to traverse from the oldest to the newest, you must reverse the
      order by yourself before traversing.
      
      The basic atomic operation of this list is cmpxchg on long.  On
      architectures that don't have NMI-safe cmpxchg implementation, the
      list can NOT be used in NMI handler.  So code uses the list in NMI
      handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      f49f23ab
    • S
      dt: add of_alias_scan and of_alias_get_id · 750f463a
      Shawn Guo 提交于
      The patch adds function of_alias_scan to populate a global lookup
      table with the properties of 'aliases' node and function
      of_alias_get_id for drivers to find alias id from the lookup table.
      Signed-off-by: NShawn Guo <shawn.guo@linaro.org>
      [grant.likely: add locking and rework parse loop]
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      750f463a
    • A
    • A
      get rid of boilerplate switches in posix_acl.h · 951c0d66
      Al Viro 提交于
      the only potentially subtle thing here: get_cached_acl()
      is never called with the second argument other than
      ACL_TYPE_{ACCESS,DEFAULT}.  IOW, that return ERR_PTR(-EINVAL)
      in there might as well be BUG().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      951c0d66
    • D
      ACPI: remove NID_INVAL · f52e00c6
      David Rientjes 提交于
      b552a8c5 ("ACPI: remove NID_INVAL") removed the left over uses of
      NID_INVAL, but didn't actually remove the definition.  Remove it.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      f52e00c6
    • J
      thermal: make THERMAL_HWMON implementation fully internal · 31f5396a
      Jean Delvare 提交于
      THERMAL_HWMON is implemented inside the thermal_sys driver and has no
      effect on drivers implementing thermal zones, so they shouldn't see
      anything related to it in <linux/thermal.h>.  Making the THERMAL_HWMON
      implementation fully internal has two advantages beyond the cleaner
      design:
      
      * This avoids rebuilding all thermal drivers if the THERMAL_HWMON
        implementation changes, or if CONFIG_THERMAL_HWMON gets enabled or
        disabled.
      
      * This avoids breaking the thermal kABI in these cases too, which should
        make distributions happy.
      
      The only drawback I can see is slightly higher memory fragmentation, as
      the number of kzalloc() calls will increase by one per thermal zone.  But
      I doubt it will be a problem in practice, as I've never seen a system with
      more than two thermal zones.
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Cc: Rene Herman <rene.herman@gmail.com>
      Acked-by: NGuenter Roeck <guenter.roeck@ericsson.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      31f5396a
  13. 02 8月, 2011 1 次提交