1. 04 8月, 2011 11 次提交
    • H
      tmpfs radix_tree: locate_item to speed up swapoff · e504f3fd
      Hugh Dickins 提交于
      We have already acknowledged that swapoff of a tmpfs file is slower than
      it was before conversion to the generic radix_tree: a little slower
      there will be acceptable, if the hotter paths are faster.
      
      But it was a shock to find swapoff of a 500MB file 20 times slower on my
      laptop, taking 10 minutes; and at that rate it significantly slows down
      my testing.
      
      Now, most of that turned out to be overhead from PROVE_LOCKING and
      PROVE_RCU: without those it was only 4 times slower than before; and
      more realistic tests on other machines don't fare as badly.
      
      I've tried a number of things to improve it, including tagging the swap
      entries, then doing lookup by tag: I'd expected that to halve the time,
      but in practice it's erratic, and often counter-productive.
      
      The only change I've so far found to make a consistent improvement, is
      to short-circuit the way we go back and forth, gang lookup packing
      entries into the array supplied, then shmem scanning that array for the
      target entry.  Scanning in place doubles the speed, so it's now only
      twice as slow as before (or three times slower when the PROVEs are on).
      
      So, add radix_tree_locate_item() as an expedient, once-off,
      single-caller hack to do the lookup directly in place.  #ifdef it on
      CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited
      applicability as save space in other configurations.  And, sadly,
      #include sched.h for cond_resched().
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e504f3fd
    • H
      tmpfs: use kmemdup for short symlinks · 69f07ec9
      Hugh Dickins 提交于
      But we've not yet removed the old swp_entry_t i_direct[16] from
      shmem_inode_info.  That's because it was still being shared with the
      inline symlink.  Remove it now (saving 64 or 128 bytes from shmem inode
      size), and use kmemdup() for short symlinks, say, those up to 128 bytes.
      
      I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode()
      rather than shmem_evict_inode(), where we usually do such freeing? I
      guess it doesn't matter, and I'm not into NUMA mpol testing right now.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69f07ec9
    • H
      tmpfs: convert mem_cgroup shmem to radix-swap · aa3b1895
      Hugh Dickins 提交于
      Remove mem_cgroup_shmem_charge_fallback(): it was only required when we
      had to move swappage to filecache with GFP_NOWAIT.
      
      Remove the GFP_NOWAIT special case from mem_cgroup_cache_charge(), by
      moving its call out from shmem_add_to_page_cache() to two of thats three
      callers.  But leave it doing mem_cgroup_uncharge_cache_page() on error:
      although asymmetrical, it's easier for all 3 callers to handle.
      
      These two changes would also be appropriate if anyone were to start
      using shmem_read_mapping_page_gfp() with GFP_NOWAIT.
      
      Remove mem_cgroup_get_shmem_target(): mc_handle_file_pte() can test
      radix_tree_exceptional_entry() to get what it needs for itself.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa3b1895
    • H
      tmpfs: miscellaneous trivial cleanups · 41ffe5d5
      Hugh Dickins 提交于
      While it's at its least, make a number of boring nitpicky cleanups to
      shmem.c, mostly for consistency of variable naming.  Things like "swap"
      instead of "entry", "pgoff_t index" instead of "unsigned long idx".
      
      And since everything else here is prefixed "shmem_", better change
      init_tmpfs() to shmem_init().
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41ffe5d5
    • H
      tmpfs: demolish old swap vector support · 285b2c4f
      Hugh Dickins 提交于
      The maximum size of a shmem/tmpfs file has been limited by the maximum
      size of its triple-indirect swap vector.  With 4kB page size, maximum
      filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of
      that on a 64-bit kernel.  (With 8kB page size, maximum filesize was just
      over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel,
      MAX_LFS_FILESIZE being then more restrictive than swap vector layout.)
      
      It's a shame that tmpfs should be more restrictive than ramfs, and this
      limitation has now been noticed.  Add another level to the swap vector?
      No, it became obscure and hard to maintain, once I complicated it to
      make use of highmem pages nine years ago: better choose another way.
      
      Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then
      tmpfs would never have invented its own peculiar radix tree: we would
      have fitted swap entries into the common radix tree instead, in much the
      same way as we fit swap entries into page tables.
      
      And why should each file have a separate radix tree for its pages and
      for its swap entries? The swap entries are required precisely where and
      when the pages are not.  We want to put them together in a single radix
      tree: which can then avoid much of the locking which was needed to
      prevent them from being exchanged underneath us.
      
      This also avoids the waste of memory devoted to swap vectors, first in
      the shmem_inode itself, then at least two more pages once a file grew
      beyond 16 data pages (pages accounted by df and du, but not by memcg).
      Allocated upfront, to avoid allocation when under swapping pressure, but
      pure waste when CONFIG_SWAP is not set - I have never spattered around
      the ifdefs to prevent that, preferring this move to sharing the common
      radix tree instead.
      
      There are three downsides to sharing the radix tree.  One, that it binds
      tmpfs more tightly to the rest of mm, either requiring knowledge of swap
      entries in radix tree there, or duplication of its code here in shmem.c.
      I believe that the simplications and memory savings (and probable higher
      performance, not yet measured) justify that.
      
      Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix
      nodes that cannot be freed under memory pressure - whereas before it was
      the less precious highmem swap vector pages that could not be freed.
      I'm hoping that 64-bit has now been accessible for long enough, that the
      highmem argument has grown much less persuasive.
      
      Three, that swapoff is slower than it used to be on tmpfs files, since
      it's using a simple generic mechanism not tailored to it: I find this
      noticeable, and shall want to improve, but maybe nobody else will
      notice.
      
      So...  now remove most of the old swap vector code from shmem.c.  But,
      for the moment, keep the simple i_direct vector of 16 pages, with simple
      accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation
      to help mark where swap needs to be handled in subsequent patches.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      285b2c4f
    • H
      mm: let swap use exceptional entries · a2c16d6c
      Hugh Dickins 提交于
      If swap entries are to be stored along with struct page pointers in a
      radix tree, they need to be distinguished as exceptional entries.
      
      Most of the handling of swap entries in radix tree will be contained in
      shmem.c, but a few functions in filemap.c's common code need to check
      for their appearance: find_get_page(), find_lock_page(),
      find_get_pages() and find_get_pages_contig().
      
      So as not to slow their fast paths, tuck those checks inside the
      existing checks for unlikely radix_tree_deref_slot(); except for
      find_lock_page(), where it is an added test.  And make it a BUG in
      find_get_pages_tag(), which is not applied to tmpfs files.
      
      A part of the reason for eliminating shmem_readpage() earlier, was to
      minimize the places where common code would need to allow for swap
      entries.
      
      The swp_entry_t known to swapfile.c must be massaged into a slightly
      different form when stored in the radix tree, just as it gets massaged
      into a pte_t when stored in page tables.
      
      In an i386 kernel this limits its information (type and page offset) to
      30 bits: given 32 "types" of swapfile and 4kB pagesize, that's a maximum
      swapfile size of 128GB.  Which is less than the 512GB we previously
      allowed with X86_PAE (where the swap entry can occupy the entire upper
      32 bits of a pte_t), but not a new limitation on 32-bit without PAE; and
      there's not a new limitation on 64-bit (where swap filesize is already
      limited to 16TB by a 32-bit page offset).  Thirty areas of 128GB is
      probably still enough swap for a 64GB 32-bit machine.
      
      Provide swp_to_radix_entry() and radix_to_swp_entry() conversions, and
      enforce filesize limit in read_swap_header(), just as for ptes.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2c16d6c
    • H
      radix_tree: exceptional entries and indices · 6328650b
      Hugh Dickins 提交于
      A patchset to extend tmpfs to MAX_LFS_FILESIZE by abandoning its
      peculiar swap vector, instead keeping a file's swap entries in the same
      radix tree as its struct page pointers: thus saving memory, and
      simplifying its code and locking.
      
      This patch:
      
      The radix_tree is used by several subsystems for different purposes.  A
      major use is to store the struct page pointers of a file's pagecache for
      memory management.  But what if mm wanted to store something other than
      page pointers there too?
      
      The low bit of a radix_tree entry is already used to denote an indirect
      pointer, for internal use, and the unlikely radix_tree_deref_retry()
      case.
      
      Define the next bit as denoting an exceptional entry, and supply inline
      functions radix_tree_exception() to return non-0 in either unlikely
      case, and radix_tree_exceptional_entry() to return non-0 in the second
      case.
      
      If a subsystem already uses radix_tree with that bit set, no problem: it
      does not affect internal workings at all, but is defined for the
      convenience of those storing well-aligned pointers in the radix_tree.
      
      The radix_tree_gang_lookups have an implicit assumption that the caller
      can deduce the offset of each entry returned e.g.  by the page->index of
      a struct page.  But that may not be feasible for some kinds of item to
      be stored there.
      
      radix_tree_gang_lookup_slot() allow for an optional indices argument,
      output array in which to return those offsets.  The same could be added
      to other radix_tree_gang_lookups, but for now keep it to the only one
      for which we need it.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6328650b
    • A
      drivers/video/backlight/aat2870_bl.c: fix setting max_current · 5d6f921b
      Axel Lin 提交于
       - Current implementation tests wrong value for setting
         aat2870_bl->max_current.
      
       - In the current implementation, we cannot differentiate between 2 cases:
      
         a) if pdata->max_current is not set , or
      
         b) pdata->max_current is set to AAT2870_CURRENT_0_45 (which is also 0).
      
         Fix it by setting AAT2870_CURRENT_0_45 to be 1 and adjust the equation in
         aat2870_brightness() accordingly.
      Signed-off-by: NAxel Lin <axel.lin@gmail.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Tested-by: NJin Park <jinyoungp@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d6f921b
    • J
      mm: page_alloc: increase __GFP_BITS_SHIFT to include __GFP_OTHER_NODE · 3dab1bce
      Johannes Weiner 提交于
      __GFP_OTHER_NODE is used for NUMA allocations on behalf of other nodes.
      It's supposed to be passed through from the page allocator to
      zone_statistics(), but it never gets there as gfp_allowed_mask is not
      wide enough and masks out the flag early in the allocation path.
      
      The result is an accounting glitch where successful NUMA allocations
      by-agent are not properly attributed as local.
      
      Increase __GFP_BITS_SHIFT so that it includes __GFP_OTHER_NODE.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3dab1bce
    • R
      ida: simplified functions for id allocation · 88eca020
      Rusty Russell 提交于
      The current hyper-optimized functions are overkill if you simply want to
      allocate an id for a device.  Create versions which use an internal
      lock.
      
      In followup patches, numerous drivers are converted to use this
      interface.
      
      Thanks to Tejun for feedback.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJonathan Cameron <jic23@cam.ac.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88eca020
    • A
      fault-injection: add ability to export fault_attr in arbitrary directory · dd48c085
      Akinobu Mita 提交于
      init_fault_attr_dentries() is used to export fault_attr via debugfs.
      But it can only export it in debugfs root directory.
      
      Per Forlin is working on mmc_fail_request which adds support to inject
      data errors after a completed host transfer in MMC subsystem.
      
      The fault_attr for mmc_fail_request should be defined per mmc host and
      export it in debugfs directory per mmc host like
      /sys/kernel/debug/mmc0/mmc_fail_request.
      
      init_fault_attr_dentries() doesn't help for mmc_fail_request.  So this
      introduces fault_create_debugfs_attr() which is able to create a
      directory in the arbitrary directory and replace
      init_fault_attr_dentries().
      
      [akpm@linux-foundation.org: extraneous semicolon, per Randy]
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Tested-by: NPer Forlin <per.forlin@linaro.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd48c085
  2. 03 8月, 2011 3 次提交
  3. 02 8月, 2011 4 次提交
  4. 01 8月, 2011 22 次提交