1. 22 9月, 2009 5 次提交
  2. 21 9月, 2009 3 次提交
    • J
      writeback: make balance_dirty_pages() gradually back more off · 87c6a9b2
      Jens Axboe 提交于
      Currently it just sleeps for a very short time, just 1 jiffy. If
      we keep looping in there, continually delay for a little longer
      of up to 100msec in total. That was the old limit for congestion
      wait.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      87c6a9b2
    • J
      writeback: don't use schedule_timeout() without setting runstate · 3542a5c0
      Jens Axboe 提交于
      Just use schedule_timeout_interruptible(), saves a call to
      set_current_state().
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3542a5c0
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  3. 19 9月, 2009 2 次提交
  4. 16 9月, 2009 6 次提交
    • J
      writeback: splice dirty inode entries to default bdi on bdi_destroy() · ce5f8e77
      Jens Axboe 提交于
      We cannot safely ensure that the inodes are all gone at this point
      in time, and we must not destroy this bdi with inodes having off it.
      So just splice our entries to the default bdi since that one will
      always persist.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ce5f8e77
    • J
      writeback: separate starting of sync vs opportunistic writeback · b6e51316
      Jens Axboe 提交于
      bdi_start_writeback() is currently split into two paths, one for
      WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback()
      for WB_SYNC_ALL writeback and let bdi_start_writeback() handle
      only WB_SYNC_NONE.
      
      Push down the writeback_control allocation and only accept the
      parameters that make sense for each function. This cleans up
      the API considerably.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b6e51316
    • J
      writeback: use RCU to protect bdi_list · cfc4ba53
      Jens Axboe 提交于
      Now that bdi_writeback_all() no longer handles integrity writeback,
      it doesn't have to block anymore. This means that we can switch
      bdi_list reader side protection to RCU.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      cfc4ba53
    • J
      writeback: get rid of wbc->for_writepages · 1fe06ad8
      Jens Axboe 提交于
      It's only set, it's never checked. Kill it.
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1fe06ad8
    • I
      slub: Fix build error in kmem_cache_open() with !CONFIG_SLUB_DEBUG · fdaa45e9
      Ingo Molnar 提交于
      This build bug:
      
       mm/slub.c: In function 'kmem_cache_open':
       mm/slub.c:2476: error: 'disable_higher_order_debug' undeclared (first use in this function)
       mm/slub.c:2476: error: (Each undeclared identifier is reported only once
       mm/slub.c:2476: error: for each function it appears in.)
      
      Triggers because there's no !CONFIG_SLUB_DEBUG definition for
      disable_higher_order_debug.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      fdaa45e9
    • K
      Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev · 2b2af54a
      Kay Sievers 提交于
      Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
      very early at kernel initialization, before any driver-core device
      is registered. Every device with a major/minor will provide a
      device node in devtmpfs.
      
      Devtmpfs can be changed and altered by userspace at any time,
      and in any way needed - just like today's udev-mounted tmpfs.
      Unmodified udev versions will run just fine on top of it, and will
      recognize an already existing kernel-created device node and use it.
      The default node permissions are root:root 0600. Proper permissions
      and user/group ownership, meaningful symlinks, all other policy still
      needs to be applied by userspace.
      
      If a node is created by devtmps, devtmpfs will remove the device node
      when the device goes away. If the device node was created by
      userspace, or the devtmpfs created node was replaced by userspace, it
      will no longer be removed by devtmpfs.
      
      If it is requested to auto-mount it, it makes init=/bin/sh work
      without any further userspace support. /dev will be fully populated
      and dynamic, and always reflect the current device state of the kernel.
      With the commonly used dynamic device numbers, it solves the problem
      where static devices nodes may point to the wrong devices.
      
      It is intended to make the initial bootup logic simpler and more robust,
      by de-coupling the creation of the inital environment, to reliably run
      userspace processes, from a complex userspace bootstrap logic to provide
      a working /dev.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Tested-By: NHarald Hoyer <harald@redhat.com>
      Tested-By: NScott James Remnant <scott@ubuntu.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      2b2af54a
  5. 14 9月, 2009 8 次提交
  6. 11 9月, 2009 7 次提交
    • C
      kmemleak: Improve the "Early log buffer exceeded" error message · addd72c1
      Catalin Marinas 提交于
      Based on a suggestion from Jaswinder, clarify what the user would need
      to do to avoid this error message from kmemleak.
      Reported-by: NJaswinder Singh Rajput <jaswinder@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      addd72c1
    • J
      writeback: check for registered bdi in flusher add and inode dirty · 500b067c
      Jens Axboe 提交于
      Also a debugging aid. We want to catch dirty inodes being added to
      backing devices that don't do writeback.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      500b067c
    • J
      writeback: add name to backing_dev_info · d993831f
      Jens Axboe 提交于
      This enables us to track who does what and print info. Its main use
      is catching dirty inodes on the default_backing_dev_info, so we can
      fix that up.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d993831f
    • J
      writeback: add some debug inode list counters to bdi stats · f09b00d3
      Jens Axboe 提交于
      Add some debug entries to be able to inspect the internal state of
      the writeback details.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      f09b00d3
    • J
      writeback: get rid of pdflush completely · d0bceac7
      Jens Axboe 提交于
      It is now unused, so kill it off.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d0bceac7
    • J
      writeback: switch to per-bdi threads for flushing data · 03ba3782
      Jens Axboe 提交于
      This gets rid of pdflush for bdi writeout and kupdated style cleaning.
      pdflush writeout suffers from lack of locality and also requires more
      threads to handle the same workload, since it has to work in a
      non-blocking fashion against each queue. This also introduces lumpy
      behaviour and potential request starvation, since pdflush can be starved
      for queue access if others are accessing it. A sample ffsb workload that
      does random writes to files is about 8% faster here on a simple SATA drive
      during the benchmark phase. File layout also seems a LOT more smooth in
      vmstat:
      
       r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
       0  1      0 608848   2652 375372    0    0     0 71024  604    24  1 10 48 42
       0  1      0 549644   2712 433736    0    0     0 60692  505    27  1  8 48 44
       1  0      0 476928   2784 505192    0    0     4 29540  553    24  0  9 53 37
       0  1      0 457972   2808 524008    0    0     0 54876  331    16  0  4 38 58
       0  1      0 366128   2928 614284    0    0     4 92168  710    58  0 13 53 34
       0  1      0 295092   3000 684140    0    0     0 62924  572    23  0  9 53 37
       0  1      0 236592   3064 741704    0    0     4 58256  523    17  0  8 48 44
       0  1      0 165608   3132 811464    0    0     0 57460  560    21  0  8 54 38
       0  1      0 102952   3200 873164    0    0     4 74748  540    29  1 10 48 41
       0  1      0  48604   3252 926472    0    0     0 53248  469    29  0  7 47 45
      
      where vanilla tends to fluctuate a lot in the creation phase:
      
       r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
       1  1      0 678716   5792 303380    0    0     0 74064  565    50  1 11 52 36
       1  0      0 662488   5864 319396    0    0     4   352  302   329  0  2 47 51
       0  1      0 599312   5924 381468    0    0     0 78164  516    55  0  9 51 40
       0  1      0 519952   6008 459516    0    0     4 78156  622    56  1 11 52 37
       1  1      0 436640   6092 541632    0    0     0 82244  622    54  0 11 48 41
       0  1      0 436640   6092 541660    0    0     0     8  152    39  0  0 51 49
       0  1      0 332224   6200 644252    0    0     4 102800  728    46  1 13 49 36
       1  0      0 274492   6260 701056    0    0     4 12328  459    49  0  7 50 43
       0  1      0 211220   6324 763356    0    0     0 106940  515    37  1 10 51 39
       1  0      0 160412   6376 813468    0    0     0  8224  415    43  0  6 49 45
       1  1      0  85980   6452 886556    0    0     4 113516  575    39  1 11 54 34
       0  2      0  85968   6452 886620    0    0     0  1640  158   211  0  0 46 54
      
      A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
      SSD based writeback test on XFS performs over 20% better as well, with
      the throughput being very stable around 1GB/sec, where pdflush only
      manages 750MB/sec and fluctuates wildly while doing so. Random buffered
      writes to many files behave a lot better as well, as does random mmap'ed
      writes.
      
      A separate thread is added to sync the super blocks. In the long term,
      adding sync_supers_bdi() functionality could get rid of this thread again.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      03ba3782
    • J
      writeback: move dirty inodes from super_block to backing_dev_info · 66f3b8e2
      Jens Axboe 提交于
      This is a first step at introducing per-bdi flusher threads. We should
      have no change in behaviour, although sb_has_dirty_inodes() is now
      ridiculously expensive, as there's no easy way to answer that question.
      Not a huge problem, since it'll be deleted in subsequent patches.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      66f3b8e2
  7. 10 9月, 2009 1 次提交
  8. 09 9月, 2009 4 次提交
  9. 08 9月, 2009 3 次提交
  10. 06 9月, 2009 1 次提交
    • M
      page-allocator: always change pageblock ownership when anti-fragmentation is disabled · dd5d241e
      Mel Gorman 提交于
      On low-memory systems, anti-fragmentation gets disabled as fragmentation
      cannot be avoided on a sufficiently large boundary to be worthwhile.  Once
      disabled, there is a period of time when all the pageblocks are marked
      MOVABLE and the expectation is that they get marked UNMOVABLE at each call
      to __rmqueue_fallback().
      
      However, when MAX_ORDER is large the pageblocks do not change ownership
      because the normal criteria are not met.  This has the effect of
      prematurely breaking up too many large contiguous blocks.  This is most
      serious on NOMMU systems which depend on high-order allocations to boot.
      This patch causes pageblocks to change ownership on every fallback when
      anti-fragmentation is disabled.  This prevents the large blocks being
      prematurely broken up.
      
      This is a fix to commit 49255c61 [page
      allocator: move check for disabled anti-fragmentation out of fastpath] and
      the problem affects 2.6.31-rc8.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Tested-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NGreg Ungerer <gerg@snapgear.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd5d241e