1. 13 8月, 2010 24 次提交
  2. 12 8月, 2010 16 次提交
    • M
      memstick: fix hangs on unexpected device removal in mspro_blk · d862b13b
      Maxim Levitsky 提交于
      mspro_block_remove() is called from detect thread that first calls the
      mspro_block_stop(), which stops the request queue.  If we call
      del_gendisk() with the queue stopped we get a deadlock.
      Signed-off-by: NMaxim Levitsky <maximlevitsky@gmail.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d862b13b
    • M
      memstick: init sysfs attributes · 21fd0495
      Maxim Levitsky 提交于
      Otherwise lockdep complains.
      Signed-off-by: NMaxim Levitsky <maximlevitsky@gmail.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      21fd0495
    • A
      mmc_test: fix large memory allocation · fec4dcce
      Adrian Hunter 提交于
      - Fix mmc_test_alloc_mem.
      
      - Use nr_free_buffer_pages() instead of sysinfo.totalram to determine
        total lowmem pages.
      
      - Change variables containing memory sizes to unsigned long.
      
      - Limit maximum test area size to 128MiB because that is the maximum MMC
        high capacity erase size (the maxmium SD allocation unit size is just
        4MiB)
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fec4dcce
    • A
      mmc_test: add performance tests · 64f7120d
      Adrian Hunter 提交于
      mmc_test provides tests aimed at testing SD/MMC hosts.  This patch adds
      performance tests.
      
      It is advantageous to have performance tests in a kernel
      module like mmc_test for the following reasons:
      	- transfer times can be measured very accurately
      	- arbitrarily large transfers are possible
      	- the effect of contiguous vs scattered pages
      	can be determined
      
      The new tests are:
      
      	23. Best-case read performance
      	24. Best-case write performance
      	25. Best-case read performance into scattered pages
      	26. Best-case write performance from scattered pages
      	27. Single read performance by transfer size
      	28. Single write performance by transfer size
      	29. Single trim performance by transfer size
      	30. Consecutive read performance by transfer size
      	31. Consecutive write performance by transfer size
      	32. Consecutive trim performance by transfer size
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64f7120d
    • A
      mmc_block: add support for secure discard · 49804548
      Adrian Hunter 提交于
      Secure discard is implemented by Secure Trim if the discard is unaligned
      or Secure Erase otherwise.
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      Cc: Kyungmin Park <kmpark@infradead.org>
      Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ben Gardiner <bengardiner@nanometrics.ca>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49804548
    • A
      block: add secure discard · 8d57a98c
      Adrian Hunter 提交于
      Secure discard is the same as discard except that all copies of the
      discarded sectors (perhaps created by garbage collection) must also be
      erased.
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      Cc: Kyungmin Park <kmpark@infradead.org>
      Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ben Gardiner <bengardiner@nanometrics.ca>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d57a98c
    • A
      omap_hsmmc: add erase capability · 93caf8e6
      Adrian Hunter 提交于
      Disable the data (busy) timeout for erases and set the MMC_CAP_ERASE
      capability.
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      Cc: Kyungmin Park <kmpark@infradead.org>
      Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ben Gardiner <bengardiner@nanometrics.ca>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93caf8e6
    • A
      mmc_block: add discard support · bd788c96
      Adrian Hunter 提交于
      Enable MMC to service discard requests.  In the case of SD and MMC cards
      that do not support trim, discards become erases.  In the case of cards
      (MMC) that only allow erases in multiples of erase group size, round to
      the nearest completely discarded erase group.
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      Cc: Kyungmin Park <kmpark@infradead.org>
      Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ben Gardiner <bengardiner@nanometrics.ca>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd788c96
    • A
      mmc: add erase, secure erase, trim and secure trim operations · dfe86cba
      Adrian Hunter 提交于
      SD/MMC cards tend to support an erase operation.  In addition, eMMC v4.4
      cards can support secure erase, trim and secure trim operations that are
      all variants of the basic erase command.
      
      SD/MMC device attributes "erase_size" and "preferred_erase_size" have been
      added.
      
      "erase_size" is the minimum size, in bytes, of an erase operation.  For
      MMC, "erase_size" is the erase group size reported by the card.  Note that
      "erase_size" does not apply to trim or secure trim operations where the
      minimum size is always one 512 byte sector.  For SD, "erase_size" is 512
      if the card is block-addressed, 0 otherwise.
      
      SD/MMC cards can erase an arbitrarily large area up to and
      including the whole card.  When erasing a large area it may
      be desirable to do it in smaller chunks for three reasons:
      
          1. A single erase command will make all other I/O on the card
             wait.  This is not a problem if the whole card is being erased, but
             erasing one partition will make I/O for another partition on the
             same card wait for the duration of the erase - which could be a
             several minutes.
      
          2. To be able to inform the user of erase progress.
      
          3. The erase timeout becomes too large to be very useful.
             Because the erase timeout contains a margin which is multiplied by
             the size of the erase area, the value can end up being several
             minutes for large areas.
      
      "erase_size" is not the most efficient unit to erase (especially for SD
      where it is just one sector), hence "preferred_erase_size" provides a good
      chunk size for erasing large areas.
      
      For MMC, "preferred_erase_size" is the high-capacity erase size if a card
      specifies one, otherwise it is based on the capacity of the card.
      
      For SD, "preferred_erase_size" is the allocation unit size specified by
      the card.
      
      "preferred_erase_size" is in bytes.
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      Cc: Kyungmin Park <kmpark@infradead.org>
      Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ben Gardiner <bengardiner@nanometrics.ca>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfe86cba
    • J
      mm: fix writeback_in_progress() · 81d73a32
      Jan Kara 提交于
      Commit 83ba7b07 ("writeback: simplify the write back thread queue")
      broke writeback_in_progress() as in that commit we started to remove work
      items from the list at the moment we start working on them and not at the
      moment they are finished.  Thus if the flusher thread was doing some work
      but there was no other work queued, writeback_in_progress() returned
      false.  This could in particular cause unnecessary queueing of background
      writeback from balance_dirty_pages() or writeout work from
      writeback_sb_if_idle().
      
      This patch fixes the problem by introducing a bit in the bdi state which
      indicates that the flusher thread is processing some work and uses this
      bit for writeback_in_progress() test.
      
      NOTE: Both callsites of writeback_in_progress() (namely,
      writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
      need a different information than what writeback_in_progress() provides.
      They would need to know whether *the kind of writeback they are going to
      submit* is already queued.  But this information isn't that simple to
      provide so let's fix writeback_in_progress() for the time being.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Acked-by: NJens Axboe <jaxboe@fusionio.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81d73a32
    • W
      writeback: merge for_kupdate and !for_kupdate cases · a50aeb40
      Wu Fengguang 提交于
      Unify the logic for kupdate and non-kupdate cases.  There won't be
      starvation because the inodes requeued into b_more_io will later be
      spliced _after_ the remaining inodes in b_io, hence won't stand in the way
      of other inodes in the next run.
      
      It avoids unnecessary redirty_tail() calls, hence the update of
      i_dirtied_when.  The timestamp update is undesirable because it could
      later delay the inode's periodic writeback, or may exclude the inode from
      the data integrity sync operation (which checks timestamp to avoid extra
      work and livelock).
      
      ===
      How the redirty_tail() comes about:
      
      It was a long story..  This redirty_tail() was introduced with
      wbc.more_io.  The initial patch for more_io actually does not have the
      redirty_tail(), and when it's merged, several 100% iowait bug reports
      arised:
      
      reiserfs:
              http://lkml.org/lkml/2007/10/23/93
      
      jfs:
              commit 29a424f2
              JFS: clear PAGECACHE_TAG_DIRTY for no-write pages
      
      ext2:
              http://www.spinics.net/linux/lists/linux-ext4/msg04762.html
      
      They are all old bugs hidden in various filesystems that become "visible"
      with the more_io patch.  At the time, the ext2 bug is thought to be
      "trivial", so not fixed.  Instead the following updated more_io patch with
      redirty_tail() is merged:
      
      	http://www.spinics.net/linux/lists/linux-ext4/msg04507.html
      
      This will in general prevent 100% on ext2 and possibly other unknown FS bugs.
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a50aeb40
    • W
      writeback: fix queue_io() ordering · 4ea879b9
      Wu Fengguang 提交于
      This was not a bug, since b_io is empty for kupdate writeback.  The next
      patch will do requeue_io() for non-kupdate writeback, so let's fix it.
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ea879b9
    • W
      writeback: don't redirty tail an inode with dirty pages · 23539afc
      Wu Fengguang 提交于
      Avoid delaying writeback for an expire inode with lots of dirty pages, but
      no active dirtier at the moment.  Previously we only do that for the
      kupdate case.
      
      Any filesystem that does delayed allocation or unwritten extent conversion
      after IO completion will cause this - for example, XFS.
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23539afc
    • W
      writeback: add comment to the dirty limit functions · 1babe183
      Wu Fengguang 提交于
      Document global_dirty_limits() and bdi_dirty_limit().
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1babe183
    • W
      writeback: avoid unnecessary calculation of bdi dirty thresholds · 16c4042f
      Wu Fengguang 提交于
      Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so
      that the latter can be avoided when under global dirty background
      threshold (which is the normal state for most systems).
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16c4042f
    • W
      writeback: balance_dirty_pages(): reduce calls to global_page_state · e50e3720
      Wu Fengguang 提交于
      Reducing the number of times balance_dirty_pages calls global_page_state
      reduces the cache references and so improves write performance on a
      variety of workloads.
      
      'perf stats' of simple fio write tests shows the reduction in cache
      access.  Where the test is fio 'write,mmap,600Mb,pre_read' on AMD AthlonX2
      with 3Gb memory (dirty_threshold approx 600 Mb) running each test 10
      times, dropping the fasted & slowest values then taking the average &
      standard deviation
      
      		average (s.d.) in millions (10^6)
      2.6.31-rc8	648.6 (14.6)
      +patch		620.1 (16.5)
      
      Achieving this reduction is by dropping clip_bdi_dirty_limit as it rereads
      the counters to apply the dirty_threshold and moving this check up into
      balance_dirty_pages where it has already read the counters.
      
      Also by rearrange the for loop to only contain one copy of the limit tests
      allows the pdflush test after the loop to use the local copies of the
      counters rather than rereading them.
      
      In the common case with no throttling it now calls global_page_state 5
      fewer times and bdi_stat 2 fewer.
      
      Fengguang:
      
      This patch slightly changes behavior by replacing clip_bdi_dirty_limit()
      with the explicit check (nr_reclaimable + nr_writeback >= dirty_thresh) to
      avoid exceeding the dirty limit.  Since the bdi dirty limit is mostly
      accurate we don't need to do routinely clip.  A simple dirty limit check
      would be enough.
      
      The check is necessary because, in principle we should throttle everything
      calling balance_dirty_pages() when we're over the total limit, as said by
      Peter.
      
      We now set and clear dirty_exceeded not only based on bdi dirty limits,
      but also on the global dirty limit.  The global limit check is added in
      place of clip_bdi_dirty_limit() for safety and not intended as a behavior
      change.  The bdi limits should be tight enough to keep all dirty pages
      under the global limit at most time; occasional small exceeding should be
      OK though.  The change makes the logic more obvious: the global limit is
      the ultimate goal and shall be always imposed.
      
      We may now start background writeback work based on outdated conditions.
      That's safe because the bdi flush thread will (and have to) double check
      the states.  It reduces overall overheads because the test based on old
      states still have good chance to be right.
      
      [akpm@linux-foundation.org] fix uninitialized dirty_exceeded
      Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e50e3720