1. 08 8月, 2010 40 次提交
    • A
      writeback: cleanup bdi_register · c284de61
      Artem Bityutskiy 提交于
      This patch makes sure we first initialize everything and set the BDI_registered
      flag, and only after this we add the bdi to 'bdi_list'. Current code adds the
      bdi to the list too early, and as a result I the
      
      WARN(!test_bit(BDI_registered, &bdi->state)
      
      in bdi forker is triggered. Also, it is in general good practice to make things
      visible only when they are fully initialized.
      
      Also, this patch does few micro clean-ups:
      1. Removes the 'exit' label which does not do anything, just returns. This
         allows to get rid of few braces and 'ret' variable and make the code smaller.
      2. If 'kthread_run()' fails, remove the error code it returns, not hard-coded
         '-ENOMEM'. Theoretically, some day 'kthread_run()' can return something
         else. Also, in case of failure it is not necessary to set 'bdi->wb.task' to
         NULL.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c284de61
    • A
      writeback: add new tracepoints · 60332023
      Artem Bityutskiy 提交于
      Add 2 new trace points to the periodic write-back wake up case, just like we do
      in the 'bdi_queue_work()' function. Namely, introduce:
      
      1. trace_writeback_wake_thread(bdi)
      2. trace_writeback_wake_forker_thread(bdi)
      
      The first event is triggered every time we wake up a bdi thread to start
      periodic background write-out. The second event is triggered only when the bdi
      thread does not exist and should be created by the forker thread.
      
      This patch was suggested by Dave Chinner and Christoph Hellwig.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      60332023
    • A
      writeback: remove unnecessary init_timer call · b5048a6c
      Artem Bityutskiy 提交于
      The 'setup_timer()' function also calls 'init_timer()', so the extra
      'init_timer()' call is not needed. Indeed, 'setup_timer()' is basically
      'init_timer()' plus callback function and data pointers initialization.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      b5048a6c
    • A
      writeback: optimize periodic bdi thread wakeups · 6467716a
      Artem Bityutskiy 提交于
      Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
      should take care of the periodic background write-out. However, the write-out
      will actually start only 'dirty_writeback_interval' centisecs later, so we can
      delay the wake-up.
      
      This change was requested by Nick Piggin who pointed out that if we delay the
      wake-up, we weed out 2 unnecessary contex switches, which matters because
      '__mark_inode_dirty()' is a hot-path function.
      
      This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
      sets up a timer to wake-up the bdi thread and returns. So the wake-up is
      delayed.
      
      We also delete the timer in bdi threads just before writing-back. And
      synchronously delete it when unregistering bdi. At the unregister point the bdi
      does not have any users, so no one can arm it again.
      
      Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq
      context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes
      this change as well.
      
      This patch also moves the 'bdi_wb_init()' function down in the file to avoid
      forward-declaration of 'bdi_wakeup_thread_delayed()'.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6467716a
    • A
      writeback: prevent unnecessary bdi threads wakeups · 253c34e9
      Artem Bityutskiy 提交于
      Finally, we can get rid of unnecessary wake-ups in bdi threads, which are very
      bad for battery-driven devices.
      
      There are two types of activities bdi threads do:
      1. process bdi works from the 'bdi->work_list'
      2. periodic write-back
      
      So there are 2 sources of wake-up events for bdi threads:
      
      1. 'bdi_queue_work()' - submits bdi works
      2. '__mark_inode_dirty()' - adds dirty I/O to bdi's
      
      The former already has bdi wake-up code. The latter does not, and this patch
      adds it.
      
      '__mark_inode_dirty()' is hot-path function, but this patch adds another
      'spin_lock(&bdi->wb_lock)' there. However, it is taken only in rare cases when
      the bdi has no dirty inodes. So adding this spinlock should be fine and should
      not affect performance.
      
      This patch makes sure bdi threads and the forker thread do not wake-up if there
      is nothing to do. The forker thread will nevertheless wake up at least every
      5 min. to check whether it has to kill a bdi thread. This can also be optimized,
      but is not worth it.
      
      This patch also tidies up the warning about unregistered bid, and turns it from
      an ugly crocodile to a simple 'WARN()' statement.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      253c34e9
    • A
      writeback: move bdi threads exiting logic to the forker thread · fff5b85a
      Artem Bityutskiy 提交于
      Currently, bdi threads can decide to exit if there were no useful activities
      for 5 minutes. However, this causes nasty races: we can easily oops in the
      'bdi_queue_work()' if the bdi thread decides to exit while we are waking it up.
      
      And even if we do not oops, but the bdi tread exits immediately after we wake
      it up, we'd lose the wake-up event and have an unnecessary delay (up to 5 secs)
      in the bdi work processing.
      
      This patch makes the forker thread to be the central place which not only
      creates bdi threads, but also kills them if they were inactive long enough.
      This better design-wise.
      
      Another reason why this change was done is to prepare for the further changes
      which will prevent the bdi threads from waking up every 5 sec and wasting
      power. Indeed, when the task does not wake up periodically anymore, it won't be
      able to exit either.
      
      This patch also moves the the 'wake_up_bit()' call from the bdi thread to the
      forker thread as well. So now the forker thread sets the BDI_pending bit, then
      forks the task or kills it, then clears the bit and wakes up the waiting
      process.
      
      The only process which may wain on the bit is 'bdi_wb_shutdown()'. This
      function was changed as well - now it first removes the bdi from the
      'bdi_list', then waits on the 'BDI_pending' bit. Once it wakes up, it is
      guaranteed that the forker thread won't race with it, because the bdi is not
      visible. Note, the forker thread sets the 'BDI_pending' bit under the
      'bdi->wb_lock' which is essential for proper serialization.
      
      And additionally, when we change 'bdi->wb.task', we now take the
      'bdi->work_lock', to make sure that we do not lose wake-ups which we otherwise
      would when raced with, say, 'bdi_queue_work()'.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      fff5b85a
    • A
      writeback: restructure bdi forker loop a little · adf39240
      Artem Bityutskiy 提交于
      This patch re-structures the bdi forker a little:
      1. Add 'bdi_cap_flush_forker(bdi)' condition check to the bdi loop. The reason
         for this is that the forker thread can start _before_ the 'BDI_registered'
         flag is set (see 'bdi_register()'), so the WARN() statement will fire for
         the default bdi. I observed this warning at boot-up.
      
      2. Introduce an enum 'action' and use "switch" statement in the outer loop.
         This is a preparation to the further patch which will teach the forker
         thread killing bdi threads, so we'll have another case in the "switch"
         statement. This change was suggested by Christoph Hellwig.
      
      This patch is just a small step towards the coming change where the forker
      thread will kill the bdi threads. It should simplify reviewing the following
      changes, which would otherwise be larger.
      
      This patch also amends comments a little.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      adf39240
    • A
      writeback: move last_active to bdi · ecd58403
      Artem Bityutskiy 提交于
      Currently bdi threads use local variable 'last_active' which stores last time
      when the bdi thread did some useful work. Move this local variable to 'struct
      bdi_writeback'. This is just a preparation for the further patches which will
      make the forker thread decide when bdi threads should be killed.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      ecd58403
    • A
      writeback: do not remove bdi from bdi_list · 78c40cb6
      Artem Bityutskiy 提交于
      The forker thread removes bdis from 'bdi_list' before forking the bdi thread.
      But this is wrong for at least 2 reasons.
      
      Reason #1: if we temporary remove a bdi from the list, we may miss works which
                 would otherwise be given to us.
      
      Reason #2: this is racy; indeed, 'bdi_wb_shutdown()' expects that bdis are
                 always in the 'bdi_list' (see 'bdi_remove_from_list()'), and when
                 it races with the forker thread, it can shut down the bdi thread
                 at the same time as the forker creates it.
      
      This patch makes sure the forker thread never removes bdis from 'bdi_list'
      (which was suggested by Christoph Hellwig).
      
      In order to make sure that we do not race with 'bdi_wb_shutdown()', we have to
      hold the 'bdi_lock' while walking the 'bdi_list' and setting the 'BDI_pending'
      flag.
      
      NOTE! The error path is interesting. Currently, when we fail to create a bdi
      thread, we move the bdi to the tail of 'bdi_list'. But if we never remove the
      bdi from the list, we cannot move it to the tail either, because then we can
      mess up the RCU readers which walk the list. And also, we'll have the race
      described above in "Reason #2".
      
      But I not think that adding to the tail is any important so I just do not do
      that.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      78c40cb6
    • A
      writeback: simplify bdi code a little · 080dcec4
      Artem Bityutskiy 提交于
      This patch simplifies bdi code a little by removing the 'pending_list' which is
      redundant. Indeed, currently the forker thread ('bdi_forker_thread()') is
      working like this:
      
      1. In a loop, fetch all bdi's which have works but have no writeback thread and
         move them to the 'pending_list'.
      2. If the list is empty, sleep for 5 sec.
      3. Otherwise, take one bdi from the list, fork the writeback thread for this
         bdi, and repeat the loop.
      
      IOW, it first moves everything to the 'pending_list', then process only one
      element, and so on. This patch simplifies the algorithm, which is now as
      follows.
      
      1. Find the first bdi which has a work and remove it from the global list of
         bdi's (bdi_list).
      2. If there was not such bdi, sleep 5 sec.
      3. Fork the writeback thread for this bdi and repeat the loop.
      
      IOW, now we find the first bdi to process, process it, and so on. This is
      simpler and involves less lists.
      
      The bonus now is that we can get rid of a couple of functions, as well as
      remove complications which involve 'rcu_call()' and 'bdi->rcu_head'.
      
      This patch also makes sure we use 'list_add_tail_rcu()', instead of plain
      'list_add_tail()', but this piece of code is going to be removed in the next
      patch anyway.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      080dcec4
    • A
      writeback: do not lose wake-ups in bdi threads · 297252c8
      Artem Bityutskiy 提交于
      Currently, bdi threads ('bdi_writeback_thread()') can lose wake-ups. For
      example, if 'bdi_queue_work()' is executed after the bdi thread have had
      finished 'wb_do_writeback()' but before it called
      'schedule_timeout_interruptible()'.
      
      To fix this issue, we have to check whether we have works to process after we
      have changed the task state to 'TASK_INTERRUPTIBLE'.
      
      This patch also clean-ups handling of the cases when 'dirty_writeback_interval'
      is zero or non-zero.
      
      Additionally, this patch also removes unneeded 'list_empty_careful()' call.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      297252c8
    • A
      writeback: do not lose wake-ups in the forker thread - 2 · c4ec7908
      Artem Bityutskiy 提交于
      Currently, if someone submits jobs for the default bdi, we can lose wake-up
      events. E.g., this can happen if 'bdi_queue_work()' is called when
      'bdi_forker_thread()' is executing code after 'wb_do_writeback(me, 0)', but
      before 'set_current_state(TASK_INTERRUPTIBLE)'.
      
      This situation is unlikely, and the result is not very severe - we'll just
      delay the execution of the work, but this is still not very nice.
      
      This patch fixes the issue by checking whether the default bdi has works before
      the forker thread goes sleep.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c4ec7908
    • A
      writeback: do not lose wake-ups in the forker thread - 1 · c5f7ad23
      Artem Bityutskiy 提交于
      Currently the forker thread can lose wake-ups which may lead to unnecessary
      delays in processing bdi works. E.g., consider the following scenario.
      
      1. 'bdi_forker_thread()' walks the 'bdi_list', finds out there is nothing to
         do, and is about to finish the loop.
      2. A bdi thread decides to exit because it was inactive for long time.
      3. 'bdi_queue_work()' adds a work to the bdi which just exited, so it wakes up
         the forker thread.
      4. but 'bdi_forker_thread()' executes 'set_current_state(TASK_INTERRUPTIBLE)'
         and goes sleep. We lose a wake-up.
      
      Losing the wake-up is not fatal, but this means that the bdi work processing
      will be delayed by up to 5 sec. This race is theoretical, I never hit it, but
      it is worth fixing.
      
      The fix is to execute 'set_current_state(TASK_INTERRUPTIBLE)' _before_ walking
      'bdi_list', not after.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c5f7ad23
    • A
      writeback: fix possible race when creating bdi threads · 94eac5e6
      Artem Bityutskiy 提交于
      This patch fixes a very unlikely race condition on the bdi forker thread error
      path: when bdi thread creation fails, 'bdi->wb.task' may contain the error code
      for a short period of time. If at the same time someone submits a work to this
      bdi, we can end up with an oops 'bdi_queue_work()' while executing
      'wake_up_process(wb->task)'.
      
      This patch fixes the issue by introducing a temporary variable 'task' and
      storing the possible error code there, so that 'wb->task' would never take
      erroneous values.
      
      Note, this race is very unlikely and I never hit it, so it is theoretical, but
      nevertheless worth fixing.
      
      This patch also merges 2 comments which were previously separate.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      94eac5e6
    • A
      writeback: harmonize writeback threads naming · 6f904ff0
      Artem Bityutskiy 提交于
      The write-back code mixes words "thread" and "task" for the same things. This
      is not a big deal, but still an inconsistency.
      
      hch: a convention I tend to use and I've seen in various places
      is to always use _task for the storage of the task_struct pointer,
      and thread everywhere else.  This especially helps with having
      foo_thread for the actual thread and foo_task for a global
      variable keeping the task_struct pointer
      
      This patch renames:
      * 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()'
      * 'bdi_forker_task()'              -> 'bdi_forker_thread()'
      
      because bdi threads are 'bdi_writeback_thread()', so these names are more
      consistent.
      
      This patch also amends commentaries and makes them refer the forker and bdi
      threads as "thread", not "task".
      
      Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use
      'static void' instead of 'void static' and make checkpatch.pl happy.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6f904ff0
    • J
      coda: fixup clash with block layer REQ_* defines · 4aeefdc6
      Jens Axboe 提交于
      CODA should not be using defines in the global name space of
      that nature, prefix them with CODA_.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4aeefdc6
    • T
      bio, fs: separate out bio_types.h and define READ/WRITE constants in terms of BIO_RW_* flags · 7cc01581
      Tejun Heo 提交于
      linux/fs.h hard coded READ/WRITE constants which should match BIO_RW_*
      flags.  This is fragile and caused breakage during BIO_RW_* flag
      rearrangement.  The hardcoding is to avoid include dependency hell.
      
      Create linux/bio_types.h which contatins definitions for bio data
      structures and flags and include it from bio.h and fs.h, and make fs.h
      define all READ/WRITE related constants in terms of BIO_RW_* flags.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7cc01581
    • T
      bio, fs: update RWA_MASK, READA and SWRITE to match the corresponding BIO_RW_* bits · aca27ba9
      Tejun Heo 提交于
      Commit a82afdfc (block: use the same failfast bits for bio and request)
      moved BIO_RW_* bits around such that they match up with REQ_* bits.
      Unfortunately, fs.h hard coded RW_MASK, RWA_MASK, READ, WRITE, READA
      and SWRITE as 0, 1, 2 and 3, and expected them to match with BIO_RW_*
      bits.  READ/WRITE didn't change but BIO_RW_AHEAD was moved to bit 4
      instead of bit 1, breaking RWA_MASK, READA and SWRITE.
      
      This patch updates RWA_MASK, READA and SWRITE such that they match the
      BIO_RW_* bits again.  A follow up patch will update the definitions to
      directly use BIO_RW_* bits so that this kind of breakage won't happen
      again.
      
      Neil also spotted missing RWA_MASK conversion.
      
      Stable: The offending commit a82afdfc was released with v2.6.32, so
      this patch should be applied to all kernels since then but it must
      _NOT_ be applied to kernels earlier than that.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-bisected-by: NVladislav Bolkhovitin <vst@vlnb.net>
      Root-caused-by: NNeil Brown <neilb@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      aca27ba9
    • M
      block: disallow FS recursion from sb_issue_discard allocation · edca4a38
      Mike Snitzer 提交于
      Filesystems can call sb_issue_discard on a memory reclaim path
      (e.g. ext4 calls sb_issue_discard during journal commit).
      
      Use GFP_NOFS in sb_issue_discard to avoid recursing back into the FS.
      Reported-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      edca4a38
    • K
      cpqarray: check put_user() result · f6c4c8e1
      Kulikov Vasiliy 提交于
      put_user() may fail, if so return -EFAULT.
      Signed-off-by: NKulikov Vasiliy <segooon@gmail.com>
      Acked-by: NMike Miller <mike.miller@hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      f6c4c8e1
    • M
      writeback: remove wb in get_next_work_item · 08852b6d
      Minchan Kim 提交于
      83ba7b07 cleans up the writeback.
      So we don't use wb any more in get_next_work_item.
      Let's remove unnecessary argument.
      
      CC: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      08852b6d
    • M
      splice: fix misuse of SPLICE_F_NONBLOCK · 6965031d
      Miklos Szeredi 提交于
      SPLICE_F_NONBLOCK is clearly documented to only affect blocking on the
      pipe.  In __generic_file_splice_read(), however, it causes an EAGAIN
      if the page is currently being read.
      
      This makes it impossible to write an application that only wants
      failure if the pipe is full.  For example if the same process is
      handling both ends of a pipe and isn't otherwise able to determine
      whether a splice to the pipe will fill it or not.
      
      We could make the read non-blocking on O_NONBLOCK or some other splice
      flag, but for now this is the simplest fix.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6965031d
    • J
      xen/blkfront: Use QUEUE_ORDERED_DRAIN for old backends · 7901d141
      Jeremy Fitzhardinge 提交于
      If there's no feature-barrier key in xenstore, then it means its a fairly
      old backend which does uncached in-order writes, which means ORDERED_DRAIN
      is appropriate.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      7901d141
    • J
      xen/blkfront: use tagged queuing for barriers · 4dab46ff
      Jeremy Fitzhardinge 提交于
      When barriers are supported, then use QUEUE_ORDERED_TAG to tell the block
      subsystem that it doesn't need to do anything else with the barriers.
      Previously we used ORDERED_DRAIN which caused the block subsystem to
      drain all pending IO before submitting the barrier, which would be
      very expensive.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      4dab46ff
    • F
      scsi: use REQ_TYPE_FS for flush request · e96f6abe
      FUJITA Tomonori 提交于
      scsi-ml uses REQ_TYPE_BLOCK_PC for flush requests from file
      systems. The definition of REQ_TYPE_BLOCK_PC is that we don't retry
      requests even when we can (e.g. UNIT ATTENTION) and we send the
      response to the callers (then the callers can decide what they want).
      We need a workaround such as the commit
      77a42297 to retry BLOCK_PC flush
      requests. We will need the similar workaround for discard requests too
      since SCSI-ml handle them as BLOCK_PC internally.
      
      This uses REQ_TYPE_FS for flush requests from file systems instead of
      REQ_TYPE_BLOCK_PC.
      
      scsi-ml retries only REQ_TYPE_FS requests that have data to
      transfer when we can retry them (e.g. UNIT_ATTENTION). However, we
      also need to retry REQ_TYPE_FS requests without data because the
      callers don't.
      
      This also changes scsi_check_sense() to retry all the REQ_TYPE_FS
      requests when appropriate. Thanks to scsi_noretry_cmd(),
      REQ_TYPE_BLOCK_PC requests don't be retried as before.
      
      Note that basically, this reverts the commit
      77a42297 since now we use REQ_TYPE_FS
      for flush requests.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e96f6abe
    • F
      block: set up rq->rq_disk properly for flush requests · 16f2319f
      FUJITA Tomonori 提交于
      q->bar_rq.rq_disk is NULL. Use the rq_disk of the original request
      instead.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      16f2319f
    • F
      block: set REQ_TYPE_FS on flush requests · 28e18d01
      FUJITA Tomonori 提交于
      the block layer doesn't set rq->cmd_type on flush requests. By
      definition, it should be REQ_TYPE_FS (the lower layers build a command
      and interpret the result of it, that is, the block layer doesn't know
      the details).
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      28e18d01
    • S
      floppy: make controller const · 3b06c21e
      Stephen Hemminger 提交于
      The struct cont_t is just a set of virtual function pointers.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      3b06c21e
    • J
      drivers/block: use memdup_user · ad96a7a7
      Julia Lawall 提交于
      Use memdup_user when user data is immediately copied into the
      allocated region.  Some checkpatch cleanups in nearby code.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression from,to,size,flag;
      position p;
      identifier l1,l2;
      @@
      
      -  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
      +  to = memdup_user(from,size);
         if (
      -      to==NULL
      +      IS_ERR(to)
                       || ...) {
         <+... when != goto l1;
      -  -ENOMEM
      +  PTR_ERR(to)
         ...+>
         }
      -  if (copy_from_user(to, from, size) != 0) {
      -    <+... when != goto l2;
      -    -EFAULT
      -    ...+>
      -  }
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Cc: Chirag Kantharia <chirag.kantharia@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      ad96a7a7
    • F
      scsi: convert discard to REQ_TYPE_FS from REQ_TYPE_BLOCK_PC · 6a32a8ae
      FUJITA Tomonori 提交于
      Jens, any reason why this isn't included in your for-2.6.36 yet?
      
      =
      From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Subject: [PATCH resend] scsi: convert discard to REQ_TYPE_FS from REQ_TYPE_BLOCK_PC
      
      The block layer (file systems) sends discard requests as REQ_TYPE_FS
      (the role of REQ_TYPE_FS is that setting up commands and interpreting
      the results). But SCSI-ml treats discard requests as
      REQ_TYPE_BLOCK_PC.
      
      scsi-ml can handle discard requests as REQ_TYPE_FS
      easily. scsi_setup_discard_cmnd() sets up struct request and the bio
      nicely. Only remaining issue is that discard requests can't be
      completed partially so we need to modify sd_done.
      
      This conversion also fixes the problem that discard requests aren't
      retried when possible (e.g. UNIT ATTENTION).
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6a32a8ae
    • S
      cciss: cleanup interrupt_not_for_us · 81125860
      Stephen M. Cameron 提交于
      cciss: cleanup interrupt_not_for_us
      In the case of MSI/MSIX interrutps, we don't need to check
      if the interrupt is for us, and in the case of the intx interrupt
      handler, when checking if the interrupt is for us, we don't need
      to check if we're using MSI/MSIX, we know we're not.
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      81125860
    • S
      cciss: change printks to dev_warn, etc. · b2a4a43d
      Stephen M. Cameron 提交于
      cciss: change printks to dev_warn, etc.
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      b2a4a43d
    • S
      cciss: separate cmd_alloc() and cmd_special_alloc() · 6b4d96b8
      Stephen M. Cameron 提交于
      cciss: separate cmd_alloc() and cmd_special_alloc()
      cmd_alloc() took a parameter which caused it to either allocate
      from a pre-allocated pool, or allocate using pci_alloc_consistent.
      This parameter is always known at compile time, so this would
      be better handled by breaking the function into two functions
      and differentiating the cases by function names.  Same goes
      for cmd_free().
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6b4d96b8
    • S
      cciss: use consistent variable names · f70dba83
      Stephen M. Cameron 提交于
      cciss: use consistent variable names
      "h", for the hba structure and "c" for the command structures.
      and get rid of trivial CCISS_LOCK macro.
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      f70dba83
    • S
      cciss: forbid hard reset of 640x boards · 058a0f9f
      Stephen M. Cameron 提交于
      cciss: forbid hard reset of 640x boards
      The 6402/6404 are two PCI devices -- two Smart Array controllers
      -- that fit into one slot.  It is possible to reset them independently,
      however, they share a battery backed cache module.  One of the pair
      controls the cache and the 2nd one access the cache through the first
      one.  If you reset the one controlling the cache, the other one will
      not be a happy camper.  So we just forbid resetting this conjoined
      mess.
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      058a0f9f
    • S
      cciss: sanitize max commands · adfbc1ff
      Stephen M. Cameron 提交于
      cciss: sanitize max commands
      Some controllers might try to tell us they support 0 commands
      in performant mode.  This is a lie told by buggy firmware.
      We have to be wary of this lest we try to allocate a negative
      number of command blocks, which will be treated as unsigned,
      and get an out of memory condition.
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      adfbc1ff
    • S
      cciss: fix hard reset code. · a6528d01
      Stephen M. Cameron 提交于
      cciss: Fix hard reset code.
      Smart Array controllers newer than the P600 do not honor the
      PCI power state method of resetting the controllers.  Instead,
      in these cases we can get them to reset via the "doorbell" register.
      
      This escaped notice until we began using "performant" mode because
      the fact that the controllers did not reset did not normally
      impede subsequent operation, and so things generally appeared to
      "work".  Once the performant mode code was added, if the controller
      does not reset, it remains in performant mode.  The code immediately
      after the reset presumes the controller is in "simple" mode
      (which previously, it had remained in simple mode the whole time).
      If the controller remains in performant mode any code which presumes
      it is in simple mode will not work.  So the reset needs to be fixed.
      
      Unfortunately there are some controllers which cannot be reset by
      either method. (eg. p800).  We detect these cases by noticing that
      the controller seems to remain in performant mode even after a
      reset has been attempted.  In those cases we ignore the controller,
      as any commands outstanding on it will result in stale completions.
      To sum up, we try to do a better job of resetting the controller if
      "reset_devices" is set, and if it doesn't work, we ignore that
      controller.
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      a6528d01
    • S
      cciss: factor out cciss_reset_devices() · 83123cb1
      Stephen M. Cameron 提交于
      cciss: factor out cciss_reset_devices()
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      83123cb1
    • S
      cciss: factor out cciss_find_cfg_addrs. · 8e93bf6d
      Stephen M. Cameron 提交于
      Rationale for this is that I will also need to use this code
      in fixing kdump host reset code prior to having the hba structure.
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      8e93bf6d
    • S
      cciss: factor out cciss_enter_performant_mode · b9933135
      Stephen M. Cameron 提交于
      cciss: factor out cciss_enter_performant_mode
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      b9933135