1. 21 5月, 2011 1 次提交
  2. 18 5月, 2011 2 次提交
  3. 16 5月, 2011 1 次提交
    • V
      blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup · 70087dc3
      Vivek Goyal 提交于
      Currentlly we first map the task to cgroup and then cgroup to
      blkio_cgroup. There is a more direct way to get to blkio_cgroup
      from task using task_subsys_state(). Use that.
      
      The real reason for the fix is that it also avoids a race in generic
      cgroup code. During remount/umount rebind_subsystems() is called and
      it can do following with and rcu protection.
      
      cgrp->subsys[i] = NULL;
      
      That means if somebody got hold of cgroup under rcu and then it tried
      to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which
      is wrong. I was running into this race condition with ltp running on a
      upstream derived kernel and that lead to crash.
      
      So ideally we should also fix cgroup generic code to wait for rcu
      grace period before setting pointer to NULL. Li Zefan is not very keen
      on introducing synchronize_wait() as he thinks it will slow
      down moun/remount/umount operations.
      
      So for the time being atleast fix the kernel crash by taking a more
      direct route to blkio_cgroup.
      
      One tester had reported a crash while running LTP on a derived kernel
      and with this fix crash is no more seen while the test has been
      running for over 6 days.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      70087dc3
  4. 07 5月, 2011 5 次提交
    • L
      blkdev: Do not return -EOPNOTSUPP if discard is supported · 8af1954d
      Lukas Czerner 提交于
      Currently we return -EOPNOTSUPP in blkdev_issue_discard() if any of the
      bio fails due to underlying device not supporting discard request.
      However, if the device is for example dm device composed of devices
      which some of them support discard and some of them does not, it is ok
      for some bios to fail with EOPNOTSUPP, but it does not mean that discard
      is not supported at all.
      
      This commit removes the check for bios failed with EOPNOTSUPP and change
      blkdev_issue_discard() to return operation not supported if and only if
      the device does not actually supports it, not just part of the device as
      some bios might indicate.
      
      This change also fixes problem with BLKDISCARD ioctl() which now works
      correctly on such dm devices.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      CC: Jens Axboe <jaxboe@fusionio.com>
      CC: Jeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      8af1954d
    • L
      blkdev: Simple cleanup in blkdev_issue_zeroout() · 5baebe5c
      Lukas Czerner 提交于
      In blkdev_issue_zeroout() we are submitting regular WRITE bios, so we do
      not need to check for -EOPNOTSUPP specifically in case of error. Also
      there is no need to have label submit: because there is no way to jump
      out from the while cycle without an error and we really want to exit,
      rather than try again. And also remove the check for (sz == 0) since at
      that point sz can never be zero.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      CC: Dmitry Monakhov <dmonakhov@openvz.org>
      CC: Jens Axboe <jaxboe@fusionio.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      5baebe5c
    • L
      blkdev: Submit discard bio in batches in blkdev_issue_discard() · 5dba3089
      Lukas Czerner 提交于
      Currently we are waiting for every submitted REQ_DISCARD bio separately,
      but it can have unwanted consequences of repeatedly flushing the queue,
      so we rather submit bios in batches and wait for the entire batch, hence
      narrowing the window of other ios going in.
      
      Use bio_batch_end_io() and struct bio_batch for that purpose, the same
      is used by blkdev_issue_zeroout(). Also change bio_batch_end_io() so we
      always set !BIO_UPTODATE in the case of error and remove the check for
      bb, since we are the only user of this function and we always set this.
      
      Remove bio_get()/bio_put() from the blkdev_issue_discard() since
      bio_alloc() and bio_batch_end_io() is doing the same thing, hence it is
      not needed anymore.
      
      I have done simple dd testing with surprising results. The script I have
      used is:
      
      for i in $(seq 10); do
              echo $i
              dd if=/dev/sdb1 of=/dev/sdc1 bs=4k &
              sleep 5
      done
      /usr/bin/time -f %e ./blkdiscard /dev/sdc1
      
      Running time of BLKDISCARD on the whole device:
      with patch              without patch
      0.95                    15.58
      
      So we can see that in this artificial test the kernel with the patch
      applied is approx 16x faster in discarding the device.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      CC: Dmitry Monakhov <dmonakhov@openvz.org>
      CC: Jens Axboe <jaxboe@fusionio.com>
      CC: Jeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      5dba3089
    • S
      block: hold queue if flush is running for non-queueable flush drive · 3ac0cc45
      shaohua.li@intel.com 提交于
      In some drives, flush requests are non-queueable. When flush request is
      running, normal read/write requests can't run. If block layer dispatches
      such request, driver can't handle it and requeue it.  Tejun suggested we
      can hold the queue when flush is running. This can avoid unnecessary
      requeue.  Also this can improve performance. For example, we have
      request flush1, write1, flush 2. flush1 is dispatched, then queue is
      hold, write1 isn't inserted to queue. After flush1 is finished, flush2
      will be dispatched. Since disk cache is already clean, flush2 will be
      finished very soon, so looks like flush2 is folded to flush1.
      
      In my test, the queue holding completely solves a regression introduced by
      commit 53d63e6b:
      
          block: make the flush insertion use the tail of the dispatch list
      
          It's not a preempt type request, in fact we have to insert it
          behind requests that do specify INSERT_FRONT.
      
      which causes about 20% regression running a sysbench fileio
      workload.
      
      Stable: 2.6.39 only
      
      Cc: stable@kernel.org
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      3ac0cc45
    • S
      block: add a non-queueable flush flag · f3876930
      shaohua.li@intel.com 提交于
      flush request isn't queueable in some drives. Add a flag to let driver
      notify block layer about this. We can optimize flush performance with the
      knowledge.
      
      Stable: 2.6.39 only
      
      Cc: stable@kernel.org
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      f3876930
  5. 06 5月, 2011 2 次提交
  6. 22 4月, 2011 2 次提交
    • T
      block: don't propagate unlisted DISK_EVENTs to userland · 7c88a168
      Tejun Heo 提交于
      DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
      internal event for revalidation of removeable devices.  Some legacy
      drivers don't implement proper event detection and continuously
      generate events under certain circumstances.  For example, ide-cd
      generates media changed continuously if there's no media in the drive,
      which can lead to infinite loop of events jumping back and forth
      between the driver and userland event handler.
      
      This patch updates disk event infrastructure such that it never
      propagates events not listed in disk->events to userland.  Those
      events are processed the same for internal purposes but uevent
      generation is suppressed.
      
      This also ensures that userland only gets events which are advertised
      in the @events sysfs node lowering risk of confusion.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7c88a168
    • J
      elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too · 3aa72873
      Jens Axboe 提交于
      The sort insert is the one that goes to the IO scheduler. With
      the SORT_MERGE addition, we could bypass IO scheduler setup
      but still ask the IO scheduler to insert the request. This would
      cause an oops on switching IO schedulers through the sysfs
      interface, unless the disk just happened to be idle while it
      occured.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      3aa72873
  7. 19 4月, 2011 6 次提交
  8. 18 4月, 2011 5 次提交
  9. 16 4月, 2011 1 次提交
  10. 15 4月, 2011 2 次提交
  11. 14 4月, 2011 1 次提交
  12. 12 4月, 2011 6 次提交
  13. 11 4月, 2011 1 次提交
    • N
      block: splice plug list to local context · 109b8129
      NeilBrown 提交于
      If the request_fn ends up blocking, we could be re-entering
      the plug flush. Since the list is protected by explicitly
      not allowing schedule events, this isn't a terribly good idea.
      
      Additionally, it can cause us to recurse. As request_fn called by
      __blk_run_queue is allowed to 'schedule()' (after dropping the queue
      lock of course), it is possible to get a recursive call:
      
       schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list
            -> __blk_run_queue -> request_fn -> schedule
      
      We must make sure that the second schedule does not call into
      blk_flush_plug again.  So instead of leaving the list of requests on
      blk_plug->list, move them to a separate list leaving blk_plug->list
      empty.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      109b8129
  14. 06 4月, 2011 5 次提交
    • K
      block: fix request sorting at unplug · f83e8261
      Konstantin Khlebnikov 提交于
      Comparison function for list_sort() must be anticommutative,
      otherwise it is not sorting in ordinary meaning.
      
      But fortunately list_sort() always check ((*cmp)(priv, a, b) <= 0)
      it not distinguish negative and zero, so comparison function can
      implement only less-or-equal instead of full three-way comparison.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      f83e8261
    • M
      dm: improve block integrity support · a63a5cf8
      Mike Snitzer 提交于
      The current block integrity (DIF/DIX) support in DM is verifying that
      all devices' integrity profiles match during DM device resume (which
      is past the point of no return).  To some degree that is unavoidable
      (stacked DM devices force this late checking).  But for most DM
      devices (which aren't stacking on other DM devices) the ideal time to
      verify all integrity profiles match is during table load.
      
      Introduce the notion of an "initialized" integrity profile: a profile
      that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
      template.  Add blk_integrity_is_initialized() to allow checking if a
      profile was initialized.
      
      Update DM integrity support to:
      - check all devices with _initialized_ integrity profiles match
        during table load; uninitialized profiles (e.g. for underlying DM
        device(s) of a stacked DM device) are ignored.
      - disallow a table load that would result in an integrity profile that
        conflicts with a DM device's existing (in-use) integrity profile
      - avoid clearing an existing integrity profile
      - validate all integrity profiles match during resume; but if they
        don't all we can do is report the mismatch (during resume we're past
        the point of no return)
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      a63a5cf8
    • A
      blk-throttle: don't call xchg on bool · 6f037937
      Andreas Schwab 提交于
      xchg does not work portably with smaller than 32bit types.
      Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6f037937
    • J
      block: make the flush insertion use the tail of the dispatch list · 53d63e6b
      Jens Axboe 提交于
      It's not a preempt type request, in fact we have to insert it
      behind requests that do specify INSERT_FRONT.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      53d63e6b
    • J
      block: get rid of elv_insert() interface · b710a480
      Jens Axboe 提交于
      Merge it with __elv_add_request(), it's pretty pointless to
      have a function with only two callers. The main interface
      is elv_add_request()/__elv_add_request().
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      b710a480