1. 29 1月, 2015 1 次提交
  2. 27 1月, 2015 1 次提交
  3. 25 1月, 2015 1 次提交
  4. 24 1月, 2015 4 次提交
    • S
      libata: use blk taging · 12cb5ce1
      Shaohua Li 提交于
      libata uses its own tag management which is duplication and the
      implementation is poor. And if we switch to blk-mq, tag is build-in.
      It's time to switch to generic taging.
      
      The SAS driver has its own tag management, and looks we can't directly
      map the host controler tag to SATA tag. So I just bypassed the SAS case.
      
      I changed the code/variable name for the tag management of libata to
      make it self contained. Only sas will use it. Later if libsas implements
      its tag management, the tag management code in libata can be deleted
      easily.
      
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      12cb5ce1
    • J
      Merge branch 'for-3.20/core' into for-3.20/drivers · a4a1cc16
      Jens Axboe 提交于
      We need the tagging changes for the libata conversion.
      a4a1cc16
    • S
      blk-mq: add tag allocation policy · 24391c0d
      Shaohua Li 提交于
      This is the blk-mq part to support tag allocation policy. The default
      allocation policy isn't changed (though it's not a strict FIFO). The new
      policy is round-robin for libata. But it's a try-best implementation. If
      multiple tasks are competing, the tags returned will be mixed (which is
      unavoidable even with !mq, as requests from different tasks can be
      mixed in queue)
      
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      24391c0d
    • S
      block: support different tag allocation policy · ee1b6f7a
      Shaohua Li 提交于
      The libata tag allocation is using a round-robin policy. Next patch will
      make libata use block generic tag allocation, so let's add a policy to
      tag allocation.
      
      Currently two policies: FIFO (default) and round-robin.
      
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ee1b6f7a
  5. 22 1月, 2015 4 次提交
    • B
      block: Remove annoying "unknown partition table" message · bb5c3cdd
      Boaz Harrosh 提交于
      As Christoph put it:
        Can we just get rid of the warnings?  It's fairly annoying as devices
        without partitions are perfectly fine and very useful.
      
      Me too I see this message every VM boot for ages on all my
      devices. Would love to just remove it. For me a partition-table
      is only needed for a booting BIOS, grub, and stuff.
      
      CC: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bb5c3cdd
    • K
      NVMe: within nvme_free_queues(), delete RCU sychro/deferred free · 121c7ad4
      kaoudis 提交于
      Converting from to blk-queue got rid of the driver's RCU
      locking-on-queue, so removing unnecessary RCU locking-on-queue
      artefacts.
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NKelly Nicole Kaoudis <kaoudis@colorado.edu>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      121c7ad4
    • M
      block: Add discard flag to blkdev_issue_zeroout() function · d93ba7a5
      Martin K. Petersen 提交于
      blkdev_issue_discard() will zero a given block range. This is done by
      way of explicit writing, thus provisioning or allocating the blocks on
      disk.
      
      There are use cases where the desired behavior is to zero the blocks but
      unprovision them if possible. The blocks must deterministically contain
      zeroes when they are subsequently read back.
      
      This patch adds a flag to blkdev_issue_zeroout() that provides this
      variant. If the discard flag is set and a block device guarantees
      discard_zeroes_data we will use REQ_DISCARD to clear the block range. If
      the device does not support discard_zeroes_data or if the discard
      request fails we will fall back to first REQ_WRITE_SAME and then a
      regular REQ_WRITE.
      
      Also update the callers of blkdev_issue_zero() to reflect the new flag
      and make sb_issue_zeroout() prefer the discard approach.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      d93ba7a5
    • J
      cfq-iosched: fix incorrect filing of rt async cfqq · c6ce1943
      Jeff Moyer 提交于
      Hi,
      
      If you can manage to submit an async write as the first async I/O from
      the context of a process with realtime scheduling priority, then a
      cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
      ends up in the best effort array, but actually has realtime I/O
      scheduling priority set in cfqq->ioprio.
      
      The reason is that cfq_get_queue assumes the default scheduling class and
      priority when there is no information present (i.e. when the async cfqq
      is created):
      
      static struct cfq_queue *
      cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
      	      struct bio *bio, gfp_t gfp_mask)
      {
      	const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
      	const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
      
      cic->ioprio starts out as 0, which is "invalid".  So, class of 0
      (IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:
      
      		async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
      
      static struct cfq_queue **
      cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
      {
              switch (ioprio_class) {
              case IOPRIO_CLASS_RT:
                      return &cfqd->async_cfqq[0][ioprio];
              case IOPRIO_CLASS_NONE:
                      ioprio = IOPRIO_NORM;
                      /* fall through */
              case IOPRIO_CLASS_BE:
                      return &cfqd->async_cfqq[1][ioprio];
              case IOPRIO_CLASS_IDLE:
                      return &cfqd->async_idle_cfqq;
              default:
                      BUG();
              }
      }
      
      Here, instead of returning a class mapped from the process' scheduling
      priority, we get back the bucket associated with IOPRIO_CLASS_BE.
      
      Now, there is no queue allocated there yet, so we create it:
      
      		cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);
      
      That function ends up doing this:
      
      			cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
      			cfq_init_prio_data(cfqq, cic);
      
      cfq_init_cfqq marks the priority as having changed.  Then, cfq_init_prio
      data does this:
      
      	ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
      	switch (ioprio_class) {
      	default:
      		printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
      	case IOPRIO_CLASS_NONE:
      		/*
      		 * no prio set, inherit CPU scheduling settings
      		 */
      		cfqq->ioprio = task_nice_ioprio(tsk);
      		cfqq->ioprio_class = task_nice_ioclass(tsk);
      		break;
      
      So we basically have two code paths that treat IOPRIO_CLASS_NONE
      differently, which results in an RT async cfqq filed into a best effort
      bucket.
      
      Attached is a patch which fixes the problem.  I'm not sure how to make
      it cleaner.  Suggestions would be welcome.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Tested-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c6ce1943
  6. 17 1月, 2015 1 次提交
    • J
      null_blk: suppress invalid partition info · 227290b4
      Jens Axboe 提交于
      null_blk is partitionable, but it doesn't store any of the info. When
      it is loaded, you would normally see:
      
      [1226739.343608]  nullb0: unknown partition table
      [1226739.343746]  nullb1: unknown partition table
      
      which can confuse some people. Add the appropriate gendisk flag
      to suppress this info.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      227290b4
  7. 14 1月, 2015 6 次提交
    • J
      blk-mq: fix false negative out-of-tags condition · 0bf36498
      Jens Axboe 提交于
      The blk-mq tagging tries to maintain some locality between CPUs and
      the tags issued. The tags are split into groups of words, and the
      words may not be fully populated. When searching for a new free tag,
      blk-mq may look at partial words, hence it passes in an offset/size
      to find_next_zero_bit(). However, it does that wrong, the size must
      always be the full length of the number of tags in that word,
      otherwise we'll potentially miss some near the end.
      
      Another issue is when __bt_get() goes from one word set to the next.
      It bumps the index, but not the last_tag associated with the
      previous index. Bump that to be in the range of the new word.
      
      Finally, clean up __bt_get() and __bt_get_word() a bit and get
      rid of the goto in there, and the unnecessary 'wrap' variable.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0bf36498
    • B
      brd: Request from fdisk 4k alignment · c8fa3173
      Boaz Harrosh 提交于
      Because of the direct_access() API which returns a PFN. partitions
      better start on 4K boundary, else offset ZERO of a partition will
      not be aligned and blk_direct_access() will fail the call.
      
      By setting blk_queue_physical_block_size(PAGE_SIZE) we can communicate
      this to fdisk and friends.
      
      The call to blk_queue_physical_block_size() is harmless and will
      not affect the Kernel behavior in any way. It is only for
      communication to user-mode.
      
      before this patch running fdisk on a default size brd of 4M
      the first sector offered is 34 (BAD), but after this patch it
      will be 40, ie 8 sectors aligned. Also when entering some random
      partition sizes the next partition-start sector is offered 8 sectors
      aligned after this patch. (Please note that with fdisk the user
      can still enter bad values, only the offered default values will
      be correct)
      
      Note that with bdev-size > 4M fdisk will try to align on a 1M
      boundary (above first-sector will be 2048), in any case.
      
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c8fa3173
    • B
      brd: Fix all partitions BUGs · 937af5ec
      Boaz Harrosh 提交于
      This patch fixes up brd's partitions scheme, now enjoying all worlds.
      
      The MAIN fix here is that currently, if one fdisks some partitions,
      a BAD bug will make all partitions point to the same start-end sector
      ie: 0 - brd_size And an mkfs of any partition would trash the partition
      table and the other partition.
      
      Another fix is that "mount -U uuid" will not work if show_part was not
      specified, because of the GENHD_FL_SUPPRESS_PARTITION_INFO flag.
      We now always load without it and remove the show_part parameter.
      
      [We remove Dmitry's new module-param part_show it is now always
       show]
      
      So NOW the logic goes like this:
      * max_part - Just says how many minors to reserve between ramX
        devices. In any way, there can be as many partition as requested.
        If minors between devices ends, then dynamic 259-major ids will
        be allocated on the fly.
        The default is now max_part=1, which means all partitions devt(s)
        will be from the dynamic (259) major-range.
        (If persistent partition minors is needed use max_part=X)
        For example with /dev/sdX max_part is hard coded 16.
      
      * Creation of new devices on the fly still/always work:
        mknod /path/devnod b 1 X
        fdisk -l /path/devnod
        Will create a new device if [X / max_part] was not already
        created before. (Just as before)
      
        partitions on the dynamically created device will work as well
        Same logic applies with minors as with the pre-created ones.
      
      TODO: dynamic grow of device size. So each device can have it's
            own size.
      
      CC: Dmitry Monakhov <dmonakhov@openvz.org>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      937af5ec
    • J
      Merge branch 'for-3.20/core' into for-3.20/drivers · d4119ee0
      Jens Axboe 提交于
      d4119ee0
    • M
      block: Change direct_access calling convention · dd22f551
      Matthew Wilcox 提交于
      In order to support accesses to larger chunks of memory, pass in a
      'size' parameter (counted in bytes), and return the amount available at
      that address.
      
      Add a new helper function, bdev_direct_access(), to handle common
      functionality including partition handling, checking the length requested
      is positive, checking for the sector being page-aligned, and checking
      the length of the request does not pass the end of the partition.
      Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NBoaz Harrosh <boaz@plexistor.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      dd22f551
    • M
      axonram: Fix bug in direct_access · 91117a20
      Matthew Wilcox 提交于
      The 'pfn' returned by axonram was completely bogus, and has been since
      2008.
      Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      91117a20
  8. 03 1月, 2015 8 次提交
  9. 01 1月, 2015 1 次提交
  10. 29 12月, 2014 3 次提交
  11. 28 12月, 2014 5 次提交
  12. 27 12月, 2014 3 次提交
  13. 26 12月, 2014 2 次提交