1. 31 1月, 2016 1 次提交
  2. 26 11月, 2015 1 次提交
  3. 22 10月, 2015 1 次提交
    • M
      block: Inline blk_integrity in struct gendisk · 25520d55
      Martin K. Petersen 提交于
      Up until now the_integrity profile has been dynamically allocated and
      attached to struct gendisk after the disk has been made active.
      
      This causes problems because NVMe devices need to register the profile
      prior to the partition table being read due to a mandatory metadata
      buffer requirement. In addition, DM goes through hoops to deal with
      preallocating, but not initializing integrity profiles.
      
      Since the integrity profile is small (4 bytes + a pointer), Christoph
      suggested moving it to struct gendisk proper. This requires several
      changes:
      
       - Moving the blk_integrity definition to genhd.h.
      
       - Inlining blk_integrity in struct gendisk.
      
       - Removing the dynamic allocation code.
      
       - Adding helper functions which allow gendisk to set up and tear down
         the integrity sysfs dir when a disk is added/deleted.
      
       - Adding a blk_integrity_revalidate() callback for updating the stable
         pages bdi setting.
      
       - The calls that depend on whether a device has an integrity profile or
         not now key off of the bi->profile pointer.
      
       - Simplifying the integrity support routines in DM (Mike Snitzer).
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      25520d55
  4. 17 7月, 2015 2 次提交
  5. 04 9月, 2014 1 次提交
  6. 08 4月, 2013 1 次提交
    • J
      Revert "loop: cleanup partitions when detaching loop device" · c2fccc1c
      Jens Axboe 提交于
      This reverts commit 8761a3dc.
      
      There are situations where the destruction path is called
      with the bdev->bd_mutex already held, which then deadlocks in
      loop_clr_fd(). The normal partition cleanup does a trylock()
      on the mutex, but it'd be nice to have a more bullet proof
      method in loop. So punt this more involved fix to the next
      merge window, and just back out this buggy fix for now.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c2fccc1c
  7. 23 3月, 2013 1 次提交
    • P
      loop: cleanup partitions when detaching loop device · 8761a3dc
      Phillip Susi 提交于
      Any partitions added by user space to the loop device were being
      left in place after detaching the loop device.  This was because
      the detach path issued a BLKRRPART to clean up partitions if
      LO_FLAGS_PARTSCAN was set, meaning that the partitions were auto
      scanned on attach.  Replace this BLKRRPART with code that
      unconditionally cleans up partitions on detach instead.
      Signed-off-by: NPhillip Susi <psusi@ubuntu.com>
      
      Modified by Jens to export delete_partition().
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8761a3dc
  8. 28 2月, 2013 2 次提交
    • M
      block/partitions: optimize memory allocation in check_partition() · ac2e5327
      Ming Lei 提交于
      Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch, so
      it is easy to trigger page allocation failure by check_partition,
      especially in hotplug block device situation(such as, USB mass storage,
      MMC card, ...), and Felipe Balbi has observed the failure.
      
      This patch does below optimizations on the allocation of struct
      parsed_partitions to try to address the issue:
      
      - make parsed_partitions.parts as pointer so that the pointed memory can
        fit in 32KB buffer, then approximate 32KB memory can be saved
      
      - vmalloc the buffer pointed by parsed_partitions.parts because 32KB is
        still a bit big for kmalloc
      
      - given that many devices have the partition count limit, so only
        allocate disk_max_parts() partitions instead of 256 partitions always
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Reported-by: NFelipe Balbi <balbi@ti.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac2e5327
    • T
      block: fix ext_devt_idr handling · 7b74e912
      Tomas Henzl 提交于
      While adding and removing a lot of disks disks and partitions this
      sometimes shows up:
      
        WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
        Hardware name:
        sysfs: cannot create duplicate filename '/dev/block/259:751'
        Modules linked in: raid1 autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 dm_mirror dm_region_hash dm_log power_meter microcode dcdbas serio_raw amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core k10temp bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mod [last unloaded: scsi_wait_scan]
        Pid: 44103, comm: async/16 Not tainted 2.6.32-195.el6.x86_64 #1
        Call Trace:
          warn_slowpath_common+0x87/0xc0
          warn_slowpath_fmt+0x46/0x50
          sysfs_add_one+0xc9/0x130
          sysfs_do_create_link+0x12b/0x170
          sysfs_create_link+0x13/0x20
          device_add+0x317/0x650
          idr_get_new+0x13/0x50
          add_partition+0x21c/0x390
          rescan_partitions+0x32b/0x470
          sd_open+0x81/0x1f0 [sd_mod]
          __blkdev_get+0x1b6/0x3c0
          blkdev_get+0x10/0x20
          register_disk+0x155/0x170
          add_disk+0xa6/0x160
          sd_probe_async+0x13b/0x210 [sd_mod]
          add_wait_queue+0x46/0x60
          async_thread+0x102/0x250
          default_wake_function+0x0/0x20
          async_thread+0x0/0x250
          kthread+0x96/0xa0
          child_rip+0xa/0x20
          kthread+0x0/0xa0
          child_rip+0x0/0x20
      
      This most likely happens because dev_t is freed while the number is
      still used and idr_get_new() is not protected on every use.  The fix
      adds a mutex where it wasn't before and moves the dev_t free function so
      it is called after device del.
      Signed-off-by: NTomas Henzl <thenzl@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b74e912
  9. 01 8月, 2012 1 次提交
    • V
      block: add partition resize function to blkpg ioctl · c83f6bf9
      Vivek Goyal 提交于
      Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
      allows altering the size of an existing partition, even if it is currently
      in use.
      
      This patch converts hd_struct->nr_sects into sequence counter because
      One might extend a partition while IO is happening to it and update of
      nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
      can lead to issues like reading inconsistent size of a partition. Sequence
      counter have been used so that readers don't have to take bdev mutex lock
      as we call sector_in_part() very frequently.
      
      Now all the access to hd_struct->nr_sects should happen using sequence
      counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
      There is one exception though, set_capacity()/get_capacity(). I think
      theoritically race should exist there too but this patch does not
      modify set_capacity()/get_capacity() due to sheer number of call sites
      and I am afraid that change might break something. I have left that as a
      TODO item. We can handle it later if need be. This patch does not introduce
      any new races as such w.r.t set_capacity()/get_capacity().
      
      v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NPhillip Susi <psusi@ubuntu.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c83f6bf9
  10. 02 3月, 2012 1 次提交
  11. 04 1月, 2012 2 次提交
  12. 13 6月, 2011 1 次提交
  13. 30 5月, 2011 1 次提交
    • J
      Revert "block: Remove extra discard_alignment from hd_struct." · a1706ac4
      Jens Axboe 提交于
      It was not a good idea to start dereferencing disk->queue from
      the fs sysfs strategy for displaying discard alignment. We ran
      into first a NULL pointer deref, and after fixing that we sometimes
      see unvalid disk->queue pointer values.
      
      Since discard is the only one of the bunch actually looking into
      the queue, just revert the change.
      
      This reverts commit 23ceb5b7.
      
      Conflicts:
      	fs/partitions/check.c
      a1706ac4
  14. 27 5月, 2011 1 次提交
  15. 09 5月, 2011 1 次提交
    • J
      fs: fixup warning part_discard_alignment_show() · bbdd304c
      Jens Axboe 提交于
      Stephen reports:
      
      -----
      
      After merging the block tree, today's linux-next build (x86_64
      allmodconfig) produced this warning:
      
      fs/partitions/check.c: In function 'part_discard_alignment_show':
      fs/partitions/check.c:263: warning: format '%u' expects type 'unsigned int', but argument 3 has type 'long long unsigned int'
      
      Introduced by commit  ("block: Remove extra discard_alignment from
      hd_struct")
      
      -----
      
      Fix it up by just removing the cast, we return an int already.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      bbdd304c
  16. 07 5月, 2011 1 次提交
  17. 31 3月, 2011 1 次提交
  18. 22 3月, 2011 1 次提交
  19. 07 1月, 2011 1 次提交
  20. 05 1月, 2011 1 次提交
    • J
      block: fix accounting bug on cross partition merges · 09e099d4
      Jerome Marchand 提交于
      /proc/diskstats would display a strange output as follows.
      
      $ cat /proc/diskstats |grep sda
         8       0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
         8       1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
                                                      ~~~~~~~~~~
         8       2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
         8       3 sda3 54 487 2188 92 0 0 0 0 0 88 92
         8       4 sda4 4 0 8 0 0 0 0 0 0 0 0
         8       5 sda5 81 2027 2130 138 0 0 0 0 0 87 137
      
      Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
      merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.
      
      The detailed root cause is as follows.
      
      Assuming that there are two partition, sda1 and sda2.
      
      1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
         is 0 and sda2's one is 1.
      
              | hd_struct->in_flight
         ---------------------------
         sda1 |          0
         sda2 |          1
         ---------------------------
      
      2. A bio belongs to sda1 is issued and is merged into the request mentioned on
         step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
         from sda2 region to sda1 region. However the two partition's
         hd_struct->in_flight are not changed.
      
              | hd_struct->in_flight
         ---------------------------
         sda1 |          0
         sda2 |          1
         ---------------------------
      
      3. The request is finished and blk_account_io_done() is called. In this case,
         sda2's hd_struct->in_flight, not a sda1's one, is decremented.
      
              | hd_struct->in_flight
         ---------------------------
         sda1 |         -1
         sda2 |          1
         ---------------------------
      
      The patch fixes the problem by caching the partition lookup
      inside the request structure, hence making sure that the increment
      and decrement will always happen on the same partition struct. This
      also speeds up IO with accounting enabled, since it cuts down on
      the number of lookups we have to do.
      
      Also add a refcount to struct hd_struct to keep the partition in
      memory as long as users exist. We use kref_test_and_get() to ensure
      we don't add a reference to a partition which is going away.
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      09e099d4
  21. 17 12月, 2010 1 次提交
  22. 13 11月, 2010 1 次提交
    • T
      block: make blkdev_get/put() handle exclusive access · e525fd89
      Tejun Heo 提交于
      Over time, block layer has accumulated a set of APIs dealing with bdev
      open, close, claim and release.
      
      * blkdev_get/put() are the primary open and close functions.
      
      * bd_claim/release() deal with exclusive open.
      
      * open/close_bdev_exclusive() are combination of open and claim and
        the other way around, respectively.
      
      * bd_link/unlink_disk_holder() to create and remove holder/slave
        symlinks.
      
      * open_by_devnum() wraps bdget() + blkdev_get().
      
      The interface is a bit confusing and the decoupling of open and claim
      makes it impossible to properly guarantee exclusive access as
      in-kernel open + claim sequence can disturb the existing exclusive
      open even before the block layer knows the current open if for another
      exclusive access.  Reorganize the interface such that,
      
      * blkdev_get() is extended to include exclusive access management.
        @holder argument is added and, if is @FMODE_EXCL specified, it will
        gain exclusive access atomically w.r.t. other exclusive accesses.
      
      * blkdev_put() is similarly extended.  It now takes @mode argument and
        if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
        the last exclusive claim is released, the holder/slave symlinks are
        removed automatically.
      
      * bd_claim/release() and close_bdev_exclusive() are no longer
        necessary and either made static or removed.
      
      * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
        is no longer necessary and removed.
      
      * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
        and blkdev_get().  It also has an unexpected extra bdev_read_only()
        test which probably should be moved into blkdev_get().
      
      * open_by_devnum() is modified to take @holder argument and pass it to
        blkdev_get().
      
      Most of bdev open/close operations are unified into blkdev_get/put()
      and most exclusive accesses are tested atomically at the open time (as
      it should).  This cleans up code and removes some, both valid and
      invalid, but unnecessary all the same, corner cases.
      
      open_bdev_exclusive() and open_by_devnum() can use further cleanup -
      rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
      special features.  Well, let's leave them for another day.
      
      Most conversions are straight-forward.  drbd conversion is a bit more
      involved as there was some reordering, but the logic should stay the
      same.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Brown <neilb@suse.de>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: dm-devel@redhat.com
      Cc: drbd-dev@lists.linbit.com
      Cc: Leo Chen <leochen@broadcom.com>
      Cc: Scott Branden <sbranden@broadcom.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      e525fd89
  23. 11 11月, 2010 1 次提交
  24. 25 10月, 2010 1 次提交
  25. 23 10月, 2010 1 次提交
    • A
      SYSFS: Allow boot time switching between deprecated and modern sysfs layout · e52eec13
      Andi Kleen 提交于
      I have some systems which need legacy sysfs due to old tools that are
      making assumptions that a directory can never be a symlink to another
      directory, and it's a big hazzle to compile separate kernels for them.
      
      This patch turns CONFIG_SYSFS_DEPRECATED into a run time option
      that can be switched on/off the kernel command line. This way
      the same binary can be used in both cases with just a option
      on the command line.
      
      The old CONFIG_SYSFS_DEPRECATED_V2 option is still there to set
      the default. I kept the weird name to not break existing
      config files.
      
      Also the compat code can be still completely disabled by undefining
      CONFIG_SYSFS_DEPRECATED_SWITCH -- just the optimizer takes
      care of this now instead of lots of ifdefs. This makes the code
      look nicer.
      
      v2: This is an updated version on top of Kay's patch to only
      handle the block devices. I tested it on my old systems
      and that seems to work.
      
      Cc: axboe@kernel.dk
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      e52eec13
  26. 19 10月, 2010 1 次提交
    • Y
      block: fix accounting bug on cross partition merges · 7681bfee
      Yasuaki Ishimatsu 提交于
      /proc/diskstats would display a strange output as follows.
      
      $ cat /proc/diskstats |grep sda
         8       0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
         8       1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
                                                      ~~~~~~~~~~
         8       2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
         8       3 sda3 54 487 2188 92 0 0 0 0 0 88 92
         8       4 sda4 4 0 8 0 0 0 0 0 0 0 0
         8       5 sda5 81 2027 2130 138 0 0 0 0 0 87 137
      
      Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
      merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.
      
      The detailed root cause is as follows.
      
      Assuming that there are two partition, sda1 and sda2.
      
      1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
         is 0 and sda2's one is 1.
      
              | hd_struct->in_flight
         ---------------------------
         sda1 |          0
         sda2 |          1
         ---------------------------
      
      2. A bio belongs to sda1 is issued and is merged into the request mentioned on
         step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
         from sda2 region to sda1 region. However the two partition's
         hd_struct->in_flight are not changed.
      
              | hd_struct->in_flight
         ---------------------------
         sda1 |          0
         sda2 |          1
         ---------------------------
      
      3. The request is finished and blk_account_io_done() is called. In this case,
         sda2's hd_struct->in_flight, not a sda1's one, is decremented.
      
              | hd_struct->in_flight
         ---------------------------
         sda1 |         -1
         sda2 |          1
         ---------------------------
      
      The patch fixes the problem by caching the partition lookup
      inside the request structure, hence making sure that the increment
      and decrement will always happen on the same partition struct. This
      also speeds up IO with accounting enabled, since it cuts down on
      the number of lookups we have to do.
      
      When reloading partition tables, quiesce IO to ensure that no
      request references to the partition struct exists. When it is safe
      to free the partition table, the IO for that device is restarted
      again.
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7681bfee
  27. 15 9月, 2010 1 次提交
    • W
      block, partition: add partition_meta_info to hd_struct · 6d1d8050
      Will Drewry 提交于
      I'm reposting this patch series as v4 since there have been no additional
      comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches
      are against Linus's tree: 2bfc96a1
      (2.6.36-rc3).
      
      Would this patchset be suitable for inclusion in an mm branch?
      
      This changes adds a partition_meta_info struct which itself contains a
      union of structures that provide partition table specific metadata.
      
      This change leaves the union empty. The subsequent patch includes an
      implementation for CONFIG_EFI_PARTITION-based metadata.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6d1d8050
  28. 11 8月, 2010 1 次提交
  29. 15 6月, 2010 1 次提交
  30. 22 5月, 2010 4 次提交
    • T
      block: improve automatic native capacity unlocking · b403a98e
      Tejun Heo 提交于
      Currently, native capacity unlocking is initiated only when a
      recognized partition extends beyond the end of the disk.  However,
      there are several other unhandled cases where truncated capacity can
      lead to misdetection of partitions.
      
      * Partition table is fully beyond EOD.
      
      * Partition table is partially beyond EOD (daisy chained ones).
      
      * Recognized partition starts beyond EOD.
      
      This patch updates generic partition check code such that all the
      above three cases are handled too.  For the first two, @state tracks
      whether low level partition check code tried to read beyond EOD during
      partition scan and triggers native capacity unlocking accordingly.
      The third is now handled similarly to the original unlocking case.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b403a98e
    • T
      block: use struct parsed_partitions *state universally in partition check code · 1493bf21
      Tejun Heo 提交于
      Make the following changes to partition check code.
      
      * Add ->bdev to struct parsed_partitions.
      
      * Introduce read_part_sector() which is a simple wrapper around
        read_dev_sector() which takes struct parsed_partitions *state
        instead of @bdev.
      
      * For functions which used to take @state and @bdev, drop @bdev.  For
        functions which used to take @bdev, replace it with @state.
      
      * While updating, drop superflous checks on NULL state/bdev in ldm.c.
      
      This cleans up the API a bit and enables better handling of IO errors
      during partition check as the generic partition check code now has
      much better visibility into what went wrong in the low level code
      paths.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1493bf21
    • T
      block,ide: simplify bdops->set_capacity() to ->unlock_native_capacity() · c3e33e04
      Tejun Heo 提交于
      bdops->set_capacity() is unnecessarily generic.  All that's required
      is a simple one way notification to lower level driver telling it to
      try to unlock native capacity.  There's no reason to pass in target
      capacity or return the new capacity.  The former is always the
      inherent native capacity and the latter can be handled via the usual
      device resize / revalidation path.  In fact, the current API is always
      used that way.
      
      Replace ->set_capacity() with ->unlock_native_capacity() which take
      only @disk and doesn't return anything.  IDE which is the only current
      user of the API is converted accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c3e33e04
    • T
      block: restart partition scan after resizing a device · 56bca017
      Tejun Heo 提交于
      Device resize via ->set_capacity() can reveal new partitions (e.g. in
      chained partition table formats such as dos extended parts).  Restart
      partition scan from the beginning after resizing a device.  This
      change also makes libata always revalidate the disk after resize which
      makes lower layer native capacity unlocking implementation simpler and
      more robust as resize can be handled in the usual path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NBen Hutchings <ben@decadent.org.uk>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      56bca017
  31. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  32. 11 1月, 2010 1 次提交
  33. 10 11月, 2009 1 次提交
  34. 07 10月, 2009 1 次提交
    • N
      block: Seperate read and write statistics of in_flight requests v2 · 316d315b
      Nikanth Karthikesan 提交于
      Commit a9327cac added seperate read
      and write statistics of in_flight requests. And exported the number
      of read and write requests in progress seperately through sysfs.
      
      But  Corrado Zoccolo <czoccolo@gmail.com> reported getting strange
      output from "iostat -kx 2". Global values for service time and
      utilization were garbage. For interval values, utilization was always
      100%, and service time is higher than normal.
      
      So this was reverted by commit 0f78ab98
      
      The problem was in part_round_stats_single(), I missed the following:
              if (now == part->stamp)
                      return;
      
      -       if (part->in_flight) {
      +       if (part_in_flight(part)) {
                      __part_stat_add(cpu, part, time_in_queue,
                                      part_in_flight(part) * (now - part->stamp));
                      __part_stat_add(cpu, part, io_ticks, (now - part->stamp));
      
      With this chunk included, the reported regression gets fixed.
      Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
      
      --
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      316d315b