提交 · 77ea887e433ad8389d416826936c110fa7910f80 · gsplhtlxg / clone-Linux

17 12月, 2010 3 次提交

implement in-kernel gendisk events handling · 77ea887e

由 Tejun Heo 提交于 12月 08, 2010

Currently, media presence polling for removeable block devices is done
from userland.  There are several issues with this.

* Polling is done by periodically opening the device.  For SCSI
  devices, the command sequence generated by such action involves a
  few different commands including TEST_UNIT_READY.  This behavior,
  while perfectly legal, is different from Windows which only issues
  single command, GET_EVENT_STATUS_NOTIFICATION.  Unfortunately, some
  ATAPI devices lock up after being periodically queried such command
  sequences.

* There is no reliable and unintrusive way for a userland program to
  tell whether the target device is safe for media presence polling.
  For example, polling for media presence during an on-going burning
  session can make it fail.  The polling program can avoid this by
  opening the device with O_EXCL but then it risks making a valid
  exclusive user of the device fail w/ -EBUSY.

* Userland polling is unnecessarily heavy and in-kernel implementation
  is lighter and better coordinated (workqueue, timer slack).

This patch implements framework for in-kernel disk event handling,
which includes media presence polling.

* bdops->check_events() is added, which supercedes ->media_changed().
  It should check whether there's any pending event and return if so.
  Currently, two events are defined - DISK_EVENT_MEDIA_CHANGE and
  DISK_EVENT_EJECT_REQUEST.  ->check_events() is guaranteed not to be
  called parallelly.

* gendisk->events and ->async_events are added.  These should be
  initialized by block driver before passing the device to add_disk().
  The former contains the mask of all supported events and the latter
  the mask of all events which the device can report without polling.
  /sys/block/*/events[_async] export these to userland.

* Kernel parameter block.events_dfl_poll_msecs controls the system
  polling interval (default is 0 which means disable) and
  /sys/block/*/events_poll_msecs control polling intervals for
  individual devices (default is -1 meaning use system setting).  Note
  that if a device can report all supported events asynchronously and
  its polling interval isn't explicitly set, the device won't be
  polled regardless of the system polling interval.

* If a device is opened exclusively with write access, event checking
  is automatically disabled until all write exclusive accesses are
  released.

* There are event 'clearing' events.  For example, both of currently
  defined events are cleared after the device has been successfully
  opened.  This information is passed to ->check_events() callback
  using @clearing argument as a hint.

* Event checking is always performed from system_nrt_wq and timer
  slack is set to 25% for polling.

* Nothing changes for drivers which implement ->media_changed() but
  not ->check_events().  Going forward, all drivers will be converted
  to ->check_events() and ->media_change() will be dropped.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

77ea887e

block: move register_disk() and del_gendisk() to block/genhd.c · d2bf1b67

由 Tejun Heo 提交于 12月 08, 2010

There's no reason for register_disk() and del_gendisk() to be in
fs/partitions/check.c.  Move both to genhd.c.  While at it, collapse
unlink_gendisk(), which was artificially in a separate function due to
genhd.c / check.c split, into del_gendisk().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

d2bf1b67

block: kill genhd_media_change_notify() · dddd9dc3

由 Tejun Heo 提交于 12月 08, 2010

There's no user of the facility.  Kill it.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

dddd9dc3

25 10月, 2010 1 次提交

Revert "block: fix accounting bug on cross partition merges" · f253b86b

由 Jens Axboe 提交于 10月 24, 2010

This reverts commit 7681bfee.

Conflicts:

	include/linux/genhd.h

It has numerous issues with the cleanup path and non-elevator
devices. Revert it for now so we can come up with a clean
version without rushing things.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

f253b86b

19 10月, 2010 1 次提交

block: fix accounting bug on cross partition merges · 7681bfee

由 Yasuaki Ishimatsu 提交于 10月 19, 2010

/proc/diskstats would display a strange output as follows.

$ cat /proc/diskstats |grep sda
   8       0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
   8       1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
                                                ~~~~~~~~~~
   8       2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
   8       3 sda3 54 487 2188 92 0 0 0 0 0 88 92
   8       4 sda4 4 0 8 0 0 0 0 0 0 0 0
   8       5 sda5 81 2027 2130 138 0 0 0 0 0 87 137

Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.

The detailed root cause is as follows.

Assuming that there are two partition, sda1 and sda2.

1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
   is 0 and sda2's one is 1.

        | hd_struct->in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

2. A bio belongs to sda1 is issued and is merged into the request mentioned on
   step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
   from sda2 region to sda1 region. However the two partition's
   hd_struct->in_flight are not changed.

        | hd_struct->in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

3. The request is finished and blk_account_io_done() is called. In this case,
   sda2's hd_struct->in_flight, not a sda1's one, is decremented.

        | hd_struct->in_flight
   ---------------------------
   sda1 |         -1
   sda2 |          1
   ---------------------------

The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.

When reloading partition tables, quiesce IO to ensure that no
request references to the partition struct exists. When it is safe
to free the partition table, the IO for that device is restarted
again.
Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7681bfee

15 9月, 2010 1 次提交

block, partition: add partition_meta_info to hd_struct · 6d1d8050

由 Will Drewry 提交于 8月 31, 2010

I'm reposting this patch series as v4 since there have been no additional
comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches
are against Linus's tree: 2bfc96a1
(2.6.36-rc3).

Would this patchset be suitable for inclusion in an mm branch?

This changes adds a partition_meta_info struct which itself contains a
union of structures that provide partition table specific metadata.

This change leaves the union empty. The subsequent patch includes an
implementation for CONFIG_EFI_PARTITION-based metadata.
Signed-off-by: NWill Drewry <wad@chromium.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

6d1d8050

20 8月, 2010 1 次提交

kernel: __rcu annotations · 4d2deb40

由 Arnd Bergmann 提交于 2月 24, 2010

This adds annotations for RCU operations in core kernel components
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

4d2deb40

16 3月, 2010 1 次提交

Remove GENHD_FL_DRIVERFS · 97fedbbe

由 NeilBrown 提交于 3月 16, 2010

This flag is not used, so best discarded.
Signed-off-by: NNeilBrown <neilb@suse.de>
--
Hi Jens,
 I came across this recently - these are the only two occurances
 of "GENHD_FL_DRIVERFS" in the kernel, so it cannot be needed.
NeilBrown
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

97fedbbe

17 2月, 2010 1 次提交

percpu: add __percpu sparse annotations to core kernel subsystems · 43cf38eb

由 Tejun Heo 提交于 2月 02, 2010

Add __percpu sparse annotations to core subsystems.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-mm@kvack.org
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Biederman <ebiederm@xmission.com>

43cf38eb

11 1月, 2010 1 次提交

genhd: overlapping variable definition · 7af92f87

由 Stephen Hemminger 提交于 1月 06, 2010

This fixes the sparse warning:
fs/ext4/super.c:2390:40: warning: symbol 'i' shadows an earlier one
fs/ext4/super.c:2368:22: originally declared here

Using 'i' in a macro is dubious practice.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7af92f87

10 11月, 2009 1 次提交

block: Expose discard granularity · 86b37281

由 Martin K. Petersen 提交于 11月 10, 2009

While SSDs track block usage on a per-sector basis, RAID arrays often
have allocation blocks that are bigger.  Allow the discard granularity
and alignment to be set and teach the topology stacking logic how to
handle them.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

86b37281

07 10月, 2009 1 次提交

block: Seperate read and write statistics of in_flight requests v2 · 316d315b

由 Nikanth Karthikesan 提交于 10月 06, 2009

Commit a9327cac added seperate read
and write statistics of in_flight requests. And exported the number
of read and write requests in progress seperately through sysfs.

But  Corrado Zoccolo <czoccolo@gmail.com> reported getting strange
output from "iostat -kx 2". Global values for service time and
utilization were garbage. For interval values, utilization was always
100%, and service time is higher than normal.

So this was reverted by commit 0f78ab98

The problem was in part_round_stats_single(), I missed the following:
        if (now == part->stamp)
                return;

-       if (part->in_flight) {
+       if (part_in_flight(part)) {
                __part_stat_add(cpu, part, time_in_queue,
                                part_in_flight(part) * (now - part->stamp));
                __part_stat_add(cpu, part, io_ticks, (now - part->stamp));

With this chunk included, the reported regression gets fixed.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>

--
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

316d315b

05 10月, 2009 1 次提交

Revert "Seperate read and write statistics of in_flight requests" · 0f78ab98

由 Jens Axboe 提交于 10月 04, 2009

This reverts commit a9327cac.

Corrado Zoccolo <czoccolo@gmail.com> reports:

"with 2.6.32-rc1 I started getting the following strange output from
"iostat -kx 2":
Linux 2.6.31bisect (et2) 	04/10/2009 	_i686_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10,70    0,00    3,16   15,75    0,00   70,38

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda              18,22     0,00    0,67    0,01    14,77     0,02
43,94     0,01   10,53 39043915,03 2629219,87
sdb              60,89     9,68   50,79    3,04  1724,43    50,52
65,95     0,70   13,06 488437,47 2629219,87

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2,72    0,00    0,74    0,00    0,00   96,53

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6,68    0,00    0,99    0,00    0,00   92,33

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,40    0,00    0,73    1,47    0,00   93,40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     4,00    0,00    3,00     0,00    28,00
18,67     0,06   19,50 333,33 100,00

Global values for service time and utilization are garbage. For
interval values, utilization is always 100%, and service time is
higher than normal.

I bisected it down to:
[a9327cac] Seperate read and write
statistics of in_flight requests
and verified that reverting just that commit indeed solves the issue
on 2.6.32-rc1."

So until this is debugged, revert the bad commit.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0f78ab98

22 9月, 2009 1 次提交

const: make block_device_operations const · 83d5cde4

由 Alexey Dobriyan 提交于 9月 21, 2009

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83d5cde4

20 9月, 2009 1 次提交

Driver-Core: extend devnode callbacks to provide permissions · e454cea2

由 Kay Sievers 提交于 9月 18, 2009

This allows subsytems to provide devtmpfs with non-default permissions
for the device node. Instead of the default mode of 0600, null, zero,
random, urandom, full, tty, ptmx now have a mode of 0666, which allows
non-privileged processes to access standard device nodes in case no
other userspace process applies the expected permissions.

This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain.
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

e454cea2

14 9月, 2009 1 次提交

Seperate read and write statistics of in_flight requests · a9327cac

由 Nikanth Karthikesan 提交于 9月 11, 2009

Currently, there is a single in_flight counter measuring the number of
requests in the request_queue. But some monitoring tools would like to
know how many read requests and write requests are in progress. Split the
current in_flight counter into two seperate counters for read and write.

This information is exported as a sysfs attribute, as changing the
currently available stat files would break the existing tools.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a9327cac

16 6月, 2009 1 次提交

Driver Core: block: add nodename support for block drivers. · b03f38b6

由 Kay Sievers 提交于 4月 30, 2009

This adds support for block drivers to report their requested nodename
to userspace.  It also updates a number of block drivers to provide the
needed subdirectory and device name to be used for them.
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NJan Blunck <jblunck@suse.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

b03f38b6

07 6月, 2009 1 次提交

partitions: add ->set_capacity block device method · db429e9e

由 Bartlomiej Zolnierkiewicz 提交于 6月 07, 2009

* Add ->set_capacity block device method and use it in rescan_partitions()
  to attempt enabling native capacity of the device upon detecting the
  partition which exceeds device capacity.

* Add GENHD_FL_NATIVE_CAPACITY flag to try limit attempts of enabling
  native capacity during partition scan.

Together with the consecutive patch implementing ->set_capacity method in
ide-gd device driver this allows automatic disabling of Host Protected Area
(HPA) if any partitions overlapping HPA are detected.

Cc: Robert Hancock <hancockrwd@gmail.com>
Cc: Frans Pop <elendil@planet.nl>
Cc: "Andries E. Brouwer" <Andries.Brouwer@cwi.nl>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Emphatically-Acked-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>

db429e9e

23 5月, 2009 1 次提交

block: Export I/O topology for block devices and partitions · c72758f3

由 Martin K. Petersen 提交于 5月 22, 2009

To support devices with physical block sizes bigger than 512 bytes we
need to ensure proper alignment.  This patch adds support for exposing
I/O topology characteristics as devices are stacked.

  logical_block_size is the smallest unit the device can address.

  physical_block_size indicates the smallest I/O the device can write
  without incurring a read-modify-write penalty.

  The io_min parameter is the smallest preferred I/O size reported by
  the device.  In many cases this is the same as the physical block
  size.  However, the io_min parameter can be scaled up when stacking
  (RAID5 chunk size > physical block size).

  The io_opt characteristic indicates the optimal I/O size reported by
  the device.  This is usually the stripe width for arrays.

  The alignment_offset parameter indicates the number of bytes the start
  of the device/partition is offset from the device's natural alignment.
  Partition tools and MD/DM utilities can use this to pad their offsets
  so filesystems start on proper boundaries.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c72758f3

22 4月, 2009 1 次提交

block: include empty disks in /proc/diskstats · 71982a40

由 Tejun Heo 提交于 4月 17, 2009

/proc/diskstats used to show stats for all disks whether they're
zero-sized or not and their non-zero partitions.  Commit
074a7aca accidentally changed the
behavior such that it doesn't print out zero sized disks.  This patch
implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and
uses it in diskstats_show() such that empty part0 is shown in
/proc/diskstats.

Reported and bisectd by Dianel Collins.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NDaniel Collins <solemnwarning@solemnwarning.no-ip.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

71982a40

24 3月, 2009 2 次提交

block: genhd.h cleanup patch · d3992286

由 Petros Koutoupis 提交于 3月 11, 2009

In include/linux/genhd.h: Line 335 has a comment that needs to be updated from: /* drivers/block/ll_rw_blk.c */ to /* block/blk-core.c */. Also as of kernel 2.6.16, the function definition for get_blkdev_list was removed from block/genhd.c but the function declaration is still present on line 339. This patch addresses both those fixes, by updating the comment and removing the declaration.
Signed-off-by: NPetros Koutoupis <pkoutoupis@hydrasystemsllc.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d3992286

block: genhd.h comment needs updating · 32ca163c

由 Petros Koutoupis 提交于 3月 10, 2009

The include/linux/genhd.h file, on line 338-352 declares some function
prototypes in which the comment on line 338 states that the definition of
these prototypes are to be found at drivers/block/genhd.c. The problem is
that genhd.c has been relocated to block/genhd.c. See attached patch to
correct this minor cosmetic typo.
Signed-off-by: NPetros Koutoupis <pkoutoupis@hydrasystemsllc.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

32ca163c

29 12月, 2008 1 次提交

block: add one-hit cache for disk partition lookup · a6f23657

由 Jens Axboe 提交于 10月 24, 2008

disk_map_sector_rcu() returns a partition from a sector offset,
which we use for IO statistics on a per-partition basis. The
lookup itself is an O(N) list lookup, where N is the number of
partitions. This actually hurts performance quite a bit, even
on the lower end partitions. On higher numbered partitions,
it can get pretty bad.

Solve this by adding a one-hit cache for partition lookup.
This makes the lookup O(1) for the case where we do most IO to
one partition. Even for mixed partition workloads, amortized cost
is pretty close to O(1) since the natural IO batching makes the
one-hit cache last for lots of IOs.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a6f23657

18 11月, 2008 1 次提交

block: make add_partition() return pointer to hd_struct · ba32929a

由 Tejun Heo 提交于 11月 10, 2008

Make add_partition() return pointer to the new hd_struct on success
and ERR_PTR() value on failure.  This change will be used to fix md
autodetection bug.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ba32929a

23 10月, 2008 2 次提交
- A
  proc: move /proc/diskstats boilerplate to block/genhd.c · 31d85ab2
  由 Alexey Dobriyan 提交于 10月 06, 2008
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
```
  31d85ab2
- A
  proc: move rest of /proc/partitions code to block/genhd.c · f500975a
  由 Alexey Dobriyan 提交于 10月 04, 2008
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
```
  f500975a
09 10月, 2008 14 次提交

block: allow disk to have extended device number · 3e1a7ff8

由 Tejun Heo 提交于 8月 25, 2008

Now that disk and partition handlings are mostly unified, it's easy to
allow disk to have extended device number.  This patch makes
add_disk() use extended device number if disk->minors is zero.  Both
sd and ide-disk are updated to use this.

* sd_format_disk_name() is implemented which can generically determine
  the drive name.  This removes disk number restriction stemming from
  limited device names.

* If sd index goes over SD_MAX_DISKS (which can be increased now BTW),
  sd simply doesn't initialize minors letting block layer choose
  extended device number.

* If CONFIG_DEBUG_EXT_DEVT is set, both sd and ide-disk always set
  minors to 0 and use extended device numbers.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3e1a7ff8

block: replace @ext_minors with GENHD_FL_EXT_DEVT · 689d6fac

由 Tejun Heo 提交于 8月 25, 2008

With previous changes, it's meaningless to limit the number of
partitions.  Replace @ext_minors with GENHD_FL_EXT_DEVT such that
setting the flag allows the disk to have maximum number of allowed
partitions (only limited by the number of entries in parsed_partitions
as determined by MAX_PART constant).

This kills not-too-pretty alloc_disk_ext[_node]() functions and makes
@minors parameter to alloc_disk[_node]() unnecessary.  The parameter
is left alone to avoid disturbing the users.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

689d6fac

block: make partition array dynamic · 540eed56

由 Tejun Heo 提交于 8月 25, 2008

disk->__part used to be statically allocated to the maximum possible
number of partitions.  This patch makes partition array allocation
dynamic.  The added overhead is minimal as only real change is one
memory dereference changed to RCU one.  This saves both a bit of
memory and cpu cycles iterating through unoccupied slots and makes
increasing partition limit easier.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

540eed56

block: move stats from disk to part0 · 074a7aca

由 Tejun Heo 提交于 8月 25, 2008

Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...

* part_stat_*() now updates part0 together if the specified partition
  is not part0.  ie. part_stat_*() are now essentially all_stat_*().

* {disk|all}_stat_*() are gone.

* part_round_stats() is updated similary.  It handles part0 stats
  automatically and disk_round_stats() is killed.

* part_{inc|dec}_in_fligh() is implemented which automatically updates
  part0 stats for parts other than part0.

* disk_map_sector_rcu() is updated to return part0 if no part matches.
  Combined with the above changes, this makes NULL special case
  handling in callers unnecessary.

* Separate stats show code paths for disk are collapsed into part
  stats show code paths.

* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()

While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

074a7aca

block: kill GENHD_FL_FAIL and use part0->make_it_fail · eddb2e26

由 Tejun Heo 提交于 8月 25, 2008

GENHD_FL_FAIL for disk is what make_it_fail is for parts.  Kill it and
use part0->make_it_fail.  Sysfs node handling is unified too.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

eddb2e26

block: always set bdev->bd_part · 0762b8bd

由 Tejun Heo 提交于 8月 25, 2008

Till now, bdev->bd_part is set only if the bdev was for parts other
than part0.  This patch makes bdev->bd_part always set so that code
paths don't have to differenciate common handling.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0762b8bd

block: move holder_dir from disk to part0 · 4c46501d

由 Tejun Heo 提交于 8月 25, 2008

Move disk->holder_dir to part0->holder_dir.  Kill now mostly
superflous bdev_get_holder().

While at it, kill superflous kobject_get/put() around holder_dir,
slave_dir and cmd_filter creation and collapse
disk_sysfs_add_subdirs() into register_disk().  These serve no purpose
but obfuscating the code.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

4c46501d

block: move policy from disk to part0 · b7db9956

由 Tejun Heo 提交于 8月 25, 2008

Move disk->policy to part0->policy.  Implement and use get_disk_ro().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b7db9956

block: unify sysfs size node handling · e5610521

由 Tejun Heo 提交于 8月 25, 2008

Now that capacity and __dev are moved to part0, part0 and others can
share the same method.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e5610521

block: move __dev from disk to part0 · 548b10eb

由 Tejun Heo 提交于 8月 29, 2008

Move disk->__dev to part0->__dev.  This simplifies bdget_disk() and
lookup_devt() and allows common sysfs attributes to be unified.
part_to_disk() is updated to handle part0 -> disk.

Updated to include a fix from Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>,
he writes:

"part0 is a "special" partition and doesn't need to have capacity set - this
fixes regression caused by "block: move __dev from disk to part0" commit."
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

548b10eb

block: move capacity from disk to part0 · 80795aef

由 Tejun Heo 提交于 8月 25, 2008

Move disk->capacity to part0->nr_sects and convert all users who
directly accessed the field to use {get|set}_capacity().  This is done
early to allow the __dev field to be moved.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

80795aef

block: introduce partition 0 · b5d0b9df

由 Tejun Heo 提交于 9月 03, 2008

genhd and partition code handled disk and partitions separately.  All
information about the whole disk was in struct genhd and partitions in
struct hd_struct.  However, the whole disk (part0) and other
partitions have a lot in common and the data structures end up having
good number of common fields and thus separate code paths doing the
same thing.  Also, the partition array was indexed by partno - 1 which
gets pretty confusing at times.

This patch introduces partition 0 and makes the partition array
indexed by partno.  Following patches will unify the handling of disk
and parts piece-by-piece.

This patch also implements disk_partitionable() which tests whether a
disk is partitionable.  With coming dynamic partition array change,
the most common usage of disk_max_parts() will be testing whether a
disk is partitionable and the number of max partitions will become
much less important.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b5d0b9df

block: implement and use {disk|part}_to_dev() · ed9e1982

由 Tejun Heo 提交于 8月 25, 2008

Implement {disk|part}_to_dev() and use them to access generic device
instead of directly dereferencing {disk|part}->dev.  To make sure no
user is left behind, rename generic devices fields to __dev.

This is in preparation of unifying partition 0 handling with other
partitions.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ed9e1982

block: implement extended dev numbers · bcce3de1

由 Tejun Heo 提交于 8月 25, 2008

Implement extended device numbers.  A block driver can tell block
layer that it wants to use extended device numbers.  After the usual
minor space is used up, block layer automatically allocates devt's
from EXT_BLOCK_MAJOR.

Currently only one major number is allocated for this but as the
allocation is strictly on-demand, ~1mil minor space under it should
suffice unless the system actually has more than ~1mil partitions and
if that ever happens adding more majors to the extended devt area is
easy.

Due to internal implementation issues, the first partition can't be
allocated on the extended area.  In other words, genhd->minors should
at least be 1.  This limitation will be lifted by later changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

bcce3de1