提交 · ea02259fdf47ca81ff3ca0c22906d989094fb8ff · openanolis / cloud-kernel

02 4月, 2009 1 次提交

由 Bartlomiej Zolnierkiewicz 提交于 4月 01, 2009

* Use ATA_CMD_* defines instead of WIN_* ones.

* Include <linux/ata.h> directly instead of through <linux/hdreg.h>.

Cc: Ed L. Cashin <ecashin@coraid.com>
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>

04b3ab52

25 11月, 2008 1 次提交

aoe: remove private mac address format function · 411c41ee

由 Harvey Harrison 提交于 11月 25, 2008

Add %pm to omit the colons when printing a mac address.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

411c41ee

09 10月, 2008 5 次提交

block: move stats from disk to part0 · 074a7aca

由 Tejun Heo 提交于 8月 25, 2008

Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...

* part_stat_*() now updates part0 together if the specified partition
  is not part0.  ie. part_stat_*() are now essentially all_stat_*().

* {disk|all}_stat_*() are gone.

* part_round_stats() is updated similary.  It handles part0 stats
  automatically and disk_round_stats() is killed.

* part_{inc|dec}_in_fligh() is implemented which automatically updates
  part0 stats for parts other than part0.

* disk_map_sector_rcu() is updated to return part0 if no part matches.
  Combined with the above changes, this makes NULL special case
  handling in callers unnecessary.

* Separate stats show code paths for disk are collapsed into part
  stats show code paths.

* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()

While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

074a7aca

block: move capacity from disk to part0 · 80795aef

由 Tejun Heo 提交于 8月 25, 2008

Move disk->capacity to part0->nr_sects and convert all users who
directly accessed the field to use {get|set}_capacity().  This is done
early to allow the __dev field to be moved.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

80795aef

block: fix diskstats access · c9959059

由 Tejun Heo 提交于 8月 25, 2008

There are two variants of stat functions - ones prefixed with double
underbars which don't care about preemption and ones without which
disable preemption before manipulating per-cpu counters.  It's unclear
whether the underbarred ones assume that preemtion is disabled on
entry as some callers don't do that.

This patch unifies diskstats access by implementing disk_stat_lock()
and disk_stat_unlock() which take care of both RCU (for partition
access) and preemption (for per-cpu counter access).  diskstats access
should always be enclosed between the two functions.  As such, there's
no need for the versions which disables preemption.  They're removed
and double underbars ones are renamed to drop the underbars.  As an
extra argument is added, there's no danger of using the old version
unconverted.

disk_stat_lock() uses get_cpu() and returns the cpu index and all
diskstat functions which access per-cpu counters now has @cpu
argument to help RT.

This change adds RCU or preemption operations at some places but also
collapses several preemption ops into one at others.  Overall, the
performance difference should be negligible as all involved ops are
very lightweight per-cpu ones.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c9959059

block: fix disk->part[] dereferencing race · e71bf0d0

由 Tejun Heo 提交于 9月 03, 2008

disk->part[] is protected by its matching bdev's lock.  However,
non-critical accesses like collecting stats and printing out sysfs and
proc information used to be performed without any locking.  As
partitions can come and go dynamically, partitions can go away
underneath those non-critical accesses.  As some of those accesses are
writes, this theoretically can lead to silent corruption.

This patch fixes the race by using RCU for the partition array and dev
reference counter to hold partitions.

* Rename disk->part[] to disk->__part[] to make sure no one outside
  genhd layer proper accesses it directly.

* Use RCU for disk->__part[] dereferencing.

* Implement disk_{get|put}_part() which can be used to get and put
  partitions from gendisk respectively.

* Iterators are implemented to help iterate through all partitions
  safely.

* Functions which require RCU readlock are marked with _rcu suffix.

* Use disk_put_part() in __blkdev_put() instead of directly putting
  the contained kobject.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e71bf0d0

block: misc updates · 310a2c10

由 Tejun Heo 提交于 8月 25, 2008

This patch makes the following misc updates in preparation for
disk->part dereference fix and extended block devt support.

* implment part_to_disk()

* fix comment about gendisk->part indexing

* rename get_part() to disk_map_sector()

* don't use n which is always zero while printing disk information in
  diskstats_show()
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

310a2c10

22 9月, 2008 1 次提交
- D
  aoe: Use SKB interfaces for list management instead of home-grown stuff. · e9bb8fb0
  由 David S. Miller 提交于 9月 21, 2008
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  e9bb8fb0
04 7月, 2008 1 次提交

block: use get_unaligned_* helpers · 823ed72e

由 Harvey Harrison 提交于 7月 04, 2008

Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

823ed72e

07 5月, 2008 1 次提交

block: avoid duplicate calls to get_part() in disk stat code · 28f13702

由 Jens Axboe 提交于 5月 07, 2008

get_part() is fairly expensive, as it O(N) loops over partitions
to find the right one. In lots of normal IO paths we end up looking
up the partition twice, to make matters even worse. Change the
stat add code to accept a passed in partition instead.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

28f13702

29 4月, 2008 1 次提交

drivers/block: use get_unaligned_* helpers · f885f8d1

由 Harvey Harrison 提交于 4月 29, 2008

Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Cc: Ed L. Cashin <ecashin@coraid.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f885f8d1

09 2月, 2008 8 次提交

aoe: update copyright date · 52e112b3

由 Ed L. Cashin 提交于 2月 08, 2008

Update the year in the copyright notices.
Signed-off-by: NEd L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

52e112b3

aoe: make error messages more specific · 578c4aa0

由 Ed L. Cashin 提交于 2月 08, 2008

Andrew Morton pointed out that the "too many targets" message in patch 2 could
be printed for failing GFP_ATOMIC allocations.  This patch makes the messages
more specific.
Signed-off-by: NEd L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

578c4aa0

aoe: the aoeminor doesn't need a long format · 1d75981a

由 Ed L. Cashin 提交于 2月 08, 2008

The aoedev aoeminor member doesn't need a long format.
Signed-off-by: NEd L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d75981a

aoe: add module parameter for users who need more outstanding I/O · 7df620d8

由 Ed L. Cashin 提交于 2月 08, 2008

An AoE target provides an estimate of the number of outstanding commands that
the AoE initiator can send before getting a response.  The aoe_maxout
parameter provides a way to set an even lower limit.  It will not allow a user
to use more outstanding commands than the target permits.  If a user discovers
a problem with a large setting, this parameter provides a way for us to work
with them to debug the problem.  We expect to improve the dynamic window
sizing algorithm and drop this parameter.  For the time being, it is a
debugging aid.
Signed-off-by: NEd L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7df620d8

aoe: only install new AoE device once · 6b9699bb

由 Ed L. Cashin 提交于 2月 08, 2008

An aoe driver user who had about 70 AoE targets found that he was hitting a
BUG in sysfs_create_file because the aoe driver was trying to tell the kernel
about an AoE device more than once.  Each AoE device was reachable by several
local network interfaces, and multiple ATA device indentify responses were
returning from that single device.

This patch eliminates a race condition so that aoe always informs the block
layer of a new AoE device once in the presence of multiple incoming ATA device
identify responses.
Signed-off-by: NEd L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6b9699bb

aoe: dynamically allocate a capped number of skbs when necessary · 9bb237b6

由 Ed L. Cashin 提交于 2月 08, 2008

What this Patch Does

  Even before this recent series of 12 patches to 2.6.22-rc4, the aoe
  driver was reusing a small set of skbs that were allocated once and
  were only used for outbound AoE commands.

  The network layer cannot be allowed to put_page on the data that is
  still associated with a bio we haven't returned to the block layer,
  so the aoe driver (even before the patch under discussion) is still
  the owner of skbs that have been handed to the network layer for
  transmission.  We need to keep track of these skbs so that we can
  free them, but by tracking them, we can also easily re-use them.

  The new patch was a response to the behavior of certain network
  drivers.  We cannot reuse an skb that the network driver still has
  in its transmit ring.  Network drivers can defer transmit ring
  cleanup and then use the state in the skb to determine how many data
  segments to clean up in its transmit ring.  The tg3 driver is one
  driver that behaves in this way.

  When the network driver defers cleanup of its transmit ring, the aoe
  driver can find itself in a situation where it would like to send an
  AoE command, and the AoE target is ready for more work, but the
  network driver still has all of the pre-allocated skbs.  In that
  case, the new patch just calls alloc_skb, as you'd expect.

  We don't want to get carried away, though.  We try not to do
  excessive allocation in the write path, so we cap the number of skbs
  we dynamically allocate.

  Probably calling it a "dynamic pool" is misleading.  We were already
  trying to use a small fixed-size set of pre-allocated skbs before
  this patch, and this patch just provides a little headroom (with a
  ceiling, though) to accomodate network drivers that hang onto skbs,
  by allocating when needed.  The d->skbpool_hd list of allocated skbs
  is necessary so that we can free them later.

  We didn't notice the need for this headroom until AoE targets got
  fast enough.

Alternatives

  If the network layer never did a put_page on the pages in the bio's
  we get from the block layer, then it would be possible for us to
  hand skbs to the network layer and forget about them, allowing the
  network layer to free skbs itself (and thereby calling our own
  skb->destructor callback function if we needed that).  In that case
  we could get rid of the pre-allocated skbs and also the
  d->skbpool_hd, instead just calling alloc_skb every time we wanted
  to transmit a packet.  The slab allocator would effectively maintain
  the list of skbs.

  Besides a loss of CPU cache locality, the main concern with that
  approach the danger that it would increase the likelihood of
  deadlock when VM is trying to free pages by writing dirty data from
  the page cache through the aoe driver out to persistent storage on
  an AoE device.  Right now we have a situation where we have
  pre-allocation that corresponds to how much we use, which seems
  ideal.

  Of course, there's still the separate issue of receiving the packets
  that tell us that a write has successfully completed on the AoE
  target.  When memory is low and VM is using AoE to flush dirty data
  to free up pages, it would be perfect if there were a way for us to
  register a fast callback that could recognize write command
  completion responses.  But I don't think the current problems with
  the receive side of the situation are a justification for
  exacerbating the problem on the transmit side.
Signed-off-by: NEd L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9bb237b6

aoe: mac_addr: avoid 64-bit arch compiler warnings · 1eb0da4c

由 Ed L. Cashin 提交于 2月 08, 2008

By returning unsigned long long, mac_addr does not generate compiler warnings
on 64-bit architectures.
Signed-off-by: NEd L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1eb0da4c

aoe: handle multiple network paths to AoE device · 68e0d42f

由 Ed L. Cashin 提交于 2月 08, 2008

A remote AoE device is something can process ATA commands and is identified by
an AoE shelf number and an AoE slot number. Such a device might have more
than one network interface, and it might be reachable by more than one local
network interface. This patch tracks the available network paths available to
each AoE device, allowing them to be used more efficiently.

Andrew Morton asked about the call to msleep_interruptible in the revalidate
function. Yes, if a signal is pending, then msleep_interruptible will not
return 0. That means we will not loop but will call aoenet_xmit with a NULL
skb, which is a noop. If the system is too low on memory or the aoe driver is
too low on frames, then the user can hit control-C to interrupt the attempt to
do a revalidate. I have added a comment to the code summarizing that.

Andrew Morton asked whether the allocation performed inside addtgt could use a
more relaxed allocation like GFP_KERNEL, but addtgt is called when the aoedev
lock has been locked with spin_lock_irqsave. It would be nice to allocate the
memory under fewer restrictions, but targets are only added when the device is
being discovered, and if the target can't be added right now, we can try again
in a minute when then next AoE config query broadcast goes out.

Andrew Morton pointed out that the "too many targets" message could be printed
for failing GFP_ATOMIC allocations. The last patch in this series makes the
messages more specific.
Signed-off-by: NEd L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

68e0d42f

08 2月, 2008 1 次提交

Enhanced partition statistics: aoe fix · a890d62b

由 Jerome Marchand 提交于 2月 08, 2008

Updates the enhanced partition statistics in ATA over Ethernet driver
(not tested).
Signed-off-by: NJerome Marchand <jmarchan@redhat.com>

a890d62b

17 10月, 2007 1 次提交

aoe: remove unecessary wrapper function · abdbf94d

由 Ed L. Cashin 提交于 10月 16, 2007

We can just use skb_mac_header now, and we don't need a wrapper function to
perform the cast. Instead of requiring the reader to check aoe.h to look
up what an aoe_hdr function does, I'd rather do without it.
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

abdbf94d

11 10月, 2007 1 次提交

[NET]: Make the device list and device lookups per namespace. · 881d966b

由 Eric W. Biederman 提交于 9月 17, 2007

This patch makes most of the generic device layer network
namespace safe.  This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables.  The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces.  The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace.  This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change.  Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

881d966b

10 10月, 2007 1 次提交

Drop 'size' argument from bio_endio and bi_end_io · 6712ecf8

由 NeilBrown 提交于 9月 27, 2007

As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant.  Remove it.

Now there is no need for bio_endio to subtract the size completed
from bi_size.  So don't do that either.

While we are at it, change bi_end_io to return void.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6712ecf8

04 5月, 2007 1 次提交

[NET]: Rework dev_base via list_head (v3) · 7562f876

由 Pavel Emelianov 提交于 5月 03, 2007

Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().
Signed-off-by: NPavel Emelianov <xemul@openvz.org>
Acked-by: NKirill Korotaev <dev@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7562f876

26 4月, 2007 3 次提交

[SK_BUFF]: Introduce skb_reset_network_header(skb) · c1d2bbe1

由 Arnaldo Carvalho de Melo 提交于 4月 10, 2007

For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can
later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1d2bbe1

[SK_BUFF]: Introduce skb_reset_mac_header(skb) · 459a98ed

由 Arnaldo Carvalho de Melo 提交于 3月 19, 2007

For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

459a98ed

[AOE]: Introduce aoe_hdr() · 029720f1

由 Arnaldo Carvalho de Melo 提交于 3月 10, 2007

For consistency with other skb->mac.raw users.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

029720f1

03 3月, 2007 1 次提交
- D
  [AOE]: Add get_unaligned() calls where needed. · 43ecf529
  由 David S. Miller 提交于 3月 01, 2007
```
Based upon a report by Andrew Walrond.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  43ecf529
23 12月, 2006 1 次提交

[PATCH] fix aoe without scatter-gather [Bug 7662] · 19900cde

由 Ed L. Cashin 提交于 12月 22, 2006

Fix a bug that only appears when AoE goes over a network card that does not
support scatter-gather.  The headers in the linear part of the skb appeared
to be larger than they really were, resulting in data that was offset by 24
bytes.

This patch eliminates the offset data on cards that don't support
scatter-gather or have had scatter-gather turned off.  There remains an
unrelated issue that I'll address in a separate email.

Fixes bugzilla #7662
Signed-off-by: N"Ed L. Cashin" <ecashin@coraid.com>
Cc: <stable@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: <boddingt@optusnet.com.au>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

19900cde

22 11月, 2006 1 次提交
- D
  WorkStruct: make allyesconfig · c4028958
  由 David Howells 提交于 11月 22, 2006
```
Fix up for make allyesconfig.
Signed-Off-By: NDavid Howells <dhowells@redhat.com>
```
  c4028958
19 10月, 2006 10 次提交

aoe: revert printk macros · a12c93f0