- 29 4月, 2008 4 次提交
-
-
由 FUJITA Tomonori 提交于
blk_get_request initializes rq->cmd (rq_init does) so the users don't need to do that. The purpose of this patch is to remove sizeof(rq->cmd) and &rq->cmd, as a preparation for large command support, which changes rq->cmd from the static array to a pointer. sizeof(rq->cmd) will not make sense and &rq->cmd won't work. Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 FUJITA Tomonori 提交于
The block layer initializes rq->cmd (queue_flush calls rq_init) so prepare_flush_fn hooks don't need to do that. The purpose of this patch is to remove sizeof(rq->cmd), as a preparation for large command support, which changes rq->cmd from the static array to a pointer. sizeof(rq->cmd) will not make sense. Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Nick Piggin 提交于
We can save some atomic ops in the IO path, if we clearly define the rules of how to modify the queue flags. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Geert Uytterhoeven 提交于
As ps3disk is a ppc64-only driver, sector_t equals to unsigned long, and the cast is not needed. Reuse in another (possibly 32-bit) driver is protected by the safety net called `compiler warning' (with the cast, it may silently truncate to 32-bit). If sector_t ever changes, we will get a compiler warning as well (with the cast, we won't). Signed-off-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 28 4月, 2008 1 次提交
-
-
由 Jared Hulbert 提交于
Alter the block device ->direct_access() API to work with the new get_xip_mem() API (that requires both kaddr and pfn are returned). Some architectures will not do the right thing in their virt_to_page() for use by XIP (to translate from the kernel virtual address returned by direct_access(), to a user mappable pfn in XIP's page fault handler. However, we can't switch it to just return the pfn and not the kaddr, because we have no good way to get a kva from a pfn, and XIP requires the kva for its read(2) and write(2) handlers. So we have to return both. Signed-off-by: NJared Hulbert <jaredeh@gmail.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-mm@kvack.org Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 4月, 2008 5 次提交
-
-
由 Mark McLoughlin 提交于
Before getting merged, xen-blkfront was xenblk and xen-netfront was xennet. Temporarily adding compatibility module aliases eases upgrades from older versions by e.g. allowing mkinitrd to find the new version of the module. Signed-off-by: NMark McLoughlin <markmc@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Mark McLoughlin 提交于
Add module aliases to support autoprobing modules for xen frontend devices. Signed-off-by: NMark McLoughlin <markmc@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Christian Limpach 提交于
When the xen block frontend driver is built as a module the module load is only synchronous up to the point where the frontend and the backend become connected rather than when the disk is added. This means that there can be a race on boot between loading the module and loading the dm-* modules and doing the scan for LVM physical volumes (all in the initrd). In the failure case the disk is not present until after the scan for physical volumes is complete. Taken from: http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017Signed-off-by: NChristian Limpach <Christian.Limpach@xensource.com> Signed-off-by: NMark McLoughlin <markmc@redhat.com> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Jeremy Fitzhardinge 提交于
info->dev is never initialized to anything, so bdget(info->dev) is meaningless. Get rid of info->dev, and use bdget_disk on the gendisk. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Markus Armbruster 提交于
Frontends are expected to write their protocol ABI to xenstore. Since the protocol ABI defaults to the backend's native ABI, things work fine without that as long as the frontend's native ABI is identical to the backend's native ABI. This is not the case for xen-blkfront running 32-on-64, because its ABI differs between 32 and 64 bit, and thus needs this fix. Based on http://xenbits.xensource.com/xen-unstable.hg?rev/c545932a18f3 and http://xenbits.xensource.com/xen-unstable.hg?rev/ffe52263b430 by Gerd Hoffmann <kraxel@suse.de> Signed-off-by: NMarkus Armbruster <armbru@redhat.com> Signed-off-by: NJeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 23 4月, 2008 1 次提交
-
-
由 Petr Tesarik 提交于
While looking at the implementation of the Ram backed block device driver, I stumbled across a write-only local variable, which makes little sense, so I assume it should actually work like this: Signed-off-by: NPetr Tesarik <ptesarik@suse.cz> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 4月, 2008 4 次提交
-
-
由 Harvey Harrison 提交于
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
* Fix oops on cciss rmmod due to calling pci_free_consistent with irqs disabled. Signed-off-by: NStephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
Fix race condition between cciss_init_one(), cciss_update_drive_info(), and cciss_check_queues(). Signed-off-by: NStephen M. Cameron <scameron@beardog.cca.cpqcorp.net> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Laurent Vivier 提交于
This patch allows to use loop device with partitionned disk image. Original behavior of loop is not modified. A new parameter is introduced to define how many partition we want to be able to manage per loop device. This parameter is "max_part". For instance, to manage 63 partitions / loop device, we will do: # modprobe loop max_part=63 # ls -l /dev/loop?* brw-rw---- 1 root disk 7, 0 2008-03-05 14:55 /dev/loop0 brw-rw---- 1 root disk 7, 64 2008-03-05 14:55 /dev/loop1 brw-rw---- 1 root disk 7, 128 2008-03-05 14:55 /dev/loop2 brw-rw---- 1 root disk 7, 192 2008-03-05 14:55 /dev/loop3 brw-rw---- 1 root disk 7, 256 2008-03-05 14:55 /dev/loop4 brw-rw---- 1 root disk 7, 320 2008-03-05 14:55 /dev/loop5 brw-rw---- 1 root disk 7, 384 2008-03-05 14:55 /dev/loop6 brw-rw---- 1 root disk 7, 448 2008-03-05 14:55 /dev/loop7 And to attach a raw partitionned disk image, the original losetup is used: # losetup -f etch.img # ls -l /dev/loop?* brw-rw---- 1 root disk 7, 0 2008-03-05 14:55 /dev/loop0 brw-rw---- 1 root disk 7, 1 2008-03-05 14:57 /dev/loop0p1 brw-rw---- 1 root disk 7, 2 2008-03-05 14:57 /dev/loop0p2 brw-rw---- 1 root disk 7, 5 2008-03-05 14:57 /dev/loop0p5 brw-rw---- 1 root disk 7, 64 2008-03-05 14:55 /dev/loop1 brw-rw---- 1 root disk 7, 128 2008-03-05 14:55 /dev/loop2 brw-rw---- 1 root disk 7, 192 2008-03-05 14:55 /dev/loop3 brw-rw---- 1 root disk 7, 256 2008-03-05 14:55 /dev/loop4 brw-rw---- 1 root disk 7, 320 2008-03-05 14:55 /dev/loop5 brw-rw---- 1 root disk 7, 384 2008-03-05 14:55 /dev/loop6 brw-rw---- 1 root disk 7, 448 2008-03-05 14:55 /dev/loop7 # mount /dev/loop0p1 /mnt # ls /mnt bench cdrom home lib mnt root srv usr bin dev initrd lost+found opt sbin sys var boot etc initrd.img media proc selinux tmp vmlinuz # umount /mnt # losetup -d /dev/loop0 Of course, the same behavior can be done using kpartx on a loop device, but modifying loop avoids to stack several layers of block device (loop + device mapper), this is a very light modification (40% of modifications are to manage the new parameter). Signed-off-by: NLaurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 19 4月, 2008 1 次提交
-
-
由 Matthew Wilcox 提交于
None of these files use any of the functionality promised by asm/semaphore.h. It's possible that they rely on it dragging in some unrelated header file, but I can't build all these files, so we'll have fix any build failures as they come up. Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
-
- 11 4月, 2008 1 次提交
-
-
由 Mike Pagano 提交于
This patch adds the missing include directive <linux/scatterlist.h> to the cciss.c source file. This was discovered by our release team when building the kernel for the Alpha architecture. Errors were found as references to functions 'sg_init_table' and 'sg_page' do not exist without the include for Alpha. Signed-off-by: NMike Pagano <mpagano@gentoo.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: <mike.miller@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 4月, 2008 1 次提交
-
-
由 Pete Zaitcev 提交于
When __blk_end_request returns nonzero, it means that the request was not completely processed and some BIOs are still attached. Since we have dequeued it by that time, it means leaking requests and hanging processes, which is why BUG() was in there. In ub this happens if a packet request ends normally, but with residue (e.g. when scsi_id issues INQUIRY). The fix is to make sure that arguments passed to __blk_end_request are correct: the full request length and not just transferred length. The transferred length is indicated to applications by adjusting rq->data_len with old, unchanged code outside of this patch. Signed-off-by: NPete Zaitcev <zaitcev@redhat.com> Cc: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Cc: Greg KH <greg@kroah.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2008 1 次提交
-
-
由 Mike Snitzer 提交于
NBD does not protect the nbd_device's socket from becoming NULL during receives. This closes a race with the NBD_CLEAR_SOCK ioctl (nbd-client -d) setting the nbd_device's socket to NULL right before NBD calls sock_xmit. Signed-off-by: NMike Snitzer <snitzer@gmail.com> Cc: Paul Clements <paul.clements@steeleye.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 4月, 2008 1 次提交
-
-
由 Julia Lawall 提交于
Robert P.J. Day proposed to use the macro FIELD_SIZEOF in replace of code that matches its definition. The modification was made using the following semantic patch (http://www.emn.fr/x-info/coccinelle/) // <smpl> @haskernel@ @@ #include <linux/kernel.h> @depends on haskernel@ type t; identifier f; @@ - (sizeof(((t*)0)->f)) + FIELD_SIZEOF(t, f) @depends on haskernel@ type t; identifier f; @@ - sizeof(((t*)0)->f) + FIELD_SIZEOF(t, f) // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 26 3月, 2008 1 次提交
-
-
由 YOSHIFUJI Hideaki 提交于
Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
-
- 17 3月, 2008 2 次提交
-
-
由 Jeremy Katz 提交于
Fix up so that the virtio_blk devices in sysfs link correctly to their block device. This then allows them to be detected by hal, etc Signed-off-by: NJeremy Katz <katzj@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Adrian Bunk 提交于
no longer working for some time. A driver that had been marked as BROKEN for such a long time seems to be unlikely to be revived in the forseeable future. But if anyone wants to ever revive this driver, the code is still present in the older kernel releases. Signed-off-by: NAdrian Bunk <bunk@kernel.org> Acked-by: NAlan Cox <alan@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 14 3月, 2008 1 次提交
-
-
由 Jiri Slaby 提交于
Floppy rmmod locks up when no such hardware was initialized, since there is nobody to wake the remove code up. Remove the completion, because release is called during platform_unregister anyway. Signed-off-by: NJiri Slaby <jirislaby@gmail.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 3月, 2008 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
The iSeries viodasd drivers does some very strange things with scatterlists, one of these causing a BUG_ON to trigger when scatterlist debugging is enabled due to initializing the scatterlist with memset instead of sg_init_table(). This fixes it by using sg_init_table(). The rest of the stuff it does to that poor list is still pretty awful but it will work. I may look into fixing things in a nicer way some other time. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 05 3月, 2008 1 次提交
-
-
由 Peter Osterlund 提交于
On my system, pkt_open() consumes 584 bytes because the compiler decides to inline lots of functions that would not normally be part of long call chains. The following patch fixes that problem on my system. Signed-off-by: NPeter Osterlund <petero2@telia.com> Cc: Nix <nix@esperi.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 3月, 2008 2 次提交
-
-
由 Mike Miller 提交于
This patch removes the #define READ_AHEAD 1024 from the driver and uses the block layer defaults, instead. We have found that under certain workloads the setting can cause a disk connected to the e200 controller to go offline. If the disk hiccups the link may try to downshift but the controller is never notified that the link successfully completed the renegotiation. We've also found that performance using the block layer default of 32 pages was on par with the 1024 setting. We tried setting it to zero at one time based on info from our firmware guys but that killed performance. Turns out we were talking about 2 different read ahead settings. Please consider this for inclusion. Signed-off-by: NMike Miller <mike.miller@hp.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Mike Miller 提交于
volumes This patch allows us to display information about all of the logical volumes configured on a particular controller without stepping on memory even when there are many volumes (128 or more) configured. Please consider this for inclusion. Signed-off-by: NMike Miller <mike.miller@hp.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 24 2月, 2008 1 次提交
-
-
由 Paul Clements 提交于
NBD doesn't work well with CFQ (or AS) schedulers, so let's default to something else. The two problems I have experienced with nbd and cfq are: 1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been fixed There's a similar debian bug that has been filed as well: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638 There have been posts to nbd-general mailing list about problems with cfq and nbd also. 2) nbd performs about 10% better (the last time I tested) with deadline vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not being a real disk], and you end up going through the I/O scheduler on the nbd server anyway, so it makes sense that deadline is better with nbd) Signed-off-by: NPaul Clements <paul.clements@steeleye.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 2月, 2008 1 次提交
-
-
由 Ian Campbell 提交于
The below implements the getgeo hook for Xen block devices. Extracted from the xen-unstable tree where it has been used for ages. It is useful to have because it allows things like grub2 (used by the Debian installer images) to work in a guest domain without having to sprinkle Xen specific hacks around the place. Signed-off-by: NIan Campbell <ijc@hellion.org.uk> Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 2月, 2008 1 次提交
-
-
由 Tony Breeds 提交于
The current pmac32_defconfig fails to build with the following error: Building modules, stage 2. ERROR: "check_media_bay" [drivers/block/swim3.ko] undefined! WARNING: modpost: Found 23 section mismatch(es). To see full details build your kernel with: 'make CONFIG_DEBUG_SECTION_MISMATCH=y' make[2]: *** [__modpost] Error 1 This patch fixes that. Signed-off-by: NTony Breeds <tony@bakeyournoodle.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Acked-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 2月, 2008 1 次提交
-
-
由 Pete Zaitcev 提交于
Signed-off-by: NPete Zaitcev <zaitcev@redhat.com> Cc: "Oliver Pinter" <oliver.pntr@gmail.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 2月, 2008 8 次提交
-
-
由 Paul Clements 提交于
Remove the arbitrary 128 device limit for NBD. nbds_max can now be set to any number. In certain scenarios where devices are used sparsely we have run into the 128 device limit. Signed-off-by: NPaul Clements <paul.clements@steeleye.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
I guess aoedev_init() can go away now. Cc: Greg KH <greg@kroah.com> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ed L. Cashin 提交于
Update the year in the copyright notices. Signed-off-by: NEd L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ed L. Cashin 提交于
Andrew Morton pointed out that the "too many targets" message in patch 2 could be printed for failing GFP_ATOMIC allocations. This patch makes the messages more specific. Signed-off-by: NEd L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ed L. Cashin 提交于
The aoedev aoeminor member doesn't need a long format. Signed-off-by: NEd L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ed L. Cashin 提交于
An AoE target provides an estimate of the number of outstanding commands that the AoE initiator can send before getting a response. The aoe_maxout parameter provides a way to set an even lower limit. It will not allow a user to use more outstanding commands than the target permits. If a user discovers a problem with a large setting, this parameter provides a way for us to work with them to debug the problem. We expect to improve the dynamic window sizing algorithm and drop this parameter. For the time being, it is a debugging aid. Signed-off-by: NEd L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ed L. Cashin 提交于
An aoe driver user who had about 70 AoE targets found that he was hitting a BUG in sysfs_create_file because the aoe driver was trying to tell the kernel about an AoE device more than once. Each AoE device was reachable by several local network interfaces, and multiple ATA device indentify responses were returning from that single device. This patch eliminates a race condition so that aoe always informs the block layer of a new AoE device once in the presence of multiple incoming ATA device identify responses. Signed-off-by: NEd L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ed L. Cashin 提交于
What this Patch Does Even before this recent series of 12 patches to 2.6.22-rc4, the aoe driver was reusing a small set of skbs that were allocated once and were only used for outbound AoE commands. The network layer cannot be allowed to put_page on the data that is still associated with a bio we haven't returned to the block layer, so the aoe driver (even before the patch under discussion) is still the owner of skbs that have been handed to the network layer for transmission. We need to keep track of these skbs so that we can free them, but by tracking them, we can also easily re-use them. The new patch was a response to the behavior of certain network drivers. We cannot reuse an skb that the network driver still has in its transmit ring. Network drivers can defer transmit ring cleanup and then use the state in the skb to determine how many data segments to clean up in its transmit ring. The tg3 driver is one driver that behaves in this way. When the network driver defers cleanup of its transmit ring, the aoe driver can find itself in a situation where it would like to send an AoE command, and the AoE target is ready for more work, but the network driver still has all of the pre-allocated skbs. In that case, the new patch just calls alloc_skb, as you'd expect. We don't want to get carried away, though. We try not to do excessive allocation in the write path, so we cap the number of skbs we dynamically allocate. Probably calling it a "dynamic pool" is misleading. We were already trying to use a small fixed-size set of pre-allocated skbs before this patch, and this patch just provides a little headroom (with a ceiling, though) to accomodate network drivers that hang onto skbs, by allocating when needed. The d->skbpool_hd list of allocated skbs is necessary so that we can free them later. We didn't notice the need for this headroom until AoE targets got fast enough. Alternatives If the network layer never did a put_page on the pages in the bio's we get from the block layer, then it would be possible for us to hand skbs to the network layer and forget about them, allowing the network layer to free skbs itself (and thereby calling our own skb->destructor callback function if we needed that). In that case we could get rid of the pre-allocated skbs and also the d->skbpool_hd, instead just calling alloc_skb every time we wanted to transmit a packet. The slab allocator would effectively maintain the list of skbs. Besides a loss of CPU cache locality, the main concern with that approach the danger that it would increase the likelihood of deadlock when VM is trying to free pages by writing dirty data from the page cache through the aoe driver out to persistent storage on an AoE device. Right now we have a situation where we have pre-allocation that corresponds to how much we use, which seems ideal. Of course, there's still the separate issue of receiving the packets that tell us that a write has successfully completed on the AoE target. When memory is low and VM is using AoE to flush dirty data to free up pages, it would be perfect if there were a way for us to register a fast callback that could recognize write command completion responses. But I don't think the current problems with the receive side of the situation are a justification for exacerbating the problem on the transmit side. Signed-off-by: NEd L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-