1. 30 7月, 2008 19 次提交
  2. 27 7月, 2008 3 次提交
    • A
      [PATCH] f_count may wrap around · 516e0cc5
      Al Viro 提交于
      make it atomic_long_t; while we are at it, get rid of useless checks in affs,
      hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      516e0cc5
    • N
      mm: speculative page references · e286781d
      Nick Piggin 提交于
      If we can be sure that elevating the page_count on a pagecache page will
      pin it, we can speculatively run this operation, and subsequently check to
      see if we hit the right page rather than relying on holding a lock or
      otherwise pinning a reference to the page.
      
      This can be done if get_page/put_page behaves consistently throughout the
      whole tree (ie.  if we "get" the page after it has been used for something
      else, we must be able to free it with a put_page).
      
      Actually, there is a period where the count behaves differently: when the
      page is free or if it is a constituent page of a compound page.  We need
      an atomic_inc_not_zero operation to ensure we don't try to grab the page
      in either case.
      
      This patch introduces the core locking protocol to the pagecache (ie.
      adds page_cache_get_speculative, and tweaks some update-side code to make
      it work).
      
      Thanks to Hugh for pointing out an improvement to the algorithm setting
      page_count to zero when we have control of all references, in order to
      hold off speculative getters.
      
      [kamezawa.hiroyu@jp.fujitsu.com: fix migration_entry_wait()]
      [hugh@veritas.com: fix add_to_page_cache]
      [akpm@linux-foundation.org: repair a comment]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jeff Garzik <jeff@garzik.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Reviewed-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e286781d
    • F
      dma-mapping: add the device argument to dma_mapping_error() · 8d8bb39b
      FUJITA Tomonori 提交于
      Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
      architecture does:
      
      This enables us to cleanly fix the Calgary IOMMU issue that some devices
      are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
      
      I think that per-device dma_mapping_ops support would be also helpful for
      KVM people to support PCI passthrough but Andi thinks that this makes it
      difficult to support the PCI passthrough (see the above thread).  So I
      CC'ed this to KVM camp.  Comments are appreciated.
      
      A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
      pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
      NULL, the system-wide dma_ops pointer is used as before.
      
      If it's useful for KVM people, I plan to implement a mechanism to register
      a hook called when a new pci (or dma capable) device is created (it works
      with hot plugging).  It enables IOMMUs to set up an appropriate
      dma_mapping_ops per device.
      
      The major obstacle is that dma_mapping_error doesn't take a pointer to the
      device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
      device.  Note all the POWER IOMMUs use the same dma_mapping_error function
      so this is not a problem for POWER but x86 IOMMUs use different
      dma_mapping_error functions.
      
      The first patch adds the device argument to dma_mapping_error.  The patch
      is trivial but large since it touches lots of drivers and dma-mapping.h in
      all the architecture.
      
      This patch:
      
      dma_mapping_error() doesn't take a pointer to the device unlike other DMA
      operations.  So we can't have dma_mapping_ops per device.
      
      Note that POWER already has dma_mapping_ops per device but all the POWER
      IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
      argument.
      
      [akpm@linux-foundation.org: fix sge]
      [akpm@linux-foundation.org: fix svc_rdma]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix bnx2x]
      [akpm@linux-foundation.org: fix s2io]
      [akpm@linux-foundation.org: fix pasemi_mac]
      [akpm@linux-foundation.org: fix sdhci]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix sparc]
      [akpm@linux-foundation.org: fix ibmvscsi]
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Avi Kivity <avi@qumranet.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d8bb39b
  3. 26 7月, 2008 2 次提交
  4. 25 7月, 2008 7 次提交
    • R
      ibmveth: enable driver for CMO · 1096d63d
      Robert Jennings 提交于
      Enable ibmveth for Cooperative Memory Overcommitment (CMO).  For this driver
      it means calculating a desired amount of IO memory based on the current MTU
      and updating this value with the bus when MTU changes occur.  Because DMA
      mappings can fail, we have added a bounce buffer for temporary cases where
      the driver can not map IO memory for the buffer pool.
      
      The following changes are made to enable the driver for CMO:
       * DMA mapping errors will not result in error messages if entitlement has
         been exceeded and resources were not available.
       * DMA mapping errors are handled gracefully, ibmveth_replenish_buffer_pool()
         is corrected to check the return from dma_map_single and fail gracefully.
       * The driver will have a get_desired_dma function defined to function
         in a CMO environment.
       * When the MTU is changed, the driver will update the device IO entitlement
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NSantiago Leon <santil@us.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1096d63d
    • S
      ibmveth: Automatically enable larger rx buffer pools for larger mtu · ea866e65
      Santiago Leon 提交于
      Activates larger rx buffer pools when the MTU is changed to a larger
      value.  This patch de-activates the large rx buffer pools when the MTU
      changes to a smaller value.
      Signed-off-by: NSantiago Leon <santil@us.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ea866e65
    • R
      virtio: Recycle unused recv buffer pages for large skbs in net driver · fb6813f4
      Rusty Russell 提交于
      If we hack the virtio_net driver to always allocate full-sized (64k+)
      skbuffs, the driver slows down (lguest numbers):
      
        Time to receive 1GB (small buffers): 10.85 seconds
        Time to receive 1GB (64k+ buffers): 24.75 seconds
      
      Of course, large buffers use up more space in the ring, so we increase
      that from 128 to 2048:
      
        Time to receive 1GB (64k+ buffers, 2k ring): 16.61 seconds
      
      If we recycle pages rather than using alloc_page/free_page:
      
        Time to receive 1GB (64k+ buffers, 2k ring, recycle pages): 10.81 seconds
      
      This demonstrates that with efficient allocation, we don't need to
      have a separate "small buffer" queue.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      fb6813f4
    • H
      virtio net: Allow receiving SG packets · 97402b96
      Herbert Xu 提交于
      Finally this patch lets virtio_net receive GSO packets in addition
      to sending them.  This can definitely be optimised for the non-GSO
      case.  For comparison the Xen approach stores one page in each skb
      and uses subsequent skb's pages to construct an SG skb instead of
      preallocating the maximum amount of pages per skb.
      
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added feature bits)
      97402b96
    • H
      virtio net: Add ethtool ops for SG/GSO · a9ea3fc6
      Herbert Xu 提交于
      This patch adds some basic ethtool operations to virtio_net so
      I could test SG without GSO (which was really useful because TSO
      turned out to be buggy :)
      
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (remove MTU setting)
      a9ea3fc6
    • M
      virtio: fix virtio_net xmit of freed skb bug · 9953ca6c
      Mark McLoughlin 提交于
      On Mon, 2008-05-26 at 17:42 +1000, Rusty Russell wrote:
      > If we fail to transmit a packet, we assume the queue is full and put
      > the skb into last_xmit_skb.  However, if more space frees up before we
      > xmit it, we loop, and the result can be transmitting the same skb twice.
      >
      > Fix is simple: set skb to NULL if we've used it in some way, and check
      > before sending.
      ...
      > diff -r 564237b31993 drivers/net/virtio_net.c
      > --- a/drivers/net/virtio_net.c	Mon May 19 12:22:00 2008 +1000
      > +++ b/drivers/net/virtio_net.c	Mon May 19 12:24:58 2008 +1000
      > @@ -287,21 +287,25 @@ again:
      >  	free_old_xmit_skbs(vi);
      >
      >  	/* If we has a buffer left over from last time, send it now. */
      > -	if (vi->last_xmit_skb) {
      > +	if (unlikely(vi->last_xmit_skb)) {
      >  		if (xmit_skb(vi, vi->last_xmit_skb) != 0) {
      >  			/* Drop this skb: we only queue one. */
      >  			vi->dev->stats.tx_dropped++;
      >  			kfree_skb(skb);
      > +			skb = NULL;
      >  			goto stop_queue;
      >  		}
      >  		vi->last_xmit_skb = NULL;
      
      With this, may drop an skb and then later in the function discover that
      we could have sent it after all. Poor wee skb :)
      
      How about the incremental patch below?
      
      Cheers,
      Mark.
      
      Subject: [PATCH] virtio_net: Delay dropping tx skbs
      
      Currently we drop the skb in start_xmit() if we have a
      queued buffer and fail to transmit it.
      
      However, if we delay dropping it until we've stopped the
      queue and enabled the tx notification callback, then there
      is a chance space might become available for it.
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9953ca6c
    • A
      PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures · 27ac792c
      Andrea Righi 提交于
      On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
      boundary. For example:
      
      	u64 val = PAGE_ALIGN(size);
      
      always returns a value < 4GB even if size is greater than 4GB.
      
      The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
      example):
      
      #define PAGE_SHIFT      12
      #define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
      #define PAGE_MASK       (~(PAGE_SIZE-1))
      ...
      #define PAGE_ALIGN(addr)       (((addr)+PAGE_SIZE-1)&PAGE_MASK)
      
      The "~" is performed on a 32-bit value, so everything in "and" with
      PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
      Using the ALIGN() macro seems to be the right way, because it uses
      typeof(addr) for the mask.
      
      Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
      include/linux/mm.h.
      
      See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
      
      [akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
      [akpm@linux-foundation.org: fix v850]
      [akpm@linux-foundation.org: fix powerpc]
      [akpm@linux-foundation.org: fix arm]
      [akpm@linux-foundation.org: fix mips]
      [akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
      [akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
      [akpm@linux-foundation.org: fix powerpc]
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27ac792c
  5. 24 7月, 2008 1 次提交
    • I
      e1000e: fix e1000_netpoll(), remove extraneous e1000_clean_tx_irq() call · e8ebe3b8
      Ingo Molnar 提交于
      Evgeniy Polyakov noticed that drivers/net/e1000e/netdev.c:e1000_netpoll()
      was calling e1000_clean_tx_irq() without taking the TX lock.
      
      David Miller suggested to remove the call altogether: since in this
      callpah there's periodic calls to ->poll() anyway which will do
      e1000_clean_tx_irq() and will garbage-collect any finished TX ring
      descriptors.
      
      This fix solved the e1000e+netconsole crashes i've been seeing:
      
      =============================================================================
      BUG skbuff_head_cache: Poison overwritten
      -----------------------------------------------------------------------------
      
      INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b
      INFO: Allocated in __alloc_skb+0x2c/0x110 age=0 cpu=0 pid=5098
      INFO: Freed in __kfree_skb+0x31/0x80 age=0 cpu=1 pid=4440
      INFO: Slab 0xc16cc140 objects=16 used=1 fp=0xf658ae00 flags=0x400000c3
      INFO: Object 0xf658ae00 @offset=3584 fp=0xf658af00
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8ebe3b8
  6. 23 7月, 2008 8 次提交