1. 31 7月, 2008 1 次提交
  2. 30 7月, 2008 10 次提交
  3. 27 7月, 2008 3 次提交
    • A
      [PATCH] f_count may wrap around · 516e0cc5
      Al Viro 提交于
      make it atomic_long_t; while we are at it, get rid of useless checks in affs,
      hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      516e0cc5
    • N
      mm: speculative page references · e286781d
      Nick Piggin 提交于
      If we can be sure that elevating the page_count on a pagecache page will
      pin it, we can speculatively run this operation, and subsequently check to
      see if we hit the right page rather than relying on holding a lock or
      otherwise pinning a reference to the page.
      
      This can be done if get_page/put_page behaves consistently throughout the
      whole tree (ie.  if we "get" the page after it has been used for something
      else, we must be able to free it with a put_page).
      
      Actually, there is a period where the count behaves differently: when the
      page is free or if it is a constituent page of a compound page.  We need
      an atomic_inc_not_zero operation to ensure we don't try to grab the page
      in either case.
      
      This patch introduces the core locking protocol to the pagecache (ie.
      adds page_cache_get_speculative, and tweaks some update-side code to make
      it work).
      
      Thanks to Hugh for pointing out an improvement to the algorithm setting
      page_count to zero when we have control of all references, in order to
      hold off speculative getters.
      
      [kamezawa.hiroyu@jp.fujitsu.com: fix migration_entry_wait()]
      [hugh@veritas.com: fix add_to_page_cache]
      [akpm@linux-foundation.org: repair a comment]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jeff Garzik <jeff@garzik.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Reviewed-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e286781d
    • F
      dma-mapping: add the device argument to dma_mapping_error() · 8d8bb39b
      FUJITA Tomonori 提交于
      Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
      architecture does:
      
      This enables us to cleanly fix the Calgary IOMMU issue that some devices
      are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
      
      I think that per-device dma_mapping_ops support would be also helpful for
      KVM people to support PCI passthrough but Andi thinks that this makes it
      difficult to support the PCI passthrough (see the above thread).  So I
      CC'ed this to KVM camp.  Comments are appreciated.
      
      A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
      pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
      NULL, the system-wide dma_ops pointer is used as before.
      
      If it's useful for KVM people, I plan to implement a mechanism to register
      a hook called when a new pci (or dma capable) device is created (it works
      with hot plugging).  It enables IOMMUs to set up an appropriate
      dma_mapping_ops per device.
      
      The major obstacle is that dma_mapping_error doesn't take a pointer to the
      device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
      device.  Note all the POWER IOMMUs use the same dma_mapping_error function
      so this is not a problem for POWER but x86 IOMMUs use different
      dma_mapping_error functions.
      
      The first patch adds the device argument to dma_mapping_error.  The patch
      is trivial but large since it touches lots of drivers and dma-mapping.h in
      all the architecture.
      
      This patch:
      
      dma_mapping_error() doesn't take a pointer to the device unlike other DMA
      operations.  So we can't have dma_mapping_ops per device.
      
      Note that POWER already has dma_mapping_ops per device but all the POWER
      IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
      argument.
      
      [akpm@linux-foundation.org: fix sge]
      [akpm@linux-foundation.org: fix svc_rdma]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix bnx2x]
      [akpm@linux-foundation.org: fix s2io]
      [akpm@linux-foundation.org: fix pasemi_mac]
      [akpm@linux-foundation.org: fix sdhci]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix sparc]
      [akpm@linux-foundation.org: fix ibmvscsi]
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Avi Kivity <avi@qumranet.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d8bb39b
  4. 26 7月, 2008 2 次提交
  5. 25 7月, 2008 7 次提交
    • R
      ibmveth: enable driver for CMO · 1096d63d
      Robert Jennings 提交于
      Enable ibmveth for Cooperative Memory Overcommitment (CMO).  For this driver
      it means calculating a desired amount of IO memory based on the current MTU
      and updating this value with the bus when MTU changes occur.  Because DMA
      mappings can fail, we have added a bounce buffer for temporary cases where
      the driver can not map IO memory for the buffer pool.
      
      The following changes are made to enable the driver for CMO:
       * DMA mapping errors will not result in error messages if entitlement has
         been exceeded and resources were not available.
       * DMA mapping errors are handled gracefully, ibmveth_replenish_buffer_pool()
         is corrected to check the return from dma_map_single and fail gracefully.
       * The driver will have a get_desired_dma function defined to function
         in a CMO environment.
       * When the MTU is changed, the driver will update the device IO entitlement
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NSantiago Leon <santil@us.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1096d63d
    • S
      ibmveth: Automatically enable larger rx buffer pools for larger mtu · ea866e65
      Santiago Leon 提交于
      Activates larger rx buffer pools when the MTU is changed to a larger
      value.  This patch de-activates the large rx buffer pools when the MTU
      changes to a smaller value.
      Signed-off-by: NSantiago Leon <santil@us.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ea866e65
    • R
      virtio: Recycle unused recv buffer pages for large skbs in net driver · fb6813f4
      Rusty Russell 提交于
      If we hack the virtio_net driver to always allocate full-sized (64k+)
      skbuffs, the driver slows down (lguest numbers):
      
        Time to receive 1GB (small buffers): 10.85 seconds
        Time to receive 1GB (64k+ buffers): 24.75 seconds
      
      Of course, large buffers use up more space in the ring, so we increase
      that from 128 to 2048:
      
        Time to receive 1GB (64k+ buffers, 2k ring): 16.61 seconds
      
      If we recycle pages rather than using alloc_page/free_page:
      
        Time to receive 1GB (64k+ buffers, 2k ring, recycle pages): 10.81 seconds
      
      This demonstrates that with efficient allocation, we don't need to
      have a separate "small buffer" queue.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      fb6813f4
    • H
      virtio net: Allow receiving SG packets · 97402b96
      Herbert Xu 提交于
      Finally this patch lets virtio_net receive GSO packets in addition
      to sending them.  This can definitely be optimised for the non-GSO
      case.  For comparison the Xen approach stores one page in each skb
      and uses subsequent skb's pages to construct an SG skb instead of
      preallocating the maximum amount of pages per skb.
      
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added feature bits)
      97402b96
    • H
      virtio net: Add ethtool ops for SG/GSO · a9ea3fc6
      Herbert Xu 提交于
      This patch adds some basic ethtool operations to virtio_net so
      I could test SG without GSO (which was really useful because TSO
      turned out to be buggy :)
      
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (remove MTU setting)
      a9ea3fc6
    • M
      virtio: fix virtio_net xmit of freed skb bug · 9953ca6c
      Mark McLoughlin 提交于
      On Mon, 2008-05-26 at 17:42 +1000, Rusty Russell wrote:
      > If we fail to transmit a packet, we assume the queue is full and put
      > the skb into last_xmit_skb.  However, if more space frees up before we
      > xmit it, we loop, and the result can be transmitting the same skb twice.
      >
      > Fix is simple: set skb to NULL if we've used it in some way, and check
      > before sending.
      ...
      > diff -r 564237b31993 drivers/net/virtio_net.c
      > --- a/drivers/net/virtio_net.c	Mon May 19 12:22:00 2008 +1000
      > +++ b/drivers/net/virtio_net.c	Mon May 19 12:24:58 2008 +1000
      > @@ -287,21 +287,25 @@ again:
      >  	free_old_xmit_skbs(vi);
      >
      >  	/* If we has a buffer left over from last time, send it now. */
      > -	if (vi->last_xmit_skb) {
      > +	if (unlikely(vi->last_xmit_skb)) {
      >  		if (xmit_skb(vi, vi->last_xmit_skb) != 0) {
      >  			/* Drop this skb: we only queue one. */
      >  			vi->dev->stats.tx_dropped++;
      >  			kfree_skb(skb);
      > +			skb = NULL;
      >  			goto stop_queue;
      >  		}
      >  		vi->last_xmit_skb = NULL;
      
      With this, may drop an skb and then later in the function discover that
      we could have sent it after all. Poor wee skb :)
      
      How about the incremental patch below?
      
      Cheers,
      Mark.
      
      Subject: [PATCH] virtio_net: Delay dropping tx skbs
      
      Currently we drop the skb in start_xmit() if we have a
      queued buffer and fail to transmit it.
      
      However, if we delay dropping it until we've stopped the
      queue and enabled the tx notification callback, then there
      is a chance space might become available for it.
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9953ca6c
    • A
      PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures · 27ac792c
      Andrea Righi 提交于
      On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
      boundary. For example:
      
      	u64 val = PAGE_ALIGN(size);
      
      always returns a value < 4GB even if size is greater than 4GB.
      
      The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
      example):
      
      #define PAGE_SHIFT      12
      #define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
      #define PAGE_MASK       (~(PAGE_SIZE-1))
      ...
      #define PAGE_ALIGN(addr)       (((addr)+PAGE_SIZE-1)&PAGE_MASK)
      
      The "~" is performed on a 32-bit value, so everything in "and" with
      PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
      Using the ALIGN() macro seems to be the right way, because it uses
      typeof(addr) for the mask.
      
      Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
      include/linux/mm.h.
      
      See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
      
      [akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
      [akpm@linux-foundation.org: fix v850]
      [akpm@linux-foundation.org: fix powerpc]
      [akpm@linux-foundation.org: fix arm]
      [akpm@linux-foundation.org: fix mips]
      [akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
      [akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
      [akpm@linux-foundation.org: fix powerpc]
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27ac792c
  6. 24 7月, 2008 13 次提交
    • L
      mv643xx_eth: bump version to 1.2 · ac0a2d0c
      Lennert Buytenhek 提交于
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      ac0a2d0c
    • L
      mv643xx_eth: enable hardware TX checksumming with vlan tags · e32b6617
      Lennert Buytenhek 提交于
      Although mv643xx_eth has no hardware support for inserting a vlan
      tag by twiddling some bits in the TX descriptor, it does support
      hardware TX checksumming on packets where the IP header starts {a
      limited set of values other than 14} bytes into the packet.
      
      This patch sets mv643xx_eth's ->vlan_features to NETIF_F_SG |
      NETIF_F_IP_CSUM, which prevents the stack from checksumming vlan'ed
      packets in software, and if vlan tags are present on a transmitted
      packet, notifies the hardware of this fact by toggling the right
      bits in the TX descriptor.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      e32b6617
    • L
      mv643xx_eth: print message on link status change · 2f7eb47a
      Lennert Buytenhek 提交于
      When there is a link status change (link or phy status interrupt),
      print a message notifying the user of the new link status.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      2f7eb47a
    • L
      mv643xx_eth: use auto phy polling for configuring (R)(G)MII interface · 81600eea
      Lennert Buytenhek 提交于
      The mv643xx_eth hardware has a provision for polling the PHY's
      MII management registers to obtain the (R)(G)MII interface speed
      (10/100/1000) and duplex (half/full) and pause (off/symmetric)
      settings to use to talk to the PHY.
      
      The driver currently does not make use of this feature.  Instead,
      whenever there is a link status change event, it reads the current
      link parameters from the PHY, and programs those parameters into
      the mv643xx_eth MAC by hand.
      
      This patch switches the mv643xx_eth driver to letting the MAC
      auto-determine the (R)(G)MII link parameters by PHY polling, if there
      is a PHY present.  For PHYless ports (when e.g. the (R)(G)MII
      interface is connected to a hardware switch), we keep hardcoding the
      MII interface parameters.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      81600eea
    • L
      mv643xx_eth: print driver version on init · 7dde154d
      Lennert Buytenhek 提交于
      Print the mv643xx_eth driver version on init to help debugging.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      7dde154d
    • L
      mv643xx_eth: use symbolic MII register addresses and values · 7f106c1d
      Lennert Buytenhek 提交于
      Instead of hardcoding MII register addresses and values, use the
      symbolic constants defined in linux/mii.h.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      7f106c1d
    • L
      mv643xx_eth: use longer DMA bursts · cd4ccf76
      Lennert Buytenhek 提交于
      The mv643xx_eth driver is limiting DMA bursts to 32 bytes, while
      using the largest burst size (128 bytes) gives a couple percentage
      points performance improvement in throughput tests, and the docs
      say that the 128 byte default should not need to be changed, so
      use 128 byte bursts instead.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      cd4ccf76
    • L
      mv643xx_eth: also check TX_IN_PROGRESS when disabling transmit path · ae9ae064
      Lennert Buytenhek 提交于
      The recommended sequence for waiting for the transmit path to clear
      after disabling all of the transmit queues is to wait for the
      TX_FIFO_EMPTY bit in the Port Status register to become set as well
      as the TX_IN_PROGRESS bit to clear.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      ae9ae064
    • L
      mv643xx_eth: don't fiddle with maximum receive packet size setting · 65193a91
      Lennert Buytenhek 提交于
      The maximum receive packet size field in the Port Serial Control
      register controls at what size received packets are flagged
      overlength in the receive descriptor, but it doesn't prevent
      overlength packets from being DMAd to memory and signaled to the
      host like other received packets.
      
      mv643xx_eth does not support receiving jumbo frames in 10/100 mode,
      but setting the packet threshold to larger than 1522 bytes in 10/100
      mode won't cause breakage by itself.
      
      If we really want to enforce maximum packet size on the receiving
      end instead of on the sending end where it should be done, we can
      always just add a length check to the software receive handler
      instead of relying on the hardware to do the comparison for us.
      
      What's more, changing the maximum packet size field requires
      temporarily disabling the RX/TX paths.  So once the link comes
      up in 10/100 Mb/s mode or 1000 Mb/s mode, we'd have to disable it
      again just to set the right maximum packet size field (1522 in
      10/100 Mb/s mode or 9700 in 1000 Mb/s mode), just so that we can
      offload one comparison operation to hardware that we might as well
      do in software, assuming that we'd want to do it at all.
      
      Contrary to what the documentation suggests, there is no harm in
      just setting a 9700 byte maximum packet size in 10/100 mode, so use
      the maximum maximum packet size for all modes.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      65193a91
    • L
      mv643xx_eth: fix transmit-reclaim-in-napi-poll · 4dfc1c87
      Lennert Buytenhek 提交于
      The mv643xx_eth driver allows doing transmit reclaim from within the
      napi poll routine, but after doing reclaim, it would forget to check
      the free transmit descriptor count and wake up the transmit queue if
      the reclaim caused enough descriptors for a new packet to become
      available.  This would cause the netdev watchdog to occasionally kick
      in during certain workloads with combined receive and transmit traffic.
      
      Fix this by adding a wakeup check identical to the one in the
      interrupt handler to the napi poll routine.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      4dfc1c87
    • L
      mv643xx_eth: prevent breakage when link goes down during transmit · 6b368f68
      Lennert Buytenhek 提交于
      When the ethernet link goes down while mv643xx_eth is transmitting
      data, transmit DMA can stop before all queued transmit descriptors
      have been processed.  But even the descriptors that _have_ been
      processed might not be properly marked as done before the transmit
      DMA unit shuts down.
      
      Then when the link comes up again, the hardware transmit pointer
      might have advanced while not all previous packet descriptors have
      been marked as transmitted, causing software transmit reclaim to
      hang waiting for the hardware to finish transmitting a descriptor
      that it has already skipped.
      
      This patch forcibly reclaims all packets on the transmit ring on a
      link down interrupt, and then resyncs the hardware transmit pointer to
      what the software's idea of the first free descriptor is.  Also, we
      need to prevent re-waking the transmit queue if we get a 'transmit
      done' interrupt at the same time as a 'link down' interrupt, which
      this patch does as well.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      6b368f68
    • L
      mv643xx_eth: fix TX hang erratum workaround · 8fa89bf5
      Lennert Buytenhek 提交于
      The previously merged TX hang erratum workaround ("mv643xx_eth:
      work around TX hang hardware issue") assumes that TX_END interrupts
      are delivered simultaneously with or after their corresponding TX
      interrupts, but this is not always true in practise.
      
      In particular, it appears that TX_END interrupts are issued as soon
      as descriptor fetch returns an invalid descriptor, which may happen
      before earlier descriptors have been fully transmitted and written
      back to memory as being done.
      
      This hardware behavior can lead to a situation where the current
      driver code mistakenly assumes that the MAC has given up transmitting
      before noticing the packets that it is in fact still currently working
      on, causing the driver to re-kick the transmit queue, which will only
      cause the MAC to re-fetch the invalid head descriptor, and generate
      another TX_END interrupt, et cetera, until the packets in the pipe
      finally finish transmitting and have their descriptors written back
      to memory, which will then finally break the loop.
      
      Fix this by having the erratum workaround not check the 'number of
      unfinished descriptor', but instead, to compare the software's idea
      of what the head descriptor pointer should be to the hardware's head
      descriptor pointer (which is updated on the same conditions as the
      TX_END interupt is generated on, i.e. possibly before all previous
      descriptors have been transmitted and written back).
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      8fa89bf5
    • I
      e1000e: fix e1000_netpoll(), remove extraneous e1000_clean_tx_irq() call · e8ebe3b8
      Ingo Molnar 提交于
      Evgeniy Polyakov noticed that drivers/net/e1000e/netdev.c:e1000_netpoll()
      was calling e1000_clean_tx_irq() without taking the TX lock.
      
      David Miller suggested to remove the call altogether: since in this
      callpah there's periodic calls to ->poll() anyway which will do
      e1000_clean_tx_irq() and will garbage-collect any finished TX ring
      descriptors.
      
      This fix solved the e1000e+netconsole crashes i've been seeing:
      
      =============================================================================
      BUG skbuff_head_cache: Poison overwritten
      -----------------------------------------------------------------------------
      
      INFO: 0xf658ae9c-0xf658ae9c. First byte 0x6a instead of 0x6b
      INFO: Allocated in __alloc_skb+0x2c/0x110 age=0 cpu=0 pid=5098
      INFO: Freed in __kfree_skb+0x31/0x80 age=0 cpu=1 pid=4440
      INFO: Slab 0xc16cc140 objects=16 used=1 fp=0xf658ae00 flags=0x400000c3
      INFO: Object 0xf658ae00 @offset=3584 fp=0xf658af00
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8ebe3b8
  7. 23 7月, 2008 4 次提交