1. 17 10月, 2007 5 次提交
  2. 27 8月, 2007 1 次提交
  3. 15 8月, 2007 1 次提交
    • S
      [IOAT]: Remove redundant struct member to avoid descriptor cache miss · 54a09feb
      Shannon Nelson 提交于
      The layout for struct ioat_desc_sw is non-optimal and causes an extra
      cache hit for every descriptor processed.  By tightening up the struct
      layout and removing one item, we pull in the fields that get used in
      the speedpath and get a little better performance.
      
      
      Before:
      -------
      struct ioat_desc_sw {
      	struct ioat_dma_descriptor * hw;                 /*     0     8
      */
      	struct list_head           node;                 /*     8    16
      */
      	int                        tx_cnt;               /*    24     4
      */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	dma_addr_t                 src;                  /*    32     8
      */
      	__u32                      src_len;              /*    40     4
      */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	dma_addr_t                 dst;                  /*    48     8
      */
      	__u32                      dst_len;              /*    56     4
      */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	struct dma_async_tx_descriptor async_tx;         /*    64   144
      */
      	/* --- cacheline 3 boundary (192 bytes) was 16 bytes ago --- */
      
      	/* size: 208, cachelines: 4 */
      	/* sum members: 196, holes: 3, sum holes: 12 */
      	/* last cacheline: 16 bytes */
      };	/* definitions: 1 */
      
      
      After:
      ------
      
      struct ioat_desc_sw {
      	struct ioat_dma_descriptor * hw;                 /*     0     8
      */
      	struct list_head           node;                 /*     8    16
      */
      	int                        tx_cnt;               /*    24     4
      */
      	__u32                      len;                  /*    28     4
      */
      	dma_addr_t                 src;                  /*    32     8
      */
      	dma_addr_t                 dst;                  /*    40     8
      */
      	struct dma_async_tx_descriptor async_tx;         /*    48   144
      */
      	/* --- cacheline 3 boundary (192 bytes) --- */
      
      	/* size: 192, cachelines: 3 */
      };	/* definitions: 1 */
      Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54a09feb
  4. 13 7月, 2007 3 次提交
    • D
      ioatdma: add the unisys "i/oat" pci vendor/device id · 3039f073
      Dan Williams 提交于
      Cc: John Magolan <john.magolan@unisys.com>
      Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3039f073
    • D
      dmaengine: make clients responsible for managing channels · d379b01e
      Dan Williams 提交于
      The current implementation assumes that a channel will only be used by one
      client at a time.  In order to enable channel sharing the dmaengine core is
      changed to a model where clients subscribe to channel-available-events.
      Instead of tracking how many channels a client wants and how many it has
      received the core just broadcasts the available channels and lets the
      clients optionally take a reference.  The core learns about the clients'
      needs at dma_event_callback time.
      
      In support of multiple operation types, clients can specify a capability
      mask to only be notified of channels that satisfy a certain set of
      capabilities.
      
      Changelog:
      * removed DMA_TX_ARRAY_INIT, no longer needed
      * dma_client_chan_free -> dma_chan_release: switch to global reference
        counting only at device unregistration time, before it was also happening
        at client unregistration time
      * clients now return dma_state_client to dmaengine (ack, dup, nak)
      * checkpatch.pl fixes
      * fixup merge with git-ioat
      
      Cc: Chris Leech <christopher.leech@intel.com>
      Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      d379b01e
    • D
      dmaengine: refactor dmaengine around dma_async_tx_descriptor · 7405f74b
      Dan Williams 提交于
      The current dmaengine interface defines mutliple routines per operation,
      i.e. dma_async_memcpy_buf_to_buf, dma_async_memcpy_buf_to_page etc.  Adding
      more operation types (xor, crc, etc) to this model would result in an
      unmanageable number of method permutations.
      
      	Are we really going to add a set of hooks for each DMA engine
      	whizbang feature?
      		- Jeff Garzik
      
      The descriptor creation process is refactored using the new common
      dma_async_tx_descriptor structure.  Instead of per driver
      do_<operation>_<dest>_to_<src> methods, drivers integrate
      dma_async_tx_descriptor into their private software descriptor and then
      define a 'prep' routine per operation.  The prep routine allocates a
      descriptor and ensures that the tx_set_src, tx_set_dest, tx_submit routines
      are valid.  Descriptor creation and submission becomes:
      
      struct dma_device *dev;
      struct dma_chan *chan;
      struct dma_async_tx_descriptor *tx;
      
      tx = dev->device_prep_dma_<operation>(chan, len, int_flag)
      tx->tx_set_src(dma_addr_t, tx, index /* for multi-source ops */)
      tx->tx_set_dest(dma_addr_t, tx, index)
      tx->tx_submit(tx)
      
      In addition to the refactoring, dma_async_tx_descriptor also lays the
      groundwork for definining cross-channel-operation dependencies, and a
      callback facility for asynchronous notification of operation completion.
      
      Changelog:
      * drop dma mapping methods, suggested by Chris Leech
      * fix ioat_dma_dependency_added, also caught by Andrew Morton
      * fix dma_sync_wait, change from Andrew Morton
      * uninline large functions, change from Andrew Morton
      * add tx->callback = NULL to dmaengine calls to interoperate with async_tx
        calls
      * hookup ioat_tx_submit
      * convert channel capabilities to a 'cpumask_t like' bitmap
      * removed DMA_TX_ARRAY_INIT, no longer needed
      * checkpatch.pl fixes
      * make set_src, set_dest, and tx_submit descriptor specific methods
      * fixup git-ioat merge
      * move group_list and phys to dma_async_tx_descriptor
      
      Cc: Jeff Garzik <jeff@garzik.org>
      Cc: Chris Leech <christopher.leech@intel.com>
      Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      7405f74b
  5. 12 7月, 2007 4 次提交
  6. 29 6月, 2007 1 次提交
    • R
      IOATDMA: fix section mismatches · 92504f79
      Randy Dunlap 提交于
      Rename struct pci_driver data so that false section mismatch warnings won't
      be produced.
      
      Sam, ISTM that depending on variable names is the weakest & worst part of
      modpost section checking.  Should __init_refok work here?  I got build
      errors when I tried to use it, probably because the struct pci_driver probe
      and remove methods are not marked "__init_refok".
      
      WARNING: drivers/dma/ioatdma.o(.data+0x10): Section mismatch: reference to .init.text: (between 'ioat_pci_drv' and 'ioat_pci_tbl')
      WARNING: drivers/dma/ioatdma.o(.data+0x14): Section mismatch: reference to .exit.text: (between 'ioat_pci_drv' and 'ioat_pci_tbl')
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Acked-by: NChris Leech <christopher.leech@intel.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92504f79
  7. 08 12月, 2006 1 次提交
  8. 11 10月, 2006 1 次提交
  9. 05 10月, 2006 1 次提交
    • D
      IRQ: Maintain regs pointer globally rather than passing to IRQ handlers · 7d12e780
      David Howells 提交于
      Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
      of passing regs around manually through all ~1800 interrupt handlers in the
      Linux kernel.
      
      The regs pointer is used in few places, but it potentially costs both stack
      space and code to pass it around.  On the FRV arch, removing the regs parameter
      from all the genirq function results in a 20% speed up of the IRQ exit path
      (ie: from leaving timer_interrupt() to leaving do_IRQ()).
      
      Where appropriate, an arch may override the generic storage facility and do
      something different with the variable.  On FRV, for instance, the address is
      maintained in GR28 at all times inside the kernel as part of general exception
      handling.
      
      Having looked over the code, it appears that the parameter may be handed down
      through up to twenty or so layers of functions.  Consider a USB character
      device attached to a USB hub, attached to a USB controller that posts its
      interrupts through a cascaded auxiliary interrupt controller.  A character
      device driver may want to pass regs to the sysrq handler through the input
      layer which adds another few layers of parameter passing.
      
      I've build this code with allyesconfig for x86_64 and i386.  I've runtested the
      main part of the code on FRV and i386, though I can't test most of the drivers.
      I've also done partial conversion for powerpc and MIPS - these at least compile
      with minimal configurations.
      
      This will affect all archs.  Mostly the changes should be relatively easy.
      Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
      
      	struct pt_regs *old_regs = set_irq_regs(regs);
      
      And put the old one back at the end:
      
      	set_irq_regs(old_regs);
      
      Don't pass regs through to generic_handle_irq() or __do_IRQ().
      
      In timer_interrupt(), this sort of change will be necessary:
      
      	-	update_process_times(user_mode(regs));
      	-	profile_tick(CPU_PROFILING, regs);
      	+	update_process_times(user_mode(get_irq_regs()));
      	+	profile_tick(CPU_PROFILING);
      
      I'd like to move update_process_times()'s use of get_irq_regs() into itself,
      except that i386, alone of the archs, uses something other than user_mode().
      
      Some notes on the interrupt handling in the drivers:
      
       (*) input_dev() is now gone entirely.  The regs pointer is no longer stored in
           the input_dev struct.
      
       (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking.  It does
           something different depending on whether it's been supplied with a regs
           pointer or not.
      
       (*) Various IRQ handler function pointers have been moved to type
           irq_handler_t.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
      7d12e780
  10. 22 7月, 2006 1 次提交
  11. 04 7月, 2006 2 次提交
  12. 03 7月, 2006 1 次提交
  13. 26 6月, 2006 1 次提交
  14. 18 6月, 2006 2 次提交