1. 29 12月, 2008 17 次提交
    • J
      block: add one-hit cache for disk partition lookup · a6f23657
      Jens Axboe 提交于
      disk_map_sector_rcu() returns a partition from a sector offset,
      which we use for IO statistics on a per-partition basis. The
      lookup itself is an O(N) list lookup, where N is the number of
      partitions. This actually hurts performance quite a bit, even
      on the lower end partitions. On higher numbered partitions,
      it can get pretty bad.
      
      Solve this by adding a one-hit cache for partition lookup.
      This makes the lookup O(1) for the case where we do most IO to
      one partition. Even for mixed partition workloads, amortized cost
      is pretty close to O(1) since the natural IO batching makes the
      one-hit cache last for lots of IOs.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a6f23657
    • J
      block: get rid of elevator_t typedef · b374d18a
      Jens Axboe 提交于
      Just use struct elevator_queue everywhere instead.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b374d18a
    • J
      aio: make the lookup_ioctx() lockless · abf137dd
      Jens Axboe 提交于
      The mm->ioctx_list is currently protected by a reader-writer lock,
      so we always grab that lock on the read side for doing ioctx
      lookups. As the workload is extremely reader biased, turn this into
      an rcu hlist so we can make lookup_ioctx() lockless. Get rid of
      the rwlock and use a spinlock for providing update side exclusion.
      
      There's usually only 1 entry on this list, so it doesn't make sense
      to look into fancier data structures.
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      abf137dd
    • J
      bio: add support for inlining a number of bio_vecs inside the bio · 392ddc32
      Jens Axboe 提交于
      When we go and allocate a bio for IO, we actually do two allocations.
      One for the bio itself, and one for the bi_io_vec that holds the
      actual pages we are interested in.
      
      This feature inlines a definable amount of io vecs inside the bio
      itself, so we eliminate the bio_vec array allocation for IO's up
      to a certain size. It defaults to 4 vecs, which is typically 16k
      of IO.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      392ddc32
    • J
      bio: allow individual slabs in the bio_set · bb799ca0
      Jens Axboe 提交于
      Instead of having a global bio slab cache, add a reference to one
      in each bio_set that is created. This allows for personalized slabs
      in each bio_set, so that they can have bios of different sizes.
      
      This means we can personalize the bios we return. File systems may
      want to embed the bio inside another structure, to avoid allocation
      more items (and stuffing them in ->bi_private) after the get a bio.
      Or we may want to embed a number of bio_vecs directly at the end
      of a bio, to avoid doing two allocations to return a bio. This is now
      possible.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      bb799ca0
    • J
      bio: move the slab pointer inside the bio_set · 1b434498
      Jens Axboe 提交于
      In preparation for adding differently sized bios.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1b434498
    • J
      bio: only mempool back the largest bio_vec slab cache · 7ff9345f
      Jens Axboe 提交于
      We only very rarely need the mempool backing, so it makes sense to
      get rid of all but one of the mempool in a bio_set. So keep the
      largest bio_vec count mempool so we can always honor the largest
      allocation, and "upgrade" callers that fail.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7ff9345f
    • T
      block: simplify empty barrier implementation · 58eea927
      Tejun Heo 提交于
      Empty barrier required special handling in __elv_next_request() to
      complete it without letting the low level driver see it.
      
      With previous changes, barrier code is now flexible enough to skip the
      BAR step using the same barrier sequence selection mechanism.  Drop
      the special handling and mask off q->ordered from start_ordered().
      
      Remove blk_empty_barrier() test which now has no user.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      58eea927
    • T
      block: make barrier completion more robust · 8f11b3e9
      Tejun Heo 提交于
      Barrier completion had the following assumptions.
      
      * start_ordered() couldn't finish the whole sequence properly.  If all
        actions are to be skipped, q->ordseq is set correctly but the actual
        completion was never triggered thus hanging the barrier request.
      
      * Drain completion in elv_complete_request() assumed that there's
        always at least one request in the queue when drain completes.
      
      Both assumptions are true but these assumptions need to be removed to
      improve empty barrier implementation.  This patch makes the following
      changes.
      
      * Make start_ordered() use blk_ordered_complete_seq() to mark skipped
        steps complete and notify __elv_next_request() that it should fetch
        the next request if the whole barrier has completed inside
        start_ordered().
      
      * Make drain completion path in elv_complete_request() check whether
        the queue is empty.  Empty queue also indicates drain completion.
      
      * While at it, convert 0/1 return from blk_do_ordered() to false/true.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      8f11b3e9
    • T
      block: make every barrier action optional · f671620e
      Tejun Heo 提交于
      In all barrier sequences, the barrier write itself was always assumed
      to be issued and thus didn't have corresponding control flag.  This
      patch adds QUEUE_ORDERED_DO_BAR and unify action mask handling in
      start_ordered() such that any barrier action can be skipped.
      
      This patch doesn't introduce any visible behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      f671620e
    • T
      block: reorganize QUEUE_ORDERED_* constants · 313e4299
      Tejun Heo 提交于
      Separate out ordering type (drain,) and action masks (preflush,
      postflush, fua) from visible ordering mode selectors
      (QUEUE_ORDERED_*).  Ordering types are now named QUEUE_ORDERED_BY_*
      while action masks are named QUEUE_ORDERED_DO_*.
      
      This change is necessary to add QUEUE_ORDERED_DO_BAR and make it
      optional to improve empty barrier implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      313e4299
    • R
      block: reorder struct bio to remove padding on 64bit · ba744d5e
      Richard Kennedy 提交于
      Remove 8 bytes of padding from struct bio which also removes 16 bytes from
      struct bio_pair to make it 248 bytes.  bio_pair then fits into one fewer
      cache lines & into a smaller slab.
      Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ba744d5e
    • C
      block: use cancel_work_sync() instead of kblockd_flush_work() · 64d01dc9
      Cheng Renquan 提交于
      After many improvements on kblockd_flush_work, it is now identical to
      cancel_work_sync, so a direct call to cancel_work_sync is suggested.
      
      The only difference is that cancel_work_sync is a GPL symbol,
      so no non-GPL modules anymore.
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      64d01dc9
    • K
      block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set · 08bafc03
      Keith Mannthey 提交于
      Allow the scsi request REQ_QUIET flag to be propagated to the buffer
      file system layer. The basic ideas is to pass the flag from the scsi
      request to the bio (block IO) and then to the buffer layer.  The buffer
      layer can then suppress needless printks.
      
      This patch declutters the kernel log by removed the 40-50 (per lun)
      buffer io error messages seen during a boot in my multipath setup . It
      is a good chance any real errors will be missed in the "noise" it the
      logs without this patch.
      
      During boot I see blocks of messages like
      "
      __ratelimit: 211 callbacks suppressed
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242847
      Buffer I/O error on device sdm, logical block 1
      Buffer I/O error on device sdm, logical block 5242878
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242872
      "
      in my logs.
      
      My disk environment is multipath fiber channel using the SCSI_DH_RDAC
      code and multipathd.  This topology includes an "active" and "ghost"
      path for each lun. IO's to the "ghost" path will never complete and the
      SCSI layer, via the scsi device handler rdac code, quick returns the IOs
      to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
      layer messages.
      
       I am wanting to extend the QUIET behavior to include the buffer file
      system layer to deal with these errors as well. I have been running this
      patch for a while now on several boxes without issue.  A few runs of
      bonnie++ show no noticeable difference in performance in my setup.
      
      Thanks for John Stultz for the quiet_error finalization.
      Submitted-by: NKeith Mannthey <kmannth@us.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      08bafc03
    • F
      block: add queue flag for paravirt frontend drivers · 88e740f1
      Fernando Luis Vázquez Cao 提交于
      As is the case with SSD devices, we do not want to idle in AS/CFQ when
      the block device is a paravirt front-end driver. This patch adds a flag
      (QUEUE_FLAG_VIRT) which should be used by front-end drivers such as
      virtio_blk and xen-blkfront to indicate a paravirtualized device.
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      88e740f1
    • H
    • F
      m68k: machw.h cleanup · 429dbf53
      Finn Thain 提交于
      Remove some more cruft from machw.h and drop the #include where it isn't
      needed.
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      429dbf53
  2. 26 12月, 2008 2 次提交
  3. 25 12月, 2008 12 次提交
    • M
      [S390] __page_to_pfn warnings · 32272a26
      Martin Schwidefsky 提交于
      For CONFIG_SPARSEMEM_VMEMMAP=y on s390 I get warnings like
      
      init/main.c: In function 'start_kernel':
      init/main.c:641: warning: format '%08lx' expects type 'long unsigned int', but argument 2 has type 'int'
      
      The warning can be suppressed with a cast to unsigned long in the
      CONFIG_SPARSEMEM_VMEMMAP=y version of __page_to_pfn.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      32272a26
    • H
      [S390] iucv: Locking free version of iucv_message_(receive|send) · 91d5d45e
      Hendrik Brueckner 提交于
      Provide a locking free version of iucv_message_receive and iucv_message_send
      that do not call local_bh_enable in a spin_lock_(bh|irqsave)() context.
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      91d5d45e
    • P
      sched, trace: update trace_sched_wakeup() · 468a15bb
      Peter Zijlstra 提交于
      Impact: extend the wakeup tracepoint with the info whether the wakeup was real
      
      Add the information needed to distinguish 'real' wakeups from 'false'
      wakeups.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      468a15bb
    • H
      crypto: aes - Precompute tables · 0ee4a969
      Herbert Xu 提交于
      The tables used by the various AES algorithms are currently
      computed at run-time.  This has created an init ordering problem
      because some AES algorithms may be registered before the tables
      have been initialised.
      
      This patch gets around this whole thing by precomputing the tables.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      0ee4a969
    • H
      libcrc32c: Add crc32c_le macro · 0426c166
      Herbert Xu 提交于
      The bnx2x driver actually uses the crc32c_le name so this patch
      restores the crc32c_le symbol through a macro.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      0426c166
    • H
      libcrc32c: Move implementation to crypto crc32c · 69c35efc
      Herbert Xu 提交于
      This patch swaps the role of libcrc32c and crc32c.  Previously
      the implementation was in libcrc32c and crc32c was a wrapper.
      Now the code is in crc32c and libcrc32c just calls the crypto
      layer.
      
      The reason for the change is to tap into the algorithm selection
      capability of the crypto API so that optimised implementations
      such as the one utilising Intel's CRC32C instruction can be
      used where available.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      69c35efc
    • H
      crypto: hash - Export shash through hash · 5f7082ed
      Herbert Xu 提交于
      This patch allows shash algorithms to be used through the old hash
      interface.  This is a transitional measure so we can convert the
      underlying algorithms to shash before converting the users across.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      5f7082ed
    • H
      crypto: hash - Add import/export interface · dec8b786
      Herbert Xu 提交于
      It is often useful to save the partial state of a hash function
      so that it can be used as a base for two or more computations.
      
      The most prominent example is HMAC where all hashes start from
      a base determined by the key.  Having an import/export interface
      means that we only have to compute that base once rather than
      for each message.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      dec8b786
    • H
      crypto: hash - Export shash through ahash · 3b2f6df0
      Herbert Xu 提交于
      This patch allows shash algorithms to be used through the ahash
      interface.  This is required before we can convert digest algorithms
      over to shash.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      3b2f6df0
    • H
      crypto: hash - Add shash interface · 7b5a080b
      Herbert Xu 提交于
      The shash interface replaces the current synchronous hash interface.
      It improves over hash in two ways.  Firstly shash is reentrant,
      meaning that the same tfm may be used by two threads simultaneously
      as all hashing state is stored in a local descriptor.
      
      The other enhancement is that shash no longer takes scatter list
      entries.  This is because shash is specifically designed for
      synchronous algorithms and as such scatter lists are unnecessary.
      
      All existing hash users will be converted to shash once the
      algorithms have been completely converted.
      
      There is also a new finup function that combines update with final.
      This will be extended to ahash once the algorithm conversion is
      done.
      
      This is also the first time that an algorithm type has their own
      registration function.  Existing algorithm types will be converted
      to this way in due course.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7b5a080b
    • H
      crypto: api - Rebirth of crypto_alloc_tfm · 7b0bac64
      Herbert Xu 提交于
      This patch reintroduces a completely revamped crypto_alloc_tfm.
      The biggest change is that we now take two crypto_type objects
      when allocating a tfm, a frontend and a backend.  In fact this
      simply formalises what we've been doing behind the API's back.
      
      For example, as it stands crypto_alloc_ahash may use an
      actual ahash algorithm or a crypto_hash algorithm.  Putting
      this in the API allows us to do this much more cleanly.
      
      The existing types will be converted across gradually.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7b0bac64
    • H
      crypto: api - Move type exit function into crypto_tfm · 4a779486
      Herbert Xu 提交于
      The type exit function needs to undo any allocations done by the type
      init function.  However, the type init function may differ depending
      on the upper-level type of the transform (e.g., a crypto_blkcipher
      instantiated as a crypto_ablkcipher).
      
      So we need to move the exit function out of the lower-level
      structure and into crypto_tfm itself.
      
      As it stands this is a no-op since nobody uses exit functions at
      all.  However, all cases where a lower-level type is instantiated
      as a different upper-level type (such as blkcipher as ablkcipher)
      will be converted such that they allocate the underlying transform
      and use that instead of casting (e.g., crypto_ablkcipher casted
      into crypto_blkcipher).  That will need to use a different exit
      function depending on the upper-level type.
      
      This patch also allows the type init/exit functions to call (or not)
      cra_init/cra_exit instead of always calling them from the top level.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      4a779486
  4. 23 12月, 2008 1 次提交
  5. 22 12月, 2008 5 次提交
  6. 21 12月, 2008 2 次提交
  7. 20 12月, 2008 1 次提交
新手
引导
客服 返回
顶部