1. 15 6月, 2009 6 次提交
  2. 14 6月, 2009 4 次提交
  3. 13 6月, 2009 10 次提交
  4. 12 6月, 2009 20 次提交
    • P
      slab,slub: don't enable interrupts during early boot · 7e85ee0c
      Pekka Enberg 提交于
      As explained by Benjamin Herrenschmidt:
      
        Oh and btw, your patch alone doesn't fix powerpc, because it's missing
        a whole bunch of GFP_KERNEL's in the arch code... You would have to
        grep the entire kernel for things that check slab_is_available() and
        even then you'll be missing some.
      
        For example, slab_is_available() didn't always exist, and so in the
        early days on powerpc, we used a mem_init_done global that is set form
        mem_init() (not perfect but works in practice). And we still have code
        using that to do the test.
      
      Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators
      in early boot code to avoid enabling interrupts.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      7e85ee0c
    • R
      lguest: remove obsolete LHREQ_BREAK call · 5dac051b
      Rusty Russell 提交于
      We no longer need an efficient mechanism to force the Guest back into
      host userspace, as each device is serviced without bothering the main
      Guest process (aka. the Launcher).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      5dac051b
    • R
      lguest: use eventfds for device notification · df60aeef
      Rusty Russell 提交于
      Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
      an address: the main Launcher process returns with this address, and figures
      out what device to run.
      
      A far nicer model is to let processes bind an eventfd to an address: if we
      find one, we simply signal the eventfd.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      df60aeef
    • R
      lguest: improve interrupt handling, speed up stream networking · a32a8813
      Rusty Russell 提交于
      lguest never checked for pending interrupts when enabling interrupts, and
      things still worked.  However, it makes a significant difference to TCP
      performance, so it's time we fixed it by introducing a pending_irq flag
      and checking it on irq_restore and irq_enable.
      
      These two routines are now too big to patch into the 8/10 bytes
      patch space, so we drop that code.
      
      Note: The high latency on interrupt delivery had a very curious
      effect: once everything else was optimized, networking without GSO was
      faster than networking with GSO, since more interrupts were sent and
      hence a greater chance of one getting through to the Guest!
      
      Note2: (Almost) Closing the same loophole for iret doesn't have any
      measurable effect, so I'm leaving that patch for the moment.
      
      Before:
      	1GB tcpblast Guest->Host:		30.7 seconds
      	1GB tcpblast Guest->Host (no GSO):	76.0 seconds
      
      After:
      	1GB tcpblast Guest->Host:		6.8 seconds
      	1GB tcpblast Guest->Host (no GSO):	27.8 seconds
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      a32a8813
    • M
      virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) · 9fa29b9d
      Mark McLoughlin 提交于
      Add a new feature flag for indirect ring entries. These are ring
      entries which point to a table of buffer descriptors.
      
      The idea here is to increase the ring capacity by allowing a larger
      effective ring size whereby the ring size dictates the number of
      requests that may be outstanding, rather than the size of those
      requests.
      
      This should be most effective in the case of block I/O where we can
      potentially benefit by concurrently dispatching a large number of
      large requests. Even in the simple case of single segment block
      requests, this results in a threefold increase in ring capacity.
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9fa29b9d
    • M
      virtio: teach virtio_has_feature() about transport features · ee006b35
      Mark McLoughlin 提交于
      Drivers don't add transport features to their table, so we
      shouldn't check these with virtio_check_driver_offered_feature().
      
      We could perhaps add an ->offered_feature() virtio_config_op,
      but that perhaps that would be overkill for a consitency check
      like this.
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      ee006b35
    • M
      virtio_pci: optional MSI-X support · 82af8ce8
      Michael S. Tsirkin 提交于
      This implements optional MSI-X support in virtio_pci.
      MSI-X is used whenever the host supports at least 2 MSI-X
      vectors: 1 for configuration changes and 1 for virtqueues.
      Per-virtqueue vectors are allocated if enough vectors
      available.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NAnthony Liguori <aliguori@us.ibm.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ whitespace, style)
      82af8ce8
    • M
      virtio: find_vqs/del_vqs virtio operations · d2a7ddda
      Michael S. Tsirkin 提交于
      This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
      and updates all drivers. This is needed for MSI support, because MSI
      needs to know the total number of vectors upfront.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
      d2a7ddda
    • R
      virtio: add names to virtqueue struct, mapping from devices to queues. · 9499f5e7
      Rusty Russell 提交于
      Add a linked list of all virtqueues for a virtio device: this helps for
      debugging and is also needed for upcoming interface change.
      
      Also, add a "name" field for clearer debug messages.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9499f5e7
    • R
      20f77f56
    • P
      perf_counter: Add forward/backward attribute ABI compatibility · 974802ea
      Peter Zijlstra 提交于
      Provide for means of extending the perf_counter_attr in a 'natural' way.
      
      We allow growing the structure by appending fields at the end by specifying
      the full structure size inside it.
      
      When a new kernel sees a smaller (old) structure, it will 0 pad the tail.
      When an old kernel sees a larger (new) structure, it will verify the tail
      consists of 0s, otherwise fail.
      
      If we fail due to a size-mismatch, we return -E2BIG and write the kernel's
      native attribe size back into the provided structure.
      
      Furthermore, add some attribute verification, so that we'll fail counter
      creation when unknown bits are present (PERF_SAMPLE, PERF_FORMAT, or in
      the __reserved fields).
      
      (This ABI detail is introduced while keeping the existing syscall ABI.)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      974802ea
    • P
      perf_counter: PERF_TYPE_HW_CACHE is a hardware counter too · f1a3c979
      Peter Zijlstra 提交于
      is_software_counter() was missing the new HW_CACHE category.
      
      ( This could have caused some counter scheduling artifacts
        with mixed sw and hw counters and counter groups. )
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f1a3c979
    • R
      module: trim exception table on init free. · ad6561df
      Rusty Russell 提交于
      It's theoretically possible that there are exception table entries
      which point into the (freed) init text of modules.  These could cause
      future problems if other modules get loaded into that memory and cause
      an exception as we'd see the wrong fixup.  The only case I know of is
      kvm-intel.ko (when CONFIG_CC_OPTIMIZE_FOR_SIZE=n).
      
      Amerigo fixed this long-standing FIXME in the x86 version, but this
      patch is more general.
      
      This implements trim_init_extable(); most archs are simple since they
      use the standard lib/extable.c sort code.  Alpha and IA64 use relative
      addresses in their fixups, so thier trimming is a slight variation.
      
      Sparc32 is unique; it doesn't seem to define ARCH_HAS_SORT_EXTABLE,
      yet it defines its own sort_extable() which overrides the one in lib.
      It doesn't sort, so we have to mark deleted entries instead of
      actually trimming them.
      Inspired-by: NAmerigo Wang <amwang@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: linux-alpha@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      ad6561df
    • R
      module_param: allow 'bool' module_params to be bool, not just int. · fddd5201
      Rusty Russell 提交于
      Impact: API cleanup
      
      For historical reasons, 'bool' parameters must be an int, not a bool.
      But there are around 600 users, so a conversion seems like useless churn.
      
      So we use __same_type() to distinguish, and handle both cases.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      fddd5201
    • R
      module_param: add __same_type convenience wrapper for __builtin_types_compatible_p · d2c123c2
      Rusty Russell 提交于
      Impact: new API
      
      __builtin_types_compatible_p() is a little awkward to use: it takes two
      types rather than types or variables, and it's just damn long.
      
      (typeof(type) == type, so this works on types as well as vars).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      d2c123c2
    • R
      module_param: split perm field into flags and perm · 45fcc70c
      Rusty Russell 提交于
      Impact: cleanup
      
      Rather than hack KPARAM_KMALLOCED into the perm field, separate it out.
      Since the perm field was 32 bits and only needs 16, we don't add bloat.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      45fcc70c
    • R
      module_param: invbool should take a 'bool', not an 'int' · 9a71af2c
      Rusty Russell 提交于
      It takes an 'int' for historical reasons, and there are only two
      users: simply switch it over to bool.
      
      The other user (uvesafb.c) will get a (harmless-on-x86) warning until
      the next patch is applied.
      
      Cc: Brad Douglas <brad@neruo.com>
      Cc: Michal Januszewski <spock@gentoo.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9a71af2c
    • K
      memcg: fix page_cgroup fatal error in FLATMEM · ca371c0d
      KAMEZAWA Hiroyuki 提交于
      Now, SLAB is configured in very early stage and it can be used in
      init routine now.
      
      But replacing alloc_bootmem() in FLAT/DISCONTIGMEM's page_cgroup()
      initialization breaks the allocation, now.
      (Works well in SPARSEMEM case...it supports MEMORY_HOTPLUG and
       size of page_cgroup is in reasonable size (< 1 << MAX_ORDER.)
      
      This patch revive FLATMEM+memory cgroup by using alloc_bootmem.
      
      In future,
      We stop to support FLATMEM (if no users) or rewrite codes for flatmem
      completely.But this will adds more messy codes and overheads.
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Tested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      ca371c0d
    • A
      fs/qnx4: sanitize includes · 964f5369
      Al Viro 提交于
      fs-internal parts of qnx4_fs.h taken to fs/qnx4/qnx4.h, includes adjusted,
      qnx4_fs.h doesn't need unifdef anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      964f5369
    • A
      Sanitize qnx4 fsync handling · 79d25767
      Al Viro 提交于
      * have directory operations use mark_buffer_dirty_inode(),
        so that sync_mapping_buffers() would get those.
      * make qnx4_write_inode() honour its last argument.
      * get rid of insane copies of very ancient "walk the indirect blocks"
        in qnx4/fsync - they never matched the actual fs layout and, fortunately,
        never'd been called.  Again, all this junk is not needed; ->fsync()
        should just do sync_mapping_buffers + sync_inode (and if we implement
        block allocation for qnx4, we'll need to use mark_buffer_dirty_inode()
        for extent blocks)
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      79d25767