1. 22 4月, 2014 1 次提交
  2. 18 4月, 2014 2 次提交
    • M
      KVM: VMX: speed up wildcard MMIO EVENTFD · 68c3b4d1
      Michael S. Tsirkin 提交于
      With KVM, MMIO is much slower than PIO, due to the need to
      do page walk and emulation. But with EPT, it does not have to be: we
      know the address from the VMCS so if the address is unique, we can look
      up the eventfd directly, bypassing emulation.
      
      Unfortunately, this only works if userspace does not need to match on
      access length and data.  The implementation adds a separate FAST_MMIO
      bus internally. This serves two purposes:
          - minimize overhead for old userspace that does not use eventfd with lengtth = 0
          - minimize disruption in other code (since we don't know the length,
            devices on the MMIO bus only get a valid address in write, this
            way we don't need to touch all devices to teach them to handle
            an invalid length)
      
      At the moment, this optimization only has effect for EPT on x86.
      
      It will be possible to speed up MMIO for NPT and MMU using the same
      idea in the future.
      
      With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
      I was unable to detect any measureable slowdown to non-eventfd MMIO.
      
      Making MMIO faster is important for the upcoming virtio 1.0 which
      includes an MMIO signalling capability.
      
      The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
      pre-review and suggestions.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      68c3b4d1
    • M
      KVM: support any-length wildcard ioeventfd · f848a5a8
      Michael S. Tsirkin 提交于
      It is sometimes benefitial to ignore IO size, and only match on address.
      In hindsight this would have been a better default than matching length
      when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind
      of access can be optimized on VMX: there no need to do page lookups.
      This can currently be done with many ioeventfds but in a suboptimal way.
      
      However we can't change kernel/userspace ABI without risk of breaking
      some applications.
      Use len = 0 to mean "ignore length for matching" in a more optimal way.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f848a5a8
  3. 17 4月, 2014 4 次提交
    • N
      KVM: x86: Fix page-tables reserved bits · cd9ae5fe
      Nadav Amit 提交于
      KVM does not handle the reserved bits of x86 page tables correctly:
      In PAE, bits 5:8 are reserved in the PDPTE.
      In IA-32e, bit 8 is not reserved.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      cd9ae5fe
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 0f689a33
      Linus Torvalds 提交于
      Pull s390 patches from Martin Schwidefsky:
       "An update to the oops output with additional information about the
        crash.  The renameat2 system call is enabled.  Two patches in regard
        to the PTR_ERR_OR_ZERO cleanup.  And a bunch of bug fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
        s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
        s390/sclp_vt220: Fix kernel panic due to early terminal input
        s390/compat: fix typo
        s390/uaccess: fix possible register corruption in strnlen_user_srst()
        s390: add 31 bit warning message
        s390: wire up sys_renameat2
        s390: show_registers() should not map user space addresses to kernel symbols
        s390/mm: print control registers and page table walk on crash
        s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
        s390: fix control register update
      0f689a33
    • L
      Merge tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · 7d38cc02
      Linus Torvalds 提交于
      Pull itanium erratum fix from Tony Luck:
       "Small workaround for a rare, but annoying, erratum #237"
      
      * tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
        [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237)
      7d38cc02
    • T
      [IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237) · c0b5a64d
      Tony Luck 提交于
      April 2014 Itanium processor specification update:
      
      http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html
      
      describes this erratum:
      
      =========================================================================
      237. Under a complex set of conditions, store to load forwarding for a
      sub 8-byte load may complete incorrectly
      
      Problem: A load instruction may complete incorrectly when a code sequence
      using 4-byte or smaller load and store operations to the same address
      is executed in combination with specific timing of all the following
      concurrent conditions: store to load forwarding, alignment checking
      enabled, a mis-predicted branch, and complex cache utilization activity.
      
      Implication: The affected sub 8-byte instruction may complete
      incorrectly resulting in unpredictable system behavior. There is an
      extremely low probability of exposure due to the significant number of
      complex microarchitectural concurrent conditions required to encounter
      the erratum.
      
      Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling
      Hyper-Threading will significantly reduce exposure to the conditions
      that contribute to encountering the erratum.
      
      Status: See the Summary Table of Changes for the affected steppings.
      =========================================================================
      
      [Table of changes essentially lists all models from McKinley to Tukwila]
      
      The PSR.ac bit controls whether the processor will always generate
      an unaligned reference trap (0x5a00) for a misaligned data access
      (when PSR.ac=1) or if it will let the access succeed when running
      on a cpu that implements logic to handle some unaligned accesses.
      
      Way back in 2008 in commit b704882e
        [IA64] Rationalize kernel mode alignment checking
      we made the decision to always enable strict checking. We were
      already doing so in trap/interrupt context because the common
      preamble code set this bit - but the rest of supervisor code
      (and by inheritance user code) ran with PSR.ac=0.
      
      We now reverse that decision and set PSR.ac=0 everywhere in the
      kernel (also inherited by user processes). This will avoid the
      erratum using the method described in the Itanium specification
      update.  Net effect for users is that the processor will handle
      unaligned access when it can (typically with a tiny performance
      bubble in the pipeline ... but much less invasive than taking a
      trap and having the OS perform the access).
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      c0b5a64d
  4. 16 4月, 2014 6 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 10ec34fc
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix BPF filter validation of netlink attribute accesses, from
          Mathias Kruase.
      
       2) Netfilter conntrack generation seqcount not initialized properly,
          from Andrey Vagin.
      
       3) Fix comparison mask computation on big-endian in nft_cmp_fast(),
          from Patrick McHardy.
      
       4) Properly limit MTU over ipv6, from Eric Dumazet.
      
       5) Fix seccomp system call argument population on 32-bit, from Daniel
          Borkmann.
      
       6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead
          skb->mac_len needs to be used.  From Vlad Yasevich.
      
       7) We have several cases of using socket based communications to
          implement a tunnel.  For example, some tunnels are encapsulations
          over UDP so we use an internal kernel UDP socket to do the
          transmits.
      
          These tunnels should behave just like other software devices and
          pass the packets on down to the next layer.
      
          Most importantly we want the top-level socket (eg TCP) that created
          the traffic to be charged for the SKB memory.
      
          However, once you get into the IP output path, we have code that
          assumed that whatever was attached to skb->sk is an IP socket.
      
          To keep the top-level socket being charged for the SKB memory,
          whilst satisfying the needs of the IP output path, we now pass in an
          explicit 'sk' argument.
      
          From Eric Dumazet.
      
       8) ping_init_sock() leaks group info, from Xiaoming Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
        cxgb4: use the correct max size for firmware flash
        qlcnic: Fix MSI-X initialization code
        ip6_gre: don't allow to remove the fb_tunnel_dev
        ipv4: add a sock pointer to dst->output() path.
        ipv4: add a sock pointer to ip_queue_xmit()
        driver/net: cosa driver uses udelay incorrectly
        at86rf230: fix __at86rf230_read_subreg function
        at86rf230: remove check if AVDD settled
        net: cadence: Add architecture dependencies
        net: Start with correct mac_len in skb_network_protocol
        Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
        cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
        net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
        seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
        qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
        qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
        qlcnic: Fix PVID configuration on eSwitch port.
        qlcnic: Fix max ring count calculation
        qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
        qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
        ...
      10ec34fc
    • S
      cxgb4: use the correct max size for firmware flash · 6f1d7210
      Steve Wise 提交于
      The wrong max fw size was being used and causing false
      "too big" errors running ethtool -f.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f1d7210
    • A
      qlcnic: Fix MSI-X initialization code · 8564ae09
      Alexander Gordeev 提交于
      Function qlcnic_setup_tss_rss_intr() might enter endless
      loop in case pci_enable_msix() contiguously returns a
      positive number of MSI-Xs that could have been allocated.
      Besides, the function contains 'err = -EIO;' assignment
      that never could be reached. This update fixes the
      aforementioned issues.
      
      Cc: Shahed Shaikh <shahed.shaikh@qlogic.com>
      Cc: Dept-HSGLinuxNICDev@qlogic.com
      Cc: netdev@vger.kernel.org
      Cc: linux-pci@vger.kernel.org
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Acked-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8564ae09
    • N
      ip6_gre: don't allow to remove the fb_tunnel_dev · 54d63f78
      Nicolas Dichtel 提交于
      It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but
      this is unsafe, the module always supposes that this device exists. For example,
      ip6gre_tunnel_lookup() may use it unconditionally.
      
      Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we
      let ip6gre_destroy_tunnels() do the job).
      
      Introduced by commit c12b395a ("gre: Support GRE over IPv6").
      
      CC: Dmitry Kozlov <xeb@mail.ru>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54d63f78
    • E
      ipv4: add a sock pointer to dst->output() path. · aad88724
      Eric Dumazet 提交于
      In the dst->output() path for ipv4, the code assumes the skb it has to
      transmit is attached to an inet socket, specifically via
      ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
      provider of the packet is an AF_PACKET socket.
      
      The dst->output() method gets an additional 'struct sock *sk'
      parameter. This needs a cascade of changes so that this parameter can
      be propagated from vxlan to final consumer.
      
      Fixes: 8f646c92 ("vxlan: keep original skb ownership")
      Reported-by: Nlucien xin <lucien.xin@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aad88724
    • E
      ipv4: add a sock pointer to ip_queue_xmit() · b0270e91
      Eric Dumazet 提交于
      ip_queue_xmit() assumes the skb it has to transmit is attached to an
      inet socket. Commit 31c70d59 ("l2tp: keep original skb ownership")
      changed l2tp to not change skb ownership and thus broke this assumption.
      
      One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
      so that we do not assume skb->sk points to the socket used by l2tp
      tunnel.
      
      Fixes: 31c70d59 ("l2tp: keep original skb ownership")
      Reported-by: NZhan Jianyu <nasa4836@gmail.com>
      Tested-by: NZhan Jianyu <nasa4836@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0270e91
  5. 15 4月, 2014 27 次提交