1. 17 4月, 2014 10 次提交
    • G
      net: mdio-gpio: Add support for active low gpio pins · 1d251481
      Guenter Roeck 提交于
      Some systems using mdio-gpio may use active-low gpio pins
      (eg with inverters or FETs connected to all or some of the
      gpio pins).
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d251481
    • G
      net: mdio-gpio: Use devm_ functions where possible · 78cdb079
      Guenter Roeck 提交于
      This simplifies error path and deinit/removal functions.
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NChris Healy <cphealy@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78cdb079
    • D
      Merge branch 'fib_validate_loopback' · bc383ea5
      David S. Miller 提交于
      Cong Wang says:
      
      ====================
      ipv4: fix flowi4_iif for input routing
      
      This patchset fixes ->flowi4_iif for input routing and rp filter,
      based on suggestion from Julian. See per patch for details.
      
      v1 -> v2:
      * merge the first two patches into one
      * fix fib_check_nh() too
      * add this cover letter
      ====================
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc383ea5
    • C
      ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source() · 0d5edc68
      Cong Wang 提交于
      In my special case, when a packet is redirected from veth0 to lo,
      its skb->dev->ifindex would be LOOPBACK_IFINDEX. Meanwhile we
      pass the hard-coded LOOPBACK_IFINDEX to fib_validate_source()
      in ip_route_input_slow(). This would cause the following check
      in fib_validate_source() fail:
      
                  (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))
      
      when rp_filter is disabeld on loopback. As suggested by Julian,
      the caller should pass 0 here so that we will not end up by
      calling __fib_validate_source().
      
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d5edc68
    • C
      ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif · 6a662719
      Cong Wang 提交于
      As suggested by Julian:
      
      	Simply, flowi4_iif must not contain 0, it does not
      	look logical to ignore all ip rules with specified iif.
      
      because in fib_rule_match() we do:
      
              if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
                      goto out;
      
      flowi4_iif should be LOOPBACK_IFINDEX by default.
      
      We need to move LOOPBACK_IFINDEX to include/net/flow.h:
      
      1) It is mostly used by flowi_iif
      
      2) Fix the following compile error if we use it in flow.h
      by the patches latter:
      
      In file included from include/linux/netfilter.h:277:0,
                       from include/net/netns/netfilter.h:5,
                       from include/net/net_namespace.h:21,
                       from include/linux/netdevice.h:43,
                       from include/linux/icmpv6.h:12,
                       from include/linux/ipv6.h:61,
                       from include/net/ipv6.h:16,
                       from include/linux/sunrpc/clnt.h:27,
                       from include/linux/nfs_fs.h:30,
                       from init/do_mounts.c:32:
      include/net/flow.h: In function ‘flowi4_init_output’:
      include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)
      
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a662719
    • C
      mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll · c98235cb
      Chris Mason 提交于
      The mlx4 driver is triggering schedules while atomic inside
      mlx4_en_netpoll:
      
      	spin_lock_irqsave(&cq->lock, flags);
      	napi_synchronize(&cq->napi);
      		^^^^^ msleep here
      	mlx4_en_process_rx_cq(dev, cq, 0);
      	spin_unlock_irqrestore(&cq->lock, flags);
      
      This was part of a patch by Alexander Guller from Mellanox in 2011,
      but it still isn't upstream.
      Signed-off-by: NChris Mason <clm@fb.com>
      cc: stable@vger.kernel.org
      Acked-By: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c98235cb
    • D
      Merge branch 'mvneta_qsgmii' · b07afe07
      David S. Miller 提交于
      Thomas Petazzoni says:
      
      ====================
      net: mvneta: fix usage as a module, and support QSGMII properly
      
      This set of patches is a new attempt at fixing the operation of the
      mvneta driver when built as a module. For the record, the previous
      attempt, merged in commit e3a8786c
      ('net: mvneta: fix usage as a module on RGMII configurations') caused
      problems for all RGMII configurations.
      
      In fact, it turned out that the MAC to PHY connection on the Armada XP
      GP, which was described as using RGMII-ID according to its Device
      Tree, is in fact a QSGMII connection. And the RGMII and QSGMII
      configurations have to be handled in a different way in the driver,
      because the SERDES configuration is different in those two cases.
      
      So, this patch series fixes that by:
      
       * Adding minimal handling of a "qsgmii" connection type in the PHY
         layer. Mainly to make sure that a "qsgmii" phy-mode in the Device
         Tree is recognized, and handed over to the driver as
         PHY_INTERFACE_QSGMII.
      
       * Changing the mvneta driver to properly configure the RGMIIEn and
         PCSEn bits in the GMAC_CTRL_2 register, and configure the SERDES
         register, in the three possible cases: RGMII, SGMII and QSGMII.
      
       * Updating the Device Tree of the Armada XP GP board to reflect the
         fact that it uses a QSGMII MAC/PHY connection.
      
      PATCH 1 and 2 would be merged by David Miller, through the net tree,
      while PATCH 3 would be merged by the mach-mvebu maintainers, through
      their tree and arm-soc.
      
      This set of patches has been tested on:
      
       * Armada XP GP (four QSGMII interfaces)
       * Armada XP DB (two RGMII interfaces and two SGMII interfaces)
       * Armada 370 Mirabox (two RGMII interfaces)
      
      I've tested both the driver built-in, and compiled as a module.
      
      Since the last attempt at fixing this was quite a fiasco, I'd like
      this new attempt to be tested more widely before being applied. I'll
      try to do some testing on other Armada boards I have, but independent
      testing from other persons would also be appreciated.
      
      Note that these patches apply after reverting the previous attempt,
      obviously.
      ====================
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b07afe07
    • T
      net: mvneta: properly configure the MAC <-> PHY connection in all situations · 3f1dd4bc
      Thomas Petazzoni 提交于
      Commit 5445eaf3 ('mvneta: Try to fix mvneta when compiled as
      module') fixed the mvneta driver to make it work properly when loaded
      as a module in SGMII configuration, which was tested successful by the
      author on the Armada XP OpenBlocks AX3, which uses SGMII.
      
      However, some other platforms, namely the Armada XP GP don't use
      SGMII, but a QSGMII connection between the MAC and the PHY, and this
      case was not supported by the mvneta driver, which was relying on
      configuration put in place by the bootloader. While this works when
      the mvneta driver is built-in (because clocks are not gated), it
      breaks when mvneta is built as a module, because the clock is gated
      (all configuration is lost) and then re-enabled when the mvneta driver
      is loaded.
      
      In order to support all of RGMII, SGMII and QSGMII, this commit
      reworks how the PHY interface configuration is done, and simplifies
      it: it removes the mvneta_port_sgmii_config() and
      mvneta_gmac_rgmii_set() functions, which were strange because
      mvneta_gmac_rgmii_set() was called in all cases, even for SGMII
      configurations. Also, the mvneta_gmac_rgmii_set() function was taking
      a boolean as argument, which was always true.
      
      Instead, all the PHY interface configuration logic is moved into the
      mvneta_port_power_up() function, in a much simpler 'switch' construct,
      with four cases:
      
       - QSGMII: the RGMIIEn bit, the PCSEn bit in GMAC_CTRL_2 are set, and
         the SERDES is configured in QSGMII. Technically speaking,
         configuring the SERDES of the first port would be sufficient, but
         it is simpler to do it on all ports.
      
       - SGMII: the RGMIIEn bit, the PCSEn bit in GMAC_CTRL_2 are set, and
         the SERDES is configured as SGMII.
      
       - RGMII: the RGMIIEn bit in GMAC_CTRL_2 is set. The PCSEn bit is kept
         cleared, and no SERDES configuration is done, because RGMII is not
         using SERDES lanes.
      
       - other: an error is returned. For this reason, the
         mvneta_port_power_up() now returns an int instead of nothing, and
         the return value is checked by mvneta_probe().
      
      This has been successfully tested on:
      
       * Armada XP DB, which has two RGMII and two SGMII connections
       * Armada XP GP, which uses QSGMII for its four interfaces
       * Armada 370 Mirabox, which has two RGMII connections
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f1dd4bc
    • T
      net: phy: add minimal support for QSGMII PHY · b9d12085
      Thomas Petazzoni 提交于
      This commit adds the necessary definitions for the PHY layer to
      recognize "qsgmii" as a valid PHY interface. A QSMII interface, as
      defined at
      http://en.wikipedia.org/wiki/Media_Independent_Interface#Quad_Serial_Gigabit_Media_Independent_Interface,
      is "is a method of combining four SGMII lines into a 5Gbit/s
      interface. QSGMII, like SGMII, uses LVDS signalling for the TX and RX
      data and a single LVDS clock signal. QSGMII uses significantly fewer
      signal lines than four SGMII busses."
      
      This type of MAC <-> PHY connection might require special handling on
      the MAC driver side, so it should be possible to express this type of
      MAC <-> PHY connection, for example in the Device Tree.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: devicetree@vger.kernel.org
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9d12085
    • E
      sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast) · e283546c
      Edward Cree 提交于
      When an MCDI command times out (whether or not we find it
      completed when we poll), call efx_mcdi_abandon(), which tells
      all subsequent MCDI calls to fail-fast, and queues up an FLR.
      
      Because an FLR doesn't lead to receiving any reboot even from
      the MC (unlike most other types of reset), we have to call
      efx_ef10_reset_mc_allocations.
      In efx_start_all(), if a reset (of any kind) is pending, we
      bail out.
      Without this, attempts to reconfigure (e.g. change mtu) can
      cause driver/mc state inconsistency if the first MCDI call
      triggers an FLR.
      
      For similar reasons, on EF10, in
      efx_reset_down(method=RESET_TYPE_MCDI_TIMEOUT), set the number
      of active queues to zero before calling efx_stop_all().
      And, on farch, in efx_reset_up(method=RESET_TYPE_MCDI_TIMEOUT),
      set active_queues and flushes pending & outstanding to zero.
      
      efx_mcdi_mode_{poll,event}() should not take us out of fail-fast
       mode. Instead, this is done by efx_mcdi_reset() after the FLR
      completes.
      
      The new FLR reset_type RESET_TYPE_MCDI_TIMEOUT doesn't really
      fit into the hierarchy of reset 'scopes' whereby efx_reset()
      decides some resets subsume others.  Thus, it uses separate logic.
      
      Also, fixed up some inconsistency around RESET_TYPE_MC_BIST,
      which was in the wrong place in that hierarchy.
      Signed-off-by: NShradha Shah <sshah@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e283546c
  2. 16 4月, 2014 6 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 10ec34fc
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix BPF filter validation of netlink attribute accesses, from
          Mathias Kruase.
      
       2) Netfilter conntrack generation seqcount not initialized properly,
          from Andrey Vagin.
      
       3) Fix comparison mask computation on big-endian in nft_cmp_fast(),
          from Patrick McHardy.
      
       4) Properly limit MTU over ipv6, from Eric Dumazet.
      
       5) Fix seccomp system call argument population on 32-bit, from Daniel
          Borkmann.
      
       6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead
          skb->mac_len needs to be used.  From Vlad Yasevich.
      
       7) We have several cases of using socket based communications to
          implement a tunnel.  For example, some tunnels are encapsulations
          over UDP so we use an internal kernel UDP socket to do the
          transmits.
      
          These tunnels should behave just like other software devices and
          pass the packets on down to the next layer.
      
          Most importantly we want the top-level socket (eg TCP) that created
          the traffic to be charged for the SKB memory.
      
          However, once you get into the IP output path, we have code that
          assumed that whatever was attached to skb->sk is an IP socket.
      
          To keep the top-level socket being charged for the SKB memory,
          whilst satisfying the needs of the IP output path, we now pass in an
          explicit 'sk' argument.
      
          From Eric Dumazet.
      
       8) ping_init_sock() leaks group info, from Xiaoming Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
        cxgb4: use the correct max size for firmware flash
        qlcnic: Fix MSI-X initialization code
        ip6_gre: don't allow to remove the fb_tunnel_dev
        ipv4: add a sock pointer to dst->output() path.
        ipv4: add a sock pointer to ip_queue_xmit()
        driver/net: cosa driver uses udelay incorrectly
        at86rf230: fix __at86rf230_read_subreg function
        at86rf230: remove check if AVDD settled
        net: cadence: Add architecture dependencies
        net: Start with correct mac_len in skb_network_protocol
        Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
        cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
        net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
        seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
        qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
        qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
        qlcnic: Fix PVID configuration on eSwitch port.
        qlcnic: Fix max ring count calculation
        qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
        qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
        ...
      10ec34fc
    • S
      cxgb4: use the correct max size for firmware flash · 6f1d7210
      Steve Wise 提交于
      The wrong max fw size was being used and causing false
      "too big" errors running ethtool -f.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f1d7210
    • A
      qlcnic: Fix MSI-X initialization code · 8564ae09
      Alexander Gordeev 提交于
      Function qlcnic_setup_tss_rss_intr() might enter endless
      loop in case pci_enable_msix() contiguously returns a
      positive number of MSI-Xs that could have been allocated.
      Besides, the function contains 'err = -EIO;' assignment
      that never could be reached. This update fixes the
      aforementioned issues.
      
      Cc: Shahed Shaikh <shahed.shaikh@qlogic.com>
      Cc: Dept-HSGLinuxNICDev@qlogic.com
      Cc: netdev@vger.kernel.org
      Cc: linux-pci@vger.kernel.org
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Acked-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8564ae09
    • N
      ip6_gre: don't allow to remove the fb_tunnel_dev · 54d63f78
      Nicolas Dichtel 提交于
      It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but
      this is unsafe, the module always supposes that this device exists. For example,
      ip6gre_tunnel_lookup() may use it unconditionally.
      
      Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we
      let ip6gre_destroy_tunnels() do the job).
      
      Introduced by commit c12b395a ("gre: Support GRE over IPv6").
      
      CC: Dmitry Kozlov <xeb@mail.ru>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54d63f78
    • E
      ipv4: add a sock pointer to dst->output() path. · aad88724
      Eric Dumazet 提交于
      In the dst->output() path for ipv4, the code assumes the skb it has to
      transmit is attached to an inet socket, specifically via
      ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
      provider of the packet is an AF_PACKET socket.
      
      The dst->output() method gets an additional 'struct sock *sk'
      parameter. This needs a cascade of changes so that this parameter can
      be propagated from vxlan to final consumer.
      
      Fixes: 8f646c92 ("vxlan: keep original skb ownership")
      Reported-by: Nlucien xin <lucien.xin@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aad88724
    • E
      ipv4: add a sock pointer to ip_queue_xmit() · b0270e91
      Eric Dumazet 提交于
      ip_queue_xmit() assumes the skb it has to transmit is attached to an
      inet socket. Commit 31c70d59 ("l2tp: keep original skb ownership")
      changed l2tp to not change skb ownership and thus broke this assumption.
      
      One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
      so that we do not assume skb->sk points to the socket used by l2tp
      tunnel.
      
      Fixes: 31c70d59 ("l2tp: keep original skb ownership")
      Reported-by: NZhan Jianyu <nasa4836@gmail.com>
      Tested-by: NZhan Jianyu <nasa4836@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0270e91
  3. 15 4月, 2014 24 次提交
    • L
      driver/net: cosa driver uses udelay incorrectly · 1dd333f4
      Li, Zhen-Hua 提交于
      In cosa driver, udelay with more than 20000 may cause __bad_udelay.
      Use msleep for instead.
      Signed-off-by: NLi, Zhen-Hua <zhen-hual@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dd333f4
    • A
      at86rf230: fix __at86rf230_read_subreg function · 2168746c
      Alexander Aring 提交于
      The __at86rf230_read_subreg function don't mask and shift register
      contents which it should do. This patch adds the necessary masks and
      shift operations in this function.
      
      Since we have csma support this can make some trouble on state changes.
      Since CSMA support turned on some bits in the TRX_STATUS register that
      used to be zero, not masking broke checking of the TRX_STATUS field
      after commanding a state change.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Reviewed-by: NWerner Almesberger <werner@almesberger.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2168746c
    • A
      at86rf230: remove check if AVDD settled · bb78864a
      Alexander Aring 提交于
      The AVDD regulator is only enabled when the RF section is active TX_ON
      (PLL_ON) state. Since commit 7dcbd22a
      ("ieee802154: ensure that first RF212 state comes from TRX_OFF").
      We are in TRX_OFF state at the time at86rf230_hw_init is run.
      
      Note that this test would only fail in case of a severe hardware
      malfunction (faulty/shorted power supply, etc.) so it wasn't all that
      useful in the first place.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Reviewed-by: NWerner Almesberger <werner@almesberger.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb78864a
    • J
      net: cadence: Add architecture dependencies · ea05df4e
      Jean Delvare 提交于
      The Cadence ethernet chipsets are only used on specific ARM
      architectures. Add Kconfig dependencies so that drivers for these
      chipsets are only buildable on the relevant architectures.
      Signed-off-by: NJean Delvare <jdelvare@suse.de>
      Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea05df4e
    • L
      Merge git://git.kernel.org/pub/scm/virt/kvm/kvm · 55101e2d
      Linus Torvalds 提交于
      Pull KVM fixes from Marcelo Tosatti:
       - Fix for guest triggerable BUG_ON (CVE-2014-0155)
       - CR4.SMAP support
       - Spurious WARN_ON() fix
      
      * git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: remove WARN_ON from get_kernel_ns()
        KVM: Rename variable smep to cr4_smep
        KVM: expose SMAP feature to guest
        KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
        KVM: Add SMAP support when setting CR4
        KVM: Remove SMAP bit from CR4_RESERVED_BITS
        KVM: ioapic: try to recover if pending_eoi goes out of range
        KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155)
      55101e2d
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · dafe344d
      Linus Torvalds 提交于
      Pull bmc2835 crypto fix from Herbert Xu:
       "This fixes a potential boot crash on bcm2835 due to the recent change
        that now causes hardware RNGs to be accessed on registration"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        hwrng: bcm2835 - fix oops when rng h/w is accessed during registration
      dafe344d
    • M
      user namespace: fix incorrect memory barriers · e79323bd
      Mikulas Patocka 提交于
      smp_read_barrier_depends() can be used if there is data dependency between
      the readers - i.e. if the read operation after the barrier uses address
      that was obtained from the read operation before the barrier.
      
      In this file, there is only control dependency, no data dependecy, so the
      use of smp_read_barrier_depends() is incorrect. The code could fail in the
      following way:
      * the cpu predicts that idx < entries is true and starts executing the
        body of the for loop
      * the cpu fetches map->extent[0].first and map->extent[0].count
      * the cpu fetches map->nr_extents
      * the cpu verifies that idx < extents is true, so it commits the
        instructions in the body of the for loop
      
      The problem is that in this scenario, the cpu read map->extent[0].first
      and map->nr_extents in the wrong order. We need a full read memory barrier
      to prevent it.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e79323bd
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 00cbc3dc
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains three Netfilter fixes for your net tree,
      they are:
      
      * Fix missing generation sequence initialization which results in a splat
        if lockdep is enabled, it was introduced in the recent works to improve
        nf_conntrack scalability, from Andrey Vagin.
      
      * Don't flush the GRE keymap list in nf_conntrack when the pptp helper is
        disabled otherwise this crashes due to a double release, from Andrey
        Vagin.
      
      * Fix nf_tables cmp fast in big endian, from Patrick McHardy.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00cbc3dc
    • V
      net: Start with correct mac_len in skb_network_protocol · 1e785f48
      Vlad Yasevich 提交于
      Sometimes, when the packet arrives at skb_mac_gso_segment()
      its skb->mac_len already accounts for some of the mac lenght
      headers in the packet.  This seems to happen when forwarding
      through and OpenSSL tunnel.
      
      When we start looking for any vlan headers in skb_network_protocol()
      we seem to ignore any of the already known mac headers and start
      with an ETH_HLEN.  This results in an incorrect offset, dropped
      TSO frames and general slowness of the connection.
      
      We can start counting from the known skb->mac_len
      and return at least that much if all mac level headers
      are known and accounted for.
      
      Fixes: 53d6471c (net: Account for all vlan headers in skb_mac_gso_segment)
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Daniel Borkman <dborkman@redhat.com>
      Tested-by: NMartin Filip <nexus+kernel@smoula.net>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e785f48
    • M
      KVM: x86: remove WARN_ON from get_kernel_ns() · b351c39c
      Marcelo Tosatti 提交于
      Function and callers can be preempted.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=73721Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      b351c39c
    • F
      KVM: Rename variable smep to cr4_smep · 66386ade
      Feng Wu 提交于
      Rename variable smep to cr4_smep, which can better reflect the
      meaning of the variable.
      Signed-off-by: NFeng Wu <feng.wu@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      66386ade
    • F
      KVM: expose SMAP feature to guest · de935ae1
      Feng Wu 提交于
      This patch exposes SMAP feature to guest
      Signed-off-by: NFeng Wu <feng.wu@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      de935ae1
    • F
      KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode · e1e746b3
      Feng Wu 提交于
      SMAP is disabled if CPU is in non-paging mode in hardware.
      However KVM always uses paging mode to emulate guest non-paging
      mode with TDP. To emulate this behavior, SMAP needs to be
      manually disabled when guest switches to non-paging mode.
      Signed-off-by: NFeng Wu <feng.wu@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      e1e746b3
    • F
      KVM: Add SMAP support when setting CR4 · 97ec8c06
      Feng Wu 提交于
      This patch adds SMAP handling logic when setting CR4 for guests
      
      Thanks a lot to Paolo Bonzini for his suggestion to use the branchless
      way to detect SMAP violation.
      Signed-off-by: NFeng Wu <feng.wu@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      97ec8c06
    • F
      KVM: Remove SMAP bit from CR4_RESERVED_BITS · 56d6efc2
      Feng Wu 提交于
      This patch removes SMAP bit from CR4_RESERVED_BITS.
      Signed-off-by: NFeng Wu <feng.wu@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      56d6efc2
    • D
      Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" · 362d5204
      Daniel Borkmann 提交于
      This reverts commit ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management
      to reflect real state of the receiver's buffer") as it introduced a
      serious performance regression on SCTP over IPv4 and IPv6, though a not
      as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.
      
      Current state:
      
      [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
      Time: Fri, 11 Apr 2014 17:56:21 GMT
      Connecting to host 192.168.241.3, port 5201
            Cookie: Lab200slot2.1397238981.812898.548918
      [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
      [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
      [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
      [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
      [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
      [  4]   6.21-6.21   sec  0.00 Bytes    0.00 bits/sec
      [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
      [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
      [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
      [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
      [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
      [  4]  16.79-16.79  sec  0.00 Bytes    0.00 bits/sec
      [  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
      (etc)
      
      [root@Lab200slot2 ~]#  iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
      Time: Fri, 11 Apr 2014 19:08:41 GMT
      Connecting to host 2001:db8:0:f101::1, port 5201
            Cookie: Lab200slot2.1397243321.714295.2b3f7c
      [  4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.00   sec   169 MBytes  1.42 Gbits/sec
      [  4]   1.00-2.00   sec   201 MBytes  1.69 Gbits/sec
      [  4]   2.00-3.00   sec   188 MBytes  1.58 Gbits/sec
      [  4]   3.00-4.00   sec   174 MBytes  1.46 Gbits/sec
      [  4]   4.00-5.00   sec   165 MBytes  1.39 Gbits/sec
      [  4]   5.00-6.00   sec   199 MBytes  1.67 Gbits/sec
      [  4]   6.00-7.00   sec   163 MBytes  1.36 Gbits/sec
      [  4]   7.00-8.00   sec   174 MBytes  1.46 Gbits/sec
      [  4]   8.00-9.00   sec   193 MBytes  1.62 Gbits/sec
      [  4]   9.00-10.00  sec   196 MBytes  1.65 Gbits/sec
      [  4]  10.00-11.00  sec   157 MBytes  1.31 Gbits/sec
      [  4]  11.00-12.00  sec   175 MBytes  1.47 Gbits/sec
      [  4]  12.00-13.00  sec   192 MBytes  1.61 Gbits/sec
      [  4]  13.00-14.00  sec   199 MBytes  1.67 Gbits/sec
      (etc)
      
      After patch:
      
      [root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
      Time: Mon, 14 Apr 2014 16:40:48 GMT
      Connecting to host 192.168.240.3, port 5201
            Cookie: Lab200slot2.1397493648.413274.65e131
      [  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
      [  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
      [  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
      [  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
      [  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
      [  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
      [  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
      [  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec
      
      With the reverted patch applied, the SCTP/IPv4 performance is back
      to normal on latest upstream for IPv4 and IPv6 and has same throughput
      as 3.4.2 test kernel, steady and interval reports are smooth again.
      
      Fixes: ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer")
      Reported-by: NPeter Butler <pbutler@sonusnet.com>
      Reported-by: NDongsheng Song <dongsheng.song@gmail.com>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Tested-by: NPeter Butler <pbutler@sonusnet.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
      Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      362d5204
    • S
      cxgb4: Save the correct mac addr for hw-loopback connections in the L2T · bfae2324
      Steve Wise 提交于
      Hardware needs the local device mac address to support hw loopback for
      rdma loopback connections.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfae2324
    • D
      net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W · 8c482cdc
      Daniel Borkmann 提交于
      While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has
      been wrongly decoded by commit a8fc9277 ("sk-filter: Add ability to
      get socket filter program (v2)") into the opcode BPF_LD|BPF_B|BPF_ABS
      although it should have been decoded as BPF_LD|BPF_W|BPF_ABS.
      
      In practice, this should not have much side-effect though, as such
      conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse
      operation PR_GET_SECCOMP will only return the current seccomp mode, but
      not the filter itself. Since the transition to the new BPF infrastructure,
      it's also not used anymore, so we can simply remove this as it's
      unreachable.
      
      Fixes: a8fc9277 ("sk-filter: Add ability to get socket filter program (v2)")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c482cdc
    • D
      seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF · 2eac7648
      Daniel Borkmann 提交于
      Linus reports that on 32-bit x86 Chromium throws the following seccomp
      resp. audit log messages:
      
        audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
      gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
      pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
      syscall=172 compat=0 ip=0xb2dd9852 code=0x30000
      
        audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
      gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
      pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
      compat=0 ip=0xb2dd9852 code=0x50000
      
      These audit messages are being triggered via audit_seccomp() through
      __secure_computing() in seccomp mode (BPF) filter with seccomp return
      codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
      during filter runtime. Moreover, Linus reports that x86_64 Chromium
      seems fine.
      
      The underlying issue that explains this is that the implementation of
      populate_seccomp_data() is wrong. Our seccomp data structure sd that
      is being shared with user ABI is:
      
        struct seccomp_data {
          int nr;
          __u32 arch;
          __u64 instruction_pointer;
          __u64 args[6];
        };
      
      Therefore, a simple cast to 'unsigned long *' for storing the value of
      the syscall argument via syscall_get_arguments() is just wrong as on
      32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
      at wrong offsets in args[] member, and thus i) could leak stack memory
      to user space and ii) tampers with the logic of seccomp BPF programs
      that read out and check for syscall arguments:
      
        syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
      
      Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
      test machine through slow ssh X forwarding, but it fixes the issue on
      my side. So fix it up by storing args in type correct variables, gcc
      is clever and optimizes the copy away in other cases, e.g. x86_64.
      
      Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set")
      Reported-and-bisected-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eac7648
    • D
      Merge branch 'qlcnic' · 14ed4a5b
      David S. Miller 提交于
      Shahed Shaikh says:
      
      ====================
      qlcnic: Bug fixes
      
      This patch series contains following bug fixes -
      
      * Send INIT_NIC_FUNC mailbox command as first mailbox
      * Fix a panic because of uninitialized delayed_work.
      * Fix inconsistent calculation of max rings count.
      * Fix PVID configuration issue. Driver needs to clear older
        PVID before adding new one.
      * Fix QLogic application/driver interface by packing vNIC information
        array.
      * Fix a crash when user tries to disable SR-IOV while VFs are
        still assigned to VMs.
      
      Please apply to net.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14ed4a5b
    • M
      qlcnic: Do not disable SR-IOV when VFs are assigned to VMs · 696f1943
      Manish Chopra 提交于
      o While disabling SR-IOV when VFs are assigned to VMs causes host crash
        so return -EPERM when user request to disable SR-IOV using pci sysfs in
        case of VFs are assigned to VMs.
      Signed-off-by: NManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      696f1943
    • J
      qlcnic: Fix QLogic application/driver interface for virtual NIC configuration · 4f030227
      Jitendra Kalsaria 提交于
      o Application expect vNIC number as the array index but driver interface
      return configuration in array index form.
      
      o Pack the vNIC information array in the buffer such that application can
      access it using vNIC number as the array index.
      Signed-off-by: NJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
      Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f030227
    • J
      qlcnic: Fix PVID configuration on eSwitch port. · a78b6da8
      Jitendra Kalsaria 提交于
      Clear older PVID before adding a newer PVID to the eSwicth port
      Signed-off-by: NJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
      Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a78b6da8
    • S
      qlcnic: Fix max ring count calculation · 7b546842
      Shahed Shaikh 提交于
      Do not read max rings count from qlcnic_get_nic_info(). Use driver defined
      values for 82xx adapters. In case of 83xx adapters, use minimum of firmware
      provided and driver defined values.
      Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b546842