1. 08 11月, 2013 10 次提交
    • I
      tg3: avoid double-freeing of rx data memory · 85aec73d
      Ivan Vecera 提交于
      If build_skb fails the memory associated with the ring buffer is freed but
      the ri->data member is not zeroed in this case. This causes a double-free
      of this memory in tg3_free_rings->... path. The patch moves this block after
      setting ri->data to NULL.
      It would be nice to fix this bug also in stable >= v3.4 trees.
      
      Cc: Nithin Nayak Sujir <nsujir@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Acked-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85aec73d
    • E
      MAINTAINERS: Update bnx2x maintainer · 28fb9655
      Eilon Greenstein 提交于
      Ariel Elior will take over the bnx2x maintenance.
      
      It's been a pleasure!
      Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
      Acked-by: NAriel Elior <ariele@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28fb9655
    • A
      net: x86: bpf: don't forget to free sk_filter (v2) · 98bbc06a
      Andrey Vagin 提交于
      sk_filter isn't freed if bpf_func is equal to sk_run_filter.
      
      This memory leak was introduced by v3.12-rc3-224-gd45ed4a4
      "net: fix unsafe set_memory_rw from softirq".
      
      Before this patch sk_filter was freed in sk_filter_release_rcu,
      now it should be freed in bpf_jit_free.
      
      Here is output of kmemleak:
      unreferenced object 0xffff8800b774eab0 (size 128):
        comm "systemd", pid 1, jiffies 4294669014 (age 124.062s)
        hex dump (first 32 bytes):
          00 00 00 00 0b 00 00 00 20 63 7f b7 00 88 ff ff  ........ c......
          60 d4 55 81 ff ff ff ff 30 d9 55 81 ff ff ff ff  `.U.....0.U.....
        backtrace:
          [<ffffffff816444be>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811845af>] __kmalloc+0xef/0x260
          [<ffffffff81534028>] sock_kmalloc+0x38/0x60
          [<ffffffff8155d4dd>] sk_attach_filter+0x5d/0x190
          [<ffffffff815378a1>] sock_setsockopt+0x991/0x9e0
          [<ffffffff81531bd6>] SyS_setsockopt+0xb6/0xd0
          [<ffffffff8165f3e9>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      v2: add extra { } after else
      
      Fixes: d45ed4a4 ("net: fix unsafe set_memory_rw from softirq")
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98bbc06a
    • D
      Merge branch 'tipc_fragmentation' · 95ed4019
      David S. Miller 提交于
      Erik Hugne says:
      
      ====================
      tipc: message reassembly using fragment chain
      
      We introduce a new reassembly algorithm that improves performance
      and eliminates the risk of causing out-of-memory situations.
      
      v3: -Use skb_try_coalesce, and revert to fraglist if this does not succeed.
          -Make sure reassembly list head is uncloned.
      
      v2: -Rebased on Ying's indentation fix.
          -Node unlock call in msg_fragmenter case moved from patch #2 to #1.
           ('continue' with this lock held would cause spinlock recursion if only
            patch #1 is used)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95ed4019
    • E
      tipc: reassembly failures should cause link reset · a715b49e
      Erik Hugne 提交于
      If appending a received fragment to the pending fragment chain
      in a unicast link fails, the current code tries to force a retransmission
      of the fragment by decrementing the 'next received sequence number'
      field in the link. This is done under the assumption that the failure
      is caused by an out-of-memory situation, an assumption that does
      not hold true after the previous patch in this series.
      
      A failure to append a fragment can now only be caused by a protocol
      violation by the sending peer, and it must hence be assumed that it
      is either malicious or buggy.  Either way, the correct behavior is now
      to reset the link instead of trying to revert its sequence number.
      So, this is what we do in this commit.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a715b49e
    • E
      tipc: message reassembly using fragment chain · 40ba3cdf
      Erik Hugne 提交于
      When the first fragment of a long data data message is received on a link, a
      reassembly buffer large enough to hold the data from this and all subsequent
      fragments of the message is allocated. The payload of each new fragment is
      copied into this buffer upon arrival. When the last fragment is received, the
      reassembled message is delivered upwards to the port/socket layer.
      
      Not only is this an inefficient approach, but it may also cause bursts of
      reassembly failures in low memory situations. since we may fail to allocate
      the necessary large buffer in the first place. Furthermore, after 100 subsequent
      such failures the link will be reset, something that in reality aggravates the
      situation.
      
      To remedy this problem, this patch introduces a different approach. Instead of
      allocating a big reassembly buffer, we now append the arriving fragments
      to a reassembly chain on the link, and deliver the whole chain up to the
      socket layer once the last fragment has been received. This is safe because
      the retransmission layer of a TIPC link always delivers packets in strict
      uninterrupted order, to the reassembly layer as to all other upper layers.
      Hence there can never be more than one fragment chain pending reassembly at
      any given time in a link, and we can trust (but still verify) that the
      fragments will be chained up in the correct order.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40ba3cdf
    • E
      tipc: don't reroute message fragments · 528f6f4b
      Erik Hugne 提交于
      When a message fragment is received in a broadcast or unicast link,
      the reception code will append the fragment payload to a big reassembly
      buffer through a call to the function tipc_recv_fragm(). However, after
      the return of that call, the logics goes on and passes the fragment
      buffer to the function tipc_net_route_msg(), which will simply drop it.
      This behavior is a remnant from the now obsolete multi-cluster
      functionality, and has no relevance in the current code base.
      
      Although currently harmless, this unnecessary call would be fatal
      after applying the next patch in this series, which introduces
      a completely new reassembly algorithm. So we change the code to
      eliminate the redundant call.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      528f6f4b
    • J
      phy: Add MOXA MDIO driver · b0db7b0c
      Jonas Jensen 提交于
      The MOXA UC-711X hardware(s) has an ethernet controller that seem
      to be developed internally. The IC used is "RTL8201CP".
      
      This patch adds an MDIO driver which handles the MII bus.
      Signed-off-by: NJonas Jensen <jonas.jensen@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0db7b0c
    • N
      bonding: document the new packets_per_slave option · 12465fb8
      Nikolay Aleksandrov 提交于
      Add new documentation for the packets_per_slave option available
      for balance-rr mode.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12465fb8
    • N
      bonding: extend round-robin mode with packets_per_slave · 73958329
      Nikolay Aleksandrov 提交于
      This patch aims to extend round-robin mode with a new option called
      packets_per_slave which can have the following values and effects:
      0 - choose a random slave
      1 (default) - standard round-robin, 1 packet per slave
       >1 - round-robin when >1 packets have been transmitted per slave
      The allowed values are between 0 and 65535.
      This patch also fixes the comment style in bond_xmit_roundrobin().
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73958329
  2. 07 11月, 2013 6 次提交
  3. 06 11月, 2013 10 次提交
    • J
      virtio-net: switch to use XPS to choose txq · 9bb8ca86
      Jason Wang 提交于
      We used to use a percpu structure vq_index to record the cpu to queue
      mapping, this is suboptimal since it duplicates the work of XPS and
      loses all other XPS functionality such as allowing user to configure
      their own transmission steering strategy.
      
      So this patch switches to use XPS and suggest a default mapping when
      the number of cpus is equal to the number of queues. With XPS support,
      there's no need for keeping per-cpu vq_index and .ndo_select_queue(),
      so they were removed also.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bb8ca86
    • D
      ipv6: drop the judgement in rt6_alloc_cow() · 249a3630
      Duan Jiong 提交于
      Now rt6_alloc_cow() is only called by ip6_pol_route() when
      rt->rt6i_flags doesn't contain both RTF_NONEXTHOP and RTF_GATEWAY,
      and rt->rt6i_flags hasn't been changed in ip6_rt_copy().
      So there is no neccessary to judge whether rt->rt6i_flags contains
      RTF_GATEWAY or not.
      Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      249a3630
    • H
      ipv6: fix headroom calculation in udp6_ufo_fragment · 0e033e04
      Hannes Frederic Sowa 提交于
      Commit 1e2bd517 ("udp6: Fix udp
      fragmentation for tunnel traffic.") changed the calculation if
      there is enough space to include a fragment header in the skb from a
      skb->mac_header dervived one to skb_headroom. Because we already peeled
      off the skb to transport_header this is wrong. Change this back to check
      if we have enough room before the mac_header.
      
      This fixes a panic Saran Neti reported. He used the tbf scheduler which
      skb_gso_segments the skb. The offsets get negative and we panic in memcpy
      because the skb was erroneously not expanded at the head.
      Reported-by: NSaran Neti <Saran.Neti@telus.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e033e04
    • J
      net: mv643xx_eth: Add missing phy_addr_set in DT mode · 1cce16d3
      Jason Gunthorpe 提交于
      Commit cc9d4598 'net: mv643xx_eth: use of_phy_connect if phy_node
      present' made the call to phy_scan optional, if the DT has a link to
      the phy node.
      
      However phy_scan has the side effect of calling phy_addr_set, which
      writes the phy MDIO address to the ethernet controller. If phy_addr_set
      is not called, and the bootloader has not set the correct address then
      the driver will fail to function.
      
      Tested on Kirkwood.
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Acked-by: NSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cce16d3
    • H
      ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE · 482fc609
      Hannes Frederic Sowa 提交于
      Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
      their sockets won't accept and install new path mtu information and they
      will always use the interface mtu for outgoing packets. It is guaranteed
      that the packet is not fragmented locally. But we won't set the DF-Flag
      on the outgoing frames.
      
      Florian Weimer had the idea to use this flag to ensure DNS servers are
      never generating outgoing fragments. They may well be fragmented on the
      path, but the server never stores or usees path mtu values, which could
      well be forged in an attack.
      
      (The root of the problem with path MTU discovery is that there is
      no reliable way to authenticate ICMP Fragmentation Needed But DF Set
      messages because they are sent from intermediate routers with their
      source addresses, and the IMCP payload will not always contain sufficient
      information to identify a flow.)
      
      Recent research in the DNS community showed that it is possible to
      implement an attack where DNS cache poisoning is feasible by spoofing
      fragments. This work was done by Amir Herzberg and Haya Shulman:
      <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>
      
      This issue was previously discussed among the DNS community, e.g.
      <http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
      without leading to fixes.
      
      This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
      regarding local fragmentation with UFO/CORK" for the enforcement of the
      non-fragmentable checks. If other users than ip_append_page/data should
      use this semantic too, we have to add a new flag to IPCB(skb)->flags to
      suppress local fragmentation and check for this in ip_finish_output.
      
      Many thanks to Florian Weimer for the idea and feedback while implementing
      this patch.
      
      Cc: David S. Miller <davem@davemloft.net>
      Suggested-by: NFlorian Weimer <fweimer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      482fc609
    • D
      Merge branch 'huawei_cdc_ncm' · b9155501
      David S. Miller 提交于
      Bjørn Mork says:
      
      ====================
      The huawei_cdc_ncm driver.
      
      Enrico has been kind enough to let me repost his driver with the changes
      requested by Oliver Neukum during the last review of this series.
      
      The changes I have made from Enricos original v5 series to this version
      are:
      
      v6:
       - fix to avoid corrupting drvstate->pmcount
       - fix error return value from huawei_cdc_ncm_suspend()
       - drop redundant testing for subdriver->suspend during resume
       - broke a few lines to keep within the 80 columns recommendation
       - rebased on top of current net-next
      
      Enrico's orginal introduction to the v5 series follows below.  It explains
      the background much better than I can.
      
      Bjørn
      
      [quote Enrico Mioso]
      
      So this is a new, revised, edition of the huawei_cdc_ncm.c driver, which
      supports devices resembling the NCM standard, but using it also as a mean
      to encapsulate other protocols, as is the case for the Huawei E3131 and
      E3251 modem devices.
      Some precisations are needed however - and I encourage discussion on this: and
      that's why I'm sending this message with a broader CC.
      Merging those patches might change:
      - the way Modem Manager interacts with those devices
      - some regressions might be possible if there are some unknown firmware
        variants around (Franko?)
      
      First of all: I observed the behaviours of two devices.
      Huawei E3131: this device doesn't accept NDIS setup requests unless they're
      sent via the embedded AT channel exposed by this driver.
      So actually we gain funcionality in this case!
      
      The second case, is the Huawei E3251: which works with standard NCM driver,
      still exposing an AT embedded channel. Whith this patch set applied, you gain
      some funcionality, loosing the ability to catch standard NCM events for now.
      The device will work in both ways with no problems, but this has to be
      acknowledged and discussed. Might be we can develop this driver further to
      change this, when more devices are tested.
      
      We where thinking Huawei changed their interfaces on new devices - but probably
      this driver only works around a nice firmware bug present in E3131, which
      prevented the modem from being used in NDIS mode.
      
      I think committing this is definitely wortth-while, since it will allow for
      more Huawei devices to be used without serial connection. Some devices like the
      E3251 also, reports some status information only via the embedded AT channel,
      at least in my case.
      Note: I'm not subscribed to any list except the Modem Manager's one, so please
      CC me, thanks!!
      
      [/quote]
      
      Enrico Mioso (3):
        net: cdc_ncm: Export cdc_ncm_{tx,rx}_fixup functions for re-use
        net: huawei_cdc_ncm: Introduce the huawei_cdc_ncm driver
        net: cdc_ncm: remove non-standard NCM device IDs
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9155501
    • E
      net: cdc_ncm: remove non-standard NCM device IDs · 9fea037d
      Enrico Mioso 提交于
      Remove device IDs of NCM-like (but not NCM-conformant) devices, that are
      handled by the huawwei_cdc_ncm driver now.
      Signed-off-by: NEnrico Mioso <mrkiko.rs@gmail.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fea037d
    • E
      net: huawei_cdc_ncm: Introduce the huawei_cdc_ncm driver · 41c47d8c
      Enrico Mioso 提交于
      This driver supports devices using the NCM protocol as an encapsulation layer
      for other protocols, like the E3131 Huawei 3G modem. This drivers approach was
      heavily inspired by the qmi_wwan/cdc_mbim approach & code model.
      Signed-off-by: NEnrico Mioso <mrkiko.rs@gmail.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41c47d8c
    • E
      net: cdc_ncm: Export cdc_ncm_{tx, rx}_fixup functions for re-use · 2f69702c
      Enrico Mioso 提交于
      Some drivers implementing NCM-like protocols, may re-use those functions, as is
      the case in the huawei_cdc_ncm driver.
      Export them via EXPORT_SYMBOL_GPL, in accordance with how other functions have
      been exported.
      Signed-off-by: NEnrico Mioso <mrkiko.rs@gmail.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f69702c
    • F
      ipv6: remove old conditions on flow label sharing · b579035f
      Florent Fourcot 提交于
      The code of flow label in Linux Kernel follows
      the rules of RFC 1809 (an informational one) for
      conditions on flow label sharing. There rules are
      not in the last proposed standard for flow label
      (RFC 6437), or in the previous one (RFC 3697).
      
      Since this code does not follow any current or
      old standard, we can remove it.
      
      With this removal, the ipv6_opt_cmp function is
      now a dead code and it can be removed too.
      
      Changelog to v1:
       * add justification for the change
       * remove the condition on IPv6 options
      
      [ Remove ipv6_hdr_cmp and it is now unused as well. -DaveM ]
      Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b579035f
  4. 05 11月, 2013 14 次提交
    • D
      Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next · cfce0a2b
      David S. Miller 提交于
      John W. Linville says:
      
      ====================
      Please accept the following pull request intended for the 3.13 tree...
      
      I had intended to pass most of these to you as much as two weeks ago.
      Unfortunately, I failed to account for the effects of bad Internet
      connections and my own fatique/laziness while traveling.  On the bright
      side, at least these have been baking in linux-next for some time!
      
      For the mac80211 bits, Johannes says:
      
      "This time I have two fixes for P2P (which requires not using CCK rates)
      and a workaround for APs with broken WMM information."
      
      For the iwlwifi bits, Johannes says:
      
      "I have a few fixes for warnings/issues: one from Alex, fixing scan
      timings, one from Emmanuel fixing a WARN_ON in the DVM driver, one from
      Stanislaw removing a trigger-happy WARN_ON in the MVM driver and a
      change from myself to try to recover when the device isn't processing
      commands quickly."
      
      And:
      
      "For this round, I have a lot of changes:
       * power management improvements
       * BT coexistence improvements/updates
       * new device support
       * VHT support
       * IBSS support (though due to a small bug it requires new firmware)
       * various other fixes/improvements."
      
      For the Bluetooth bits, Gustavo says:
      
      "More patches for 3.12, busy times for Bluetooth. More than a 100 commits since
      the last pull. The bulk of work comes from Johan and Marcel, they are doing
      fixes and improvements all over the Bluetooth subsystem, as the diffstat can
      show."
      
      For the ath10k and ath6kl bits, Kalle says:
      
      "Bartosz added support to ath10k for our 10.x AP firmware branch, which
      gives us AP specific features and fixes. We still support the main
      firmware branch as well just like before, ath10k detects runtime what
      firmware is used. Unfortunately the firmware interface in 10.x branch is
      somewhat different so there was quite a lot of changes in ath10k for
      this.
      
      Michal and Sujith did some performance improvements in ath10k. Vladimir
      fixed a compiler warning and Fengguang removed an extra semicolon."
      
      For the NFC bits, Samuel says:
      
      "It's a fairly big one, with the following highlights:
      
      - NFC digital layer implementation: Most NFC chipsets implement the NFC
        digital layer in firmware, but others have more basic functionalities
        and expect the host to implement the digital layer. This layer sits
        below the NFC core.
      
      - Sony's port100 support: This is "soft" NFC USB dongle that expects the
        digital layer to be implemented on the host. This is the first user of
        our NFC digital stack implementation.
      
      - Secure element API: We now provide a netlink API for enabling,
        disabling and discovering NFC attached (embedded or UICC ones) secure
        elements. With some userspace help, this allows us to support NFC
        payments.
        Only the pn544 driver currently supports that API.
      
      - NCI SPI fixes and improvements: In order to support NCI devices over
        SPI, we fixed and improved our NCI/SPI implementation. The currently
        most deployed NFC NCI chipset, Broadcom's bcm2079x, supports that mode
        and we're planning to use our NCI/SPI framework to implement a
        driver for it.
      
      - pn533 fragmentation support in target mode: This was the only missing
        feature from our pn533 impementation. We now support fragmentation in
        both Tx and Rx modes, in target mode."
      
      On top of all that, brcmfmac and rt2x00 both get the usual flurry
      of updates.  A few other drivers get hit here or there as well.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfce0a2b
    • J
      virtio-net: coalesce rx frags when possible during rx · ba275241
      Jason Wang 提交于
      Commit 2613af0e (virtio_net: migrate mergeable
      rx buffers to page frag allocators) try to increase the payload/truesize for
      MTU-sized traffic. But this will introduce the extra overhead for GSO packets
      received because of the frag list. This commit tries to reduce this issue by
      coalesce the possible rx frags when possible during rx. Test result shows the
      about 15% improvement on full size GSO packet receiving (and even better than
      before commit 2613af0e).
      
      Before this commit:
      ./netperf -H 192.168.100.4
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
      () port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    20303.87
      
      After this commit:
      ./netperf -H 192.168.100.4
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
      () port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    23841.26
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Michael Dalton <mwdalton@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba275241
    • J
      net: introduce skb_coalesce_rx_frag() · f8e617e1
      Jason Wang 提交于
      Sometimes we need to coalesce the rx frags to avoid frag list. One example is
      virtio-net driver which tries to use small frags for both MTU sized packet and
      GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Michael Dalton <mwdalton@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8e617e1
    • D
      vxlan: Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)) · e50fddc8
      Duan Jiong 提交于
      trivial patch converting ERR_PTR(PTR_ERR()) into ERR_CAST().
      No functional changes.
      Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e50fddc8
    • J
      net: codel: Avoid undefined behavior from signed overflow · 1ba3aab3
      Jesper Dangaard Brouer 提交于
      As described in commit 5a581b36 (jiffies: Avoid undefined
      behavior from signed overflow), according to the C standard
      3.4.3p3, overflow of a signed integer results in undefined
      behavior.
      
      To fix this, do as the above commit, and do an unsigned
      subtraction, and interpreting the result as a signed
      two's-complement number.  This is based on the theory from
      RFC 1982 and is nicely described in wikipedia here:
       https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution
      
      A side-note, I have seen practical issues with the previous logic
      when dealing with 16-bit, on a 64-bit machine (gcc version
      4.4.5). This were 32-bit, which I have not observed issues with.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NJesper Dangaard Brouer <netoptimizer@brouer.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ba3aab3
    • D
      Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next · 13521a57
      David S. Miller 提交于
      Marc Kleine-Budde says:
      
      ====================
      here's a pull request for net-next.
      
      It includes a patch by Oliver Hartkopp et al. that adds documentation
      for the broadcast manager to Documentation/networking/can.txt. Three
      patches by me that clean up the netlink handling code in the CAN core.
      And another patch that removes a not needed function from the ti_hecc
      driver.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13521a57
    • Y
      tcp: properly handle stretch acks in slow start · 9f9843a7
      Yuchung Cheng 提交于
      Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
      regardless the number of packets. Consequently slow start performance
      is highly dependent on the degree of the stretch ACKs caused by
      receiver or network ACK compression mechanisms (e.g., delayed-ACK,
      GRO, etc).  But slow start algorithm is to send twice the amount of
      packets of packets left so it should process a stretch ACK of degree
      N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
      follow up patch will use the remainder of the N (if greater than 1)
      to adjust cwnd in the congestion avoidance phase.
      
      In addition this patch retires the experimental limited slow start
      (LSS) feature. LSS has multiple drawbacks but questionable benefit. The
      fractional cwnd increase in LSS requires a loop in slow start even
      though it's rarely used. Configuring such an increase step via a global
      sysctl on different BDPS seems hard. Finally and most importantly the
      slow start overshoot concern is now better covered by the Hybrid slow
      start (hystart) enabled by default.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f9843a7
    • Y
      tcp: enable sockets to use MSG_FASTOPEN by default · 0d41cca4
      Yuchung Cheng 提交于
      Applications have started to use Fast Open (e.g., Chrome browser has
      such an optional flag) and the feature has gone through several
      generations of kernels since 3.7 with many real network tests. It's
      time to enable this flag by default for applications to test more
      conveniently and extensively.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d41cca4
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables · f8785c55
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      This batch contains fives nf_tables patches for your net-next tree,
      they are:
      
      * Fix possible use after free in the module removal path of the
        x_tables compatibility layer, from Dan Carpenter.
      
      * Add filter chain type for the bridge family, from myself.
      
      * Fix Kconfig dependencies of the nf_tables bridge family with
        the core, from myself.
      
      * Fix sparse warnings in nft_nat, from Tomasz Bursztyka.
      
      * Remove duplicated include in the IPv4 family support for nf_tables,
        from Wei Yongjun.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8785c55
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 72c39a0a
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      This is another batch containing Netfilter/IPVS updates for your net-next
      tree, they are:
      
      * Six patches to make the ipt_CLUSTERIP target support netnamespace,
        from Gao feng.
      
      * Two cleanups for the nf_conntrack_acct infrastructure, introducing
        a new structure to encapsulate conntrack counters, from Holger
        Eitzenberger.
      
      * Fix missing verdict in SCTP support for IPVS, from Daniel Borkmann.
      
      * Skip checksum recalculation in SCTP support for IPVS, also from
        Daniel Borkmann.
      
      * Fix behavioural change in xt_socket after IP early demux, from
        Florian Westphal.
      
      * Fix bogus large memory allocation in the bitmap port set type in ipset,
        from Jozsef Kadlecsik.
      
      * Fix possible compilation issues in the hash netnet set type in ipset,
        also from Jozsef Kadlecsik.
      
      * Define constants to identify netlink callback data in ipset dumps,
        again from Jozsef Kadlecsik.
      
      * Use sock_gen_put() in xt_socket to replace xt_socket_put_sk,
        from Eric Dumazet.
      
      * Improvements for the SH scheduler in IPVS, from Alexander Frolkin.
      
      * Remove extra delay due to unneeded rcu barrier in IPVS net namespace
        cleanup path, from Julian Anastasov.
      
      * Save some cycles in ip6t_REJECT by skipping checksum validation in
        packets leaving from our stack, from Stanislav Fomichev.
      
      * Fix IPVS_CMD_ATTR_MAX definition in IPVS, larger that required, from
        Julian Anastasov.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72c39a0a
    • D
      netfilter: nft_compat: use _safe version of list_for_each · c359c415
      Dan Carpenter 提交于
      We need to use the _safe version of list_for_each_entry() here otherwise
      we have a use after free bug.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c359c415
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch · 6fcf018a
      David S. Miller 提交于
      Jesse Gross says:
      
      ====================
      Open vSwitch
      
      A set of updates for net-next/3.13. Major changes are:
       * Restructure flow handling code to be more logically organized and
         easier to read.
       * Rehashing of the flow table is moved from a workqueue to flow
         installation time. Before, heavy load could block the workqueue for
         excessive periods of time.
       * Additional debugging information is provided to help diagnose megaflows.
       * It's now possible to match on TCP flags.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fcf018a
    • D
      Merge branch 'mlx4' · 5a6e55c4
      David S. Miller 提交于
      Or Gerlitz says:
      
      ====================
      Mellanox driver updates
      
      This patch set from Jack Morgenstein does the following:
      
      1. Fix MAC/VLAN SRIOV implementation, and add wrapper functions for VLAN allocation
         and de-allocation (patches 1-6).
      
      2. Implements resource quotas when running under SRIOV (patches 7-10).
         Patch 7 is a small bug fix, and patches 8-10 implement the quotas.
      
      Quotas are implemented per resource type for VFs and the PF, to prevent
      any entity from simply grabbing all the resources for itself and leaving
      the other entities unable to obtain such resources.
      
      The series is against net-next commit ba486502 "ipv6: remove the unnecessary statement in find_match()"
      
      changes from V0:
       - dropped the 1st patch which needs to go to -stable and hence through net,
         not net-next
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a6e55c4
    • J
      net/mlx4_core: Implement resource quota enforcement · 146f3ef4
      Jack Morgenstein 提交于
      Implements resource quota grant decision when resources are requested,
      for the following resources:  QPs, CQs, SRQs, MPTs, MTTs, vlans, MACs,
      and Counters.
      
      When granting a resource, the quota system increases the allocated-count
      for that slave.
      
      When the slave later frees the resource, its allocated-count is reduced.
      
      A spinlock is used to protect the integrity of each resource's free-pool counter.
      (One slave may be in the process of being granted a resource while another
      slave has crashed, initiating cleanup of that slave's resource quotas).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      146f3ef4