1. 09 7月, 2014 4 次提交
  2. 08 7月, 2014 10 次提交
    • Y
      tcp: fix false undo corner cases · 6e08d5e3
      Yuchung Cheng 提交于
      The undo code assumes that, upon entering loss recovery, TCP
      1) always retransmit something
      2) the retransmission never fails locally (e.g., qdisc drop)
      
      so undo_marker is set in tcp_enter_recovery() and undo_retrans is
      incremented only when tcp_retransmit_skb() is successful.
      
      When the assumption is broken because TCP's cwnd is too small to
      retransmit or the retransmit fails locally. The next (DUP)ACK
      would incorrectly revert the cwnd and the congestion state in
      tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs
      may enter the recovery state. The sender repeatedly enter and
      (incorrectly) exit recovery states if the retransmits continue to
      fail locally while receiving (DUP)ACKs.
      
      The fix is to initialize undo_retrans to -1 and start counting on
      the first retransmission. Always increment undo_retrans even if the
      retransmissions fail locally because they couldn't cause DSACKs to
      undo the cwnd reduction.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e08d5e3
    • O
      net/mlx4_en: Don't configure the HW vxlan parser when vxlan offloading isn't set · e326f2f1
      Or Gerlitz 提交于
      The add_vxlan_port ndo driver code was wrongly testing whether HW vxlan offloads
      are supported by the device instead of checking if they are currently enabled.
      
      This causes the driver to configure the HW parser to conduct matching for vxlan
      packets but since no steering rules were set, vxlan packets are dropped on RX.
      
      Fix that by doing the right test, as done in the del_vxlan_port ndo handler.
      
      Fixes: 1b136de1 ('net/mlx4: Implement vxlan ndo calls')
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e326f2f1
    • D
      igmp: fix the problem when mc leave group · 52ad353a
      dingtianhong 提交于
      The problem was triggered by these steps:
      
      1) create socket, bind and then setsockopt for add mc group.
         mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
         mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
         setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));
      
      2) drop the mc group for this socket.
         mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
         mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
         setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));
      
      3) and then drop the socket, I found the mc group was still used by the dev:
      
         netstat -g
      
         Interface       RefCnt Group
         --------------- ------ ---------------------
         eth2		   1	  255.0.0.37
      
      Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
      to be released for the netdev when drop the socket, but this process was broken when
      route default is NULL, the reason is that:
      
      The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
      is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
      then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
      the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
      released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
      when dropping the socket, the mc_list is NULL and the dev still keep this group.
      
      v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
      	so I add the checking for the in_dev before polling the mc_list, make sure when
      	we remove the mc group, dec the refcnt to the real dev which was using the mc address.
      	The problem would never happened again.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52ad353a
    • L
      net: Fix NETDEV_CHANGE notifier usage causing spurious arp flush · 54951194
      Loic Prylli 提交于
      A bug was introduced in NETDEV_CHANGE notifier sequence causing the
      arp table to be sometimes spuriously cleared (including manual arp
      entries marked permanent), upon network link carrier changes.
      
      The changed argument for the notifier was applied only to a single
      caller of NETDEV_CHANGE, missing among others netdev_state_change().
      So upon net_carrier events induced by the network, which are
      triggering a call to netdev_state_change(), arp_netdev_event() would
      decide whether to clear or not arp cache based on random/junk stack
      values (a kind of read buffer overflow).
      
      Fixes: be9efd36 ("net: pass changed flags along with NETDEV_CHANGE event")
      Fixes: 6c8b4e3f ("arp: flush arp cache on IFF_NOARP change")
      Signed-off-by: NLoic Prylli <loicp@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54951194
    • B
      net: qmi_wwan: Add ID for Telewell TW-LTE 4G v2 · 8dcb4b15
      Bernd Wachter 提交于
      There's a new version of the Telewell 4G modem working with, but not
      recognized by this driver.
      Signed-off-by: NBernd Wachter <bernd.wachter@jolla.com>
      Acked-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dcb4b15
    • D
      Revert "net: stmmac: add platform init/exit for Altera's ARM socfpga" · 26a9ebca
      David S. Miller 提交于
      This reverts commit 0acf1676.
      
      Breaks the build due to missing reference to phy_resume in
      the resulting dwmac-socfpga.o object.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26a9ebca
    • Z
      powerpc/ucc_geth: deal with a compile warning · 8844a006
      Zhao Qiang 提交于
      deal with a compile warning: comparison between
      'enum qe_fltr_largest_external_tbl_lookup_key_size'
      and 'enum qe_fltr_tbl_lookup_key_size'
      
      the code:
      	"if (ug_info->largestexternallookupkeysize ==
      	     QE_FLTR_TABLE_LOOKUP_KEY_SIZE_8_BYTES)"
      is warned because different enum, so modify it.
      
      	"enum qe_fltr_largest_external_tbl_lookup_key_size
      	             largestexternallookupkeysize;
      
      	enum qe_fltr_tbl_lookup_key_size {
      		 QE_FLTR_TABLE_LOOKUP_KEY_SIZE_8_BYTES
      			 = 0x3f,         /* LookupKey parsed by the Generate LookupKey
      					    CMD is truncated to 8 bytes */
      		 QE_FLTR_TABLE_LOOKUP_KEY_SIZE_16_BYTES
      			 = 0x5f,         /* LookupKey parsed by the Generate LookupKey
      					    CMD is truncated to 16 bytes */
      	 };
      
      	 /* QE FLTR extended filtering Largest External Table Lookup Key Size */
      	 enum qe_fltr_largest_external_tbl_lookup_key_size {
      		 QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_NONE
      			 = 0x0,/* not used */
      		 QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_8_BYTES
      			 = QE_FLTR_TABLE_LOOKUP_KEY_SIZE_8_BYTES,        /* 8 bytes */
      		 QE_FLTR_LARGEST_EXTERNAL_TABLE_LOOKUP_KEY_SIZE_16_BYTES
      			 = QE_FLTR_TABLE_LOOKUP_KEY_SIZE_16_BYTES,       /* 16 bytes */
      	 };"
      Signed-off-by: NZhao Qiang <B45475@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8844a006
    • D
      Merge branch 'net_ovs_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch · edc1bb0b
      David S. Miller 提交于
      Pravin B Shelar says:
      
      ====================
      Open vSwitch
      
      A set of fixes for net.
      First bug is related flow-table management.  Second one is in sample
      action. Third is related flow stats and last one add gre-err handler for ovs.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edc1bb0b
    • T
      net: Performance fix for process_backlog · 11ef7a89
      Tom Herbert 提交于
      In process_backlog the input_pkt_queue is only checked once for new
      packets and quota is artificially reduced to reflect precisely the
      number of packets on the input_pkt_queue so that the loop exits
      appropriately.
      
      This patches changes the behavior to be more straightforward and
      less convoluted. Packets are processed until either the quota
      is met or there are no more packets to process.
      
      This patch seems to provide a small, but noticeable performance
      improvement. The performance improvement is a result of staying
      in the process_backlog loop longer which can reduce number of IPI's.
      
      Performance data using super_netperf TCP_RR with 200 flows:
      
      Before fix:
      
      88.06% CPU utilization
      125/190/309 90/95/99% latencies
      1.46808e+06 tps
      1145382 intrs.sec.
      
      With fix:
      
      87.73% CPU utilization
      122/183/296 90/95/99% latencies
      1.4921e+06 tps
      1021674.30 intrs./sec.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11ef7a89
    • E
      ipv4: icmp: Fix pMTU handling for rare case · 68b7107b
      Edward Allcutt 提交于
      Some older router implementations still send Fragmentation Needed
      errors with the Next-Hop MTU field set to zero. This is explicitly
      described as an eventuality that hosts must deal with by the
      standard (RFC 1191) since older standards specified that those
      bits must be zero.
      
      Linux had a generic (for all of IPv4) implementation of the algorithm
      described in the RFC for searching a list of MTU plateaus for a good
      value. Commit 46517008 ("ipv4: Kill ip_rt_frag_needed().")
      removed this as part of the changes to remove the routing cache.
      Subsequently any Fragmentation Needed packet with a zero Next-Hop
      MTU has been discarded without being passed to the per-protocol
      handlers or notifying userspace for raw sockets.
      
      When there is a router which does not implement RFC 1191 on an
      MTU limited path then this results in stalled connections since
      large packets are discarded and the local protocols are not
      notified so they never attempt to lower the pMTU.
      
      One example I have seen is an OpenBSD router terminating IPSec
      tunnels. It's worth pointing out that this case is distinct from
      the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU
      since the commit in question dismissed that as a valid concern.
      
      All of the per-protocols handlers implement the simple approach from
      RFC 1191 of immediately falling back to the minimum value. Although
      this is sub-optimal it is vastly preferable to connections hanging
      indefinitely.
      
      Remove the Next-Hop MTU != 0 check and allow such packets
      to follow the normal path.
      
      Fixes: 46517008 ("ipv4: Kill ip_rt_frag_needed().")
      Signed-off-by: NEdward Allcutt <edward.allcutt@openmarket.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68b7107b
  3. 03 7月, 2014 14 次提交
    • D
      Merge branch 'stmmac' · a921e2a3
      David S. Miller 提交于
      Vince Bridgers says:
      
      ====================
      net: stmmac: Correct socfpga init/exit and
      
      This patch series adds platform specific init/exit code so that socfpga
      suspend/resume works as expected, and corrects a minor issue detected by
      cppcheck.
      
      V2: Address review comments by adding a line break at end of function and
          structure declaration. Add another trivial cppcheck patch.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a921e2a3
    • V
      net: stmmac: Remove unneeded I/O read caught by cppcheck · c8df8ce3
      Vince Bridgers 提交于
      Cppcheck found a case where a local variable was being assigned a value,
      but not used. There seems to be no reason to read this register before
      assigning a new value, so addressing thie issue.
      
      cppcheck --force --enable=all --inline-suppr . shows ...
      
      Variable 'value' is reassigned a value before the old one has been used.
      Signed-off-by: NVince Bridgers <vbridgers2013@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8df8ce3
    • V
      net: stmmac: Correct duplicate if/then/else case found by cppcheck · 43d24e48
      Vince Bridgers 提交于
      Cppcheck found a duplicate if/then/else case where a receive descriptor
      was being processed. This patch corrects that issue.
      
      cppcheck --force --enable=all --inline-suppr .
      ...
      Checking enh_desc.c...
      [enh_desc.c:148] -> [enh_desc.c:144]: (style) Found duplicate if expressions.
      ...
      Signed-off-by: NVince Bridgers <vbridgers2013@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43d24e48
    • V
      net: stmmac: add platform init/exit for Altera's ARM socfpga · 0acf1676
      Vince Bridgers 提交于
      This patch adds platform init/exit functions and modifications to support
      suspend/resume for the Altera Cyclone 5 SOC Ethernet controller. The platform
      exit function puts the controller into reset using the socfpga reset
      controller driver. The platform init function sets up the Synopsys mac by
      first making sure the Ethernet controller is held in reset, programming the
      phy mode through external support logic, then deasserts reset through
      the socfpga reset manager driver.
      Signed-off-by: NVince Bridgers <vbridgers2013@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0acf1676
    • A
      ieee802154: reassembly: fix possible buffer overflow · 48bc0343
      Alexander Aring 提交于
      The max_dsize attribute in ctl_table for lowpan_frags_ns_ctl_table is
      configured with integer accessing methods. This patch change the
      max_dsize attribute to int to avoid a possible buffer overflow.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48bc0343
    • D
      Merge branch 'mlx4' · 98846e69
      David S. Miller 提交于
      Amir Vadai says:
      
      ====================
      Mellanox EN driver fixes 2014-06-23
      
      Below are some fixes to patches submitted to 3.16.
      
      First patch is according to discussions with Ben [1] and Thomas [2] - to do not
      use affinity notifier, since it breaks RFS. Instead detect changes in IRQ
      affinity map, by checking if current CPU is set in affinity map on NAPI poll.
      The two other patches fix some bugs introduced in commit [3].
      
      Patches were applied and tested over commit dba63115: ('powerpc: bpf: Fix the
      broken LD_VLAN_TAG_PRESENT test')
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98846e69
    • A
      net/mlx4_en: IRQ affinity hint is not cleared on port down · bb273617
      Amir Vadai 提交于
      Need to remove affinity hint at mlx4_en_deactivate_cq() and not at
      mlx4_en_destroy_cq() - since affinity_mask might be free'd while still
      being used by procfs.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb273617
    • A
      lib/cpumask: cpumask_set_cpu_local_first to use all cores when numa node is not defined · 143b5ba2
      Amir Vadai 提交于
      When device is non numa aware (numa_node == -1), use all online cpu's.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      143b5ba2
    • A
      net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map · 35f6f453
      Amir Vadai 提交于
      IRQ affinity notifier can only have a single notifier - cpu_rmap
      notifier. Can't use it to track changes in IRQ affinity map.
      Detect IRQ affinity changes by comparing CPU to current IRQ affinity map
      during NAPI poll thread.
      
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ben Hutchings <ben@decadent.org.uk>
      Fixes: 2eacc23c ("net/mlx4_core: Enforce irq affinity changes immediatly")
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f6f453
    • M
      defxx: Fix !DYNAMIC_BUFFERS compilation warnings · 1b037474
      Maciej W. Rozycki 提交于
      This fixes compilation warnings:
      
      drivers/net/fddi/defxx.c:294: warning: 'dfx_rcv_flush' declared inline after being called
      drivers/net/fddi/defxx.c:294: warning: previous declaration of 'dfx_rcv_flush' was here
      drivers/net/fddi/defxx.c:2854: warning: 'my_skb_align' defined but not used
      
      triggered when the driver is built with DYNAMIC_BUFFERS undefined.  Code
      tested to work just fine with these changes and a few DEFPA and DEFTA
      boards.
      Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b037474
    • M
      defxx: Remove an incorrectly inverted preprocessor conditional · f46d53d0
      Maciej W. Rozycki 提交于
      The RX handler of the driver has two paths switched between, depending on
      the size of the frame received, as determined by SKBUFF_RX_COPYBREAK.
      
      When a small frame is received, a new skb allocated has data space large
      enough to hold the incoming frame only, and data is copied there from the
      original skb whose buffer is returned to the DMA RX ring; in that case
      `rx_in_place' is 0.  When a large frame is received, a new skb allocated
      has data space large enough to hold the largest frame possible, including
      the overhead for alignment, the receive status and padding, over 4.5kiB
      overall, and its buffer is placed on the DMA RX ring while the original
      buffer is passed up to the network stack avoiding the need to copy data;
      in that case `rx_in_place' is 1.
      
      However the latter scenario is only possible when dynamic buffers are
      used, as determined by DYNAMIC_BUFFERS, because otherwise the buffers used
      for the DMA RX ring are fixed at the time the interface is brought up.
      
      That leads to an observation that the preprocessor conditional around the
      `rx_in_place' check is inverted, the check only really matters when
      dynamic buffers are in use.  It has gone unnoticed for many years since
      support for using dynamic buffers on the DMA RX ring was introduced in
      2.1.40 -- because the only problem that results is in the case where
      `rx_in_place' is 1 frame data received is unnecessarily copied to the
      newly-allocated buffer, before the buffer placed on the the DMA receive RX
      and its contents ignored.  Therefore the only symptom is some performance
      loss.
      
      Rather than flipping the condition though I decided to discard the
      conditional altogether -- in the case of static buffers `rx_in_place' is
      always 0 so GCC will optimise the C conditional away instead.
      
      Tested on a few DEFPA and DEFTA boards successfully using both small and
      large frames, both with DYNAMIC_BUFFERS defined and with the macro
      undefined.
      Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f46d53d0
    • C
      tcp: Fix divide by zero when pushing during tcp-repair · 5924f17a
      Christoph Paasch 提交于
      When in repair-mode and TCP_RECV_QUEUE is set, we end up calling
      tcp_push with mss_now being 0. If data is in the send-queue and
      tcp_set_skb_tso_segs gets called, we crash because it will divide by
      mss_now:
      
      [  347.151939] divide error: 0000 [#1] SMP
      [  347.152907] Modules linked in:
      [  347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4
      [  347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [  347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000
      [  347.152907] EIP: 0060:[<c1601359>] EFLAGS: 00210246 CPU: 1
      [  347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0
      [  347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000
      [  347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00
      [  347.152907]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      [  347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0
      [  347.152907] Stack:
      [  347.152907]  c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080
      [  347.152907]  f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85
      [  347.152907]  c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8
      [  347.152907] Call Trace:
      [  347.152907]  [<c167f9d9>] ? apic_timer_interrupt+0x2d/0x34
      [  347.152907]  [<c16013e6>] tcp_init_tso_segs+0x36/0x50
      [  347.152907]  [<c1603b5a>] tcp_write_xmit+0x7a/0xbf0
      [  347.152907]  [<c10a0188>] ? up+0x28/0x40
      [  347.152907]  [<c10acc85>] ? console_unlock+0x295/0x480
      [  347.152907]  [<c10ad24f>] ? vprintk_emit+0x1ef/0x4b0
      [  347.152907]  [<c1605716>] __tcp_push_pending_frames+0x36/0xd0
      [  347.152907]  [<c15f4860>] tcp_push+0xf0/0x120
      [  347.152907]  [<c15f7641>] tcp_sendmsg+0xf1/0xbf0
      [  347.152907]  [<c116d920>] ? kmem_cache_free+0xf0/0x120
      [  347.152907]  [<c106a682>] ? __sigqueue_free+0x32/0x40
      [  347.152907]  [<c106a682>] ? __sigqueue_free+0x32/0x40
      [  347.152907]  [<c114f0f0>] ? do_wp_page+0x3e0/0x850
      [  347.152907]  [<c161c36a>] inet_sendmsg+0x4a/0xb0
      [  347.152907]  [<c1150269>] ? handle_mm_fault+0x709/0xfb0
      [  347.152907]  [<c15a006b>] sock_aio_write+0xbb/0xd0
      [  347.152907]  [<c1180b79>] do_sync_write+0x69/0xa0
      [  347.152907]  [<c1181023>] vfs_write+0x123/0x160
      [  347.152907]  [<c1181d55>] SyS_write+0x55/0xb0
      [  347.152907]  [<c167f0d8>] sysenter_do_call+0x12/0x28
      
      This can easily be reproduced with the following packetdrill-script (the
      "magic" with netem, sk_pacing and limit_output_bytes is done to prevent
      the kernel from pushing all segments, because hitting the limit without
      doing this is not so easy with packetdrill):
      
      0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      
      +0  bind(3, ..., ...) = 0
      +0  listen(3, 1) = 0
      
      +0  < S 0:0(0) win 32792 <mss 1460>
      +0  > S. 0:0(0) ack 1 <mss 1460>
      +0.1  < . 1:1(0) ack 1 win 65000
      
      +0  accept(3, ..., ...) = 4
      
      // This forces that not all segments of the snd-queue will be pushed
      +0 `tc qdisc add dev tun0 root netem delay 10ms`
      +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2`
      +0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0
      
      +0 write(4,...,10000) = 10000
      +0 write(4,...,10000) = 10000
      
      // Set tcp-repair stuff, particularly TCP_RECV_QUEUE
      +0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0
      +0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0
      
      // This now will make the write push the remaining segments
      +0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0
      +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000`
      
      // Now we will crash
      +0 write(4,...,1000) = 1000
      
      This happens since ec342325 (tcp: fix retransmission in repair
      mode). Prior to that, the call to tcp_push was prevented by a check for
      tp->repair.
      
      The patch fixes it, by adding the new goto-label out_nopush. When exiting
      tcp_sendmsg and a push is not required, which is the case for tp->repair,
      we go to this label.
      
      When repairing and calling send() with TCP_RECV_QUEUE, the data is
      actually put in the receive-queue. So, no push is required because no
      data has been added to the send-queue.
      
      Cc: Andrew Vagin <avagin@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Fixes: ec342325 (tcp: fix retransmission in repair mode)
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Acked-by: NAndrew Vagin <avagin@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5924f17a
    • E
      net: fix sparse warning in sk_dst_set() · 5925a055
      Eric Dumazet 提交于
      sk_dst_cache has __rcu annotation, so we need a cast to avoid
      following sparse error :
      
      include/net/sock.h:1774:19: warning: incorrect type in initializer (different address spaces)
      include/net/sock.h:1774:19:    expected struct dst_entry [noderef] <asn:4>*__ret
      include/net/sock.h:1774:19:    got struct dst_entry *dst
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Fixes: 7f502361 ("ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5925a055
    • E
      vlan: free percpu stats in device destructor · a48e5faf
      Eric Dumazet 提交于
      Madalin-Cristian reported crashs happening after a recent commit
      (5a4ae5f6 "vlan: unnecessary to check if vlan_pcpu_stats is NULL")
      
      -----------------------------------------------------------------------
      root@p5040ds:~# vconfig add eth8 1
      root@p5040ds:~# vconfig rem eth8.1
      Unable to handle kernel paging request for data at address 0x2bc88028
      Faulting instruction address: 0xc058e950
      Oops: Kernel access of bad area, sig: 11 [#1]
      SMP NR_CPUS=8 CoreNet Generic
      Modules linked in:
      CPU: 3 PID: 2167 Comm: vconfig Tainted: G        W     3.16.0-rc3-00346-g65e85bf #2
      task: e7264d90 ti: e2c2c000 task.ti: e2c2c000
      NIP: c058e950 LR: c058ea30 CTR: c058e900
      REGS: e2c2db20 TRAP: 0300   Tainted: G        W      (3.16.0-rc3-00346-g65e85bf)
      MSR: 00029002 <CE,EE,ME>  CR: 48000428  XER: 20000000
      DEAR: 2bc88028 ESR: 00000000
      GPR00: c047299c e2c2dbd0 e7264d90 00000000 2bc88000 00000000 ffffffff 00000000
      GPR08: 0000000f 00000000 000000ff 00000000 28000422 10121928 10100000 10100000
      GPR16: 10100000 00000000 c07c5968 00000000 00000000 00000000 e2c2dc48 e7838000
      GPR24: c07c5bac c07c58a8 e77290cc c07b0000 00000000 c05de6c0 e7838000 e2c2dc48
      NIP [c058e950] vlan_dev_get_stats64+0x50/0x170
      LR [c058ea30] vlan_dev_get_stats64+0x130/0x170
      Call Trace:
      [e2c2dbd0] [ffffffea] 0xffffffea (unreliable)
      [e2c2dc20] [c047299c] dev_get_stats+0x4c/0x140
      [e2c2dc40] [c0488ca8] rtnl_fill_ifinfo+0x3d8/0x960
      [e2c2dd70] [c0489f4c] rtmsg_ifinfo+0x6c/0x110
      [e2c2dd90] [c04731d4] rollback_registered_many+0x344/0x3b0
      [e2c2ddd0] [c047332c] rollback_registered+0x2c/0x50
      [e2c2ddf0] [c0476058] unregister_netdevice_queue+0x78/0xf0
      [e2c2de00] [c058d800] unregister_vlan_dev+0xc0/0x160
      [e2c2de20] [c058e360] vlan_ioctl_handler+0x1c0/0x550
      [e2c2de90] [c045d11c] sock_ioctl+0x28c/0x2f0
      [e2c2deb0] [c010d070] do_vfs_ioctl+0x90/0x7b0
      [e2c2df20] [c010d7d0] SyS_ioctl+0x40/0x80
      [e2c2df40] [c000f924] ret_from_syscall+0x0/0x3c
      
      Fix this problem by freeing percpu stats from dev->destructor() instead
      of ndo_uninit()
      Reported-by: NMadalin-Cristian Bucur <madalin.bucur@freescale.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NMadalin-Cristian Bucur <madalin.bucur@freescale.com>
      Fixes: 5a4ae5f6 ("vlan: unnecessary to check if vlan_pcpu_stats is NULL")
      Cc: Li RongQing <roy.qing.li@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a48e5faf
  4. 02 7月, 2014 9 次提交
  5. 01 7月, 2014 2 次提交
    • E
      ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix · 7f502361
      Eric Dumazet 提交于
      We have two different ways to handle changes to sk->sk_dst
      
      First way (used by TCP) assumes socket lock is owned by caller, and use
      no extra lock : __sk_dst_set() & __sk_dst_reset()
      
      Another way (used by UDP) uses sk_dst_lock because socket lock is not
      always taken. Note that sk_dst_lock is not softirq safe.
      
      These ways are not inter changeable for a given socket type.
      
      ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used
      the socket lock as synchronization, but users might be UDP sockets.
      
      Instead of converting sk_dst_lock to a softirq safe version, use xchg()
      as we did for sk_rx_dst in commit e47eb5df ("udp: ipv4: do not use
      sk_dst_lock from softirq context")
      
      In a follow up patch, we probably can remove sk_dst_lock, as it is
      only used in IPv6.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Fixes: 9cb3a50c ("ipv4: Invalidate the socket cached route on pmtu events if possible")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f502361
    • A
      openvswitch: Use exact lookup for flow_get and flow_del. · 4a46b24e
      Alex Wang 提交于
      Due to the race condition in userspace, there is chance that two
      overlapping megaflows could be installed in datapath.  And this
      causes userspace unable to delete the less inclusive megaflow flow
      even after it timeout, since the flow_del logic will stop at the
      first match of masked flow.
      
      This commit fixes the bug by making the kernel flow_del and flow_get
      logic check all masks in that case.
      
      Introduced by 03f0d916 (openvswitch: Mega flow implementation).
      Signed-off-by: NAlex Wang <alexw@nicira.com>
      Acked-by: NAndy Zhou <azhou@nicira.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      4a46b24e
  6. 30 6月, 2014 1 次提交