1. 11 1月, 2017 1 次提交
  2. 09 1月, 2017 1 次提交
    • J
      mac80211: implement multicast forwarding on fast-RX path · eeb0d56f
      Johannes Berg 提交于
      In AP (or VLAN) mode, when unicast 802.11 packets are received,
      they might actually be multicast after conversion. In this case
      the fast-RX path didn't handle them properly to send them back
      to the wireless medium. Implement that by copying the SKB and
      sending it back out.
      
      The possible alternative would be to just punt the packet back
      to the regular (slow) RX path, but since we have almost all of
      the required code here already it's not so complicated to add
      here. Punting it back would also mean acquiring the spinlock,
      which would be bad for the stated purpose of the fast-RX path,
      to enable well-performing parallel RX.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      eeb0d56f
  3. 07 1月, 2017 2 次提交
  4. 05 1月, 2017 1 次提交
    • J
      nl80211: fix sched scan netlink socket owner destruction · 753aacfd
      Johannes Berg 提交于
      A single netlink socket might own multiple interfaces *and* a
      scheduled scan request (which might belong to another interface),
      so when it goes away both may need to be destroyed.
      
      Remove the schedule_scan_stop indirection to fix this - it's only
      needed for interface destruction because of the way this works
      right now, with a single work taking care of all interfaces.
      
      Cc: stable@vger.kernel.org
      Fixes: 93a1e86c ("nl80211: Stop scheduled scan if netlink client disappears")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      753aacfd
  5. 04 1月, 2017 1 次提交
  6. 03 1月, 2017 3 次提交
    • A
      ipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rules · 5350d54f
      Alexander Duyck 提交于
      In the case of custom rules being present we need to handle the case of the
      LOCAL table being intialized after the new rule has been added.  To address
      that I am adding a new check so that we can make certain we don't use an
      alias of MAIN for LOCAL when allocating a new table.
      
      Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
      Reported-by: NOliver Brunel <jjk@jjacky.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5350d54f
    • M
      igmp: Make igmp group member RFC 3376 compliant · 7ababb78
      Michal Tesar 提交于
      5.2. Action on Reception of a Query
      
       When a system receives a Query, it does not respond immediately.
       Instead, it delays its response by a random amount of time, bounded
       by the Max Resp Time value derived from the Max Resp Code in the
       received Query message.  A system may receive a variety of Queries on
       different interfaces and of different kinds (e.g., General Queries,
       Group-Specific Queries, and Group-and-Source-Specific Queries), each
       of which may require its own delayed response.
      
       Before scheduling a response to a Query, the system must first
       consider previously scheduled pending responses and in many cases
       schedule a combined response.  Therefore, the system must be able to
       maintain the following state:
      
       o A timer per interface for scheduling responses to General Queries.
      
       o A per-group and interface timer for scheduling responses to Group-
         Specific and Group-and-Source-Specific Queries.
      
       o A per-group and interface list of sources to be reported in the
         response to a Group-and-Source-Specific Query.
      
       When a new Query with the Router-Alert option arrives on an
       interface, provided the system has state to report, a delay for a
       response is randomly selected in the range (0, [Max Resp Time]) where
       Max Resp Time is derived from Max Resp Code in the received Query
       message.  The following rules are then used to determine if a Report
       needs to be scheduled and the type of Report to schedule.  The rules
       are considered in order and only the first matching rule is applied.
      
       1. If there is a pending response to a previous General Query
          scheduled sooner than the selected delay, no additional response
          needs to be scheduled.
      
       2. If the received Query is a General Query, the interface timer is
          used to schedule a response to the General Query after the
          selected delay.  Any previously pending response to a General
          Query is canceled.
      --8<--
      
      Currently the timer is rearmed with new random expiration time for
      every incoming query regardless of possibly already pending report.
      Which is not aligned with the above RFE.
      It also might happen that higher rate of incoming queries can
      postpone the report after the expiration time of the first query
      causing group membership loss.
      
      Now the per interface general query timer is rearmed only
      when there is no pending report already scheduled on that interface or
      the newly selected expiration time is before the already pending
      scheduled report.
      Signed-off-by: NMichal Tesar <mtesar@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ababb78
    • I
      flow_dissector: Update pptp handling to avoid null pointer deref. · d0af6834
      Ian Kumlien 提交于
      __skb_flow_dissect can be called with a skb or a data packet, either
      can be NULL. All calls seems to have been moved to __skb_header_pointer
      except the pptp handling which is still calling skb_header_pointer.
      
      skb_header_pointer will use skb->data and thus:
      [  109.556866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
      [  109.557102] IP: [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
      [  109.557263] PGD 0
      [  109.557338]
      [  109.557484] Oops: 0000 [#1] SMP
      [  109.557562] Modules linked in: chaoskey
      [  109.557783] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0 #79
      [  109.557867] Hardware name: Supermicro A1SRM-LN7F/LN5F/A1SRM-LN7F-2758, BIOS 1.0c 11/04/2015
      [  109.557957] task: ffff94085c27bc00 task.stack: ffffb745c0068000
      [  109.558041] RIP: 0010:[<ffffffff88dc02f8>]  [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
      [  109.558203] RSP: 0018:ffff94087fc83d40  EFLAGS: 00010206
      [  109.558286] RAX: 0000000000000130 RBX: ffffffff8975bf80 RCX: ffff94084fab6800
      [  109.558373] RDX: 0000000000000010 RSI: 000000000000000c RDI: 0000000000000000
      [  109.558460] RBP: 0000000000000b88 R08: 0000000000000000 R09: 0000000000000022
      [  109.558547] R10: 0000000000000008 R11: ffff94087fc83e04 R12: 0000000000000000
      [  109.558763] R13: ffff94084fab6800 R14: ffff94087fc83e04 R15: 000000000000002f
      [  109.558979] FS:  0000000000000000(0000) GS:ffff94087fc80000(0000) knlGS:0000000000000000
      [  109.559326] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  109.559539] CR2: 0000000000000080 CR3: 0000000281809000 CR4: 00000000001026e0
      [  109.559753] Stack:
      [  109.559957]  000000000000000c ffff94084fab6822 0000000000000001 ffff94085c2b5fc0
      [  109.560578]  0000000000000001 0000000000002000 0000000000000000 0000000000000000
      [  109.561200]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [  109.561820] Call Trace:
      [  109.562027]  <IRQ>
      [  109.562108]  [<ffffffff88dfb4fa>] ? eth_get_headlen+0x7a/0xf0
      [  109.562522]  [<ffffffff88c5a35a>] ? igb_poll+0x96a/0xe80
      [  109.562737]  [<ffffffff88dc912b>] ? net_rx_action+0x20b/0x350
      [  109.562953]  [<ffffffff88546d68>] ? __do_softirq+0xe8/0x280
      [  109.563169]  [<ffffffff8854704a>] ? irq_exit+0xaa/0xb0
      [  109.563382]  [<ffffffff8847229b>] ? do_IRQ+0x4b/0xc0
      [  109.563597]  [<ffffffff8902d4ff>] ? common_interrupt+0x7f/0x7f
      [  109.563810]  <EOI>
      [  109.563890]  [<ffffffff88d57530>] ? cpuidle_enter_state+0x130/0x2c0
      [  109.564304]  [<ffffffff88d57520>] ? cpuidle_enter_state+0x120/0x2c0
      [  109.564520]  [<ffffffff8857eacf>] ? cpu_startup_entry+0x19f/0x1f0
      [  109.564737]  [<ffffffff8848d55a>] ? start_secondary+0x12a/0x140
      [  109.564950] Code: 83 e2 20 a8 80 0f 84 60 01 00 00 c7 04 24 08 00
      00 00 66 85 d2 0f 84 be fe ff ff e9 69 fe ff ff 8b 34 24 89 f2 83 c2
      04 66 85 c0 <41> 8b 84 24 80 00 00 00 0f 49 d6 41 8d 31 01 d6 41 2b 84
      24 84
      [  109.569959] RIP  [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
      [  109.570245]  RSP <ffff94087fc83d40>
      [  109.570453] CR2: 0000000000000080
      
      Fixes: ab10dccb ("rps: Inspect PPTP encapsulated by GRE to get flow hash")
      Signed-off-by: NIan Kumlien <ian.kumlien@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0af6834
  7. 02 1月, 2017 5 次提交
  8. 31 12月, 2016 1 次提交
  9. 30 12月, 2016 4 次提交
    • D
      net: ipv4: dst for local input routes should use l3mdev if relevant · f5a0aab8
      David Ahern 提交于
      IPv4 output routes already use l3mdev device instead of loopback for dst's
      if it is applicable. Change local input routes to do the same.
      
      This fixes icmp responses for unreachable UDP ports which are directed
      to the wrong table after commit 9d1a6c4e because local_input
      routes use the loopback device. Moving from ingress device to loopback
      loses the L3 domain causing responses based on the dst to get to lost.
      
      Fixes: 9d1a6c4e ("net: icmp_route_lookup should use rt dev to
      		       determine L3 domain")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5a0aab8
    • W
      net: fix incorrect original ingress device index in PKTINFO · f0c16ba8
      Wei Zhang 提交于
      When we send a packet for our own local address on a non-loopback
      interface (e.g. eth0), due to the change had been introduced from
      commit 0b922b7a ("net: original ingress device index in PKTINFO"), the
      original ingress device index would be set as the loopback interface.
      However, the packet should be considered as if it is being arrived via the
      sending interface (eth0), otherwise it would break the expectation of the
      userspace application (e.g. the DHCPRELEASE message from dhcp_release
      binary would be ignored by the dnsmasq daemon, since it come from lo which
      is not the interface dnsmasq bind to)
      
      Fixes: 0b922b7a ("net: original ingress device index in PKTINFO")
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NWei Zhang <asuka.com@163.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0c16ba8
    • M
      rtnl: stats - add missing netlink message size checks · 4775cc1f
      Mathias Krause 提交于
      We miss to check if the netlink message is actually big enough to contain
      a struct if_stats_msg.
      
      Add a check to prevent userland from sending us short messages that would
      make us access memory beyond the end of the message.
      
      Fixes: 10c9ead9 ("rtnetlink: add new RTM_GETSTATS message to dump...")
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4775cc1f
    • Z
      ipv6: Should use consistent conditional judgement for ip6 fragment between... · e4c5e13a
      Zheng Li 提交于
      ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
      
      There is an inconsistent conditional judgement between __ip6_append_data
      and ip6_finish_output functions, the variable length in __ip6_append_data
      just include the length of application's payload and udp6 header, don't
      include the length of ipv6 header, but in ip6_finish_output use
      (skb->len > ip6_skb_dst_mtu(skb)) as judgement, and skb->len include the
      length of ipv6 header.
      
      That causes some particular application's udp6 payloads whose length are
      between (MTU - IPv6 Header) and MTU were fragmented by ip6_fragment even
      though the rst->dev support UFO feature.
      
      Add the length of ipv6 header to length in __ip6_append_data to keep
      consistent conditional judgement as ip6_finish_output for ip6 fragment.
      Signed-off-by: NZheng Li <james.z.li@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4c5e13a
  10. 29 12月, 2016 2 次提交
  11. 28 12月, 2016 3 次提交
  12. 27 12月, 2016 1 次提交
    • D
      net, sched: fix soft lockup in tc_classify · 628185cf
      Daniel Borkmann 提交于
      Shahar reported a soft lockup in tc_classify(), where we run into an
      endless loop when walking the classifier chain due to tp->next == tp
      which is a state we should never run into. The issue only seems to
      trigger under load in the tc control path.
      
      What happens is that in tc_ctl_tfilter(), thread A allocates a new
      tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
      with it. In that classifier callback we had to unlock/lock the rtnl
      mutex and returned with -EAGAIN. One reason why we need to drop there
      is, for example, that we need to request an action module to be loaded.
      
      This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
      after we loaded and found the requested action, we need to redo the
      whole request so we don't race against others. While we had to unlock
      rtnl in that time, thread B's request was processed next on that CPU.
      Thread B added a new tp instance successfully to the classifier chain.
      When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
      and destroying its tp instance which never got linked, we goto replay
      and redo A's request.
      
      This time when walking the classifier chain in tc_ctl_tfilter() for
      checking for existing tp instances we had a priority match and found
      the tp instance that was created and linked by thread B. Now calling
      again into tp->ops->change() with that tp was successful and returned
      without error.
      
      tp_created was never cleared in the second round, thus kernel thinks
      that we need to link it into the classifier chain (once again). tp and
      *back point to the same object due to the match we had earlier on. Thus
      for thread B's already public tp, we reset tp->next to tp itself and
      link it into the chain, which eventually causes the mentioned endless
      loop in tc_classify() once a packet hits the data path.
      
      Fix is to clear tp_created at the beginning of each request, also when
      we replay it. On the paths that can cause -EAGAIN we already destroy
      the original tp instance we had and on replay we really need to start
      from scratch. It seems that this issue was first introduced in commit
      12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
      and avoid kernel panic when we use cls_cgroup").
      
      Fixes: 12186be7 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
      Reported-by: NShahar Klein <shahark@mellanox.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NShahar Klein <shahark@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      628185cf
  13. 26 12月, 2016 3 次提交
    • T
      ktime: Get rid of ktime_equal() · 1f3a8e49
      Thomas Gleixner 提交于
      No point in going through loops and hoops instead of just comparing the
      values.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      1f3a8e49
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  14. 25 12月, 2016 1 次提交
  15. 24 12月, 2016 8 次提交
    • J
      tipc: don't send FIN message from connectionless socket · 693c5649
      Jon Paul Maloy 提交于
      In commit 6f00089c ("tipc: remove SS_DISCONNECTING state") the
      check for socket type is in the wrong place, causing a closing socket
      to always send out a FIN message even when the socket was never
      connected. This is normally harmless, since the destination node for
      such messages most often is zero, and the message will be dropped, but
      it is still a wrong and confusing behavior.
      
      We fix this in this commit.
      Reviewed-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      693c5649
    • M
      sctp: fix recovering from 0 win with small data chunks · 1636098c
      Marcelo Ricardo Leitner 提交于
      Currently if SCTP closes the receive window with window pressure, mostly
      caused by excessive skb overhead on payload/overheads ratio, SCTP will
      close the window abruptly while saving the delta on rwnd_press. It will
      start recovering rwnd as the chunks are consumed by the application and
      the rwnd_press will be only recovered after rwnd reach the same value as
      of rwnd_press, mostly to prevent silly window syndrome.
      
      Thing is, this is very inefficient with small data chunks, as with those
      it will never reach back that value, and thus it will never recover from
      such pressure. This means that we will not issue window updates when
      recovering from 0 window and will rely on a sender retransmit to notice
      it.
      
      The fix here is to remove such threshold, as no value is good enough: it
      depends on the (avg) chunk sizes being used.
      
      Test with netperf -t SCTP_STREAM -- -m 1, and trigger 0 window by
      sending SIGSTOP to netserver, sleep 1.2, and SIGCONT.
      Rate limited to 845kbps, for visibility. Capture done at netserver side.
      
      Previously:
      01.500751 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632372996] [a_rwnd 99153] [
      01.500752 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632372997] [SID: 0] [SS
      01.517471 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
      01.517483 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      01.517485 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
      01.517488 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      01.534168 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373096] [SID: 0] [SS
      01.534180 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      01.534181 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373169] [SID: 0] [SS
      01.534185 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      02.525978 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
      02.526021 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
        (window update missed)
      04.573807 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
      04.779370 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373082] [a_rwnd 859] [#g
      04.789162 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
      04.789323 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373156] [SID: 0] [SS
      04.789372 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373228] [a_rwnd 786] [#g
      
      After:
      02.568957 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098728] [a_rwnd 99153]
      02.568961 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098729] [SID: 0] [S
      02.585631 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
      02.585666 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      02.585671 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
      02.585683 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      02.602330 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098828] [SID: 0] [S
      02.602359 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      02.602363 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098901] [SID: 0] [S
      02.602372 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      03.600788 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
      03.600830 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      03.619455 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 13508]
      03.619479 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 27017]
      03.619497 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 40526]
      03.619516 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 54035]
      03.619533 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 67544]
      03.619552 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 81053]
      03.619570 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 94562]
        (following data transmission triggered by window updates above)
      03.633504 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
      03.836445 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098814] [a_rwnd 100000]
      03.843125 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
      03.843285 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098888] [SID: 0] [S
      03.843345 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098960] [a_rwnd 99894]
      03.856546 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098961] [SID: 0] [S
      03.866450 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490099011] [SID: 0] [S
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1636098c
    • M
      sctp: do not loose window information if in rwnd_over · 58b94d88
      Marcelo Ricardo Leitner 提交于
      It's possible that we receive a packet that is larger than current
      window. If it's the first packet in this way, it will cause it to
      increase rwnd_over. Then, if we receive another data chunk (specially as
      SCTP allows you to have one data chunk in flight even during 0 window),
      rwnd_over will be overwritten instead of added to.
      
      In the long run, this could cause the window to grow bigger than its
      initial size, as rwnd_over would be charged only for the last received
      data chunk while the code will try open the window for all packets that
      were received and had its value in rwnd_over overwritten. This, then,
      can lead to the worsening of payload/buffer ratio and cause rwnd_press
      to kick in more often.
      
      The fix is to sum it too, same as is done for rwnd_press, so that if we
      receive 3 chunks after closing the window, we still have to release that
      same amount before re-opening it.
      
      Log snippet from sctp_test exhibiting the issue:
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      [  146.209232] sctp: sctp_assoc_rwnd_decrease:
      association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      [  146.209232] sctp: sctp_assoc_rwnd_decrease:
      association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      [  146.209232] sctp: sctp_assoc_rwnd_decrease:
      association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58b94d88
    • I
      neigh: Send netevent after marking neigh as dead · 53f800e3
      Ido Schimmel 提交于
      neigh_cleanup_and_release() is always called after marking a neighbour
      as dead, but it only notifies user space and not in-kernel listeners of
      the netevent notification chain.
      
      This can cause multiple problems. In my specific use case, it causes the
      listener (a switch driver capable of L3 offloads) to believe a neighbour
      entry is still valid, and is thus erroneously kept in the device's
      table.
      
      Fix that by sending a netevent after marking the neighbour as dead.
      
      Fixes: a6bf9e93 ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53f800e3
    • D
      ipv6: handle -EFAULT from skb_copy_bits · a98f9175
      Dave Jones 提交于
      By setting certain socket options on ipv6 raw sockets, we can confuse the
      length calculation in rawv6_push_pending_frames triggering a BUG_ON.
      
      RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
      RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
      RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
      RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
      RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
      R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
      R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
      
      Call Trace:
       [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
       [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
       [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
       [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
       [<ffffffff816da27e>] SyS_sendto+0xe/0x10
       [<ffffffff81002910>] do_syscall_64+0x50/0xa0
       [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.
      
      Reproducer:
      
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/types.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      
      #define LEN 504
      
      int main(int argc, char* argv[])
      {
      	int fd;
      	int zero = 0;
      	char buf[LEN];
      
      	memset(buf, 0, LEN);
      
      	fd = socket(AF_INET6, SOCK_RAW, 7);
      
      	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
      	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
      
      	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
      }
      Signed-off-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a98f9175
    • W
      inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets · 39b2dd76
      Willem de Bruijn 提交于
      Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
      the packet. For sockets that have transport headers pulled, transport
      offset can be negative. Use signed comparison to avoid overflow.
      
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Reported-by: NNisar Jagabar <njagabar@cloudmark.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39b2dd76
    • O
      net/sched: cls_flower: Mandate mask when matching on flags · d9724772
      Or Gerlitz 提交于
      When matching on flags, we should require the user to provide the
      mask and avoid using an all-ones mask. Not doing so causes matching
      on flags provided w.o mask to hit on the value being unset for all
      flags, which may not what the user wanted to happen.
      
      Fixes: faa3ffce ('net/sched: cls_flower: Add support for matching on flags')
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: NPaul Blakey <paulb@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9724772
    • O
      net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6 · dc594ecd
      Or Gerlitz 提交于
      The UDP dst port was provided to the helper function which sets the
      IPv6 IP tunnel meta-data under a wrong param order, fix that.
      
      Fixes: 75bfbca0 ('net/sched: act_tunnel_key: Add UDP dst port option')
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc594ecd
  16. 23 12月, 2016 2 次提交
    • X
      netfilter: ipt_CLUSTERIP: check duplicate config when initializing · 6c5d5cfb
      Xin Long 提交于
      Now when adding an ipt_CLUSTERIP rule, it only checks duplicate config in
      clusterip_config_find_get(). But after that, there may be still another
      thread to insert a config with the same ip, then it leaves proc_create_data
      to do duplicate check.
      
      It's more reasonable to check duplicate config by ipt_CLUSTERIP itself,
      instead of checking it by proc fs duplicate file check. Before, when proc
      fs allowed duplicate name files in a directory, It could even crash kernel
      because of use-after-free.
      
      This patch is to check duplicate config under the protection of clusterip
      net lock when initializing a new config and correct the return err.
      
      Note that it also moves proc file node creation after adding new config, as
      proc_create_data may sleep, it couldn't be called under the clusterip_net
      lock. clusterip_config_find_get returns NULL if c->pde is null to make sure
      it can't be used until the proc file node creation is done.
      Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6c5d5cfb
    • L
      net: ipv4: Don't crash if passing a null sk to ip_do_redirect. · 7d995694
      Lorenzo Colitti 提交于
      Commit e2d118a1 ("net: inet: Support UID-based routing in IP
      protocols.") made ip_do_redirect call sock_net(sk) to determine
      the network namespace of the passed-in socket. This crashes if sk
      is NULL.
      
      Fix this by getting the network namespace from the skb instead.
      
      Fixes: e2d118a1 ("net: inet: Support UID-based routing in IP protocols.")
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d995694
  17. 22 12月, 2016 1 次提交