1. 10 7月, 2015 5 次提交
    • Y
      tcp: update congestion state first before raising cwnd · b20a3fa3
      Yuchung Cheng 提交于
      The congestion state and cwnd can be updated in the wrong order.
      For example, upon receiving a dubious ACK, we incorrectly raise
      the cwnd first (tcp_may_raise_cwnd()/tcp_cong_avoid()) because
      the state is still Open, then enter recovery state to reduce cwnd.
      
      For another example, if the ACK indicates spurious timeout or
      retransmits, we first revert the cwnd reduction and congestion
      state back to Open state.  But we don't raise the cwnd even though
      the ACK does not indicate any congestion.
      
      To fix this problem we should first call tcp_fastretrans_alert() to
      process the dubious ACK and update the congestion state, then call
      tcp_may_raise_cwnd() that raises cwnd based on the current state.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b20a3fa3
    • Y
      tcp: do not slow start when cwnd equals ssthresh · 76174004
      Yuchung Cheng 提交于
      In the original design slow start is only used to raise cwnd
      when cwnd is stricly below ssthresh. It makes little sense
      to slow start when cwnd == ssthresh: especially
      when hystart has set ssthresh in the initial ramp, or after
      recovery when cwnd resets to ssthresh. Not doing so will
      also help reduce the buffer bloat slightly.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76174004
    • Y
      tcp: add tcp_in_slow_start helper · 071d5080
      Yuchung Cheng 提交于
      Add a helper to test the slow start condition in various congestion
      control modules and other places. This is to prepare a slight improvement
      in policy as to exactly when to slow start.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      071d5080
    • A
      net: skb_defer_rx_timestamp should check for phydev before setting up classify · 1007f59d
      Alexander Duyck 提交于
      This change makes it so that the call skb_defer_rx_timestamp will first
      check for a phydev before going in and manipulating the skb->data and
      skb->len values.  By doing this we can avoid unnecessary work on network
      devices that don't support phydev.  As a result we reduce the total
      instruction count needed to process this on most devices.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1007f59d
    • J
      tcp: v1 always send a quick ack when quickacks are enabled · 2251ae46
      Jon Maxwell 提交于
      V1 of this patch contains Eric Dumazet's suggestion to move the per
      dst RTAX_QUICKACK check into tcp_in_quickack_mode(). Thanks Eric.
      
      I ran some tests and after setting the "ip route change quickack 1"
      knob there were still many delayed ACKs sent. This occured
      because when icsk_ack.quick=0 the !icsk_ack.pingpong value is
      subsequently ignored as tcp_in_quickack_mode() checks both these
      values. The condition for a quick ack to trigger requires
      that both icsk_ack.quick != 0 and icsk_ack.pingpong=0. Currently
      only icsk_ack.pingpong is controlled by the knob. But the
      icsk_ack.quick value changes dynamically depending on heuristics.
      The crux of the matter is that delayed acks still cannot be entirely
      disabled even with the RTAX_QUICKACK per dst knob enabled. This
      patch ensures that a quick ack is always sent when the RTAX_QUICKACK
      per dst knob is turned on.
      
      The "ip route change quickack 1" knob was recently added to enable
      quickacks. It was modeled around the TCP_QUICKACK setsockopt() option.
      This issue is that even with "ip route change quickack 1" enabled
      we still see delayed ACKs under some conditions. It would be nice
      to be able to completely disable delayed ACKs.
      
      Here is an example:
      
      # netstat -s|grep dela
          3 delayed acks sent
      
      For all routes enable the knob
      
      # ip route change quickack 1
      
      Generate some traffic across a slow link and we still see the delayed
      acks.
      
      # netstat -s|grep dela
          106 delayed acks sent
          1 delayed acks further delayed because of locked socket
      
      The issue is that both the "ip route change quickack 1" knob and
      the TCP_QUICKACK option set the icsk_ack.pingpong variable to 0.
      However at the business end in the __tcp_ack_snd_check() routine,
      tcp_in_quickack_mode() checks that both icsk_ack.quick != 0
      and icsk_ack.pingpong=0 in order to trigger a quickack. As
      icsk_ack.quick is determined by heuristics it can be 0. When
      that occurs the icsk_ack.pingpong value is ignored and a delayed
      ACK is sent regardless.
      
      This patch moves the RTAX_QUICKACK per dst check into the
      tcp_in_quickack_mode() routine which ensures that a quickack is
      always sent when the quickack knob is enabled for that dst.
      Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2251ae46
  2. 09 7月, 2015 9 次提交
  3. 04 7月, 2015 4 次提交
  4. 03 7月, 2015 1 次提交
  5. 01 7月, 2015 2 次提交
    • T
      Bluetooth: Reinitialize the list after deletion for session user list · ab944c83
      Tedd Ho-Jeong An 提交于
      If the user->list is deleted with list_del(), it doesn't initialize the
      entry which can cause the issue with list_empty(). According to the
      comment from the list.h, list_empty() returns false even if the list is
      empty and put the entry in an undefined state.
      
      /**
       * list_del - deletes entry from list.
       * @entry: the element to delete from the list.
       * Note: list_empty() on entry does not return true after this, the entry is
       * in an undefined state.
       */
      
      Because of this behavior, list_empty() returns false even if list is empty
      when the device is reconnected.
      
      So, user->list needs to be re-initialized after list_del(). list.h already
      have a macro list_del_init() which deletes the entry and initailze it again.
      Signed-off-by: NTedd Ho-Jeong An <tedd.an@intel.com>
      Tested-by: NJörg Otte <jrg.otte@gmail.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      ab944c83
    • C
      sock_diag: don't broadcast kernel sockets · b922622e
      Craig Gallek 提交于
      Kernel sockets do not hold a reference for the network namespace to
      which they point.  Socket destruction broadcasting relies on the
      network namespace and will cause the splat below when a kernel socket
      is destroyed.
      
      This fix simply ignores kernel sockets when they are destroyed.
      
      Reported as:
      general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      CPU: 1 PID: 9130 Comm: kworker/1:1 Not tainted 4.1.0-gelk-debug+ #1
      Workqueue: sock_diag_events sock_diag_broadcast_destroy_work
      Stack:
       ffff8800b9c586c0 ffff8800b9c586c0 ffff8800ac4692c0 ffff8800936d4a90
       ffff8800352efd38 ffffffff8469a93e ffff8800352efd98 ffffffffc09b9b90
       ffff8800352efd78 ffff8800ac4692c0 ffff8800b9c586c0 ffff8800831b6ab8
      Call Trace:
       [<ffffffff8469a93e>] ? mutex_unlock+0xe/0x10
       [<ffffffffc09b9b90>] ? inet_diag_handler_get_info+0x110/0x1fb [inet_diag]
       [<ffffffff845c868d>] netlink_broadcast+0x1d/0x20
       [<ffffffff8469a93e>] ? mutex_unlock+0xe/0x10
       [<ffffffff845b2bf5>] sock_diag_broadcast_destroy_work+0xd5/0x160
       [<ffffffff8408ea97>] process_one_work+0x147/0x420
       [<ffffffff8408f0f9>] worker_thread+0x69/0x470
       [<ffffffff8409fda3>] ? preempt_count_sub+0xa3/0xf0
       [<ffffffff8408f090>] ? rescuer_thread+0x320/0x320
       [<ffffffff84093cd7>] kthread+0x107/0x120
       [<ffffffff84093bd0>] ? kthread_create_on_node+0x1b0/0x1b0
       [<ffffffff8469d31f>] ret_from_fork+0x3f/0x70
       [<ffffffff84093bd0>] ? kthread_create_on_node+0x1b0/0x1b0
      
      Tested:
        Using a debug kernel while 'ss -E' is running:
        ip netns add test-ns
        ip netns delete test-ns
      
      Fixes: eb4cb008 sock_diag: define destruction multicast groups
      Fixes: 26abe143 net: Modify sk_alloc to not reference count the
        netns of kernel sockets.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b922622e
  6. 30 6月, 2015 1 次提交
    • A
      sctp: Fix race between OOTB responce and route removal · 29c4afc4
      Alexander Sverdlin 提交于
      There is NULL pointer dereference possible during statistics update if the route
      used for OOTB responce is removed at unfortunate time. If the route exists when
      we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
      ABORT, but in the meantime route is removed under our feet, we take "no_route"
      path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).
      
      But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
      sctp_transport_set_owner() and therefore there is no asoc associated with this
      packet. Probably temporary asoc just for OOTB responces is overkill, so just
      introduce a check like in all other places in sctp_packet_transmit(), where
      "asoc" is dereferenced.
      
      To reproduce this, one needs to
      0. ensure that sctp module is loaded (otherwise ABORT is not generated)
      1. remove default route on the machine
      2. while true; do
           ip route del [interface-specific route]
           ip route add [interface-specific route]
         done
      3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
         responce
      
      On x86_64 the crash looks like this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
      PGD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: ...
      CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.0.5-1-ARCH #1
      Hardware name: ...
      task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
      RIP: 0010:[<ffffffffa05ec9ac>]  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
      RSP: 0018:ffff880127c037b8  EFLAGS: 00010296
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
      RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
      RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
      R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
      R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
      FS:  0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
      Stack:
       ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
       ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
       0000000000000000 0000000000000001 0000000000000000 0000000000000000
      Call Trace:
       <IRQ>
       [<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
       [<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
       [<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
       [<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
       [<ffffffff810e0329>] ? update_process_times+0x59/0x60
       [<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
       [<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
       [<ffffffff8101f599>] ? read_tsc+0x9/0x10
       [<ffffffff8116d4b5>] ? put_page+0x55/0x60
       [<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
       [<ffffffff81462b68>] ? skb_free_head+0x58/0x80
       [<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
       [<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
       [<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
       [<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
       [<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
       [<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
       [<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
       [<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
       [<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
       [<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
       [<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
       [<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
       [<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
       [<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
       [<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
       [<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
       [<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
       [<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
       [<ffffffff8147896a>] net_rx_action+0x21a/0x360
       [<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
       [<ffffffff8107912d>] irq_exit+0xad/0xb0
       [<ffffffff8157d158>] do_IRQ+0x58/0xf0
       [<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
       <EOI>
       [<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
       [<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
       [<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
       [<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
       [<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
       [<ffffffff8156b365>] rest_init+0x85/0x90
       [<ffffffff818eb035>] start_kernel+0x48b/0x4ac
       [<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
       [<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
       [<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
      Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
      RIP  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
       RSP <ffff880127c037b8>
      CR2: 0000000000000020
      ---[ end trace 5aec7fd2dc983574 ]---
      Kernel panic - not syncing: Fatal exception in interrupt
      Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
      drm_kms_helper: panic occurred, switching back to text console
      ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29c4afc4
  7. 29 6月, 2015 6 次提交
    • G
      dsa: fix promiscuity leak on slave dev open error · 4fdeddfe
      Gilad Ben-Yossef 提交于
      DSA master netdev promiscuity counter was not being properly
      decremented on slave device open error path.
      Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com>
      CC: Gilad Ben-Yossef <giladb@ezchip.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Florian Fainelli <f.fainelli@gmail.com>
      CC: Guenter Roeck <linux@roeck-us.net>
      CC: Andrew Lunn <andrew@lunn.ch>
      CC: Scott Feldman <sfeldma@gmail.com>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fdeddfe
    • D
      net: Kill sock->sk_protinfo · 1830fcea
      David Miller 提交于
      No more users, so it can now be removed.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1830fcea
    • D
      ax25: Stop using sock->sk_protinfo. · 3200392b
      David Miller 提交于
      Just make a ax25_sock structure that provides the ax25_cb pointer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3200392b
    • G
      flow_dissector: Pre-initialize ip_proto in __skb_flow_dissect() · 8e690ffd
      Geert Uytterhoeven 提交于
      net/core/flow_dissector.c: In function ‘__skb_flow_dissect’:
      net/core/flow_dissector.c:132: warning: ‘ip_proto’ may be used uninitialized in this function
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e690ffd
    • A
      ipv4: fix RCU lockdep warning from linkdown changes · 96ac5cc9
      Andy Gospodarek 提交于
      The following lockdep splat was seen due to the wrong context for
      grabbing in_dev.
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244 Not tainted
      -------------------------------
      include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      2 locks held by ip/403:
       #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81453305>] rtnl_lock+0x17/0x19
       #1:  ((inetaddr_chain).rwsem){.+.+.+}, at: [<ffffffff8105c327>] __blocking_notifier_call_chain+0x35/0x6a
      
      stack backtrace:
      CPU: 2 PID: 403 Comm: ip Not tainted 4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244
       0000000000000001 ffff8800b189b728 ffffffff8150a542 ffffffff8107a8b3
       ffff880037bbea40 ffff8800b189b758 ffffffff8107cb74 ffff8800379dbd00
       ffff8800bec85800 ffff8800bf9e13c0 00000000000000ff ffff8800b189b7d8
      Call Trace:
       [<ffffffff8150a542>] dump_stack+0x4c/0x6e
       [<ffffffff8107a8b3>] ? up+0x39/0x3e
       [<ffffffff8107cb74>] lockdep_rcu_suspicious+0xf7/0x100
       [<ffffffff814b63c3>] fib_dump_info+0x227/0x3e2
       [<ffffffff814b6624>] rtmsg_fib+0xa6/0x116
       [<ffffffff814b978f>] fib_table_insert+0x316/0x355
       [<ffffffff814b362e>] fib_magic+0xb7/0xc7
       [<ffffffff814b4803>] fib_add_ifaddr+0xb1/0x13b
       [<ffffffff814b4d09>] fib_inetaddr_event+0x36/0x90
       [<ffffffff8105c086>] notifier_call_chain+0x4c/0x71
       [<ffffffff8105c340>] __blocking_notifier_call_chain+0x4e/0x6a
       [<ffffffff8105c370>] blocking_notifier_call_chain+0x14/0x16
       [<ffffffff814a7f50>] __inet_insert_ifa+0x1a5/0x1b3
       [<ffffffff814a894d>] inet_rtm_newaddr+0x350/0x35f
       [<ffffffff81457866>] rtnetlink_rcv_msg+0x17b/0x18a
       [<ffffffff8107e7c3>] ? trace_hardirqs_on+0xd/0xf
       [<ffffffff8146965f>] ? netlink_deliver_tap+0x1cb/0x1f7
       [<ffffffff814576eb>] ? rtnl_newlink+0x72a/0x72a
      ...
      
      This patch resolves that splat.
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96ac5cc9
    • J
      tipc: purge backlog queue counters when broadcast link is reset · 7d967b67
      Jon Paul Maloy 提交于
      In commit 1f66d161
      ("tipc: introduce starvation free send algorithm")
      we introduced a counter per priority level for buffers
      in the link backlog queue. We also introduced a new
      function tipc_link_purge_backlog(), to reset these
      counters to zero when the link is reset.
      
      Unfortunately, we missed to call this function when
      the broadcast link is reset, with the result that the
      values of these counters might be permanently skewed
      when new nodes are attached. This may in the worst case
      lead to permananent, but spurious, broadcast link
      congestion, where no broadcast packets can be sent at
      all.
      
      We fix this bug with this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d967b67
  8. 27 6月, 2015 1 次提交
  9. 25 6月, 2015 1 次提交
  10. 24 6月, 2015 7 次提交
    • N
      bridge: vlan: flush the dynamically learned entries on port vlan delete · 1ea2d020
      Nikolay Aleksandrov 提交于
      Add a new argument to br_fdb_delete_by_port which allows to specify a
      vid to match when flushing entries and use it in nbp_vlan_delete() to
      flush the dynamically learned entries of the vlan/port pair when removing
      a vlan from a port. Before this patch only the local mac was being
      removed and the dynamically learned ones were left to expire.
      Note that the do_all argument is still respected and if specified, the
      vid will be ignored.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ea2d020
    • N
      bridge: multicast: add a comment to br_port_state_selection about blocking state · 9aa66382
      Nikolay Aleksandrov 提交于
      Add a comment to explain why we're not disabling port's multicast when it
      goes in blocking state. Since there's a check in the timer's function which
      bypasses the timer if the port's in blocking/disabled state, the timer will
      simply expire and stop without sending more queries.
      Suggested-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9aa66382
    • P
      net: inet_diag: export IPV6_V6ONLY sockopt · 20462155
      Phil Sutter 提交于
      For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is
      exported to userspace. It indicates whether a socket bound to in6addr_any
      listens on IPv4 as well as IPv6. Since the socket is natively IPv6, it is not
      listed by e.g. 'ss -l -4'.
      
      This patch is accompanied by an appropriate one for iproute2 to enable
      the additional information in 'ss -e'.
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20462155
    • A
      net: ipv4 sysctl option to ignore routes when nexthop link is down · 0eeb075f
      Andy Gospodarek 提交于
      This feature is only enabled with the new per-interface or ipv4 global
      sysctls called 'ignore_routes_with_linkdown'.
      
      net.ipv4.conf.all.ignore_routes_with_linkdown = 0
      net.ipv4.conf.default.ignore_routes_with_linkdown = 0
      net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, will report to userspace that a route is
      dead and will no longer resolve to this nexthop when performing a fib
      lookup.  This will signal to userspace that the route will not be
      selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
      if the sysctl is enabled and link is down.  This was done as without it
      the netlink listeners would have no idea whether or not a nexthop would
      be selected.   The kernel only sets RTNH_F_DEAD internally if the
      interface has IFF_UP cleared.
      
      With the new sysctl set, the following behavior can be observed
      (interface p8p1 is link-down):
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
          cache
      
      While the route does remain in the table (so it can be modified if
      needed rather than being wiped away as it would be if IFF_UP was
      cleared), the proper next-hop is chosen automatically when the link is
      down.  Now interface p8p1 is linked-up:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
      90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      
      and the output changes to what one would expect.
      
      If the sysctl is not set, the following output would be expected when
      p8p1 is down:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      
      Since the dead flag does not appear, there should be no expectation that
      the kernel would skip using this route due to link being down.
      
      v2: Split kernel changes into 2 patches, this actually makes a
      behavioral change if the sysctl is set.  Also took suggestion from Alex
      to simplify code by only checking sysctl during fib lookup and
      suggestion from Scott to add a per-interface sysctl.
      
      v3: Code clean-ups to make it more readable and efficient as well as a
      reverse path check fix.
      
      v4: Drop binary sysctl
      
      v5: Whitespace fixups from Dave
      
      v6: Style changes from Dave and checkpatch suggestions
      
      v7: One more checkpatch fixup
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eeb075f
    • A
      net: track link-status of ipv4 nexthops · 8a3d0316
      Andy Gospodarek 提交于
      Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
      reachable via an interface where carrier is off.  No action is taken,
      but additional flags are passed to userspace to indicate carrier status.
      
      This also includes a cleanup to fib_disable_ip to more clearly indicate
      what event made the function call to replace the more cryptic force
      option previously used.
      
      v2: Split out kernel functionality into 2 patches, this patch simply
      sets and clears new nexthop flag RTNH_F_LINKDOWN.
      
      v3: Cleanups suggested by Alex as well as a bug noticed in
      fib_sync_down_dev and fib_sync_up when multipath was not enabled.
      
      v5: Whitespace and variable declaration fixups suggested by Dave.
      
      v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
      all.
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a3d0316
    • V
      net: switchdev: ignore unsupported bridge flags · 5c8079d0
      Vivien Didelot 提交于
      switchdev_port_bridge_getlink() queries SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS
      attributes, but a driver doesn't need to implement this in order to get
      bridge link information.
      
      So error out only on errors different than -EOPNOTSUPP.
      
      (This is a follow-up patch for 7d4f8d87.)
      
      Fixes: 8793d0a6 ("switchdev: add new switchdev_port_bridge_getlink")
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c8079d0
    • J
      ip: report the original address of ICMP messages · 34b99df4
      Julian Anastasov 提交于
      ICMP messages can trigger ICMP and local errors. In this case
      serr->port is 0 and starting from Linux 4.0 we do not return
      the original target address to the error queue readers.
      Add function to define which errors provide addr_offset.
      With this fix my ping command is not silent anymore.
      
      Fixes: c247f053 ("ip: fix error queue empty skb handling")
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34b99df4
  11. 23 6月, 2015 3 次提交
    • S
      switchdev: change BUG_ON to WARN for attr set failure case · e9fdaec0
      Scott Feldman 提交于
      This particular BUG_ON condition was checking for attr set err in the
      COMMIT phase, which isn't expected (it's a driver bug if PREPARE phase is
      OK but COMMIT fails).  But BUG_ON() is too strong for this case, so change
      to WARN().  BUG_ON() would be warranted if the system was corrupted beyond
      repair, but this is not the case here.
      Signed-off-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9fdaec0
    • S
      switchdev; add VLAN support for port's bridge_getlink · 7d4f8d87
      Scott Feldman 提交于
      One more missing piece of the puzzle.  Add vlan dump support to switchdev
      port's bridge_getlink.  iproute2 "bridge vlan show" cmd already knows how
      to show the vlans installed on the bridge and the device , but (until now)
      no one implemented the port vlan part of the netlink PF_BRIDGE:RTM_GETLINK
      msg.  Before this patch, "bridge vlan show":
      
      	$ bridge -c vlan show
      	port    vlan ids
      	sw1p1    30-34			<< bridge side vlans
      		 57
      
      	sw1p1				<< device side vlans (missing)
      
      	sw1p2    57
      
      	sw1p2
      
      	sw1p3
      
      	sw1p4
      
      	br0     None
      
      (When the port is bridged, the output repeats the vlan list for the vlans
      on the bridge side of the port and the vlans on the device side of the
      port.  The listing above show no vlans for the device side even though they
      are installed).
      
      After this patch:
      
      	$ bridge -c vlan show
      	port    vlan ids
      	sw1p1    30-34			<< bridge side vlan
      		 57
      
      	sw1p1    30-34			<< device side vlans
      		 57
      		 3840 PVID
      
      	sw1p2    57
      
      	sw1p2    57
      		 3840 PVID
      
      	sw1p3    3842 PVID
      
      	sw1p4    3843 PVID
      
      	br0     None
      
      I re-used ndo_dflt_bridge_getlink to add vlan fill call-back func.
      switchdev support adds an obj dump for VLAN objects, using the same
      call-back scheme as FDB dump.  Support included for both compressed and
      un-compressed vlan dumps.
      Signed-off-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d4f8d87
    • S
      switchdev: rename vlan vid_start to vid_begin · 3e3a78b4
      Scott Feldman 提交于
      Use vid_begin/end to be consistent with BRIDGE_VLAN_INFO_RANGE_BEGIN/END.
      Signed-off-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e3a78b4