1. 30 3月, 2015 3 次提交
  2. 25 3月, 2015 2 次提交
    • D
      ipv6: Don't reduce hop limit for an interface · 6fd99094
      D.S. Ljungmark 提交于
      A local route may have a lower hop_limit set than global routes do.
      
      RFC 3756, Section 4.2.7, "Parameter Spoofing"
      
      >   1.  The attacker includes a Current Hop Limit of one or another small
      >       number which the attacker knows will cause legitimate packets to
      >       be dropped before they reach their destination.
      
      >   As an example, one possible approach to mitigate this threat is to
      >   ignore very small hop limits.  The nodes could implement a
      >   configurable minimum hop limit, and ignore attempts to set it below
      >   said limit.
      Signed-off-by: ND.S. Ljungmark <ljungmark@modio.se>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fd99094
    • W
      net: use for_each_netdev_safe() in rtnl_group_changelink() · d079535d
      WANG Cong 提交于
      In case we move the whole dev group to another netns,
      we should call for_each_netdev_safe(), otherwise we get
      a soft lockup:
      
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ip:798]
       irq event stamp: 255424
       hardirqs last  enabled at (255423): [<ffffffff81a2aa95>] restore_args+0x0/0x30
       hardirqs last disabled at (255424): [<ffffffff81a2ad5a>] apic_timer_interrupt+0x6a/0x80
       softirqs last  enabled at (255422): [<ffffffff81079ebc>] __do_softirq+0x2c1/0x3a9
       softirqs last disabled at (255417): [<ffffffff8107a190>] irq_exit+0x41/0x95
       CPU: 0 PID: 798 Comm: ip Not tainted 4.0.0-rc4+ #881
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff8800d1b88000 ti: ffff880119530000 task.ti: ffff880119530000
       RIP: 0010:[<ffffffff810cad11>]  [<ffffffff810cad11>] debug_lockdep_rcu_enabled+0x28/0x30
       RSP: 0018:ffff880119533778  EFLAGS: 00000246
       RAX: ffff8800d1b88000 RBX: 0000000000000002 RCX: 0000000000000038
       RDX: 0000000000000000 RSI: ffff8800d1b888c8 RDI: ffff8800d1b888c8
       RBP: ffff880119533778 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 000000000000b5c2 R12: 0000000000000246
       R13: ffff880119533708 R14: 00000000001d5a40 R15: ffff88011a7d5a40
       FS:  00007fc01315f740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 00007f367a120988 CR3: 000000011849c000 CR4: 00000000000007f0
       Stack:
        ffff880119533798 ffffffff811ac868 ffffffff811ac831 ffffffff811ac828
        ffff8801195337c8 ffffffff811ac8c9 ffff8801195339b0 ffff8801197633e0
        0000000000000000 ffff8801195339b0 ffff8801195337d8 ffffffff811ad2d7
       Call Trace:
        [<ffffffff811ac868>] rcu_read_lock+0x37/0x6e
        [<ffffffff811ac831>] ? rcu_read_unlock+0x5f/0x5f
        [<ffffffff811ac828>] ? rcu_read_unlock+0x56/0x5f
        [<ffffffff811ac8c9>] __fget+0x2a/0x7a
        [<ffffffff811ad2d7>] fget+0x13/0x15
        [<ffffffff811be732>] proc_ns_fget+0xe/0x38
        [<ffffffff817c7714>] get_net_ns_by_fd+0x11/0x59
        [<ffffffff817df359>] rtnl_link_get_net+0x33/0x3e
        [<ffffffff817df3d7>] do_setlink+0x73/0x87b
        [<ffffffff810b28ce>] ? trace_hardirqs_off+0xd/0xf
        [<ffffffff81a2aa95>] ? retint_restore_args+0xe/0xe
        [<ffffffff817e0301>] rtnl_newlink+0x40c/0x699
        [<ffffffff817dffe0>] ? rtnl_newlink+0xeb/0x699
        [<ffffffff81a29246>] ? _raw_spin_unlock+0x28/0x33
        [<ffffffff8143ed1e>] ? security_capable+0x18/0x1a
        [<ffffffff8107da51>] ? ns_capable+0x4d/0x65
        [<ffffffff817de5ce>] rtnetlink_rcv_msg+0x181/0x194
        [<ffffffff817de407>] ? rtnl_lock+0x17/0x19
        [<ffffffff817de407>] ? rtnl_lock+0x17/0x19
        [<ffffffff817de44d>] ? __rtnl_unlock+0x17/0x17
        [<ffffffff818327c6>] netlink_rcv_skb+0x4d/0x93
        [<ffffffff817de42f>] rtnetlink_rcv+0x26/0x2d
        [<ffffffff81830f18>] netlink_unicast+0xcb/0x150
        [<ffffffff8183198e>] netlink_sendmsg+0x501/0x523
        [<ffffffff8115cba9>] ? might_fault+0x59/0xa9
        [<ffffffff817b5398>] ? copy_from_user+0x2a/0x2c
        [<ffffffff817b7b74>] sock_sendmsg+0x34/0x3c
        [<ffffffff817b7f6d>] ___sys_sendmsg+0x1b8/0x255
        [<ffffffff8115c5eb>] ? handle_pte_fault+0xbd5/0xd4a
        [<ffffffff8100a2b0>] ? native_sched_clock+0x35/0x37
        [<ffffffff8109e94b>] ? sched_clock_local+0x12/0x72
        [<ffffffff8109eb9c>] ? sched_clock_cpu+0x9e/0xb7
        [<ffffffff810cadbf>] ? rcu_read_lock_held+0x3b/0x3d
        [<ffffffff811ac1d8>] ? __fcheck_files+0x4c/0x58
        [<ffffffff811ac946>] ? __fget_light+0x2d/0x52
        [<ffffffff817b8adc>] __sys_sendmsg+0x42/0x60
        [<ffffffff817b8b0c>] SyS_sendmsg+0x12/0x1c
        [<ffffffff81a29e32>] system_call_fastpath+0x12/0x17
      
      Fixes: e7ed828f ("netlink: support setting devgroup parameters")
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d079535d
  3. 24 3月, 2015 1 次提交
  4. 23 3月, 2015 1 次提交
  5. 21 3月, 2015 5 次提交
  6. 20 3月, 2015 1 次提交
  7. 19 3月, 2015 1 次提交
    • P
      netfilter: restore rule tracing via nfnetlink_log · 4017a7ee
      Pablo Neira Ayuso 提交于
      Since fab4085f ("netfilter: log: nf_log_packet() as real unified
      interface"), the loginfo structure that is passed to nf_log_packet() is
      used to explicitly indicate the logger type you want to use.
      
      This is a problem for people tracing rules through nfnetlink_log since
      packets are always routed to the NF_LOG_TYPE logger after the
      aforementioned patch.
      
      We can fix this by removing the trace loginfo structures, but that still
      changes the log level from 4 to 5 for tracing messages and there may be
      someone relying on this outthere. So let's just introduce a new
      nf_log_trace() function that restores the former behaviour.
      Reported-by: NMarkus Kötter <koetter@rrzn.uni-hannover.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4017a7ee
  8. 18 3月, 2015 3 次提交
    • D
      act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome · ced585c8
      Daniel Borkmann 提交于
      Revisiting commit d23b8ad8 ("tc: add BPF based action") with regards
      to eBPF support, I was thinking that it might be better to improve
      return semantics from a BPF program invoked through BPF_PROG_RUN().
      
      Currently, in case filter_res is 0, we overwrite the default action
      opcode with TC_ACT_SHOT. A default action opcode configured through tc's
      m_bpf can be: TC_ACT_RECLASSIFY, TC_ACT_PIPE, TC_ACT_SHOT, TC_ACT_UNSPEC,
      TC_ACT_OK.
      
      In cls_bpf, we have the possibility to overwrite the default class
      associated with the classifier in case filter_res is _not_ 0xffffffff
      (-1).
      
      That allows us to fold multiple [e]BPF programs into a single one, where
      they would otherwise need to be defined as a separate classifier with
      its own classid, needlessly redoing parsing work, etc.
      
      Similarly, we could do better in act_bpf: Since above TC_ACT* opcodes
      are exported to UAPI anyway, we reuse them for return-code-to-tc-opcode
      mapping, where we would allow above possibilities. Thus, like in cls_bpf,
      a filter_res of 0xffffffff (-1) means that the configured _default_ action
      is used. Any unkown return code from the BPF program would fail in
      tcf_bpf() with TC_ACT_UNSPEC.
      
      Should we one day want to make use of TC_ACT_STOLEN or TC_ACT_QUEUED,
      which both have the same semantics, we have the option to either use
      that as a default action (filter_res of 0xffffffff) or non-default BPF
      return code.
      
      All that will allow us to transparently use tcf_bpf() for both BPF
      flavours.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ced585c8
    • E
      inet: Clean up inet_csk_wait_for_connect() vs. might_sleep() · cb7cf8a3
      Eric Dumazet 提交于
      I got the following trace with current net-next kernel :
      
      [14723.885290] WARNING: CPU: 26 PID: 22658 at kernel/sched/core.c:7285 __might_sleep+0x89/0xa0()
      [14723.885325] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810e8734>] prepare_to_wait_exclusive+0x34/0xa0
      [14723.885355] CPU: 26 PID: 22658 Comm: netserver Not tainted 4.0.0-dbg-DEV #1379
      [14723.885359]  ffffffff81a223a8 ffff881fae9e7ca8 ffffffff81650b5d 0000000000000001
      [14723.885364]  ffff881fae9e7cf8 ffff881fae9e7ce8 ffffffff810a72e7 0000000000000000
      [14723.885367]  ffffffff81a57620 000000000000093a 0000000000000000 ffff881fae9e7e64
      [14723.885371] Call Trace:
      [14723.885377]  [<ffffffff81650b5d>] dump_stack+0x4c/0x65
      [14723.885382]  [<ffffffff810a72e7>] warn_slowpath_common+0x97/0xe0
      [14723.885386]  [<ffffffff810a73e6>] warn_slowpath_fmt+0x46/0x50
      [14723.885390]  [<ffffffff810f4c5d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
      [14723.885393]  [<ffffffff810e8734>] ? prepare_to_wait_exclusive+0x34/0xa0
      [14723.885396]  [<ffffffff810e8734>] ? prepare_to_wait_exclusive+0x34/0xa0
      [14723.885399]  [<ffffffff810ccdc9>] __might_sleep+0x89/0xa0
      [14723.885403]  [<ffffffff81581846>] lock_sock_nested+0x36/0xb0
      [14723.885406]  [<ffffffff815829a3>] ? release_sock+0x173/0x1c0
      [14723.885411]  [<ffffffff815ea1f7>] inet_csk_accept+0x157/0x2a0
      [14723.885415]  [<ffffffff810e8900>] ? abort_exclusive_wait+0xc0/0xc0
      [14723.885419]  [<ffffffff8161b96d>] inet_accept+0x2d/0x150
      [14723.885424]  [<ffffffff8157db6f>] SYSC_accept4+0xff/0x210
      [14723.885428]  [<ffffffff8165a451>] ? retint_swapgs+0xe/0x44
      [14723.885431]  [<ffffffff810f4c5d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
      [14723.885437]  [<ffffffff81369c0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [14723.885441]  [<ffffffff8157ef40>] SyS_accept+0x10/0x20
      [14723.885444]  [<ffffffff81659872>] system_call_fastpath+0x12/0x17
      [14723.885447] ---[ end trace ff74cd83355b1873 ]---
      
      In commit 26cabd31
      Peter added a sched_annotate_sleep() in sk_wait_event()
      
      Is the following patch needed as well ?
      
      Alternative would be to use sk_wait_event() from inet_csk_wait_for_connect()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb7cf8a3
    • N
      ip6_tunnel: fix error code when tunnel exists · 37355565
      Nicolas Dichtel 提交于
      After commit 2b0bb01b, the kernel returns -ENOBUFS when user tries to add
      an existing tunnel with ioctl API:
      $ ip -6 tunnel add ip6tnl1 mode ip6ip6 dev eth1
      add tunnel "ip6tnl0" failed: No buffer space available
      
      It's confusing, the right error is EEXIST.
      
      This patch also change a bit the code returned:
       - ENOBUFS -> ENOMEM
       - ENOENT -> ENODEV
      
      Fixes: 2b0bb01b ("ip6_tunnel: Return an error when adding an existing tunnel.")
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      Reported-by: NPierre Cheynier <me@pierre-cheynier.net>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37355565
  9. 17 3月, 2015 1 次提交
  10. 16 3月, 2015 6 次提交
  11. 15 3月, 2015 1 次提交
  12. 14 3月, 2015 1 次提交
  13. 13 3月, 2015 2 次提交
  14. 12 3月, 2015 7 次提交
    • I
      netfilter: Zero the tuple in nfnl_cthelper_parse_tuple() · 78146572
      Ian Wilson 提交于
      nfnl_cthelper_parse_tuple() is called from nfnl_cthelper_new(),
      nfnl_cthelper_get() and nfnl_cthelper_del().  In each case they pass
      a pointer to an nf_conntrack_tuple data structure local variable:
      
          struct nf_conntrack_tuple tuple;
          ...
          ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]);
      
      The problem is that this local variable is not initialized, and
      nfnl_cthelper_parse_tuple() only initializes two fields: src.l3num and
      dst.protonum.  This leaves all other fields with undefined values
      based on whatever is on the stack:
      
          tuple->src.l3num = ntohs(nla_get_be16(tb[NFCTH_TUPLE_L3PROTONUM]));
          tuple->dst.protonum = nla_get_u8(tb[NFCTH_TUPLE_L4PROTONUM]);
      
      The symptom observed was that when the rpc and tns helpers were added
      then traffic to port 1536 was being sent to user-space.
      Signed-off-by: NIan Wilson <iwilson@brocade.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      78146572
    • A
      rds: avoid potential stack overflow · f862e07c
      Arnd Bergmann 提交于
      The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
      on the stack in order to pass a pair of addresses. This happens to just
      fit withint the 1024 byte stack size warning limit on x86, but just
      exceed that limit on ARM, which gives us this warning:
      
      net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      As the use of this large variable is basically bogus, we can rearrange
      the code to not do that. Instead of passing an rds socket into
      rds_iw_get_device, we now just pass the two addresses that we have
      available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
      to create two address structures on the stack there.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f862e07c
    • W
      sock: fix possible NULL sk dereference in __skb_tstamp_tx · 3a8dd971
      Willem de Bruijn 提交于
      Test that sk != NULL before reading sk->sk_tsflags.
      
      Fixes: 49ca0d8b ("net-timestamp: no-payload option")
      Reported-by: NOne Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a8dd971
    • E
      xps: must clear sender_cpu before forwarding · c29390c6
      Eric Dumazet 提交于
      John reported that my previous commit added a regression
      on his router.
      
      This is because sender_cpu & napi_id share a common location,
      so get_xps_queue() can see garbage and perform an out of bound access.
      
      We need to make sure sender_cpu is cleared before doing the transmit,
      otherwise any NIC busy poll enabled (skb_mark_napi_id()) can trigger
      this bug.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJohn <jw@nuclearfallout.net>
      Bisected-by: NJohn <jw@nuclearfallout.net>
      Fixes: 2bd82484 ("xps: fix xps for stacked devices")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c29390c6
    • A
      net: sysctl_net_core: check SNDBUF and RCVBUF for min length · b1cb59cf
      Alexey Kodanev 提交于
      sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be
      set to incorrect values. Given that 'struct sk_buff' allocates from
      rcvbuf, incorrectly set buffer length could result to memory
      allocation failures. For example, set them as follows:
      
          # sysctl net.core.rmem_default=64
            net.core.wmem_default = 64
          # sysctl net.core.wmem_default=64
            net.core.wmem_default = 64
          # ping localhost -s 1024 -i 0 > /dev/null
      
      This could result to the following failure:
      
      skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32
      head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL>
      kernel BUG at net/core/skbuff.c:102!
      invalid opcode: 0000 [#1] SMP
      ...
      task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000
      RIP: 0010:[<ffffffff8155fbd1>]  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
      RSP: 0018:ffff88003ae8bc68  EFLAGS: 00010296
      RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000
      RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8
      RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900
      FS:  00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0
      Stack:
       ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d
       ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550
       ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00
      Call Trace:
       [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470
       [<ffffffff81556f56>] sock_write_iter+0x146/0x160
       [<ffffffff811d9612>] new_sync_write+0x92/0xd0
       [<ffffffff811d9cd6>] vfs_write+0xd6/0x180
       [<ffffffff811da499>] SyS_write+0x59/0xd0
       [<ffffffff81651532>] system_call_fastpath+0x12/0x17
      Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
            00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b
            eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
      RIP  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
      RSP <ffff88003ae8bc68>
      Kernel panic - not syncing: Fatal exception
      
      Moreover, the possible minimum is 1, so we can get another kernel panic:
      ...
      BUG: unable to handle kernel paging request at ffff88013caee5c0
      IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0
      ...
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1cb59cf
    • N
      tcp: restore 1.5x per RTT limit to CUBIC cwnd growth in congestion avoidance · d578e18c
      Neal Cardwell 提交于
      Commit 814d488c ("tcp: fix the timid additive increase on stretch
      ACKs") fixed a bug where tcp_cong_avoid_ai() would either credit a
      connection with an increase of snd_cwnd_cnt, or increase snd_cwnd, but
      not both, resulting in cwnd increasing by 1 packet on at most every
      alternate invocation of tcp_cong_avoid_ai().
      
      Although the commit correctly implemented the CUBIC algorithm, which
      can increase cwnd by as much as 1 packet per 1 packet ACKed (2x per
      RTT), in practice that could be too aggressive: in tests on network
      paths with small buffers, YouTube server retransmission rates nearly
      doubled.
      
      This commit restores CUBIC to a maximum cwnd growth rate of 1 packet
      per 2 packets ACKed (1.5x per RTT). In YouTube tests this restored
      retransmit rates to low levels.
      
      Testing: This patch has been tested in datacenter netperf transfers
      and live youtube.com and google.com servers.
      
      Fixes: 9cd981dc ("tcp: fix stretch ACK bugs in CUBIC")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d578e18c
    • N
      tcp: fix tcp_cong_avoid_ai() credit accumulation bug with decreases in w · 9949afa4
      Neal Cardwell 提交于
      The recent change to tcp_cong_avoid_ai() to handle stretch ACKs
      introduced a bug where snd_cwnd_cnt could accumulate a very large
      value while w was large, and then if w was reduced snd_cwnd could be
      incremented by a large delta, leading to a large burst and high packet
      loss. This was tickled when CUBIC's bictcp_update() sets "ca->cnt =
      100 * cwnd".
      
      This bug crept in while preparing the upstream version of
      814d488c.
      
      Testing: This patch has been tested in datacenter netperf transfers
      and live youtube.com and google.com servers.
      
      Fixes: 814d488c ("tcp: fix the timid additive increase on stretch ACKs")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9949afa4
  15. 11 3月, 2015 2 次提交
  16. 10 3月, 2015 3 次提交
    • W
      net_sched: fix struct tc_u_hnode layout in u32 · 5778d39d
      WANG Cong 提交于
      We dynamically allocate divisor+1 entries for ->ht[] in tc_u_hnode:
      
        ht = kzalloc(sizeof(*ht) + divisor*sizeof(void *), GFP_KERNEL);
      
      So ->ht is supposed to be the last field of this struct, however
      this is broken, since an rcu head is appended after it.
      
      Fixes: 1ce87720 ("net: sched: make cls_u32 lockless")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5778d39d
    • J
      tipc: fix bug in link failover handling · e6441bae
      Jon Paul Maloy 提交于
      In commit c637c103
      ("tipc: resolve race problem at unicast message reception") we
      introduced a new mechanism for delivering buffers upwards from link
      to socket layer.
      
      That code contains a bug in how we handle the new link input queue
      during failover. When a link is reset, some of its users may be blocked
      because of congestion, and in order to resolve this, we add any pending
      wakeup pseudo messages to the link's input queue, and deliver them to
      the socket. This misses the case where the other, remaining link also
      may have congested users. Currently, the owner node's reference to the
      remaining link's input queue is unconditionally overwritten by the
      reset link's input queue. This has the effect that wakeup events from
      the remaining link may be unduely delayed (but not lost) for a
      potentially long period.
      
      We fix this by adding the pending events from the reset link to the
      input queue that is currently referenced by the node, whichever one
      it is.
      
      This commit should be applied to both net and net-next.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6441bae
    • F
      net: delete stale packet_mclist entries · 82f17091
      Francesco Ruggeri 提交于
      When an interface is deleted from a net namespace the ifindex in the
      corresponding entries in PF_PACKET sockets' mclists becomes stale.
      This can create inconsistencies if later an interface with the same ifindex
      is moved from a different namespace (not that unlikely since ifindexes are
      per-namespace).
      In particular we saw problems with dev->promiscuity, resulting
      in "promiscuity touches roof, set promiscuity failed. promiscuity
      feature of device might be broken" warnings and EOVERFLOW failures of
      setsockopt(PACKET_ADD_MEMBERSHIP).
      This patch deletes the mclist entries for interfaces that are deleted.
      Since this now causes setsockopt(PACKET_DROP_MEMBERSHIP) to fail with
      EADDRNOTAVAIL if called after the interface is deleted, also make
      packet_mc_drop not fail.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82f17091