1. 23 9月, 2014 8 次提交
    • E
      tcp: avoid possible arithmetic overflows · fcdd1cf4
      Eric Dumazet 提交于
      icsk_rto is a 32bit field, and icsk_backoff can reach 15 by default,
      or more if some sysctl (eg tcp_retries2) are changed.
      
      Better use 64bit to perform icsk_rto << icsk_backoff operations
      
      As Joe Perches suggested, add a helper for this.
      
      Yuchung spotted the tcp_v4_err() case.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcdd1cf4
    • D
      ipv6: mld: answer mldv2 queries with mldv1 reports in mldv1 fallback · 35f7aa53
      Daniel Borkmann 提交于
      RFC2710 (MLDv1), section 3.7. says:
      
        The length of a received MLD message is computed by taking the
        IPv6 Payload Length value and subtracting the length of any IPv6
        extension headers present between the IPv6 header and the MLD
        message. If that length is greater than 24 octets, that indicates
        that there are other fields present *beyond* the fields described
        above, perhaps belonging to a *future backwards-compatible* version
        of MLD. An implementation of the version of MLD specified in this
        document *MUST NOT* send an MLD message longer than 24 octets and
        MUST ignore anything past the first 24 octets of a received MLD
        message.
      
      RFC3810 (MLDv2), section 8.2.1. states for *listeners* regarding
      presence of MLDv1 routers:
      
        In order to be compatible with MLDv1 routers, MLDv2 hosts MUST
        operate in version 1 compatibility mode. [...] When Host
        Compatibility Mode is MLDv2, a host acts using the MLDv2 protocol
        on that interface. When Host Compatibility Mode is MLDv1, a host
        acts in MLDv1 compatibility mode, using *only* the MLDv1 protocol,
        on that interface. [...]
      
      While section 8.3.1. specifies *router* behaviour regarding presence
      of MLDv1 routers:
      
        MLDv2 routers may be placed on a network where there is at least
        one MLDv1 router. The following requirements apply:
      
        If an MLDv1 router is present on the link, the Querier MUST use
        the *lowest* version of MLD present on the network. This must be
        administratively assured. Routers that desire to be compatible
        with MLDv1 MUST have a configuration option to act in MLDv1 mode;
        if an MLDv1 router is present on the link, the system administrator
        must explicitly configure all MLDv2 routers to act in MLDv1 mode.
        When in MLDv1 mode, the Querier MUST send periodic General Queries
        truncated at the Multicast Address field (i.e., 24 bytes long),
        and SHOULD also warn about receiving an MLDv2 Query (such warnings
        must be rate-limited). The Querier MUST also fill in the Maximum
        Response Delay in the Maximum Response Code field, i.e., the
        exponential algorithm described in section 5.1.3. is not used. [...]
      
      That means that we should not get queries from different versions of
      MLD. When there's a MLDv1 router present, MLDv2 enforces truncation
      and MRC == MRD (both fields are overlapping within the 24 octet range).
      
      Section 8.3.2. specifies behaviour in the presence of MLDv1 multicast
      address *listeners*:
      
        MLDv2 routers may be placed on a network where there are hosts
        that have not yet been upgraded to MLDv2. In order to be compatible
        with MLDv1 hosts, MLDv2 routers MUST operate in version 1 compatibility
        mode. MLDv2 routers keep a compatibility mode per multicast address
        record. The compatibility mode of a multicast address is determined
        from the Multicast Address Compatibility Mode variable, which can be
        in one of the two following states: MLDv1 or MLDv2.
      
        The Multicast Address Compatibility Mode of a multicast address
        record is set to MLDv1 whenever an MLDv1 Multicast Listener Report is
        *received* for that multicast address. At the same time, the Older
        Version Host Present timer for the multicast address is set to Older
        Version Host Present Timeout seconds. The timer is re-set whenever a
        new MLDv1 Report is received for that multicast address. If the Older
        Version Host Present timer expires, the router switches back to
        Multicast Address Compatibility Mode of MLDv2 for that multicast
        address. [...]
      
      That means, what can happen is the following scenario, that hosts can
      act in MLDv1 compatibility mode when they previously have received an
      MLDv1 query (or, simply operate in MLDv1 mode-only); and at the same
      time, an MLDv2 router could start up and transmits MLDv2 startup query
      messages while being unaware of the current operational mode.
      
      Given RFC2710, section 3.7 we would need to answer to that with an MLDv1
      listener report, so that the router according to RFC3810, section 8.3.2.
      would receive that and internally switch to MLDv1 compatibility as well.
      
      Right now, I believe since the initial implementation of MLDv2, Linux
      hosts would just silently drop such MLDv2 queries instead of replying
      with an MLDv1 listener report, which would prevent a MLDv2 router going
      into fallback mode (until it receives other MLDv1 queries).
      
      Since the mapping of MRC to MRD in exactly such cases can make use of
      the exponential algorithm from 5.1.3, we cannot [strictly speaking] be
      aware in MLDv1 of the encoding in MRC, it seems also not mentioned by
      the RFC. Since encodings are the same up to 32767, assume in such a
      situation this value as a hard upper limit we would clamp. We have asked
      one of the RFC authors on that regard, and he mentioned that there seem
      not to be any implementations that make use of that exponential algorithm
      on startup messages. In any case, this patch fixes this MLD
      interoperability issue.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f7aa53
    • J
      net: sched: cls_u32 changes to knode must appear atomic to readers · de5df632
      John Fastabend 提交于
      Changes to the cls_u32 classifier must appear atomic to the
      readers. Before this patch if a change is requested for both
      the exts and ifindex, first the ifindex is updated then the
      exts with tcf_exts_change(). This opens a small window where
      a reader can have a exts chain with an incorrect ifindex. This
      violates the the RCU semantics.
      
      Here we resolve this by always passing u32_set_parms() a copy
      of the tc_u_knode to work on and then inserting it into the hash
      table after the updates have been successfully applied.
      
      Tested with the following short script:
      
      #tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 handle 1: \
      	       u32 divisor 256
      
      #tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 \
      	       u32 link 1: hashkey mask ffffff00 at 12    \
      	       match ip src 192.168.8.0/2
      
      #tc filter add dev p3p2 parent 8001:0 protocol ip prio 102    \
      	       handle 1::10 u32 classid 1:2 ht 1: 	      \
      	       match ip src 192.168.8.0/8 match ip tos 0x0a 1e
      
      #tc filter change dev p3p2 parent 8001:0 protocol ip prio 102 \
      		 handle 1::10 u32 classid 1:2 ht 1:        \
      		 match ip src 1.1.0.0/8 match ip tos 0x0b 1e
      
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de5df632
    • J
      net: cls_u32: fix missed pcpu_success free_percpu · a1ddcfee
      John Fastabend 提交于
      This fixes a missed free_percpu in the unwind code path and when
      keys are destroyed.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1ddcfee
    • T
      udp: Need to make ip6_udp_tunnel.c have GPL license · 3fcb95a8
      Tom Herbert 提交于
      Unable to load various tunneling modules without this:
      
      [   80.679049] fou: Unknown symbol udp_sock_create6 (err 0)
      [   91.439939] ip6_udp_tunnel: Unknown symbol ip6_local_out (err 0)
      [   91.439954] ip6_udp_tunnel: Unknown symbol __put_net (err 0)
      [   91.457792] vxlan: Unknown symbol udp_sock_create6 (err 0)
      [   91.457831] vxlan: Unknown symbol udp_tunnel6_xmit_skb (err 0)
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fcb95a8
    • J
      net: keep original skb which only needs header checking during software GSO · cecda693
      Jason Wang 提交于
      Commit ce93718f ("net: Don't keep
      around original SKB when we software segment GSO frames") frees the
      original skb after software GSO even for dodgy gso skbs. This breaks
      the stream throughput from untrusted sources, since only header
      checking was done during software GSO instead of a true
      segmentation. This patch fixes this by freeing the original gso skb
      only when it was really segmented by software.
      
      Fixes ce93718f ("net: Don't keep
      around original SKB when we software segment GSO frames.")
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cecda693
    • F
      net: dsa: add {get, set}_wol callbacks to slave devices · 19e57c4e
      Florian Fainelli 提交于
      Allow switch drivers to implement per-port Wake-on-LAN getter and
      setters.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19e57c4e
    • F
      net: dsa: allow switch drivers to implement suspend/resume hooks · 24462549
      Florian Fainelli 提交于
      Add an abstraction layer to suspend/resume switch devices, doing the
      following split:
      
      - suspend/resume the slave network devices and their corresponding PHY
        devices
      - suspend/resume the switch hardware using switch driver callbacks
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24462549
  2. 20 9月, 2014 16 次提交
  3. 17 9月, 2014 7 次提交
    • J
      net: sched: cls_cgroup need tcf_exts_init in all cases · 9f6c38e7
      John Fastabend 提交于
      This ensures the tcf_exts_init() is called for all cases.
      
      Fixes: 952313bd ("net: sched: cls_cgroup use RCU")
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f6c38e7
    • J
      net: sched: cls_fw: add missing tcf_exts_init call in fw_change() · e1f93eb0
      John Fastabend 提交于
      When allocating a new structure we also need to call tcf_exts_init
      to initialize exts.
      
      A follow up patch might be in order to remove some of this code
      and do tcf_exts_assign(). With this we could remove the
      tcf_exts_init/tcf_exts_change pattern for some of the classifiers.
      As part of the future tcf_actions RCU series this will need to be
      done. For now fix the call here.
      
      Fixes e35a8ee5 ("net: sched: fw use RCU")
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1f93eb0
    • J
      net: sched: cls_cgroup fix possible memory leak of 'new' · d14cbfc8
      John Fastabend 提交于
      tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
      head:   54996b52
      commit: c7953ef2 [625/646] net: sched: cls_cgroup use RCU
      
      net/sched/cls_cgroup.c:130 cls_cgroup_change() warn: possible memory leak of 'new'
      net/sched/cls_cgroup.c:135 cls_cgroup_change() warn: possible memory leak of 'new'
      net/sched/cls_cgroup.c:139 cls_cgroup_change() warn: possible memory leak of 'new'
      
      Fixes: c7953ef2 ("net: sched: cls_cgroup use RCU")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d14cbfc8
    • J
      net: sched: cls_u32 add missing rcu_assign_pointer and annotation · a96366bf
      John Fastabend 提交于
      Add missing rcu_assign_pointer and missing  annotation for ht_up
      in cls_u32.c
      
      Caught by kbuild bot,
      
      >> net/sched/cls_u32.c:378:36: sparse: incorrect type in initializer (different address spaces)
         net/sched/cls_u32.c:378:36:    expected struct tc_u_hnode *ht
         net/sched/cls_u32.c:378:36:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
      >> net/sched/cls_u32.c:610:54: sparse: incorrect type in argument 4 (different address spaces)
         net/sched/cls_u32.c:610:54:    expected struct tc_u_hnode *ht
         net/sched/cls_u32.c:610:54:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
      >> net/sched/cls_u32.c:684:18: sparse: incorrect type in assignment (different address spaces)
         net/sched/cls_u32.c:684:18:    expected struct tc_u_hnode [noderef] <asn:4>*ht_up
         net/sched/cls_u32.c:684:18:    got struct tc_u_hnode *[assigned] ht
      >> net/sched/cls_u32.c:359:18: sparse: dereference of noderef expression
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a96366bf
    • J
      net: sched: fix unsued cpu variable · 80aab73d
      John Fastabend 提交于
      kbuild test robot reported an unused variable cpu in cls_u32.c
      after the patch below. This happens when PERF and MARK config
      variables are disabled
      
      Fix this is to use separate variables for perf and mark
      and define the cpu variable inside the ifdef logic.
      
      Fixes: 459d5f62 ("net: sched: make cls_u32 per cpu")'
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80aab73d
    • W
      net_sched: fix a null pointer dereference in tcindex_set_parms() · 69301eaa
      WANG Cong 提交于
      This patch fixes the following crash:
      
      [   42.199159] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [   42.200027] IP: [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
      [   42.200027] PGD d2319067 PUD d4ffe067 PMD 0
      [   42.200027] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [   42.200027] CPU: 0 PID: 541 Comm: tc Not tainted 3.17.0-rc4+ #603
      [   42.200027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   42.200027] task: ffff8800d22d2670 ti: ffff8800ce790000 task.ti: ffff8800ce790000
      [   42.200027] RIP: 0010:[<ffffffff817e3fc4>]  [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
      [   42.200027] RSP: 0018:ffff8800ce793898  EFLAGS: 00010202
      [   42.200027] RAX: 0000000000000001 RBX: ffff8800d1786498 RCX: 0000000000000000
      [   42.200027] RDX: ffffffff82114ec8 RSI: ffffffff82114ec8 RDI: ffffffff82114ec8
      [   42.200027] RBP: ffff8800ce793958 R08: 00000000000080d0 R09: 0000000000000001
      [   42.200027] R10: ffff8800ce7939a0 R11: 0000000000000246 R12: ffff8800d017d238
      [   42.200027] R13: 0000000000000018 R14: ffff8800d017c6a0 R15: ffff8800d1786620
      [   42.200027] FS:  00007f4e24539740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
      [   42.200027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   42.200027] CR2: 0000000000000018 CR3: 00000000cff38000 CR4: 00000000000006f0
      [   42.200027] Stack:
      [   42.200027]  ffff8800ce0949f0 0000000000000000 0000000200000003 ffff880000000000
      [   42.200027]  ffff8800ce7938b8 ffff8800ce7938b8 0000000600000007 0000000000000000
      [   42.200027]  ffff8800ce7938d8 ffff8800ce7938d8 0000000600000007 ffff8800ce0949f0
      [   42.200027] Call Trace:
      [   42.200027]  [<ffffffff817e4169>] tcindex_change+0xdb/0xee
      [   42.200027]  [<ffffffff817c16ca>] tc_ctl_tfilter+0x44d/0x63f
      [   42.200027]  [<ffffffff8179d161>] rtnetlink_rcv_msg+0x181/0x194
      [   42.200027]  [<ffffffff8179cf9d>] ? rtnl_lock+0x17/0x19
      [   42.200027]  [<ffffffff8179cfe0>] ? __rtnl_unlock+0x17/0x17
      [   42.200027]  [<ffffffff817ee296>] netlink_rcv_skb+0x49/0x8b
      [   43.462494]  [<ffffffff8179cfc2>] rtnetlink_rcv+0x23/0x2a
      [   43.462494]  [<ffffffff817ec8df>] netlink_unicast+0xc7/0x148
      [   43.462494]  [<ffffffff817ed413>] netlink_sendmsg+0x5cb/0x63d
      [   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
      [   43.462494]  [<ffffffff817757b8>] __sock_sendmsg_nosec+0x25/0x27
      [   43.462494]  [<ffffffff81778165>] sock_sendmsg+0x57/0x71
      [   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
      [   43.462494]  [<ffffffff81152c06>] ? might_fault+0xa0/0xa4
      [   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
      [   43.462494]  [<ffffffff817838fd>] ? verify_iovec+0x69/0xb7
      [   43.462494]  [<ffffffff817784f8>] ___sys_sendmsg+0x21d/0x2bb
      [   43.462494]  [<ffffffff81009db3>] ? native_sched_clock+0x35/0x37
      [   43.462494]  [<ffffffff8109ab53>] ? sched_clock_local+0x12/0x72
      [   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
      [   43.462494]  [<ffffffff8109ada4>] ? sched_clock_cpu+0xa0/0xb9
      [   43.462494]  [<ffffffff810aee37>] ? __lock_acquire+0x5fe/0xde4
      [   43.462494]  [<ffffffff8119f570>] ? rcu_read_lock_held+0x36/0x38
      [   43.462494]  [<ffffffff8119f75a>] ? __fcheck_files.isra.7+0x4b/0x57
      [   43.462494]  [<ffffffff8119fbf2>] ? __fget_light+0x30/0x54
      [   43.462494]  [<ffffffff81779012>] __sys_sendmsg+0x42/0x60
      [   43.462494]  [<ffffffff81779042>] SyS_sendmsg+0x12/0x1c
      [   43.462494]  [<ffffffff819d24d2>] system_call_fastpath+0x16/0x1b
      
      'p->h' could be NULL while 'cp->h' is always update to date.
      
      Fixes: commit 331b7292 ("net: sched: RCU cls_tcindex")
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-By: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69301eaa
    • W
      net_sched: fix memory leak in cls_tcindex · 44b75e43
      WANG Cong 提交于
      Fixes: commit 331b7292 ("net: sched: RCU cls_tcindex")
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-By: NJohn Fastabend <john.r.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44b75e43
  4. 16 9月, 2014 9 次提交