1. 03 2月, 2015 5 次提交
  2. 29 1月, 2015 4 次提交
  3. 28 1月, 2015 7 次提交
  4. 27 1月, 2015 18 次提交
    • H
      netlink: Kill redundant net argument in netlink_insert · 8ea65f4a
      Herbert Xu 提交于
      The socket already carries the net namespace with it so there is
      no need to be passing another net around.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ea65f4a
    • H
      ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too · 6e9e16e6
      Hannes Frederic Sowa 提交于
      Lubomir Rintel reported that during replacing a route the interface
      reference counter isn't correctly decremented.
      
      To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>:
      | [root@rhel7-5 lkundrak]# sh -x lal
      | + ip link add dev0 type dummy
      | + ip link set dev0 up
      | + ip link add dev1 type dummy
      | + ip link set dev1 up
      | + ip addr add 2001:db8:8086::2/64 dev dev0
      | + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
      | + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
      | + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
      | + ip link del dev0 type dummy
      | Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
      |  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
      |
      | Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
      |  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
      
      During replacement of a rt6_info we must walk all parent nodes and check
      if the to be replaced rt6_info got propagated. If so, replace it with
      an alive one.
      
      Fixes: 4a287eba ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
      Reported-by: NLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Tested-by: NLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e9e16e6
    • S
      ping: Fix race in free in receive path · fc752f1f
      subashab@codeaurora.org 提交于
      An exception is seen in ICMP ping receive path where the skb
      destructor sock_rfree() tries to access a freed socket. This happens
      because ping_rcv() releases socket reference with sock_put() and this
      internally frees up the socket. Later icmp_rcv() will try to free the
      skb and as part of this, skb destructor is called and which leads
      to a kernel panic as the socket is freed already in ping_rcv().
      
      -->|exception
      -007|sk_mem_uncharge
      -007|sock_rfree
      -008|skb_release_head_state
      -009|skb_release_all
      -009|__kfree_skb
      -010|kfree_skb
      -011|icmp_rcv
      -012|ip_local_deliver_finish
      
      Fix this incorrect free by cloning this skb and processing this cloned
      skb instead.
      
      This patch was suggested by Eric Dumazet
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc752f1f
    • H
      udp_diag: Fix socket skipping within chain · 86f3cddb
      Herbert Xu 提交于
      While working on rhashtable walking I noticed that the UDP diag
      dumping code is buggy.  In particular, the socket skipping within
      a chain never happens, even though we record the number of sockets
      that should be skipped.
      
      As this code was supposedly copied from TCP, this patch does what
      TCP does and resets num before we walk a chain.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86f3cddb
    • H
      ipv4: try to cache dst_entries which would cause a redirect · df4d9254
      Hannes Frederic Sowa 提交于
      Not caching dst_entries which cause redirects could be exploited by hosts
      on the same subnet, causing a severe DoS attack. This effect aggravated
      since commit f8864972 ("ipv4: fix dst race in sk_dst_get()").
      
      Lookups causing redirects will be allocated with DST_NOCACHE set which
      will force dst_release to free them via RCU.  Unfortunately waiting for
      RCU grace period just takes too long, we can end up with >1M dst_entries
      waiting to be released and the system will run OOM. rcuos threads cannot
      catch up under high softirq load.
      
      Attaching the flag to emit a redirect later on to the specific skb allows
      us to cache those dst_entries thus reducing the pressure on allocation
      and deallocation.
      
      This issue was discovered by Marcelo Leitner.
      
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NMarcelo Leitner <mleitner@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df4d9254
    • D
      net: sctp: fix slab corruption from use after free on INIT collisions · 600ddd68
      Daniel Borkmann 提交于
      When hitting an INIT collision case during the 4WHS with AUTH enabled, as
      already described in detail in commit 1be9a950 ("net: sctp: inherit
      auth_capable on INIT collisions"), it can happen that we occasionally
      still remotely trigger the following panic on server side which seems to
      have been uncovered after the fix from commit 1be9a950 ...
      
      [  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
      [  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
      [  533.940559] PGD 5030f2067 PUD 0
      [  533.957104] Oops: 0000 [#1] SMP
      [  533.974283] Modules linked in: sctp mlx4_en [...]
      [  534.939704] Call Trace:
      [  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
      [  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
      [  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
      [  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
      [  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
      [  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
      [  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
      [  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      ... or depending on the the application, for example this one:
      
      [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
      [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
      [ 1370.054568] PGD 633c94067 PUD 0
      [ 1370.070446] Oops: 0000 [#1] SMP
      [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
      [ 1370.963431] Call Trace:
      [ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
      [ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
      [ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
      [ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
      [ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      With slab debugging enabled, we can see that the poison has been overwritten:
      
      [  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
      [  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
      [  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
      [  669.826424]  __slab_alloc+0x4bf/0x566
      [  669.826433]  __kmalloc+0x280/0x310
      [  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
      [  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
      [  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
      [  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
      [  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
      [  669.826635]  __slab_free+0x39/0x2a8
      [  669.826643]  kfree+0x1d6/0x230
      [  669.826650]  kzfree+0x31/0x40
      [  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
      [  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
      [  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]
      
      Since this only triggers in some collision-cases with AUTH, the problem at
      heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
      when having refcnt 1, once directly in sctp_assoc_update() and yet again
      from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
      the already kzfree'd memory, which is also consistent with the observation
      of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
      at a later point in time when poison is checked on new allocation).
      
      Reference counting of auth keys revisited:
      
      Shared keys for AUTH chunks are being stored in endpoints and associations
      in endpoint_shared_keys list. On endpoint creation, a null key is being
      added; on association creation, all endpoint shared keys are being cached
      and thus cloned over to the association. struct sctp_shared_key only holds
      a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
      keeps track of users internally through refcounting. Naturally, on assoc
      or enpoint destruction, sctp_shared_key are being destroyed directly and
      the reference on sctp_auth_bytes dropped.
      
      User space can add keys to either list via setsockopt(2) through struct
      sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
      adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
      with refcount 1 and in case of replacement drops the reference on the old
      sctp_auth_bytes. A key can be set active from user space through setsockopt()
      on the id via sctp_auth_set_active_key(), which iterates through either
      endpoint_shared_keys and in case of an assoc, invokes (one of various places)
      sctp_auth_asoc_init_active_key().
      
      sctp_auth_asoc_init_active_key() computes the actual secret from local's
      and peer's random, hmac and shared key parameters and returns a new key
      directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
      the reference if there was a previous one. The secret, which where we
      eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
      intitial refcount of 1, which also stays unchanged eventually in
      sctp_assoc_update(). This key is later being used for crypto layer to
      set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().
      
      To close the loop: asoc->asoc_shared_key is freshly allocated secret
      material and independant of the sctp_shared_key management keeping track
      of only shared keys in endpoints and assocs. Hence, also commit 4184b2a7
      ("net: sctp: fix memory leak in auth key management") is independant of
      this bug here since it concerns a different layer (though same structures
      being used eventually). asoc->asoc_shared_key is reference dropped correctly
      on assoc destruction in sctp_association_free() and when active keys are
      being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
      of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
      to remove that sctp_auth_key_put() from there which fixes these panics.
      
      Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      600ddd68
    • E
      flow_dissector: add tipc support · 08bfc9cb
      Erik Hugne 提交于
      The flows are hashed on the sending node address, which allows us
      to spread out the TIPC link processing to RPS enabled cores. There
      is no point to include the destination address in the hash as that
      will always be the same for all inbound links. We have experimented
      with a 3-tuple hash over [srcnode, sport, dport], but this showed to
      give slightly lower performance because of increased lock contention
      when the same link was handled by multiple cores.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08bfc9cb
    • E
      tipc: fix excessive network event logging · 3fa9cacd
      Erik Hugne 提交于
      If a large number of namespaces is spawned on a node and TIPC is
      enabled in each of these, the excessive printk tracing of network
      events will cause the system to grind down to a near halt.
      The traces are still of debug value, so instead of removing them
      completely we fix it by changing the link state and node availability
      logging debug traces.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fa9cacd
    • D
      net: act_bpf: fix size mismatch on filter preparation · fd3e646c
      Daniel Borkmann 提交于
      Similarly as in cls_bpf, also this code needs to reject mismatches.
      
      Reference: http://article.gmane.org/gmane.linux.network/347406
      Fixes: d23b8ad8 ("tc: add BPF based action")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd3e646c
    • D
      net: cls_basic: return from walking on match in basic_get · 1c1bc6bd
      Daniel Borkmann 提交于
      As soon as we've found a matching handle in basic_get(), we can
      return it. There's no need to continue walking until the end of
      a filter chain, since they are unique anyway.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Cc: Thomas Graf <tgraf@suug.ch>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c1bc6bd
    • D
      net: cls_bpf: fix auto generation of per list handles · 3f2ab135
      Daniel Borkmann 提交于
      When creating a bpf classifier in tc with priority collisions and
      invoking automatic unique handle assignment, cls_bpf_grab_new_handle()
      will return a wrong handle id which in fact is non-unique. Usually
      altering of specific filters is being addressed over major id, but
      in case of collisions we result in a filter chain, where handle ids
      address individual cls_bpf_progs inside the classifier.
      
      Issue is, in cls_bpf_grab_new_handle() we probe for head->hgen handle
      in cls_bpf_get() and in case we found a free handle, we're supposed
      to use exactly head->hgen. In case of insufficient numbers of handles,
      we bail out later as handle id 0 is not allowed.
      
      Fixes: 7d1d65cb ("net: sched: cls_bpf: add BPF-based classifier")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f2ab135
    • D
      net: cls_bpf: fix size mismatch on filter preparation · 7913ecf6
      Daniel Borkmann 提交于
      In cls_bpf_modify_existing(), we read out the number of filter blocks,
      do some sanity checks, allocate a block on that size, and copy over the
      BPF instruction blob from user space, then pass everything through the
      classic BPF checker prior to installation of the classifier.
      
      We should reject mismatches here, there are 2 scenarios: the number of
      filter blocks could be smaller than the provided instruction blob, so
      we do a partial copy of the BPF program, and thus the instructions will
      either be rejected from the verifier or a valid BPF program will be run;
      in the other case, we'll end up copying more than we're supposed to,
      and most likely the trailing garbage will be rejected by the verifier
      as well (i.e. we need to fit instruction pattern, ret {A,K} needs to be
      last instruction, load/stores must be correct, etc); in case not, we
      would leak memory when dumping back instruction patterns. The code should
      have only used nla_len() as Dave noted to avoid this from the beginning.
      Anyway, lets fix it by rejecting such load attempts.
      
      Fixes: 7d1d65cb ("net: sched: cls_bpf: add BPF-based classifier")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7913ecf6
    • J
      openvswitch: Add support for unique flow IDs. · 74ed7ab9
      Joe Stringer 提交于
      Previously, flows were manipulated by userspace specifying a full,
      unmasked flow key. This adds significant burden onto flow
      serialization/deserialization, particularly when dumping flows.
      
      This patch adds an alternative way to refer to flows using a
      variable-length "unique flow identifier" (UFID). At flow setup time,
      userspace may specify a UFID for a flow, which is stored with the flow
      and inserted into a separate table for lookup, in addition to the
      standard flow table. Flows created using a UFID must be fetched or
      deleted using the UFID.
      
      All flow dump operations may now be made more terse with OVS_UFID_F_*
      flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
      omit the flow key from a datapath operation if the flow has a
      corresponding UFID. This significantly reduces the time spent assembling
      and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
      enabled, the datapath only returns the UFID and statistics for each flow
      during flow dump, increasing ovs-vswitchd revalidator performance by 40%
      or more.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74ed7ab9
    • J
      openvswitch: Use sw_flow_key_range for key ranges. · 272c2cf8
      Joe Stringer 提交于
      These minor tidyups make a future patch a little tidier.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      272c2cf8
    • J
      openvswitch: Refactor ovs_flow_tbl_insert(). · d29ab6f8
      Joe Stringer 提交于
      Rework so that ovs_flow_tbl_insert() calls flow_{key,mask}_insert().
      This tidies up a future patch.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d29ab6f8
    • J
      openvswitch: Refactor ovs_nla_fill_match(). · 5b4237bb
      Joe Stringer 提交于
      Refactor the ovs_nla_fill_match() function into separate netlink
      serialization functions ovs_nla_put_{unmasked_key,mask}(). Modify
      ovs_nla_put_flow() to handle attribute nesting and expose the 'is_mask'
      parameter - all callers need to nest the flow, and callers have better
      knowledge about whether it is serializing a mask or not.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b4237bb
    • C
      NFC: nfc_disable_se Remove useless blank line at beginning of function · 511e78a3
      Christophe Ricard 提交于
      Remove one useless blank line at beginning of nfc_disable_se function.
      Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com>
      Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
      511e78a3
    • C
      NFC: nfc_enable_se Remove useless blank line at beginning of function · ec068489
      Christophe Ricard 提交于
      Remove one useless blank line at beginning of nfc_enable_se function.
      Signed-off-by: NChristophe Ricard <christophe-h.ricard@st.com>
      Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
      ec068489
  5. 26 1月, 2015 6 次提交