1. 04 6月, 2014 18 次提交
  2. 29 5月, 2014 1 次提交
    • D
      net, sunrpc: suppress allocation warning in rpc_malloc() · c6c8fe79
      David Rientjes 提交于
      rpc_malloc() allocates with GFP_NOWAIT without making any attempt at
      reclaim so it easily fails when low on memory.  This ends up spamming the
      kernel log:
      
      SLAB: Unable to allocate memory on node 0 (gfp=0x4000)
        cache: kmalloc-8192, object size: 8192, order: 1
        node 0: slabs: 207/207, objs: 207/207, free: 0
      rekonq: page allocation failure: order:1, mode:0x204000
      CPU: 2 PID: 14321 Comm: rekonq Tainted: G           O  3.15.0-rc3-12.gfc9498b-desktop+ #6
      Hardware name: System manufacturer System Product Name/M4A785TD-V EVO, BIOS 2105    07/23/2010
       0000000000000000 ffff880010ff17d0 ffffffff815e693c 0000000000204000
       ffff880010ff1858 ffffffff81137bd2 0000000000000000 0000001000000000
       ffff88011ffebc38 0000000000000001 0000000000204000 ffff88011ffea000
      Call Trace:
       [<ffffffff815e693c>] dump_stack+0x4d/0x6f
       [<ffffffff81137bd2>] warn_alloc_failed+0xd2/0x140
       [<ffffffff8113be19>] __alloc_pages_nodemask+0x7e9/0xa30
       [<ffffffff811824a8>] kmem_getpages+0x58/0x140
       [<ffffffff81183de6>] fallback_alloc+0x1d6/0x210
       [<ffffffff81183be3>] ____cache_alloc_node+0x123/0x150
       [<ffffffff81185953>] __kmalloc+0x203/0x490
       [<ffffffffa06b0ee2>] rpc_malloc+0x32/0xa0 [sunrpc]
       [<ffffffffa06a6999>] call_allocate+0xb9/0x170 [sunrpc]
       [<ffffffffa06b19d8>] __rpc_execute+0x88/0x460 [sunrpc]
       [<ffffffffa06b2da9>] rpc_execute+0x59/0xc0 [sunrpc]
       [<ffffffffa06a932b>] rpc_run_task+0x6b/0x90 [sunrpc]
       [<ffffffffa077b5c1>] nfs4_call_sync_sequence+0x51/0x80 [nfsv4]
       [<ffffffffa077d45d>] _nfs4_do_setattr+0x1ed/0x280 [nfsv4]
       [<ffffffffa0782a72>] nfs4_do_setattr+0x72/0x180 [nfsv4]
       [<ffffffffa078334c>] nfs4_proc_setattr+0xbc/0x140 [nfsv4]
       [<ffffffffa074a7e8>] nfs_setattr+0xd8/0x240 [nfs]
       [<ffffffff811baa71>] notify_change+0x231/0x380
       [<ffffffff8119cf5c>] chmod_common+0xfc/0x120
       [<ffffffff8119df80>] SyS_chmod+0x40/0x90
       [<ffffffff815f4cfd>] system_call_fastpath+0x1a/0x1f
      ...
      
      If the allocation fails, simply return NULL and avoid spamming the kernel
      log.
      Reported-by: NMarc Dietrich <marvin24@gmx.de>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      c6c8fe79
  3. 19 5月, 2014 1 次提交
  4. 13 4月, 2014 2 次提交
    • N
      vti: don't allow to add the same tunnel twice · 8d89dcdf
      Nicolas Dichtel 提交于
      Before the patch, it was possible to add two times the same tunnel:
      ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41
      ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41
      
      It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
      argument dev->type, which was set only later (when calling ndo_init handler
      in register_netdevice()). Let's set this type in the setup handler, which is
      called before newlink handler.
      
      Introduced by commit b9959fd3 ("vti: switch to new ip tunnel code").
      
      CC: Cong Wang <amwang@redhat.com>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d89dcdf
    • N
      gre: don't allow to add the same tunnel twice · 5a455275
      Nicolas Dichtel 提交于
      Before the patch, it was possible to add two times the same tunnel:
      ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249
      ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249
      
      It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
      argument dev->type, which was set only later (when calling ndo_init handler
      in register_netdevice()). Let's set this type in the setup handler, which is
      called before newlink handler.
      
      Introduced by commit c5441932 ("GRE: Refactor GRE tunneling code.").
      
      CC: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a455275
  5. 12 4月, 2014 4 次提交
    • D
      pktgen: be friendly to LLTX devices · 0f2eea4b
      Daniel Borkmann 提交于
      Similarly to commit 43279500 ("packet: respect devices with
      LLTX flag in direct xmit"), we can basically apply the very same
      to pktgen. This will help testing against LLTX devices such as
      dummy driver (or others), which only have a single netdevice txq
      and would otherwise require locking their txq from pktgen side
      while e.g. in dummy case, we would not need any locking. Fix this
      by making use of HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will
      be respected.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f2eea4b
    • L
      net: ipv6: Fix oif in TCP SYN+ACK route lookup. · a36dbdb2
      Lorenzo Colitti 提交于
      net-next commit 9c76a114, ipv6: tcp_ipv6 policy route issue, had
      a boolean logic error that caused incorrect behaviour for TCP
      SYN+ACK when oif-based rules are in use. Specifically:
      
      1. If a SYN comes in from a global address, and sk_bound_dev_if
         is not set, the routing lookup has oif set to the interface
         the SYN came in on. Instead, it should have oif unset,
         because for global addresses, the incoming interface doesn't
         necessarily have any bearing on the interface the SYN+ACK is
         sent out on.
      2. If a SYN comes in from a link-local address, and
         sk_bound_dev_if is set, the routing lookup has oif set to the
         interface the SYN came in on. Instead, it should have oif set
         to sk_bound_dev_if, because that's what the application
         requested.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a36dbdb2
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
    • T
      bridge: Fix double free and memory leak around br_allowed_ingress · eb707618
      Toshiaki Makita 提交于
      br_allowed_ingress() has two problems.
      
      1. If br_allowed_ingress() is called by br_handle_frame_finish() and
      vlan_untag() in br_allowed_ingress() fails, skb will be freed by both
      vlan_untag() and br_handle_frame_finish().
      
      2. If br_allowed_ingress() is called by br_dev_xmit() and
      br_allowed_ingress() fails, the skb will not be freed.
      
      Fix these two problems by freeing the skb in br_allowed_ingress()
      if it fails.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb707618
  6. 11 4月, 2014 1 次提交
  7. 10 4月, 2014 2 次提交
    • D
      l2tp: take PMTU from tunnel UDP socket · f34c4a35
      Dmitry Petukhov 提交于
      When l2tp driver tries to get PMTU for the tunnel destination, it uses
      the pointer to struct sock that represents PPPoX socket, while it
      should use the pointer that represents UDP socket of the tunnel.
      Signed-off-by: NDmitry Petukhov <dmgenp@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f34c4a35
    • D
      net: sctp: test if association is dead in sctp_wake_up_waiters · 1e1cdf8a
      Daniel Borkmann 提交于
      In function sctp_wake_up_waiters(), we need to involve a test
      if the association is declared dead. If so, we don't have any
      reference to a possible sibling association anymore and need
      to invoke sctp_write_space() instead, and normally walk the
      socket's associations and notify them of new wmem space. The
      reason for special casing is that otherwise, we could run
      into the following issue when a sctp_primitive_SEND() call
      from sctp_sendmsg() fails, and tries to flush an association's
      outq, i.e. in the following way:
      
      sctp_association_free()
      `-> list_del(&asoc->asocs)         <-- poisons list pointer
          asoc->base.dead = true
          sctp_outq_free(&asoc->outqueue)
          `-> __sctp_outq_teardown()
           `-> sctp_chunk_free()
            `-> consume_skb()
             `-> sctp_wfree()
              `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers
                                             if asoc->ep->sndbuf_policy=0
      
      Therefore, only walk the list in an 'optimized' way if we find
      that the current association is still active. We could also use
      list_del_init() in addition when we call sctp_association_free(),
      but as Vlad suggests, we want to trap such bugs and thus leave
      it poisoned as is.
      
      Why is it safe to resolve the issue by testing for asoc->base.dead?
      Parallel calls to sctp_sendmsg() are protected under socket lock,
      that is lock_sock()/release_sock(). Only within that path under
      lock held, we're setting skb/chunk owner via sctp_set_owner_w().
      Eventually, chunks are freed directly by an association still
      under that lock. So when traversing association list on destruction
      time from sctp_wake_up_waiters() via sctp_wfree(), a different
      CPU can't be running sctp_wfree() while another one calls
      sctp_association_free() as both happens under the same lock.
      Therefore, this can also not race with setting/testing against
      asoc->base.dead as we are guaranteed for this to happen in order,
      under lock. Further, Vlad says: the times we check asoc->base.dead
      is when we've cached an association pointer for later processing.
      In between cache and processing, the association may have been
      freed and is simply still around due to reference counts. We check
      asoc->base.dead under a lock, so it should always be safe to check
      and not race against sctp_association_free(). Stress-testing seems
      fine now, too.
      
      Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e1cdf8a
  8. 09 4月, 2014 1 次提交
    • D
      net: sctp: wake up all assocs if sndbuf policy is per socket · 52c35bef
      Daniel Borkmann 提交于
      SCTP charges chunks for wmem accounting via skb->truesize in
      sctp_set_owner_w(), and sctp_wfree() respectively as the
      reverse operation. If a sender runs out of wmem, it needs to
      wait via sctp_wait_for_sndbuf(), and gets woken up by a call
      to __sctp_write_space() mostly via sctp_wfree().
      
      __sctp_write_space() is being called per association. Although
      we assign sk->sk_write_space() to sctp_write_space(), which
      is then being done per socket, it is only used if send space
      is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
      is set and therefore not invoked in sock_wfree().
      
      Commit 4c3a5bda ("sctp: Don't charge for data in sndbuf
      again when transmitting packet") fixed an issue where in case
      sctp_packet_transmit() manages to queue up more than sndbuf
      bytes, sctp_wait_for_sndbuf() will never be woken up again
      unless it is interrupted by a signal. However, a still
      remaining issue is that if net.sctp.sndbuf_policy=0, that is
      accounting per socket, and one-to-many sockets are in use,
      the reclaimed write space from sctp_wfree() is 'unfairly'
      handed back on the server to the association that is the lucky
      one to be woken up again via __sctp_write_space(), while
      the remaining associations are never be woken up again
      (unless by a signal).
      
      The effect disappears with net.sctp.sndbuf_policy=1, that
      is wmem accounting per association, as it guarantees a fair
      share of wmem among associations.
      
      Therefore, if we have reclaimed memory in case of per socket
      accounting, wake all related associations to a socket in a
      fair manner, that is, traverse the socket association list
      starting from the current neighbour of the association and
      issue a __sctp_write_space() to everyone until we end up
      waking ourselves. This guarantees that no association is
      preferred over another and even if more associations are
      taken into the one-to-many session, all receivers will get
      messages from the server and are not stalled forever on
      high load. This setting still leaves the advantage of per
      socket accounting in touch as an association can still use
      up global limits if unused by others.
      
      Fixes: 4eb701df ("[SCTP] Fix SCTP sendbuffer accouting.")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52c35bef
  9. 08 4月, 2014 6 次提交
    • C
      net: replace __this_cpu_inc in route.c with raw_cpu_inc · 3ed66e91
      Christoph Lameter 提交于
      The RT_CACHE_STAT_INC macro triggers the new preemption checks
      for __this_cpu ops.
      
      I do not see any other synchronization that would allow the use of a
      __this_cpu operation here however in commit dbd2915c ("[IPV4]:
      RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of
      raw_smp_processor_id() here because "we do not care" about races.  In
      the past we agreed that the price of disabling interrupts here to get
      consistent counters would be too high.  These counters may be inaccurate
      due to race conditions.
      
      The use of __this_cpu op improves the situation already from what commit
      dbd2915c did since the single instruction emitted on x86 does not
      allow the race to occur anymore.  However, non x86 platforms could still
      experience a race here.
      
      Trace:
      
        __this_cpu_add operation in preemptible [00000000] code: avahi-daemon/1193
        caller is __this_cpu_preempt_check+0x38/0x60
        CPU: 1 PID: 1193 Comm: avahi-daemon Tainted: GF            3.12.0-rc4+ #187
        Call Trace:
          check_preemption_disabled+0xec/0x110
          __this_cpu_preempt_check+0x38/0x60
          __ip_route_output_key+0x575/0x8c0
          ip_route_output_flow+0x27/0x70
          udp_sendmsg+0x825/0xa20
          inet_sendmsg+0x85/0xc0
          sock_sendmsg+0x9c/0xd0
          ___sys_sendmsg+0x37c/0x390
          __sys_sendmsg+0x49/0x90
          SyS_sendmsg+0x12/0x20
          tracesys+0xe1/0xe6
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ed66e91
    • V
      netdev: remove potentially harmful checks · 6859e7df
      Veaceslav Falico 提交于
      Currently we're checking a variable for != NULL after actually
      dereferencing it, in netdev_lower_get_next_private*().
      
      It's counter-intuitive at best, and can lead to faulty usage (as it implies
      that the variable can be NULL), so fix it by removing the useless checks.
      Reported-by: NDaniel Borkmann <dborkman@redhat.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: stephen hemminger <stephen@networkplumber.org>
      CC: Jerry Chu <hkchu@google.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6859e7df
    • D
      pktgen: fix xmit test for BQL enabled devices · 6f25cd47
      Daniel Borkmann 提交于
      Similarly as in commit 8e2f1a63 ("packet: fix packet_direct_xmit
      for BQL enabled drivers"), we test for __QUEUE_STATE_STACK_XOFF bit
      in pktgen's xmit, which would not fully fill the device's TX ring for
      BQL drivers that use netdev_tx_sent_queue(). Fix is to use, similarly
      as we do in packet sockets, netif_xmit_frozen_or_drv_stopped() test.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f25cd47
    • G
      tipc: Let tipc_release() return 0 · 065d7e39
      Geert Uytterhoeven 提交于
      net/tipc/socket.c: In function ‘tipc_release’:
      net/tipc/socket.c:352: warning: ‘res’ is used uninitialized in this function
      
      Introduced by commit 24be34b5 ("tipc:
      eliminate upcall function pointers between port and socket"), which
      removed the sole initializer of "res".
      
      Just return 0 to fix it.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      065d7e39
    • J
      mac802154: fix duplicate #include headers · 6c6a9855
      Jean Sacren 提交于
      The commit e6278d92 ("mac802154: use header operations to
      create/parse headers") included the header
      
      		net/ieee802154_netdev.h
      
      which had been included by the commit b70ab2e8 ("ieee802154:
      enforce consistent endianness in the 802.15.4 stack"). Fix this
      duplicate #include by deleting the latter one as the required header
      has already been in place.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Cc: linux-zigbee-devel@lists.sourceforge.net
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c6a9855
    • D
      net: filter: be more defensive on div/mod by X==0 · 5f9fde5f
      Daniel Borkmann 提交于
      The old interpreter behaviour was that we returned with 0
      whenever we found a division by 0 would take place. In the new
      interpreter we would currently just skip that instead and
      continue execution.
      
      It's true that a value of 0 as return might not be appropriate
      in all cases, but current users (socket filters -> drop
      packet, seccomp -> SECCOMP_RET_KILL, cls_bpf -> unclassified,
      etc) seem fine with that behaviour. Better this than undefined
      BPF program behaviour as it's expected that A contains the
      result of the division. In future, as more use cases open up,
      we could further adapt this return value to our needs, if
      necessary.
      
      So reintroduce return of 0 for division by 0 as in the old
      interpreter. Also in case of K which is guaranteed to be 32bit
      wide, sk_chk_filter() already takes care of preventing division
      by 0 invoked through K, so we can generally spare us these tests.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Reviewed-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f9fde5f
  10. 05 4月, 2014 4 次提交