1. 10 5月, 2015 1 次提交
  2. 26 4月, 2015 1 次提交
    • E
      net: fix crash in build_skb() · 2ea2f62c
      Eric Dumazet 提交于
      When I added pfmemalloc support in build_skb(), I forgot netlink
      was using build_skb() with a vmalloc() area.
      
      In this patch I introduce __build_skb() for netlink use,
      and build_skb() is a wrapper handling both skb->head_frag and
      skb->pfmemalloc
      
      This means netlink no longer has to hack skb->head_frag
      
      [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26!
      [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      [ 1567.700067] Dumping ftrace buffer:
      [ 1567.700067]    (ftrace buffer empty)
      [ 1567.700067] Modules linked in:
      [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167
      [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000
      [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3))
      [ 1567.700067] RSP: 0018:ffff8802467779d8  EFLAGS: 00010202
      [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c
      [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049
      [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000
      [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000
      [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000
      [ 1567.700067] FS:  00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000
      [ 1567.700067] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0
      [ 1567.700067] Stack:
      [ 1567.700067]  ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000
      [ 1567.700067]  ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08
      [ 1567.700067]  ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821
      [ 1567.700067] Call Trace:
      [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316)
      [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329)
      [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311)
      [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
      [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
      [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623)
      [ 1567.774369] sock_write_iter (net/socket.c:823)
      [ 1567.774369] ? sock_sendmsg (net/socket.c:806)
      [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491)
      [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249)
      [ 1567.774369] ? default_llseek (fs/read_write.c:487)
      [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701)
      [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4))
      [ 1567.774369] vfs_write (fs/read_write.c:539)
      [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
      [ 1567.774369] ? SyS_read (fs/read_write.c:577)
      [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
      [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636)
      [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
      [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261)
      
      Fixes: 79930f58 ("net: do not deplete pfmemalloc reserve")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ea2f62c
  3. 26 3月, 2015 1 次提交
  4. 25 3月, 2015 1 次提交
  5. 24 3月, 2015 1 次提交
  6. 21 3月, 2015 2 次提交
  7. 19 3月, 2015 1 次提交
  8. 03 3月, 2015 1 次提交
  9. 28 2月, 2015 1 次提交
  10. 05 2月, 2015 1 次提交
  11. 04 2月, 2015 1 次提交
    • A
      netlink: make the check for "send from tx_ring" deterministic · a8866ff6
      Al Viro 提交于
      As it is, zero msg_iovlen means that the first iovec in the kernel
      array of iovecs is left uninitialized, so checking if its ->iov_base
      is NULL is random.  Since the real users of that thing are doing
      sendto(fd, NULL, 0, ...), they are getting msg_iovlen = 1 and
      msg_iov[0] = {NULL, 0}, which is what this test is trying to catch.
      As suggested by davem, let's just check that msg_iovlen was 1 and
      msg_iov[0].iov_base was NULL - _that_ is well-defined and it catches
      what we want to catch.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a8866ff6
  12. 31 1月, 2015 1 次提交
  13. 29 1月, 2015 1 次提交
    • C
      net: remove sock_iocb · 7cc05662
      Christoph Hellwig 提交于
      The sock_iocb structure is allocate on stack for each read/write-like
      operation on sockets, and contains various fields of which only the
      embedded msghdr and sometimes a pointer to the scm_cookie is ever used.
      Get rid of the sock_iocb and put a msghdr directly on the stack and pass
      the scm_cookie explicitly to netlink_mmap_sendmsg.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cc05662
  14. 27 1月, 2015 1 次提交
  15. 17 1月, 2015 1 次提交
    • J
      genetlink: synchronize socket closing and family removal · ee1c2442
      Johannes Berg 提交于
      In addition to the problem Jeff Layton reported, I looked at the code
      and reproduced the same warning by subscribing and removing the genl
      family with a socket still open. This is a fairly tricky race which
      originates in the fact that generic netlink allows the family to go
      away while sockets are still open - unlike regular netlink which has
      a module refcount for every open socket so in general this cannot be
      triggered.
      
      Trying to resolve this issue by the obvious locking isn't possible as
      it will result in deadlocks between unregistration and group unbind
      notification (which incidentally lockdep doesn't find due to the home
      grown locking in the netlink table.)
      
      To really resolve this, introduce a "closing socket" reference counter
      (for generic netlink only, as it's the only affected family) in the
      core netlink code and use that in generic netlink to wait for all the
      sockets that are being closed at the same time as a generic netlink
      family is removed.
      
      This fixes the race that when a socket is closed, it will should call
      the unbind, but if the family is removed at the same time the unbind
      will not find it, leading to the warning. The real problem though is
      that in this case the unbind could actually find a new family that is
      registered to have a multicast group with the same ID, and call its
      mcast_unbind() leading to confusing.
      
      Also remove the warning since it would still trigger, but is now no
      longer a problem.
      
      This also moves the code in af_netlink.c to before unreferencing the
      module to avoid having the same problem in the normal non-genl case.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee1c2442
  16. 16 1月, 2015 1 次提交
  17. 14 1月, 2015 1 次提交
  18. 04 1月, 2015 4 次提交
    • T
      netlink: Lockless lookup with RCU grace period in socket release · 21e4902a
      Thomas Graf 提交于
      Defers the release of the socket reference using call_rcu() to
      allow using an RCU read-side protected call to rhashtable_lookup()
      
      This restores behaviour and performance gains as previously
      introduced by e341694e ("netlink: Convert netlink_lookup() to use
      RCU protected hash table") without the side effect of severely
      delayed socket destruction.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21e4902a
    • T
      rhashtable: Per bucket locks & deferred expansion/shrinking · 97defe1e
      Thomas Graf 提交于
      Introduces an array of spinlocks to protect bucket mutations. The number
      of spinlocks per CPU is configurable and selected based on the hash of
      the bucket. This allows for parallel insertions and removals of entries
      which do not share a lock.
      
      The patch also defers expansion and shrinking to a worker queue which
      allows insertion and removal from atomic context. Insertions and
      deletions may occur in parallel to it and are only held up briefly
      while the particular bucket is linked or unzipped.
      
      Mutations of the bucket table pointer is protected by a new mutex, read
      access is RCU protected.
      
      In the event of an expansion or shrinking, the new bucket table allocated
      is exposed as a so called future table as soon as the resize process
      starts.  Lookups, deletions, and insertions will briefly use both tables.
      The future table becomes the main table after an RCU grace period and
      initial linking of the old to the new table was performed. Optimization
      of the chains to make use of the new number of buckets follows only the
      new table is in use.
      
      The side effect of this is that during that RCU grace period, a bucket
      traversal using any rht_for_each() variant on the main table will not see
      any insertions performed during the RCU grace period which would at that
      point land in the future table. The lookup will see them as it searches
      both tables if needed.
      
      Having multiple insertions and removals occur in parallel requires nelems
      to become an atomic counter.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97defe1e
    • T
      rhashtable: Convert bucket iterators to take table and index · 88d6ed15
      Thomas Graf 提交于
      This patch is in preparation to introduce per bucket spinlocks. It
      extends all iterator macros to take the bucket table and bucket
      index. It also introduces a new rht_dereference_bucket() to
      handle protected accesses to buckets.
      
      It introduces a barrier() to the RCU iterators to the prevent
      the compiler from caching the first element.
      
      The lockdep verifier is introduced as stub which always succeeds
      and properly implement in the next patch when the locks are
      introduced.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88d6ed15
    • T
      rhashtable: Do hashing inside of rhashtable_lookup_compare() · 8d24c0b4
      Thomas Graf 提交于
      Hash the key inside of rhashtable_lookup_compare() like
      rhashtable_lookup() does. This allows to simplify the hashing
      functions and keep them private.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d24c0b4
  19. 27 12月, 2014 4 次提交
  20. 19 12月, 2014 2 次提交
    • T
      netlink: Don't reorder loads/stores before marking mmap netlink frame as available · a18e6a18
      Thomas Graf 提交于
      Each mmap Netlink frame contains a status field which indicates
      whether the frame is unused, reserved, contains data or needs to
      be skipped. Both loads and stores may not be reordeded and must
      complete before the status field is changed and another CPU might
      pick up the frame for use. Use an smp_mb() to cover needs of both
      types of callers to netlink_set_status(), callers which have been
      reading data frame from the frame, and callers which have been
      filling or releasing and thus writing to the frame.
      
      - Example code path requiring a smp_rmb():
        memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
        netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
      
      - Example code path requiring a smp_wmb():
        hdr->nm_uid	= from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid);
        hdr->nm_gid	= from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid);
        netlink_frame_flush_dcache(hdr);
        netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
      
      Fixes: f9c228 ("netlink: implement memory mapped recvmsg()")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a18e6a18
    • D
      netlink: Always copy on mmap TX. · 4682a035
      David Miller 提交于
      Checking the file f_count and the nlk->mapped count is not completely
      sufficient to prevent the mmap'd area contents from changing from
      under us during netlink mmap sendmsg() operations.
      
      Be careful to sample the header's length field only once, because this
      could change from under us as well.
      
      Fixes: 5fd96123 ("netlink: implement memory mapped sendmsg()")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      4682a035
  21. 11 12月, 2014 1 次提交
  22. 10 12月, 2014 1 次提交
    • A
      put iov_iter into msghdr · c0371da6
      Al Viro 提交于
      Note that the code _using_ ->msg_iter at that point will be very
      unhappy with anything other than unshifted iovec-backed iov_iter.
      We still need to convert users to proper primitives.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c0371da6
  23. 24 11月, 2014 1 次提交
  24. 20 11月, 2014 1 次提交
  25. 14 11月, 2014 3 次提交
  26. 13 11月, 2014 1 次提交
  27. 06 11月, 2014 1 次提交
    • D
      net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b
      David S. Miller 提交于
      This encapsulates all of the skb_copy_datagram_iovec() callers
      with call argument signature "skb, offset, msghdr->msg_iov, length".
      
      When we move to iov_iters in the networking, the iov_iter object will
      sit in the msghdr.
      
      Having a helper like this means there will be less places to touch
      during that transformation.
      
      Based upon descriptions and patch from Al Viro.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51f3d02b
  28. 22 10月, 2014 1 次提交
  29. 09 10月, 2014 1 次提交
    • A
      fix misuses of f_count() in ppp and netlink · 24dff96a
      Al Viro 提交于
      we used to check for "nobody else could start doing anything with
      that opened file" by checking that refcount was 2 or less - one
      for descriptor table and one we'd acquired in fget() on the way to
      wherever we are.  That was race-prone (somebody else might have
      had a reference to descriptor table and do fget() just as we'd
      been checking) and it had become flat-out incorrect back when
      we switched to fget_light() on those codepaths - unlike fget(),
      it doesn't grab an extra reference unless the descriptor table
      is shared.  The same change allowed a race-free check, though -
      we are safe exactly when refcount is less than 2.
      
      It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
      to ppp one) and 2.6.17 for sendmsg() (netlink one).  OTOH,
      netlink hadn't grown that check until 3.9 and ppp used to live
      in drivers/net, not drivers/net/ppp until 3.1.  The bug existed
      well before that, though, and the same fix used to apply in old
      location of file.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      24dff96a
  30. 15 8月, 2014 1 次提交