1. 27 12月, 2014 1 次提交
  2. 19 12月, 2014 2 次提交
    • T
      netlink: Don't reorder loads/stores before marking mmap netlink frame as available · a18e6a18
      Thomas Graf 提交于
      Each mmap Netlink frame contains a status field which indicates
      whether the frame is unused, reserved, contains data or needs to
      be skipped. Both loads and stores may not be reordeded and must
      complete before the status field is changed and another CPU might
      pick up the frame for use. Use an smp_mb() to cover needs of both
      types of callers to netlink_set_status(), callers which have been
      reading data frame from the frame, and callers which have been
      filling or releasing and thus writing to the frame.
      
      - Example code path requiring a smp_rmb():
        memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
        netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
      
      - Example code path requiring a smp_wmb():
        hdr->nm_uid	= from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid);
        hdr->nm_gid	= from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid);
        netlink_frame_flush_dcache(hdr);
        netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
      
      Fixes: f9c228 ("netlink: implement memory mapped recvmsg()")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a18e6a18
    • D
      netlink: Always copy on mmap TX. · 4682a035
      David Miller 提交于
      Checking the file f_count and the nlk->mapped count is not completely
      sufficient to prevent the mmap'd area contents from changing from
      under us during netlink mmap sendmsg() operations.
      
      Be careful to sample the header's length field only once, because this
      could change from under us as well.
      
      Fixes: 5fd96123 ("netlink: implement memory mapped sendmsg()")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      4682a035
  3. 11 12月, 2014 1 次提交
  4. 10 12月, 2014 1 次提交
    • A
      put iov_iter into msghdr · c0371da6
      Al Viro 提交于
      Note that the code _using_ ->msg_iter at that point will be very
      unhappy with anything other than unshifted iovec-backed iov_iter.
      We still need to convert users to proper primitives.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c0371da6
  5. 24 11月, 2014 1 次提交
  6. 20 11月, 2014 1 次提交
  7. 14 11月, 2014 3 次提交
  8. 13 11月, 2014 1 次提交
  9. 06 11月, 2014 1 次提交
    • D
      net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b
      David S. Miller 提交于
      This encapsulates all of the skb_copy_datagram_iovec() callers
      with call argument signature "skb, offset, msghdr->msg_iov, length".
      
      When we move to iov_iters in the networking, the iov_iter object will
      sit in the msghdr.
      
      Having a helper like this means there will be less places to touch
      during that transformation.
      
      Based upon descriptions and patch from Al Viro.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51f3d02b
  10. 22 10月, 2014 1 次提交
  11. 09 10月, 2014 1 次提交
    • A
      fix misuses of f_count() in ppp and netlink · 24dff96a
      Al Viro 提交于
      we used to check for "nobody else could start doing anything with
      that opened file" by checking that refcount was 2 or less - one
      for descriptor table and one we'd acquired in fget() on the way to
      wherever we are.  That was race-prone (somebody else might have
      had a reference to descriptor table and do fget() just as we'd
      been checking) and it had become flat-out incorrect back when
      we switched to fget_light() on those codepaths - unlike fget(),
      it doesn't grab an extra reference unless the descriptor table
      is shared.  The same change allowed a race-free check, though -
      we are safe exactly when refcount is less than 2.
      
      It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
      to ppp one) and 2.6.17 for sendmsg() (netlink one).  OTOH,
      netlink hadn't grown that check until 3.9 and ppp used to live
      in drivers/net, not drivers/net/ppp until 3.1.  The bug existed
      well before that, though, and the same fix used to apply in old
      location of file.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      24dff96a
  12. 15 8月, 2014 1 次提交
  13. 08 8月, 2014 1 次提交
  14. 07 8月, 2014 1 次提交
  15. 05 8月, 2014 1 次提交
  16. 03 8月, 2014 1 次提交
    • T
      netlink: Convert netlink_lookup() to use RCU protected hash table · e341694e
      Thomas Graf 提交于
      Heavy Netlink users such as Open vSwitch spend a considerable amount of
      time in netlink_lookup() due to the read-lock on nl_table_lock. Use of
      RCU relieves the lock contention.
      
      Makes use of the new resizable hash table to avoid locking on the
      lookup.
      
      The hash table will grow if entries exceeds 75% of table size up to a
      total table size of 64K. It will automatically shrink if usage falls
      below 30%.
      
      Also splits nl_table_lock into a separate mutex to protect hash table
      mutations and allow synchronize_rcu() to sleep while waiting for readers
      during expansion and shrinking.
      
      Before:
         9.16%  kpktgend_0  [openvswitch]      [k] masked_flow_lookup
         6.42%  kpktgend_0  [pktgen]           [k] mod_cur_headers
         6.26%  kpktgend_0  [pktgen]           [k] pktgen_thread_worker
         6.23%  kpktgend_0  [kernel.kallsyms]  [k] memset
         4.79%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
         4.37%  kpktgend_0  [kernel.kallsyms]  [k] memcpy
         3.60%  kpktgend_0  [openvswitch]      [k] ovs_flow_extract
         2.69%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
      
      After:
        15.26%  kpktgend_0  [openvswitch]      [k] masked_flow_lookup
         8.12%  kpktgend_0  [pktgen]           [k] pktgen_thread_worker
         7.92%  kpktgend_0  [pktgen]           [k] mod_cur_headers
         5.11%  kpktgend_0  [kernel.kallsyms]  [k] memset
         4.11%  kpktgend_0  [openvswitch]      [k] ovs_flow_extract
         4.06%  kpktgend_0  [kernel.kallsyms]  [k] _raw_spin_lock
         3.90%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
         [...]
         0.67%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Reviewed-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e341694e
  17. 01 8月, 2014 1 次提交
  18. 17 7月, 2014 1 次提交
  19. 10 7月, 2014 1 次提交
    • B
      netlink: Fix handling of error from netlink_dump(). · ac30ef83
      Ben Pfaff 提交于
      netlink_dump() returns a negative errno value on error.  Until now,
      netlink_recvmsg() directly recorded that negative value in sk->sk_err, but
      that's wrong since sk_err takes positive errno values.  (This manifests as
      userspace receiving a positive return value from the recv() system call,
      falsely indicating success.) This bug was introduced in the commit that
      started checking the netlink_dump() return value, commit b44d211e (netlink:
      handle errors from netlink_dump()).
      
      Multithreaded Netlink dumps are one way to trigger this behavior in
      practice, as described in the commit message for the userspace workaround
      posted here:
          http://openvswitch.org/pipermail/dev/2014-June/042339.html
      
      This commit also fixes the same bug in netlink_poll(), introduced in commit
      cd1df525 (netlink: add flow control for memory mapped I/O).
      Signed-off-by: NBen Pfaff <blp@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac30ef83
  20. 08 7月, 2014 1 次提交
  21. 03 6月, 2014 1 次提交
    • E
      netlink: Only check file credentials for implicit destinations · 2d7a85f4
      Eric W. Biederman 提交于
      It was possible to get a setuid root or setcap executable to write to
      it's stdout or stderr (which has been set made a netlink socket) and
      inadvertently reconfigure the networking stack.
      
      To prevent this we check that both the creator of the socket and
      the currentl applications has permission to reconfigure the network
      stack.
      
      Unfortunately this breaks Zebra which always uses sendto/sendmsg
      and creates it's socket without any privileges.
      
      To keep Zebra working don't bother checking if the creator of the
      socket has privilege when a destination address is specified.  Instead
      rely exclusively on the privileges of the sender of the socket.
      
      Note from Andy: This is exactly Eric's code except for some comment
      clarifications and formatting fixes.  Neither I nor, I think, anyone
      else is thrilled with this approach, but I'm hesitant to wait on a
      better fix since 3.15 is almost here.
      
      Note to stable maintainers: This is a mess.  An earlier series of
      patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
      but they did so in a way that breaks Zebra.  The offending series
      includes:
      
          commit aa4cf945
          Author: Eric W. Biederman <ebiederm@xmission.com>
          Date:   Wed Apr 23 14:28:03 2014 -0700
      
              net: Add variants of capable for use on netlink messages
      
      If a given kernel version is missing that series of fixes, it's
      probably worth backporting it and this patch.  if that series is
      present, then this fix is critical if you care about Zebra.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d7a85f4
  22. 25 4月, 2014 2 次提交
  23. 23 4月, 2014 2 次提交
  24. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  25. 11 3月, 2014 1 次提交
    • E
      netlink: autosize skb lengthes · 9063e21f
      Eric Dumazet 提交于
      One known problem with netlink is the fact that NLMSG_GOODSIZE is
      really small on PAGE_SIZE==4096 architectures, and it is difficult
      to know in advance what buffer size is used by the application.
      
      This patch adds an automatic learning of the size.
      
      First netlink message will still be limited to ~4K, but if user used
      bigger buffers, then following messages will be able to use up to 16KB.
      
      This speedups dump() operations by a large factor and should be safe
      for legacy applications.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9063e21f
  26. 26 2月, 2014 1 次提交
  27. 18 2月, 2014 1 次提交
  28. 19 1月, 2014 1 次提交
  29. 07 1月, 2014 1 次提交
  30. 02 1月, 2014 1 次提交
  31. 01 1月, 2014 2 次提交
    • D
      netlink: specify netlink packet direction for nlmon · 604d13c9
      Daniel Borkmann 提交于
      In order to facilitate development for netlink protocol dissector,
      fill the unused field skb->pkt_type of the cloned skb with a hint
      of the address space of the new owner (receiver) socket in the
      notion of "to kernel" resp. "to user".
      
      At the time we invoke __netlink_deliver_tap_skb(), we already have
      set the new skb owner via netlink_skb_set_owner_r(), so we can use
      that for netlink_is_kernel() probing.
      
      In normal PF_PACKET network traffic, this field denotes if the
      packet is destined for us (PACKET_HOST), if it's broadcast
      (PACKET_BROADCAST), etc.
      
      As we only have 3 bit reserved, we can use the value (= 6) of
      PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
      and not supported anywhere, and packets of such type were never
      exposed to user space, so there are no overlapping users of such
      kind. Thus, as wished, that seems the only way to make both
      PACKET_* values non-overlapping and therefore device agnostic.
      
      By using those two flags for netlink skbs on nlmon devices, they
      can be made available and picked up via sll_pkttype (previously
      unused in netlink context) in struct sockaddr_ll. We now have
      these two directions:
      
       - PACKET_USER (= 6)    ->  to user space
       - PACKET_KERNEL (= 7)  ->  to kernel space
      
      Partial `ip a` example strace for sa_family=AF_NETLINK with
      detected nl msg direction:
      
      syscall:                     direction:
      sendto(3,  ...) = 40         /* to kernel */
      recvmsg(3, ...) = 3404       /* to user */
      recvmsg(3, ...) = 1120       /* to user */
      recvmsg(3, ...) = 20         /* to user */
      sendto(3,  ...) = 40         /* to kernel */
      recvmsg(3, ...) = 168        /* to user */
      recvmsg(3, ...) = 144        /* to user */
      recvmsg(3, ...) = 20         /* to user */
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      604d13c9
    • D
      netlink: only do not deliver to tap when both sides are kernel sks · 73bfd370
      Daniel Borkmann 提交于
      We should also deliver packets to nlmon devices when we are in
      netlink_unicast_kernel(), and only one of the {src,dst} sockets
      is user sk and the other one kernel sk. That's e.g. the case in
      netlink diag, netlink route, etc. Still, forbid to deliver messages
      from kernel to kernel sks.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73bfd370
  32. 21 11月, 2013 1 次提交
    • H
      net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426
      Hannes Frederic Sowa 提交于
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3d33426
  33. 20 11月, 2013 1 次提交
  34. 07 9月, 2013 1 次提交