1. 15 8月, 2014 1 次提交
  2. 08 8月, 2014 1 次提交
  3. 07 8月, 2014 1 次提交
  4. 05 8月, 2014 1 次提交
  5. 03 8月, 2014 1 次提交
    • T
      netlink: Convert netlink_lookup() to use RCU protected hash table · e341694e
      Thomas Graf 提交于
      Heavy Netlink users such as Open vSwitch spend a considerable amount of
      time in netlink_lookup() due to the read-lock on nl_table_lock. Use of
      RCU relieves the lock contention.
      
      Makes use of the new resizable hash table to avoid locking on the
      lookup.
      
      The hash table will grow if entries exceeds 75% of table size up to a
      total table size of 64K. It will automatically shrink if usage falls
      below 30%.
      
      Also splits nl_table_lock into a separate mutex to protect hash table
      mutations and allow synchronize_rcu() to sleep while waiting for readers
      during expansion and shrinking.
      
      Before:
         9.16%  kpktgend_0  [openvswitch]      [k] masked_flow_lookup
         6.42%  kpktgend_0  [pktgen]           [k] mod_cur_headers
         6.26%  kpktgend_0  [pktgen]           [k] pktgen_thread_worker
         6.23%  kpktgend_0  [kernel.kallsyms]  [k] memset
         4.79%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
         4.37%  kpktgend_0  [kernel.kallsyms]  [k] memcpy
         3.60%  kpktgend_0  [openvswitch]      [k] ovs_flow_extract
         2.69%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
      
      After:
        15.26%  kpktgend_0  [openvswitch]      [k] masked_flow_lookup
         8.12%  kpktgend_0  [pktgen]           [k] pktgen_thread_worker
         7.92%  kpktgend_0  [pktgen]           [k] mod_cur_headers
         5.11%  kpktgend_0  [kernel.kallsyms]  [k] memset
         4.11%  kpktgend_0  [openvswitch]      [k] ovs_flow_extract
         4.06%  kpktgend_0  [kernel.kallsyms]  [k] _raw_spin_lock
         3.90%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
         [...]
         0.67%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Reviewed-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e341694e
  6. 01 8月, 2014 1 次提交
  7. 17 7月, 2014 1 次提交
  8. 10 7月, 2014 1 次提交
    • B
      netlink: Fix handling of error from netlink_dump(). · ac30ef83
      Ben Pfaff 提交于
      netlink_dump() returns a negative errno value on error.  Until now,
      netlink_recvmsg() directly recorded that negative value in sk->sk_err, but
      that's wrong since sk_err takes positive errno values.  (This manifests as
      userspace receiving a positive return value from the recv() system call,
      falsely indicating success.) This bug was introduced in the commit that
      started checking the netlink_dump() return value, commit b44d211e (netlink:
      handle errors from netlink_dump()).
      
      Multithreaded Netlink dumps are one way to trigger this behavior in
      practice, as described in the commit message for the userspace workaround
      posted here:
          http://openvswitch.org/pipermail/dev/2014-June/042339.html
      
      This commit also fixes the same bug in netlink_poll(), introduced in commit
      cd1df525 (netlink: add flow control for memory mapped I/O).
      Signed-off-by: NBen Pfaff <blp@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac30ef83
  9. 08 7月, 2014 1 次提交
  10. 03 6月, 2014 2 次提交
    • E
      netlink: Only check file credentials for implicit destinations · 2d7a85f4
      Eric W. Biederman 提交于
      It was possible to get a setuid root or setcap executable to write to
      it's stdout or stderr (which has been set made a netlink socket) and
      inadvertently reconfigure the networking stack.
      
      To prevent this we check that both the creator of the socket and
      the currentl applications has permission to reconfigure the network
      stack.
      
      Unfortunately this breaks Zebra which always uses sendto/sendmsg
      and creates it's socket without any privileges.
      
      To keep Zebra working don't bother checking if the creator of the
      socket has privilege when a destination address is specified.  Instead
      rely exclusively on the privileges of the sender of the socket.
      
      Note from Andy: This is exactly Eric's code except for some comment
      clarifications and formatting fixes.  Neither I nor, I think, anyone
      else is thrilled with this approach, but I'm hesitant to wait on a
      better fix since 3.15 is almost here.
      
      Note to stable maintainers: This is a mess.  An earlier series of
      patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
      but they did so in a way that breaks Zebra.  The offending series
      includes:
      
          commit aa4cf945
          Author: Eric W. Biederman <ebiederm@xmission.com>
          Date:   Wed Apr 23 14:28:03 2014 -0700
      
              net: Add variants of capable for use on netlink messages
      
      If a given kernel version is missing that series of fixes, it's
      probably worth backporting it and this patch.  if that series is
      present, then this fix is critical if you care about Zebra.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d7a85f4
    • D
      genetlink: remove superfluous assignment · 2f91abd4
      Denis ChengRq 提交于
      the local variable ops and n_ops were just read out from family,
      and not changed, hence no need to assign back.
      
      Validation functions should operate on const parameters and not
      change anything.
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f91abd4
  11. 25 4月, 2014 3 次提交
  12. 23 4月, 2014 2 次提交
  13. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  14. 11 3月, 2014 1 次提交
    • E
      netlink: autosize skb lengthes · 9063e21f
      Eric Dumazet 提交于
      One known problem with netlink is the fact that NLMSG_GOODSIZE is
      really small on PAGE_SIZE==4096 architectures, and it is difficult
      to know in advance what buffer size is used by the application.
      
      This patch adds an automatic learning of the size.
      
      First netlink message will still be limited to ~4K, but if user used
      bigger buffers, then following messages will be able to use up to 16KB.
      
      This speedups dump() operations by a large factor and should be safe
      for legacy applications.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9063e21f
  15. 26 2月, 2014 1 次提交
  16. 18 2月, 2014 1 次提交
  17. 19 1月, 2014 1 次提交
  18. 07 1月, 2014 2 次提交
  19. 02 1月, 2014 1 次提交
  20. 01 1月, 2014 2 次提交
    • D
      netlink: specify netlink packet direction for nlmon · 604d13c9
      Daniel Borkmann 提交于
      In order to facilitate development for netlink protocol dissector,
      fill the unused field skb->pkt_type of the cloned skb with a hint
      of the address space of the new owner (receiver) socket in the
      notion of "to kernel" resp. "to user".
      
      At the time we invoke __netlink_deliver_tap_skb(), we already have
      set the new skb owner via netlink_skb_set_owner_r(), so we can use
      that for netlink_is_kernel() probing.
      
      In normal PF_PACKET network traffic, this field denotes if the
      packet is destined for us (PACKET_HOST), if it's broadcast
      (PACKET_BROADCAST), etc.
      
      As we only have 3 bit reserved, we can use the value (= 6) of
      PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
      and not supported anywhere, and packets of such type were never
      exposed to user space, so there are no overlapping users of such
      kind. Thus, as wished, that seems the only way to make both
      PACKET_* values non-overlapping and therefore device agnostic.
      
      By using those two flags for netlink skbs on nlmon devices, they
      can be made available and picked up via sll_pkttype (previously
      unused in netlink context) in struct sockaddr_ll. We now have
      these two directions:
      
       - PACKET_USER (= 6)    ->  to user space
       - PACKET_KERNEL (= 7)  ->  to kernel space
      
      Partial `ip a` example strace for sa_family=AF_NETLINK with
      detected nl msg direction:
      
      syscall:                     direction:
      sendto(3,  ...) = 40         /* to kernel */
      recvmsg(3, ...) = 3404       /* to user */
      recvmsg(3, ...) = 1120       /* to user */
      recvmsg(3, ...) = 20         /* to user */
      sendto(3,  ...) = 40         /* to kernel */
      recvmsg(3, ...) = 168        /* to user */
      recvmsg(3, ...) = 144        /* to user */
      recvmsg(3, ...) = 20         /* to user */
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      604d13c9
    • D
      netlink: only do not deliver to tap when both sides are kernel sks · 73bfd370
      Daniel Borkmann 提交于
      We should also deliver packets to nlmon devices when we are in
      netlink_unicast_kernel(), and only one of the {src,dst} sockets
      is user sk and the other one kernel sk. That's e.g. the case in
      netlink diag, netlink route, etc. Still, forbid to deliver messages
      from kernel to kernel sks.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73bfd370
  21. 29 11月, 2013 2 次提交
    • J
      genetlink/pmcraid: use proper genetlink multicast API · 5e53e689
      Johannes Berg 提交于
      The pmcraid driver is abusing the genetlink API and is using its
      family ID as the multicast group ID, which is invalid and may
      belong to somebody else (and likely will.)
      
      Make it use the correct API, but since this may already be used
      as-is by userspace, reserve a family ID for this code and also
      reserve that group ID to not break userspace assumptions.
      
      My previous patch broke event delivery in the driver as I missed
      that it wasn't using the right API and forgot to update it later
      in my series.
      
      While changing this, I noticed that the genetlink code could use
      the static group ID instead of a strcmp(), so also do that for
      the VFS_DQUOT family.
      
      Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e53e689
    • G
      genetlink: Fix uninitialized variable in genl_validate_assign_mc_groups() · 0f0e2159
      Geert Uytterhoeven 提交于
      net/netlink/genetlink.c: In function ‘genl_validate_assign_mc_groups’:
      net/netlink/genetlink.c:217: warning: ‘err’ may be used uninitialized in this
      function
      
      Commit 2a94fe48 ("genetlink: make multicast
      groups const, prevent abuse") split genl_register_mc_group() in multiple
      functions, but dropped the initialization of err.
      
      Initialize err to zero to fix this.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f0e2159
  22. 22 11月, 2013 1 次提交
    • J
      genetlink: fix genlmsg_multicast() bug · 220815a9
      Johannes Berg 提交于
      Unfortunately, I introduced a tremendously stupid bug into
      genlmsg_multicast() when doing all those multicast group
      changes: it adjusts the group number, but then passes it
      to genlmsg_multicast_netns() which does that again.
      
      Somehow, my tests failed to catch this, so add a warning
      into genlmsg_multicast_netns() and remove the offending
      group ID adjustment.
      
      Also add a warning to the similar code in other functions
      so people who misuse them are more loudly warned.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      220815a9
  23. 21 11月, 2013 1 次提交
    • H
      net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426
      Hannes Frederic Sowa 提交于
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3d33426
  24. 20 11月, 2013 8 次提交
  25. 19 11月, 2013 1 次提交
  26. 16 11月, 2013 1 次提交