1. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  2. 23 2月, 2013 1 次提交
  3. 19 2月, 2013 2 次提交
  4. 10 1月, 2013 1 次提交
  5. 18 12月, 2012 2 次提交
  6. 19 11月, 2012 1 次提交
    • E
      net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm · df008c91
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Allow creation of af_key sockets.
      Allow creation of llc sockets.
      Allow creation of af_packet sockets.
      
      Allow sending xfrm netlink control messages.
      
      Allow binding to netlink multicast groups.
      Allow sending to netlink multicast groups.
      Allow adding and dropping netlink multicast groups.
      Allow sending to all netlink multicast groups and port ids.
      
      Allow reading the netfilter SO_IP_SET socket option.
      Allow sending netfilter netlink messages.
      Allow setting and getting ip_vs netfilter socket options.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df008c91
  7. 19 10月, 2012 1 次提交
    • E
      netlink: use kfree_rcu() in netlink_release() · 6d772ac5
      Eric Dumazet 提交于
      On some suspend/resume operations involving wimax device, we have
      noticed some intermittent memory corruptions in netlink code.
      
      Stéphane Marchesin tracked this corruption in netlink_update_listeners()
      and suggested a patch.
      
      It appears netlink_release() should use kfree_rcu() instead of kfree()
      for the listeners structure as it may be used by other cpus using RCU
      protection.
      
      netlink_release() must set to NULL the listeners pointer when
      it is about to be freed.
      
      Also have to protect netlink_update_listeners() and
      netlink_has_listeners() if listeners is NULL.
      
      Add a nl_deref_protected() lockdep helper to properly document which
      locks protects us.
      Reported-by: NJonathan Kliegman <kliegs@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Stéphane Marchesin <marcheu@google.com>
      Cc: Sam Leffler <sleffler@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d772ac5
  8. 07 10月, 2012 1 次提交
    • G
      netlink: add reference of module in netlink_dump_start · 6dc878a8
      Gao feng 提交于
      I get a panic when I use ss -a and rmmod inet_diag at the
      same time.
      
      It's because netlink_dump uses inet_diag_dump which belongs to module
      inet_diag.
      
      I search the codes and find many modules have the same problem.  We
      need to add a reference to the module which the cb->dump belongs to.
      
      Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.
      
      Change From v3:
      change netlink_dump_start to inline,suggestion from Pablo and
      Eric.
      
      Change From v2:
      delete netlink_dump_done,and call module_put in netlink_dump
      and netlink_sock_destruct.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dc878a8
  9. 11 9月, 2012 1 次提交
  10. 09 9月, 2012 2 次提交
  11. 08 9月, 2012 1 次提交
    • E
      scm: Don't use struct ucred in NETLINK_CB and struct scm_cookie. · dbe9a417
      Eric W. Biederman 提交于
      Passing uids and gids on NETLINK_CB from a process in one user
      namespace to a process in another user namespace can result in the
      wrong uid or gid being presented to userspace.  Avoid that problem by
      passing kuids and kgids instead.
      
      - define struct scm_creds for use in scm_cookie and netlink_skb_parms
        that holds uid and gid information in kuid_t and kgid_t.
      
      - Modify scm_set_cred to fill out scm_creds by heand instead of using
        cred_to_ucred to fill out struct ucred.  This conversion ensures
        userspace does not get incorrect uid or gid values to look at.
      
      - Modify scm_recv to convert from struct scm_creds to struct ucred
        before copying credential values to userspace.
      
      - Modify __scm_send to populate struct scm_creds on in the scm_cookie,
        instead of just copying struct ucred from userspace.
      
      - Modify netlink_sendmsg to copy scm_creds instead of struct ucred
        into the NETLINK_CB.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbe9a417
  12. 25 8月, 2012 1 次提交
    • P
      netlink: fix possible spoofing from non-root processes · 20e1db19
      Pablo Neira Ayuso 提交于
      Non-root user-space processes can send Netlink messages to other
      processes that are well-known for being subscribed to Netlink
      asynchronous notifications. This allows ilegitimate non-root
      process to send forged messages to Netlink subscribers.
      
      The userspace process usually verifies the legitimate origin in
      two ways:
      
      a) Socket credentials. If UID != 0, then the message comes from
         some ilegitimate process and the message needs to be dropped.
      
      b) Netlink portID. In general, portID == 0 means that the origin
         of the messages comes from the kernel. Thus, discarding any
         message not coming from the kernel.
      
      However, ctnetlink sets the portID in event messages that has
      been triggered by some user-space process, eg. conntrack utility.
      So other processes subscribed to ctnetlink events, eg. conntrackd,
      know that the event was triggered by some user-space action.
      
      Neither of the two ways to discard ilegitimate messages coming
      from non-root processes can help for ctnetlink.
      
      This patch adds capability validation in case that dst_pid is set
      in netlink_sendmsg(). This approach is aggressive since existing
      applications using any Netlink bus to deliver messages between
      two user-space processes will break. Note that the exception is
      NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
      userspace communication.
      
      Still, if anyone wants that his Netlink bus allows netlink-to-netlink
      userspace, then they can set NL_NONROOT_SEND. However, by default,
      I don't think it makes sense to allow to use NETLINK_ROUTE to
      communicate two processes that are sending no matter what information
      that is not related to link/neighbouring/routing. They should be using
      NETLINK_USERSOCK instead for that.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20e1db19
  13. 22 8月, 2012 1 次提交
    • E
      af_netlink: force credentials passing [CVE-2012-3520] · e0e3cea4
      Eric Dumazet 提交于
      Pablo Neira Ayuso discovered that avahi and
      potentially NetworkManager accept spoofed Netlink messages because of a
      kernel bug.  The kernel passes all-zero SCM_CREDENTIALS ancillary data
      to the receiver if the sender did not provide such data, instead of not
      including any such data at all or including the correct data from the
      peer (as it is the case with AF_UNIX).
      
      This bug was introduced in commit 16e57262
      (af_unix: dont send SCM_CREDENTIALS by default)
      
      This patch forces passing credentials for netlink, as
      before the regression.
      
      Another fix would be to not add SCM_CREDENTIALS in
      netlink messages if not provided by the sender, but it
      might break some programs.
      
      With help from Florian Weimer & Petr Matousek
      
      This issue is designated as CVE-2012-3520
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0e3cea4
  14. 15 8月, 2012 1 次提交
  15. 30 6月, 2012 2 次提交
  16. 24 4月, 2012 2 次提交
  17. 20 4月, 2012 1 次提交
  18. 06 4月, 2012 1 次提交
  19. 27 2月, 2012 2 次提交
  20. 31 1月, 2012 1 次提交
  21. 24 12月, 2011 1 次提交
    • D
      netlink: Undo const marker in netlink_is_kernel(). · 035c4c16
      David S. Miller 提交于
      We can't do this without propagating the const to nlk_sk()
      too, otherwise:
      
      net/netlink/af_netlink.c: In function ‘netlink_is_kernel’:
      net/netlink/af_netlink.c:103:2: warning: passing argument 1 of ‘nlk_sk’ discards ‘const’ qualifier from pointer target type [enabled by default]
      net/netlink/af_netlink.c:96:36: note: expected ‘struct sock *’ but argument is of type ‘const struct sock *’
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      035c4c16
  22. 23 12月, 2011 2 次提交
  23. 29 9月, 2011 1 次提交
    • E
      af_unix: dont send SCM_CREDENTIALS by default · 16e57262
      Eric Dumazet 提交于
      Since commit 7361c36c (af_unix: Allow credentials to work across
      user and pid namespaces) af_unix performance dropped a lot.
      
      This is because we now take a reference on pid and cred in each write(),
      and release them in read(), usually done from another process,
      eventually from another cpu. This triggers false sharing.
      
      # Events: 154K cycles
      #
      # Overhead  Command       Shared Object        Symbol
      # ........  .......  ..................  .........................
      #
          10.40%  hackbench  [kernel.kallsyms]   [k] put_pid
           8.60%  hackbench  [kernel.kallsyms]   [k] unix_stream_recvmsg
           7.87%  hackbench  [kernel.kallsyms]   [k] unix_stream_sendmsg
           6.11%  hackbench  [kernel.kallsyms]   [k] do_raw_spin_lock
           4.95%  hackbench  [kernel.kallsyms]   [k] unix_scm_to_skb
           4.87%  hackbench  [kernel.kallsyms]   [k] pid_nr_ns
           4.34%  hackbench  [kernel.kallsyms]   [k] cred_to_ucred
           2.39%  hackbench  [kernel.kallsyms]   [k] unix_destruct_scm
           2.24%  hackbench  [kernel.kallsyms]   [k] sub_preempt_count
           1.75%  hackbench  [kernel.kallsyms]   [k] fget_light
           1.51%  hackbench  [kernel.kallsyms]   [k]
      __mutex_lock_interruptible_slowpath
           1.42%  hackbench  [kernel.kallsyms]   [k] sock_alloc_send_pskb
      
      This patch includes SCM_CREDENTIALS information in a af_unix message/skb
      only if requested by the sender, [man 7 unix for details how to include
      ancillary data using sendmsg() system call]
      
      Note: This might break buggy applications that expected SCM_CREDENTIAL
      from an unaware write() system call, and receiver not using SO_PASSCRED
      socket option.
      
      If SOCK_PASSCRED is set on source or destination socket, we still
      include credentials for mere write() syscalls.
      
      Performance boost in hackbench : more than 50% gain on a 16 thread
      machine (2 quad-core cpus, 2 threads per core)
      
      hackbench 20 thread 2000
      
      4.228 sec instead of 9.102 sec
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16e57262
  24. 12 8月, 2011 1 次提交
  25. 23 6月, 2011 1 次提交
    • J
      netlink: advertise incomplete dumps · 670dc283
      Johannes Berg 提交于
      Consider the following situation:
       * a dump that would show 8 entries, four in the first
         round, and four in the second
       * between the first and second rounds, 6 entries are
         removed
       * now the second round will not show any entry, and
         even if there is a sequence/generation counter the
         application will not know
      
      To solve this problem, add a new flag NLM_F_DUMP_INTR
      to the netlink header that indicates the dump wasn't
      consistent, this flag can also be set on the MSG_DONE
      message that terminates the dump, and as such above
      situation can be detected.
      
      To achieve this, add a sequence counter to the netlink
      callback struct. Of course, netlink code still needs
      to use this new functionality. The correct way to do
      that is to always set cb->seq when a dumpit callback
      is invoked and call nl_dump_check_consistent() for
      each new message. The core code will also call this
      function for the final MSG_DONE message.
      
      To make it usable with generic netlink, a new function
      genlmsg_nlhdr() is needed to obtain the netlink header
      from the genetlink user header.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      670dc283
  26. 17 6月, 2011 1 次提交
  27. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  28. 24 5月, 2011 1 次提交
    • D
      net: convert %p usage to %pK · 71338aa7
      Dan Rosenberg 提交于
      The %pK format specifier is designed to hide exposed kernel pointers,
      specifically via /proc interfaces.  Exposing these pointers provides an
      easy target for kernel write vulnerabilities, since they reveal the
      locations of writable structures containing easily triggerable function
      pointers.  The behavior of %pK depends on the kptr_restrict sysctl.
      
      If kptr_restrict is set to 0, no deviation from the standard %p behavior
      occurs.  If kptr_restrict is set to 1, the default, if the current user
      (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
      (currently in the LSM tree), kernel pointers using %pK are printed as 0's.
       If kptr_restrict is set to 2, kernel pointers using %pK are printed as
      0's regardless of privileges.  Replacing with 0's was chosen over the
      default "(null)", which cannot be parsed by userland %p, which expects
      "(nil)".
      
      The supporting code for kptr_restrict and %pK are currently in the -mm
      tree.  This patch converts users of %p in net/ to %pK.  Cases of printing
      pointers to the syslog are not covered, since this would eliminate useful
      information for postmortem debugging and the reading of the syslog is
      already optionally protected by the dmesg_restrict sysctl.
      Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Thomas Graf <tgraf@infradead.org>
      Cc: Eugene Teo <eugeneteo@kernel.org>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71338aa7
  29. 08 5月, 2011 1 次提交
  30. 04 3月, 2011 2 次提交
  31. 01 3月, 2011 1 次提交
    • A
      netlink: handle errors from netlink_dump() · b44d211e
      Andrey Vagin 提交于
      netlink_dump() may failed, but nobody handle its error.
      It generates output data, when a previous portion has been returned to
      user space. This mechanism works when all data isn't go in skb. If we
      enter in netlink_recvmsg() and skb is absent in the recv queue, the
      netlink_dump() will not been executed. So if netlink_dump() is failed
      one time, the new data never appear and the reader will sleep forever.
      
      netlink_dump() is called from two places:
      
      1. from netlink_sendmsg->...->netlink_dump_start().
         In this place we can report error directly and it will be returned
         by sendmsg().
      
      2. from netlink_recvmsg
         There we can't report error directly, because we have a portion of
         valid output data and call netlink_dump() for prepare the next portion.
         If netlink_dump() is failed, the socket will be mark as error and the
         next recvmsg will be failed.
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b44d211e
  32. 25 10月, 2010 1 次提交
    • E
      netlink: fix netlink_change_ngroups() · 5c398dc8
      Eric Dumazet 提交于
      commit 6c04bb18 (netlink: use call_rcu for netlink_change_ngroups)
      used a somewhat convoluted and racy way to perform call_rcu().
      
      The old block of memory is freed after a grace period, but the rcu_head
      used to track it is located in new block.
      
      This can clash if we call two times or more netlink_change_ngroups(),
      and a block is freed before another. call_rcu() called on different cpus
      makes no guarantee in order of callbacks.
      
      Fix this using a more standard way of handling this : Each block of
      memory contains its own rcu_head, so that no 'use after free' can
      happens.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Johannes Berg <johannes@sipsolutions.net>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c398dc8