1. 12 3月, 2020 1 次提交
  2. 21 2月, 2020 1 次提交
    • N
      net: netlink: cap max groups which will be considered in netlink_bind() · 3a20773b
      Nikolay Aleksandrov 提交于
      Since nl_groups is a u32 we can't bind more groups via ->bind
      (netlink_bind) call, but netlink has supported more groups via
      setsockopt() for a long time and thus nlk->ngroups could be over 32.
      Recently I added support for per-vlan notifications and increased the
      groups to 33 for NETLINK_ROUTE which exposed an old bug in the
      netlink_bind() code causing out-of-bounds access on archs where unsigned
      long is 32 bits via test_bit() on a local variable. Fix this by capping the
      maximum groups in netlink_bind() to BITS_PER_TYPE(u32), effectively
      capping them at 32 which is the minimum of allocated groups and the
      maximum groups which can be bound via netlink_bind().
      
      CC: Christophe Leroy <christophe.leroy@c-s.fr>
      CC: Richard Guy Briggs <rgb@redhat.com>
      Fixes: 4f520900 ("netlink: have netlink per-protocol bind function return an error code.")
      Reported-by: NErhard F. <erhard_f@mailbox.org>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a20773b
  3. 18 2月, 2020 1 次提交
  4. 10 12月, 2019 1 次提交
  5. 15 6月, 2019 1 次提交
  6. 12 6月, 2019 1 次提交
  7. 31 5月, 2019 1 次提交
  8. 20 5月, 2019 1 次提交
  9. 13 4月, 2019 1 次提交
  10. 22 2月, 2019 1 次提交
  11. 20 1月, 2019 1 次提交
  12. 15 12月, 2018 1 次提交
  13. 16 10月, 2018 1 次提交
    • D
      netlink: Add answer_flags to netlink_callback · 22e6c58b
      David Ahern 提交于
      With dump filtering we need a way to ensure the NLM_F_DUMP_FILTERED
      flag is set on a message back to the user if the data returned is
      influenced by some input attributes. Normally this can be done as
      messages are added to the skb, but if the filter results in no data
      being returned, the user could be confused as to why.
      
      This patch adds answer_flags to the netlink_callback allowing dump
      handlers to set the NLM_F_DUMP_FILTERED at a minimum in the
      NLMSG_DONE message ensuring the flag gets back to the user.
      
      The netlink_callback space is initialized to 0 via a memset in
      __netlink_dump_start, so init of the new answer_flags is covered.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22e6c58b
  14. 09 10月, 2018 2 次提交
    • D
      netlink: Add new socket option to enable strict checking on dumps · 89d35528
      David Ahern 提交于
      Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace
      can use via setsockopt to request strict checking of headers and
      attributes on dump requests.
      
      To get dump features such as kernel side filtering based on data in
      the header or attributes appended to the dump request, userspace
      must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero
      value. Since the netlink sock and its flags are private to the
      af_netlink code, the strict checking flag is passed to dump handlers
      via a flag in the netlink_callback struct.
      
      For old userspace on new kernel there is no impact as all of the data
      checks in later patches are wrapped in a check on the new strict flag.
      
      For new userspace on old kernel, the setsockopt will fail and even if
      new userspace sets data in the headers and appended attributes the
      kernel will silently ignore it. Moving forward when the setsockopt
      succeeds, the new userspace on old kernel means the dump request can
      pass an attribute the kernel does not understand. The dump will then
      fail as the older kernel does not understand it.
      
      New userspace on new kernel setting the socket option gets the benefit
      of the improved data dump.
      
      Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic
      NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter
      checking on the NEW, DEL, and SET commands.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NChristian Brauner <christian@brauner.io>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89d35528
    • D
      netlink: Pass extack to dump handlers · 4a19edb6
      David Ahern 提交于
      Declare extack in netlink_dump and pass to dump handlers via
      netlink_callback. Add any extack message after the dump_done_errno
      allowing error messages to be returned. This will be useful when
      strict checking is done on dump requests, returning why the dump
      fails EINVAL.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NChristian Brauner <christian@brauner.io>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a19edb6
  15. 12 9月, 2018 1 次提交
  16. 06 9月, 2018 1 次提交
    • D
      netlink: Make groups check less stupid in netlink_bind() · 428f944b
      Dmitry Safonov 提交于
      As Linus noted, the test for 0 is needless, groups type can follow the
      usual kernel style and 8*sizeof(unsigned long) is BITS_PER_LONG:
      
      > The code [..] isn't technically incorrect...
      > But it is stupid.
      > Why stupid? Because the test for 0 is pointless.
      >
      > Just doing
      >        if (nlk->ngroups < 8*sizeof(groups))
      >                groups &= (1UL << nlk->ngroups) - 1;
      >
      > would have been fine and more understandable, since the "mask by shift
      > count" already does the right thing for a ngroups value of 0. Now that
      > test for zero makes me go "what's special about zero?". It turns out
      > that the answer to that is "nothing".
      [..]
      > The type of "groups" is kind of silly too.
      >
      > Yeah, "long unsigned int" isn't _technically_ wrong. But we normally
      > call that type "unsigned long".
      
      Cleanup my piece of pointlessness.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: netdev@vger.kernel.org
      Fairly-blamed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      428f944b
  17. 05 8月, 2018 1 次提交
    • D
      netlink: Don't shift on 64 for ngroups · 91874ecf
      Dmitry Safonov 提交于
      It's legal to have 64 groups for netlink_sock.
      
      As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
      only to first 32 groups.
      
      The check for correctness of .bind() userspace supplied parameter
      is done by applying mask made from ngroups shift. Which broke Android
      as they have 64 groups and the shift for mask resulted in an overflow.
      
      Fixes: 61f4b237 ("netlink: Don't shift with UB on nlk->ngroups")
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Reported-and-Tested-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91874ecf
  18. 02 8月, 2018 1 次提交
    • J
      netlink: Fix spectre v1 gadget in netlink_create() · bc5b6c0b
      Jeremy Cline 提交于
      'protocol' is a user-controlled value, so sanitize it after the bounds
      check to avoid using it for speculative out-of-bounds access to arrays
      indexed by it.
      
      This addresses the following accesses detected with the help of smatch:
      
      * net/netlink/af_netlink.c:654 __netlink_create() warn: potential
        spectre issue 'nlk_cb_mutex_keys' [w]
      
      * net/netlink/af_netlink.c:654 __netlink_create() warn: potential
        spectre issue 'nlk_cb_mutex_key_strings' [w]
      
      * net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre
        issue 'nl_table' [w] (local cap)
      
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJeremy Cline <jcline@redhat.com>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc5b6c0b
  19. 31 7月, 2018 1 次提交
  20. 30 7月, 2018 1 次提交
  21. 25 7月, 2018 1 次提交
  22. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  23. 26 5月, 2018 1 次提交
  24. 16 5月, 2018 1 次提交
  25. 05 5月, 2018 1 次提交
  26. 08 4月, 2018 1 次提交
  27. 28 3月, 2018 1 次提交
  28. 26 3月, 2018 1 次提交
  29. 23 2月, 2018 1 次提交
    • J
      netlink: put module reference if dump start fails · b87b6194
      Jason A. Donenfeld 提交于
      Before, if cb->start() failed, the module reference would never be put,
      because cb->cb_running is intentionally false at this point. Users are
      generally annoyed by this because they can no longer unload modules that
      leak references. Also, it may be possible to tediously wrap a reference
      counter back to zero, especially since module.c still uses atomic_inc
      instead of refcount_inc.
      
      This patch expands the error path to simply call module_put if
      cb->start() fails.
      
      Fixes: 41c87425 ("netlink: do not set cb_running if dump's start() errs")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b87b6194
  30. 13 2月, 2018 3 次提交
    • K
      net: Convert netlink_tap_net_ops · b86b47a3
      Kirill Tkhai 提交于
      These pernet_operations init just allocated net memory,
      and they obviously can be executed in parallel in any
      others.
      
      v3: New
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b86b47a3
    • K
      net: Convert netlink_net_ops · 194b95d2
      Kirill Tkhai 提交于
      The methods of netlink_net_ops create and destroy "netlink"
      file, which are not interesting for foreigh pernet_operations.
      So, netlink_net_ops may safely be made async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      194b95d2
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  31. 19 1月, 2018 1 次提交
  32. 17 1月, 2018 1 次提交
    • A
      net: delete /proc THIS_MODULE references · 96890d62
      Alexey Dobriyan 提交于
      /proc has been ignoring struct file_operations::owner field for 10 years.
      Specifically, it started with commit 786d7e16
      ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
      inode->i_fop is initialized with proxy struct file_operations for
      regular files:
      
      	-               if (de->proc_fops)
      	-                       inode->i_fop = de->proc_fops;
      	+               if (de->proc_fops) {
      	+                       if (S_ISREG(inode->i_mode))
      	+                               inode->i_fop = &proc_reg_file_ops;
      	+                       else
      	+                               inode->i_fop = de->proc_fops;
      	+               }
      
      VFS stopped pinning module at this point.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96890d62
  33. 16 1月, 2018 1 次提交
  34. 12 12月, 2017 1 次提交
    • K
      netlink: Add netns check on taps · 93c64764
      Kevin Cernekee 提交于
      Currently, a nlmon link inside a child namespace can observe systemwide
      netlink activity.  Filter the traffic so that nlmon can only sniff
      netlink messages from its own netns.
      
      Test case:
      
          vpnns -- bash -c "ip link add nlmon0 type nlmon; \
                            ip link set nlmon0 up; \
                            tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" &
          sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \
              spi 0x1 mode transport \
              auth sha1 0x6162633132330000000000000000000000000000 \
              enc aes 0x00000000000000000000000000000000
          grep --binary abc123 /tmp/nlmon.pcap
      Signed-off-by: NKevin Cernekee <cernekee@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93c64764
  35. 11 12月, 2017 3 次提交
    • C
      netlink: convert netlink tap spinlock to mutex · b1042d35
      Cong Wang 提交于
      Both netlink_add_tap() and netlink_remove_tap() are
      called in process context, no need to bother spinlock.
      
      Note, in fact, currently we always hold RTNL when calling
      these two functions, so we don't need any other lock at
      all, but keeping this lock doesn't harm anything.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1042d35
    • C
      netlink: make netlink tap per netns · 25e3f70f
      Cong Wang 提交于
      nlmon device is not supposed to capture netlink events from
      other netns, so instead of filtering events, we can simply
      make netlink tap itself per netns.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Kevin Cernekee <cernekee@chromium.org>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25e3f70f
    • T
      rhashtable: Change rhashtable_walk_start to return void · 97a6ec4a
      Tom Herbert 提交于
      Most callers of rhashtable_walk_start don't care about a resize event
      which is indicated by a return value of -EAGAIN. So calls to
      rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something
      like this is common:
      
             ret = rhashtable_walk_start(rhiter);
             if (ret && ret != -EAGAIN)
                     goto out;
      
      Since zero and -EAGAIN are the only possible return values from the
      function this check is pointless. The condition never evaluates to true.
      
      This patch changes rhashtable_walk_start to return void. This simplifies
      code for the callers that ignore -EAGAIN. For the few cases where the
      caller cares about the resize event, particularly where the table can be
      walked in mulitple parts for netlink or seq file dump, the function
      rhashtable_walk_start_check has been added that returns -EAGAIN on a
      resize event.
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97a6ec4a