1. 30 3月, 2020 1 次提交
  2. 28 3月, 2020 1 次提交
    • D
      bpf: Add netns cookie and enable it for bpf cgroup hooks · f318903c
      Daniel Borkmann 提交于
      In Cilium we're mainly using BPF cgroup hooks today in order to implement
      kube-proxy free Kubernetes service translation for ClusterIP, NodePort (*),
      ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic
      between Cilium managed nodes. While this works in its current shape and avoids
      packet-level NAT for inter Cilium managed node traffic, there is one major
      limitation we're facing today, that is, lack of netns awareness.
      
      In Kubernetes, the concept of Pods (which hold one or multiple containers)
      has been built around network namespaces, so while we can use the global scope
      of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing
      NodePort ports on loopback addresses), we also have the need to differentiate
      between initial network namespaces and non-initial one. For example, ExternalIP
      services mandate that non-local service IPs are not to be translated from the
      host (initial) network namespace as one example. Right now, we have an ugly
      work-around in place where non-local service IPs for ExternalIP services are
      not xlated from connect() and friends BPF hooks but instead via less efficient
      packet-level NAT on the veth tc ingress hook for Pod traffic.
      
      On top of determining whether we're in initial or non-initial network namespace
      we also have a need for a socket-cookie like mechanism for network namespaces
      scope. Socket cookies have the nice property that they can be combined as part
      of the key structure e.g. for BPF LRU maps without having to worry that the
      cookie could be recycled. We are planning to use this for our sessionAffinity
      implementation for services. Therefore, add a new bpf_get_netns_cookie() helper
      which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would
      provide the cookie for the initial network namespace while passing the context
      instead of NULL would provide the cookie from the application's network namespace.
      We're using a hole, so no size increase; the assignment happens only once.
      Therefore this allows for a comparison on initial namespace as well as regular
      cookie usage as we have today with socket cookies. We could later on enable
      this helper for other program types as well as we would see need.
      
        (*) Both externalTrafficPolicy={Local|Cluster} types
        [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.cSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net
      f318903c
  3. 17 1月, 2020 1 次提交
  4. 26 10月, 2019 1 次提交
    • G
      netns: fix GFP flags in rtnl_net_notifyid() · d4e4fdf9
      Guillaume Nault 提交于
      In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
      rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
      but there are a few paths calling rtnl_net_notifyid() from atomic
      context or from RCU critical sections. The later also precludes the use
      of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
      call is wrong too, as it uses GFP_KERNEL unconditionally.
      
      Therefore, we need to pass the GFP flags as parameter and propagate it
      through function calls until the proper flags can be determined.
      
      In most cases, GFP_KERNEL is fine. The exceptions are:
        * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
          indirectly call rtnl_net_notifyid() from RCU critical section,
      
        * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
          parameter.
      
      Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
      by nlmsg_new(). The function is allowed to sleep, so better make the
      flags consistent with the ones used in the following
      ovs_vport_cmd_fill_info() call.
      
      Found by code inspection.
      
      Fixes: 9a963454 ("netns: notify netns id events")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4e4fdf9
  5. 20 10月, 2019 1 次提交
    • E
      net: reorder 'struct net' fields to avoid false sharing · 2a06b898
      Eric Dumazet 提交于
      Intel test robot reported a ~7% regression on TCP_CRR tests
      that they bisected to the cited commit.
      
      Indeed, every time a new TCP socket is created or deleted,
      the atomic counter net->count is touched (via get_net(net)
      and put_net(net) calls)
      
      So cpus might have to reload a contended cache line in
      net_hash_mix(net) calls.
      
      We need to reorder 'struct net' fields to move @hash_mix
      in a read mostly cache line.
      
      We move in the first cache line fields that can be
      dirtied often.
      
      We probably will have to address in a followup patch
      the __randomize_layout that was added in linux-4.13,
      since this might break our placement choices.
      
      Fixes: 355b9855 ("netns: provide pure entropy for net_hash_mix()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a06b898
  6. 02 10月, 2019 2 次提交
  7. 07 9月, 2019 1 次提交
  8. 22 8月, 2019 1 次提交
  9. 10 8月, 2019 1 次提交
    • D
      sock: make cookie generation global instead of per netns · cd48bdda
      Daniel Borkmann 提交于
      Generating and retrieving socket cookies are a useful feature that is
      exposed to BPF for various program types through bpf_get_socket_cookie()
      helper.
      
      The fact that the cookie counter is per netns is quite a limitation
      for BPF in practice in particular for programs in host namespace that
      use socket cookies as part of a map lookup key since they will be
      causing socket cookie collisions e.g. when attached to BPF cgroup hooks
      or cls_bpf on tc egress in host namespace handling container traffic
      from veth or ipvlan devices with peer in different netns. Change the
      counter to be global instead.
      
      Socket cookie consumers must assume the value as opqaue in any case.
      Not every socket must have a cookie generated and knowledge of the
      counter value itself does not provide much value either way hence
      conversion to global is fine.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Martynas Pumputis <m@lambda.lt>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd48bdda
  10. 26 7月, 2019 1 次提交
    • O
      crypto: user - make NETLINK_CRYPTO work inside netns · 91b05a7e
      Ondrej Mosnacek 提交于
      Currently, NETLINK_CRYPTO works only in the init network namespace. It
      doesn't make much sense to cut it out of the other network namespaces,
      so do the minor plumbing work necessary to make it work in any network
      namespace. Code inspired by net/core/sock_diag.c.
      
      Tested using kcapi-dgst from libkcapi [1]:
      Before:
          # unshare -n kcapi-dgst -c sha256 </dev/null | wc -c
          libkcapi - Error: Netlink error: sendmsg failed
          libkcapi - Error: Netlink error: sendmsg failed
          libkcapi - Error: NETLINK_CRYPTO: cannot obtain cipher information for hmac(sha512) (is required crypto_user.c patch missing? see documentation)
          0
      
      After:
          # unshare -n kcapi-dgst -c sha256 </dev/null | wc -c
          32
      
      [1] https://github.com/smuellerDD/libkcapiSigned-off-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      91b05a7e
  11. 27 6月, 2019 1 次提交
    • D
      keys: Network namespace domain tag · 9b242610
      David Howells 提交于
      Create key domain tags for network namespaces and make it possible to
      automatically tag keys that are used by networked services (e.g. AF_RXRPC,
      AFS, DNS) with the default network namespace if not set by the caller.
      
      This allows keys with the same description but in different namespaces to
      coexist within a keyring.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: netdev@vger.kernel.org
      cc: linux-nfs@vger.kernel.org
      cc: linux-cifs@vger.kernel.org
      cc: linux-afs@lists.infradead.org
      9b242610
  12. 19 6月, 2019 1 次提交
    • E
      netns: add pre_exit method to struct pernet_operations · d7d99872
      Eric Dumazet 提交于
      Current struct pernet_operations exit() handlers are highly
      discouraged to call synchronize_rcu().
      
      There are cases where we need them, and exit_batch() does
      not help the common case where a single netns is dismantled.
      
      This patch leverages the existing synchronize_rcu() call
      in cleanup_net()
      
      Calling optional ->pre_exit() method before ->exit() or
      ->exit_batch() allows to benefit from a single synchronize_rcu()
      call.
      
      Note that the synchronize_rcu() calls added in this patch
      are only in error paths or slow paths.
      
      Tested:
      
      $ time for i in {1..1000}; do unshare -n /bin/false;done
      
      real	0m2.612s
      user	0m0.171s
      sys	0m2.216s
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7d99872
  13. 29 5月, 2019 1 次提交
    • D
      net: Initial nexthop code · ab84be7e
      David Ahern 提交于
      Barebones start point for nexthops. Implementation for RTM commands,
      notifications, management of rbtree for holding nexthops by id, and
      kernel side data structures for nexthops and nexthop config.
      
      Nexthops are maintained in an rbtree sorted by id. Similar to routes,
      nexthops are configured per namespace using netns_nexthop struct added
      to struct net.
      
      Nexthop notifications are sent when a nexthop is added or deleted,
      but NOT if the delete is due to a device event or network namespace
      teardown (which also involves device events). Applications are
      expected to use the device down event to flush nexthops and any
      routes used by the nexthops.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab84be7e
  14. 29 3月, 2019 1 次提交
  15. 25 1月, 2019 1 次提交
  16. 15 9月, 2018 1 次提交
  17. 21 7月, 2018 1 次提交
  18. 18 6月, 2018 1 次提交
  19. 30 3月, 2018 1 次提交
    • K
      net: Introduce net_rwsem to protect net_namespace_list · f0b07bb1
      Kirill Tkhai 提交于
      rtnl_lock() is used everywhere, and contention is very high.
      When someone wants to iterate over alive net namespaces,
      he/she has no a possibility to do that without exclusive lock.
      But the exclusive rtnl_lock() in such places is overkill,
      and it just increases the contention. Yes, there is already
      for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
      and this can't be sleepable. Also, sometimes it may be need
      really prevent net_namespace_list growth, so for_each_net_rcu()
      is not fit there.
      
      This patch introduces new rw_semaphore, which will be used
      instead of rtnl_mutex to protect net_namespace_list. It is
      sleepable and allows not-exclusive iterations over net
      namespaces list. It allows to stop using rtnl_lock()
      in several places (what is made in next patches) and makes
      less the time, we keep rtnl_mutex. Here we just add new lock,
      while the explanation of we can remove rtnl_lock() there are
      in next patches.
      
      Fine grained locks generally are better, then one big lock,
      so let's do that with net_namespace_list, while the situation
      allows that.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0b07bb1
  20. 28 3月, 2018 3 次提交
  21. 22 3月, 2018 1 次提交
    • C
      net: add uevent socket member · 94e5e308
      Christian Brauner 提交于
      This commit adds struct uevent_sock to struct net. Since struct uevent_sock
      records the position of the uevent socket in the uevent socket list we can
      trivially remove it from the uevent socket list during cleanup. This speeds
      up the old removal codepath.
      Note, list_del() will hit __list_del_entry_valid() in its call chain which
      will validate that the element is a member of the list. If it isn't it will
      take care that the list is not modified.
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94e5e308
  22. 13 3月, 2018 1 次提交
  23. 21 2月, 2018 2 次提交
  24. 13 2月, 2018 1 次提交
  25. 25 1月, 2018 1 次提交
    • D
      net: tcp: close sock if net namespace is exiting · 4ee806d5
      Dan Streetman 提交于
      When a tcp socket is closed, if it detects that its net namespace is
      exiting, close immediately and do not wait for FIN sequence.
      
      For normal sockets, a reference is taken to their net namespace, so it will
      never exit while the socket is open.  However, kernel sockets do not take a
      reference to their net namespace, so it may begin exiting while the kernel
      socket is still open.  In this case if the kernel socket is a tcp socket,
      it will stay open trying to complete its close sequence.  The sock's dst(s)
      hold a reference to their interface, which are all transferred to the
      namespace's loopback interface when the real interfaces are taken down.
      When the namespace tries to take down its loopback interface, it hangs
      waiting for all references to the loopback interface to release, which
      results in messages like:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      These messages continue until the socket finally times out and closes.
      Since the net namespace cleanup holds the net_mutex while calling its
      registered pernet callbacks, any new net namespace initialization is
      blocked until the current net namespace finishes exiting.
      
      After this change, the tcp socket notices the exiting net namespace, and
      closes immediately, releasing its dst(s) and their reference to the
      loopback interface, which lets the net namespace continue exiting.
      
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811Signed-off-by: NDan Streetman <ddstreet@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ee806d5
  26. 16 1月, 2018 1 次提交
  27. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  28. 04 8月, 2017 1 次提交
    • I
      net: core: Make the FIB notification chain generic · 04b1d4e5
      Ido Schimmel 提交于
      The FIB notification chain is currently soley used by IPv4 code.
      However, we're going to introduce IPv6 FIB offload support, which
      requires these notification as well.
      
      As explained in commit c3852ef7 ("ipv4: fib: Replay events when
      registering FIB notifier"), upon registration to the chain, the callee
      receives a full dump of the FIB tables and rules by traversing all the
      net namespaces. The integrity of the dump is ensured by a per-namespace
      sequence counter that is incremented whenever a change to the tables or
      rules occurs.
      
      In order to allow more address families to use the chain, each family is
      expected to register its fib_notifier_ops in its pernet init. These
      operations allow the common code to read the family's sequence counter
      as well as dump its tables and rules in the given net namespace.
      
      Additionally, a 'family' parameter is added to sent notifications, so
      that listeners could distinguish between the different families.
      
      Implement the common code that allows listeners to register to the chain
      and for address families to register their fib_notifier_ops. Subsequent
      patches will implement these operations in IPv6.
      
      In the future, ipmr and ip6mr will be extended to provide these
      notifications as well.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04b1d4e5
  29. 01 7月, 2017 2 次提交
  30. 20 6月, 2017 1 次提交
    • F
      netns: add and use net_ns_barrier · 7866cc57
      Florian Westphal 提交于
      Quoting Joe Stringer:
        If a user loads nf_conntrack_ftp, sends FTP traffic through a network
        namespace, destroys that namespace then unloads the FTP helper module,
        then the kernel will crash.
      
      Events that lead to the crash:
      1. conntrack is created with ftp helper in netns x
      2. This netns is destroyed
      3. netns destruction is scheduled
      4. netns destruction wq starts, removes netns from global list
      5. ftp helper is unloaded, which resets all helpers of the conntracks
      via for_each_net()
      
      but because netns is already gone from list the for_each_net() loop
      doesn't include it, therefore all of these conntracks are unaffected.
      
      6. helper module unload finishes
      7. netns wq invokes destructor for rmmod'ed helper
      
      CC: "Eric W. Biederman" <ebiederm@xmission.com>
      Reported-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7866cc57
  31. 04 4月, 2017 1 次提交
    • M
      can: initial support for network namespaces · 8e8cda6d
      Mario Kicherer 提交于
      This patch adds initial support for network namespaces. The changes only
      enable support in the CAN raw, proc and af_can code. GW and BCM still
      have their checks that ensure that they are used only from the main
      namespace.
      
      The patch boils down to moving the global structures, i.e. the global
      filter list and their /proc stats, into a per-namespace structure and passing
      around the corresponding "struct net" in a lot of different places.
      
      Changes since v1:
       - rebased on current HEAD (2bfe01ef)
       - fixed overlong line
      Signed-off-by: NMario Kicherer <dev@kicherer.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      8e8cda6d
  32. 19 11月, 2016 1 次提交
  33. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  34. 09 8月, 2016 1 次提交
  35. 03 8月, 2016 1 次提交