1. 30 10月, 2016 18 次提交
  2. 29 10月, 2016 1 次提交
  3. 28 10月, 2016 7 次提交
    • A
      net: skip genenerating uevents for network namespaces that are exiting · 002d8a1a
      Andrey Vagin 提交于
      No one can see these events, because a network namespace can not be
      destroyed, if it has sockets.
      
      Unlike other devices, uevent-s for network devices are generated
      only inside their network namespaces. They are filtered in
      kobj_bcast_filter()
      
      My experiments shows that net namespaces are destroyed more 30% faster
      with this optimization.
      
      Here is a perf output for destroying network namespaces without this
      patch.
      
      -   94.76%     0.02%  kworker/u48:1  [kernel.kallsyms]     [k] cleanup_net
         - 94.74% cleanup_net
            - 94.64% ops_exit_list.isra.4
               - 41.61% default_device_exit_batch
                  - 41.47% unregister_netdevice_many
                     - rollback_registered_many
                        - 40.36% netdev_unregister_kobject
                           - 14.55% device_del
                              + 13.71% kobject_uevent
                           - 13.04% netdev_queue_update_kobjects
                              + 12.96% kobject_put
                           - 12.72% net_rx_queue_update_kobjects
                                kobject_put
                              - kobject_release
                                 + 12.69% kobject_uevent
                        + 0.80% call_netdevice_notifiers_info
               + 19.57% nfsd_exit_net
               + 11.15% tcp_net_metrics_exit
               + 8.25% rpcsec_gss_exit_net
      
      It's very critical to optimize the exit path for network namespaces,
      because they are destroyed under net_mutex and many namespaces can be
      destroyed for one iteration.
      
      v2: use dev_set_uevent_suppress()
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      002d8a1a
    • J
      genetlink: mark families as __ro_after_init · 56989f6d
      Johannes Berg 提交于
      Now genl_register_family() is the only thing (other than the
      users themselves, perhaps, but I didn't find any doing that)
      writing to the family struct.
      
      In all families that I found, genl_register_family() is only
      called from __init functions (some indirectly, in which case
      I've add __init annotations to clarifly things), so all can
      actually be marked __ro_after_init.
      
      This protects the data structure from accidental corruption.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56989f6d
    • J
      genetlink: use idr to track families · 2ae0f17d
      Johannes Berg 提交于
      Since generic netlink family IDs are small integers, allocated
      densely, IDR is an ideal match for lookups. Replace the existing
      hand-written hash-table with IDR for allocation and lookup.
      
      This lets the families only be written to once, during register,
      since the list_head can be removed and removal of a family won't
      cause any writes.
      
      It also slightly reduces the code size (by about 1.3k on x86-64).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ae0f17d
    • J
      genetlink: statically initialize families · 489111e5
      Johannes Berg 提交于
      Instead of providing macros/inline functions to initialize
      the families, make all users initialize them statically and
      get rid of the macros.
      
      This reduces the kernel code size by about 1.6k on x86-64
      (with allyesconfig).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      489111e5
    • J
      genetlink: no longer support using static family IDs · a07ea4d9
      Johannes Berg 提交于
      Static family IDs have never really been used, the only
      use case was the workaround I introduced for those users
      that assumed their family ID was also their multicast
      group ID.
      
      Additionally, because static family IDs would never be
      reserved by the generic netlink code, using a relatively
      low ID would only work for built-in families that can be
      registered immediately after generic netlink is started,
      which is basically only the control family (apart from
      the workaround code, which I also had to add code for so
      it would reserve those IDs)
      
      Thus, anything other than GENL_ID_GENERATE is flawed and
      luckily not used except in the cases I mentioned. Move
      those workarounds into a few lines of code, and then get
      rid of GENL_ID_GENERATE entirely, making it more robust.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a07ea4d9
    • J
      genetlink: introduce and use genl_family_attrbuf() · c90c39da
      Johannes Berg 提交于
      This helper function allows family implementations to access
      their family's attrbuf. This gets rid of the attrbuf usage
      in families, and also adds locking validation, since it's not
      valid to use the attrbuf with parallel_ops or outside of the
      dumpit callback.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c90c39da
    • A
      skbedit: allow the user to specify bitmask for mark · 4fe77d82
      Antonio Quartulli 提交于
      The user may want to use only some bits of the skb mark in
      his skbedit rules because the remaining part might be used by
      something else.
      
      Introduce the "mask" parameter to the skbedit actor in order
      to implement such functionality.
      
      When the mask is specified, only those bits selected by the
      latter are altered really changed by the actor, while the
      rest is left untouched.
      Signed-off-by: NAntonio Quartulli <antonio@open-mesh.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fe77d82
  4. 27 10月, 2016 3 次提交
  5. 24 10月, 2016 4 次提交
    • C
      net: ip, diag -- Add diag interface for raw sockets · 432490f9
      Cyrill Gorcunov 提交于
      In criu we are actively using diag interface to collect sockets
      present in the system when dumping applications. And while for
      unix, tcp, udp[lite], packet, netlink it works as expected,
      the raw sockets do not have. Thus add it.
      
      v2:
       - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
       - implement @destroy for diag requests (by dsa@)
      
      v3:
       - add export of raw_abort for IPv6 (by dsa@)
       - pass net-admin flag into inet_sk_diag_fill due to
         changes in net-next branch (by dsa@)
      
      v4:
       - use @pad in struct inet_diag_req_v2 for raw socket
         protocol specification: raw module carries sockets
         which may have custom protocol passed from socket()
         syscall and sole @sdiag_protocol is not enough to
         match underlied ones
       - start reporting protocol specifed in socket() call
         when sockets are raw ones for the same reason: user
         space tools like ss may parse this attribute and use
         it for socket matching
      
      v5 (by eric.dumazet@):
       - use sock_hold in raw_sock_get instead of atomic_inc,
         we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock
         when looking up so counter won't be zero here.
      
      v6:
       - use sdiag_raw_protocol() helper which will access @pad
         structure used for raw sockets protocol specification:
         we can't simply rename this member without breaking uapi
      
      v7:
       - sine sdiag_raw_protocol() helper is not suitable for
         uapi lets rather make an alias structure with proper
         names. __check_inet_diag_req_raw helper will catch
         if any of structure unintentionally changed.
      
      CC: David S. Miller <davem@davemloft.net>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: David Ahern <dsa@cumulusnetworks.com>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Andrey Vagin <avagin@openvz.org>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      432490f9
    • T
      lwt: Remove unused len field · f76a9db3
      Thomas Graf 提交于
      The field is initialized by ILA and MPLS but never used. Remove it.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f76a9db3
    • A
      net: allow to kill a task which waits net_mutex in copy_new_ns · 7281a665
      Andrey Vagin 提交于
      net_mutex can be locked for a long time. It may be because many
      namespaces are being destroyed or many processes decide to create
      a network namespace.
      
      Both these operations are heavy, so it is better to have an ability to
      kill a process which is waiting net_mutex.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7281a665
    • S
      net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID frames · d65f2fa6
      Shmulik Ladkani 提交于
      META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci)
      is zero, then no vlan accel tag is present.
      
      This is incorrect for zero VID vlan accel packets, making the following
      match fail:
        tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ...
      
      Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was
      introduced in 05423b24 "vlan: allow null VLAN ID to be used"
      (and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not
       adapted).
      
      Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's
      value.
      
      Fixes: 05423b24 ("vlan: allow null VLAN ID to be used")
      Fixes: 1a31f204 ("netsched: Allow meta match on vlan tag on receive")
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d65f2fa6
  6. 23 10月, 2016 4 次提交
  7. 22 10月, 2016 1 次提交
  8. 21 10月, 2016 2 次提交
    • J
      ipv4/6: use core net MTU range checking · b96f9afe
      Jarod Wilson 提交于
      ipv4/ip_tunnel:
      - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen
      - preserve all ndo_change_mtu checks for now to prevent regressions
      
      ipv6/ip6_tunnel:
      - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len
      - preserve all ndo_change_mtu checks for now to prevent regressions
      
      ipv6/ip6_vti:
      - min_mtu = 1280, max_mtu = 65535
      - remove redundant vti6_change_mtu
      
      ipv6/sit:
      - min_mtu = 1280, max_mtu = 0xFFF8 - t_hlen
      - remove redundant ipip6_tunnel_change_mtu
      
      CC: netdev@vger.kernel.org
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b96f9afe
    • J
      net: use core MTU range checking in misc drivers · b3e3893e
      Jarod Wilson 提交于
      firewire-net:
      - set min/max_mtu
      - remove fwnet_change_mtu
      
      nes:
      - set max_mtu
      - clean up nes_netdev_change_mtu
      
      xpnet:
      - set min/max_mtu
      - remove xpnet_dev_change_mtu
      
      hippi:
      - set min/max_mtu
      - remove hippi_change_mtu
      
      batman-adv:
      - set max_mtu
      - remove batadv_interface_change_mtu
      - initialization is a little async, not 100% certain that max_mtu is set
        in the optimal place, don't have hardware to test with
      
      rionet:
      - set min/max_mtu
      - remove rionet_change_mtu
      
      slip:
      - set min/max_mtu
      - streamline sl_change_mtu
      
      um/net_kern:
      - remove pointless ndo_change_mtu
      
      hsi/clients/ssi_protocol:
      - use core MTU range checking
      - remove now redundant ssip_pn_set_mtu
      
      ipoib:
      - set a default max MTU value
      - Note: ipoib's actual max MTU can vary, depending on if the device is in
        connected mode or not, so we'll just set the max_mtu value to the max
        possible, and let the ndo_change_mtu function continue to validate any new
        MTU change requests with checks for CM or not. Note that ipoib has no
        min_mtu set, and thus, the network core's mtu > 0 check is the only lower
        bounds here.
      
      mptlan:
      - use net core MTU range checking
      - remove now redundant mpt_lan_change_mtu
      
      fddi:
      - min_mtu = 21, max_mtu = 4470
      - remove now redundant fddi_change_mtu (including export)
      
      fjes:
      - min_mtu = 8192, max_mtu = 65536
      - The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to
        get past the core net MTU range checks so fjes_change_mtu can validate a
        new MTU against what it supports (see fjes_support_mtu in fjes_hw.c)
      
      hsr:
      - min_mtu = 0 (calls ether_setup, max_mtu is 1500)
      
      f_phonet:
      - min_mtu = 6, max_mtu = 65541
      
      u_ether:
      - min_mtu = 14, max_mtu = 15412
      
      phonet/pep-gprs:
      - min_mtu = 576, max_mtu = 65530
      - remove redundant gprs_set_mtu
      
      CC: netdev@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: Stefan Richter <stefanr@s5r6.in-berlin.de>
      CC: Faisal Latif <faisal.latif@intel.com>
      CC: linux-rdma@vger.kernel.org
      CC: Cliff Whickman <cpw@sgi.com>
      CC: Robin Holt <robinmholt@gmail.com>
      CC: Jes Sorensen <jes@trained-monkey.org>
      CC: Marek Lindner <mareklindner@neomailbox.ch>
      CC: Simon Wunderlich <sw@simonwunderlich.de>
      CC: Antonio Quartulli <a@unstable.cc>
      CC: Sathya Prakash <sathya.prakash@broadcom.com>
      CC: Chaitra P B <chaitra.basappa@broadcom.com>
      CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
      CC: MPT-FusionLinux.pdl@broadcom.com
      CC: Sebastian Reichel <sre@kernel.org>
      CC: Felipe Balbi <balbi@kernel.org>
      CC: Arvid Brodin <arvid.brodin@alten.se>
      CC: Remi Denis-Courmont <courmisch@gmail.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3e3893e