1. 01 11月, 2016 3 次提交
  2. 31 10月, 2016 1 次提交
    • A
      net: add an ioctl to get a socket network namespace · c62cce2c
      Andrey Vagin 提交于
      Each socket operates in a network namespace where it has been created,
      so if we want to dump and restore a socket, we have to know its network
      namespace.
      
      We have a socket_diag to get information about sockets, it doesn't
      report sockets which are not bound or connected.
      
      This patch introduces a new socket ioctl, which is called SIOCGSKNS
      and used to get a file descriptor for a socket network namespace.
      
      A task must have CAP_NET_ADMIN in a target network namespace to
      use this ioctl.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c62cce2c
  3. 30 10月, 2016 15 次提交
  4. 29 10月, 2016 1 次提交
  5. 28 10月, 2016 12 次提交
    • A
      mac80211: fils_aead: fix encrypt error handling · 51487718
      Arnd Bergmann 提交于
      gcc -Wmaybe-uninitialized reports a bug in aes_siv_encryp:
      
      net/mac80211/fils_aead.c: In function ‘aes_siv_encrypt.constprop’:
      net/mac80211/fils_aead.c:84:26: error: ‘tfm2’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      At the time that the memory allocation fails, 'tfm2' has not been
      allocated, so we should not attempt to free it later, and we can
      simply return an error.
      
      Fixes: 39404fee ("mac80211: FILS AEAD protection for station mode association frames")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      51487718
    • A
      net: skip genenerating uevents for network namespaces that are exiting · 002d8a1a
      Andrey Vagin 提交于
      No one can see these events, because a network namespace can not be
      destroyed, if it has sockets.
      
      Unlike other devices, uevent-s for network devices are generated
      only inside their network namespaces. They are filtered in
      kobj_bcast_filter()
      
      My experiments shows that net namespaces are destroyed more 30% faster
      with this optimization.
      
      Here is a perf output for destroying network namespaces without this
      patch.
      
      -   94.76%     0.02%  kworker/u48:1  [kernel.kallsyms]     [k] cleanup_net
         - 94.74% cleanup_net
            - 94.64% ops_exit_list.isra.4
               - 41.61% default_device_exit_batch
                  - 41.47% unregister_netdevice_many
                     - rollback_registered_many
                        - 40.36% netdev_unregister_kobject
                           - 14.55% device_del
                              + 13.71% kobject_uevent
                           - 13.04% netdev_queue_update_kobjects
                              + 12.96% kobject_put
                           - 12.72% net_rx_queue_update_kobjects
                                kobject_put
                              - kobject_release
                                 + 12.69% kobject_uevent
                        + 0.80% call_netdevice_notifiers_info
               + 19.57% nfsd_exit_net
               + 11.15% tcp_net_metrics_exit
               + 8.25% rpcsec_gss_exit_net
      
      It's very critical to optimize the exit path for network namespaces,
      because they are destroyed under net_mutex and many namespaces can be
      destroyed for one iteration.
      
      v2: use dev_set_uevent_suppress()
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      002d8a1a
    • J
      net sched filters: fix notification of filter delete with proper handle · 9ee78374
      Jamal Hadi Salim 提交于
      Daniel says:
      
      While trying out [1][2], I noticed that tc monitor doesn't show the
      correct handle on delete:
      
      $ tc monitor
      qdisc clsact ffff: dev eno1 parent ffff:fff1
      filter dev eno1 ingress protocol all pref 49152 bpf handle 0x2a [...]
      deleted filter dev eno1 ingress protocol all pref 49152 bpf handle 0xf3be0c80
      
      some context to explain the above:
      The user identity of any tc filter is represented by a 32-bit
      identifier encoded in tcm->tcm_handle. Example 0x2a in the bpf filter
      above. A user wishing to delete, get or even modify a specific filter
      uses this handle to reference it.
      Every classifier is free to provide its own semantics for the 32 bit handle.
      Example: classifiers like u32 use schemes like 800:1:801 to describe
      the semantics of their filters represented as hash table, bucket and
      node ids etc.
      Classifiers also have internal per-filter representation which is different
      from this externally visible identity. Most classifiers set this
      internal representation to be a pointer address (which allows fast retrieval
      of said filters in their implementations). This internal representation
      is referenced with the "fh" variable in the kernel control code.
      
      When a user successfuly deletes a specific filter, by specifying the correct
      tcm->tcm_handle, an event is generated to user space which indicates
      which specific filter was deleted.
      
      Before this patch, the "fh" value was sent to user space as the identity.
      As an example what is shown in the sample bpf filter delete event above
      is 0xf3be0c80. This is infact a 32-bit truncation of 0xffff8807f3be0c80
      which happens to be a 64-bit memory address of the internal filter
      representation (address of the corresponding filter's struct cls_bpf_prog);
      
      After this patch the appropriate user identifiable handle as encoded
      in the originating request tcm->tcm_handle is generated in the event.
      One of the cardinal rules of netlink rules is to be able to take an
      event (such as a delete in this case) and reflect it back to the
      kernel and successfully delete the filter. This patch achieves that.
      
      Note, this issue has existed since the original TC action
      infrastructure code patch back in 2004 as found in:
      https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/
      
      [1] http://patchwork.ozlabs.org/patch/682828/
      [2] http://patchwork.ozlabs.org/patch/682829/
      
      Fixes: 4e54c4816bfe ("[NET]: Add tc extensions infrastructure.")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ee78374
    • A
      flow_dissector: fix vlan tag handling · bc72f3dd
      Arnd Bergmann 提交于
      gcc warns about an uninitialized pointer dereference in the vlan
      priority handling:
      
      net/core/flow_dissector.c: In function '__skb_flow_dissect':
      net/core/flow_dissector.c:281:61: error: 'vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      As pointed out by Jiri Pirko, the variable is never actually used
      without being initialized first as the only way it end up uninitialized
      is with skb_vlan_tag_present(skb)==true, and that means it does not
      get accessed.
      
      However, the warning hints at some related issues that I'm addressing
      here:
      
      - the second check for the vlan tag is different from the first one
        that tests the skb for being NULL first, causing both the warning
        and a possible NULL pointer dereference that was not entirely fixed.
      - The same patch that introduced the NULL pointer check dropped an
        earlier optimization that skipped the repeated check of the
        protocol type
      - The local '_vlan' variable is referenced through the 'vlan' pointer
        but the variable has gone out of scope by the time that it is
        accessed, causing undefined behavior
      
      Caching the result of the 'skb && skb_vlan_tag_present(skb)' check
      in a local variable allows the compiler to further optimize the
      later check. With those changes, the warning also disappears.
      
      Fixes: 3805a938 ("flow_dissector: Check skb for VLAN only if skb specified.")
      Fixes: d5709f7a ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NEric Garver <e@erig.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc72f3dd
    • D
      net: ipv6: Do not consider link state for nexthop validation · d5d32e4b
      David Ahern 提交于
      Similar to IPv4, do not consider link state when validating next hops.
      
      Currently, if the link is down default routes can fail to insert:
       $ ip -6 ro add vrf blue default via 2100:2::64 dev eth2
       RTNETLINK answers: No route to host
      
      With this patch the command succeeds.
      
      Fixes: 8c14586f ("net: ipv6: Use passed in table for nexthop lookups")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5d32e4b
    • D
      net: ipv6: Fix processing of RAs in presence of VRF · 830218c1
      David Ahern 提交于
      rt6_add_route_info and rt6_add_dflt_router were updated to pull the FIB
      table from the device index, but the corresponding rt6_get_route_info
      and rt6_get_dflt_router functions were not leading to the failure to
      process RA's:
      
          ICMPv6: RA: ndisc_router_discovery failed to add default route
      
      Fix the 'get' functions by using the table id associated with the
      device when applicable.
      
      Also, now that default routes can be added to tables other than the
      default table, rt6_purge_dflt_routers needs to be updated as well to
      look at all tables. To handle that efficiently, add a flag to the table
      denoting if it is has a default route via RA.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      830218c1
    • J
      genetlink: mark families as __ro_after_init · 56989f6d
      Johannes Berg 提交于
      Now genl_register_family() is the only thing (other than the
      users themselves, perhaps, but I didn't find any doing that)
      writing to the family struct.
      
      In all families that I found, genl_register_family() is only
      called from __init functions (some indirectly, in which case
      I've add __init annotations to clarifly things), so all can
      actually be marked __ro_after_init.
      
      This protects the data structure from accidental corruption.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56989f6d
    • J
      genetlink: use idr to track families · 2ae0f17d
      Johannes Berg 提交于
      Since generic netlink family IDs are small integers, allocated
      densely, IDR is an ideal match for lookups. Replace the existing
      hand-written hash-table with IDR for allocation and lookup.
      
      This lets the families only be written to once, during register,
      since the list_head can be removed and removal of a family won't
      cause any writes.
      
      It also slightly reduces the code size (by about 1.3k on x86-64).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ae0f17d
    • J
      genetlink: statically initialize families · 489111e5
      Johannes Berg 提交于
      Instead of providing macros/inline functions to initialize
      the families, make all users initialize them statically and
      get rid of the macros.
      
      This reduces the kernel code size by about 1.6k on x86-64
      (with allyesconfig).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      489111e5
    • J
      genetlink: no longer support using static family IDs · a07ea4d9
      Johannes Berg 提交于
      Static family IDs have never really been used, the only
      use case was the workaround I introduced for those users
      that assumed their family ID was also their multicast
      group ID.
      
      Additionally, because static family IDs would never be
      reserved by the generic netlink code, using a relatively
      low ID would only work for built-in families that can be
      registered immediately after generic netlink is started,
      which is basically only the control family (apart from
      the workaround code, which I also had to add code for so
      it would reserve those IDs)
      
      Thus, anything other than GENL_ID_GENERATE is flawed and
      luckily not used except in the cases I mentioned. Move
      those workarounds into a few lines of code, and then get
      rid of GENL_ID_GENERATE entirely, making it more robust.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a07ea4d9
    • J
      genetlink: introduce and use genl_family_attrbuf() · c90c39da
      Johannes Berg 提交于
      This helper function allows family implementations to access
      their family's attrbuf. This gets rid of the attrbuf usage
      in families, and also adds locking validation, since it's not
      valid to use the attrbuf with parallel_ops or outside of the
      dumpit callback.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c90c39da
    • A
      skbedit: allow the user to specify bitmask for mark · 4fe77d82
      Antonio Quartulli 提交于
      The user may want to use only some bits of the skb mark in
      his skbedit rules because the remaining part might be used by
      something else.
      
      Introduce the "mask" parameter to the skbedit actor in order
      to implement such functionality.
      
      When the mask is specified, only those bits selected by the
      latter are altered really changed by the actor, while the
      rest is left untouched.
      Signed-off-by: NAntonio Quartulli <antonio@open-mesh.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fe77d82
  6. 27 10月, 2016 8 次提交