1. 16 12月, 2015 4 次提交
  2. 15 12月, 2015 1 次提交
  3. 12 12月, 2015 4 次提交
  4. 10 12月, 2015 1 次提交
    • T
      cgroup: fix sock_cgroup_data initialization on earlier compilers · ad2c8c73
      Tejun Heo 提交于
      sock_cgroup_data is a struct containing an anonymous union.
      sock_cgroup_set_prioidx() and sock_cgroup_set_classid() were
      initializing a field inside the anonymous union as follows.
      
       struct sock_ccgroup_data skcd_buf = { .val = VAL };
      
      While this is fine on more recent compilers, gcc-4.4.7 triggers the
      following errors.
      
       include/linux/cgroup-defs.h: In function ‘sock_cgroup_set_prioidx’:
       include/linux/cgroup-defs.h:619: error: unknown field ‘val’ specified in initializer
       include/linux/cgroup-defs.h:619: warning: missing braces around initializer
       include/linux/cgroup-defs.h:619: warning: (near initialization for ‘skcd_buf.<anonymous>’)
      
      This is because .val belongs to the anonymous union nested inside the
      struct but the initializer is missing the nesting.  Fix it by adding
      an extra pair of braces.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAlaa Hleihel <alaa@dev.mellanox.co.il>
      Fixes: bd1060a1 ("sock, cgroup: add sock->sk_cgroup")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad2c8c73
  5. 09 12月, 2015 3 次提交
    • T
      sock, cgroup: add sock->sk_cgroup · bd1060a1
      Tejun Heo 提交于
      In cgroup v1, dealing with cgroup membership was difficult because the
      number of membership associations was unbound.  As a result, cgroup v1
      grew several controllers whose primary purpose is either tagging
      membership or pull in configuration knobs from other subsystems so
      that cgroup membership test can be avoided.
      
      net_cls and net_prio controllers are examples of the latter.  They
      allow configuring network-specific attributes from cgroup side so that
      network subsystem can avoid testing cgroup membership; unfortunately,
      these are not only cumbersome but also problematic.
      
      Both net_cls and net_prio aren't properly hierarchical.  Both inherit
      configuration from the parent on creation but there's no interaction
      afterwards.  An ancestor doesn't restrict the behavior in its subtree
      in anyway and configuration changes aren't propagated downwards.
      Especially when combined with cgroup delegation, this is problematic
      because delegatees can mess up whatever network configuration
      implemented at the system level.  net_prio would allow the delegatees
      to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
      the same for classid.
      
      While it is possible to solve these issues from controller side by
      implementing hierarchical allowable ranges in both controllers, it
      would involve quite a bit of complexity in the controllers and further
      obfuscate network configuration as it becomes even more difficult to
      tell what's actually being configured looking from the network side.
      While not much can be done for v1 at this point, as membership
      handling is sane on cgroup v2, it'd be better to make cgroup matching
      behave like other network matches and classifiers than introducing
      further complications.
      
      In preparation, this patch updates sock->sk_cgrp_data handling so that
      it points to the v2 cgroup that sock was created in until either
      net_prio or net_cls is used.  Once either of the two is used,
      sock->sk_cgrp_data reverts to its previous role of carrying prioidx
      and classid.  This is to avoid adding yet another cgroup related field
      to struct sock.
      
      As the mode switching can happen at most once per boot, the switching
      mechanism is aimed at lowering hot path overhead.  It may leak a
      finite, likely small, number of cgroup refs and report spurious
      prioidx or classid on switching; however, dynamic updates of prioidx
      and classid have always been racy and lossy - socks between creation
      and fd installation are never updated, config changes don't update
      existing sockets at all, and prioidx may index with dead and recycled
      cgroup IDs.  Non-critical inaccuracies from small race windows won't
      make any noticeable difference.
      
      This patch doesn't make use of the pointer yet.  The following patch
      will implement netfilter match for cgroup2 membership.
      
      v2: Use sock_cgroup_data to avoid inflating struct sock w/ another
          cgroup specific field.
      
      v3: Add comments explaining why sock_data_prioidx() and
          sock_data_classid() use different fallback values.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Daniel Wagner <daniel.wagner@bmw-carit.de>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd1060a1
    • T
      net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct · 2a56a1fe
      Tejun Heo 提交于
      Introduce sock->sk_cgrp_data which is a struct sock_cgroup_data.
      ->sk_cgroup_prioidx and ->sk_classid are moved into it.  The struct
      and its accessors are defined in cgroup-defs.h.  This is to prepare
      for overloading the fields with a cgroup pointer.
      
      This patch mostly performs equivalent conversions but the followings
      are noteworthy.
      
      * Equality test before updating classid is removed from
        sock_update_classid().  This shouldn't make any noticeable
        difference and a similar test will be implemented on the helper side
        later.
      
      * sock_update_netprioidx() now takes struct sock_cgroup_data and can
        be moved to netprio_cgroup.h without causing include dependency
        loop.  Moved.
      
      * The dummy version of sock_update_netprioidx() converted to a static
        inline function while at it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a56a1fe
    • S
      Revert "Merge branch 'vsock-virtio'" · 8ac2837c
      Stefan Hajnoczi 提交于
      This reverts commit 0d76d6e8 and merge
      commit c402293b, reversing changes made
      to c89359a4.
      
      The virtio-vsock device specification is not finalized yet.  Michael
      Tsirkin voiced concerned about merging this code when the hardware
      interface (and possibly the userspace interface) could still change.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ac2837c
  6. 07 12月, 2015 2 次提交
    • R
      core: enable more fine-grained datagram reception control · ea3793ee
      Rainer Weikusat 提交于
      The __skb_recv_datagram routine in core/ datagram.c provides a general
      skb reception factility supposed to be utilized by protocol modules
      providing datagram sockets. It encompasses both the actual recvmsg code
      and a surrounding 'sleep until data is available' loop. This is
      inconvenient if a protocol module has to use additional locking in order
      to maintain some per-socket state the generic datagram socket code is
      unaware of (as the af_unix code does). The patch below moves the recvmsg
      proper code into a new __skb_try_recv_datagram routine which doesn't
      sleep and renames wait_for_more_packets to
      __skb_wait_for_more_packets, both routines being exported interfaces. The
      original __skb_recv_datagram routine is reimplemented on top of these
      two functions such that its user-visible behaviour remains unchanged.
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea3793ee
    • M
      net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode · 5f61385d
      Moni Shoua 提交于
      Due to HW limitations, indexes to MAC and VLAN tables are always taken
      from the table of the actual port. So, if a resource holds an index to
      a table, it may refer to different values during the lifetime of the
      resource,  unless the tables are mirrored. Also, even when
      driver is not in HA mode the policy of allocating an index to these
      tables is such to make sure, as much as possible, that when the time
      comes the mirroring will be successful. This means that in multifunction
      mode the allocation of a free index in a port's table tries to make sure
      that the same index in the other's port table is also free.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f61385d
  7. 06 12月, 2015 2 次提交
  8. 04 12月, 2015 23 次提交