1. 10 12月, 2015 18 次提交
  2. 09 12月, 2015 5 次提交
    • T
      sock, cgroup: add sock->sk_cgroup · bd1060a1
      Tejun Heo 提交于
      In cgroup v1, dealing with cgroup membership was difficult because the
      number of membership associations was unbound.  As a result, cgroup v1
      grew several controllers whose primary purpose is either tagging
      membership or pull in configuration knobs from other subsystems so
      that cgroup membership test can be avoided.
      
      net_cls and net_prio controllers are examples of the latter.  They
      allow configuring network-specific attributes from cgroup side so that
      network subsystem can avoid testing cgroup membership; unfortunately,
      these are not only cumbersome but also problematic.
      
      Both net_cls and net_prio aren't properly hierarchical.  Both inherit
      configuration from the parent on creation but there's no interaction
      afterwards.  An ancestor doesn't restrict the behavior in its subtree
      in anyway and configuration changes aren't propagated downwards.
      Especially when combined with cgroup delegation, this is problematic
      because delegatees can mess up whatever network configuration
      implemented at the system level.  net_prio would allow the delegatees
      to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
      the same for classid.
      
      While it is possible to solve these issues from controller side by
      implementing hierarchical allowable ranges in both controllers, it
      would involve quite a bit of complexity in the controllers and further
      obfuscate network configuration as it becomes even more difficult to
      tell what's actually being configured looking from the network side.
      While not much can be done for v1 at this point, as membership
      handling is sane on cgroup v2, it'd be better to make cgroup matching
      behave like other network matches and classifiers than introducing
      further complications.
      
      In preparation, this patch updates sock->sk_cgrp_data handling so that
      it points to the v2 cgroup that sock was created in until either
      net_prio or net_cls is used.  Once either of the two is used,
      sock->sk_cgrp_data reverts to its previous role of carrying prioidx
      and classid.  This is to avoid adding yet another cgroup related field
      to struct sock.
      
      As the mode switching can happen at most once per boot, the switching
      mechanism is aimed at lowering hot path overhead.  It may leak a
      finite, likely small, number of cgroup refs and report spurious
      prioidx or classid on switching; however, dynamic updates of prioidx
      and classid have always been racy and lossy - socks between creation
      and fd installation are never updated, config changes don't update
      existing sockets at all, and prioidx may index with dead and recycled
      cgroup IDs.  Non-critical inaccuracies from small race windows won't
      make any noticeable difference.
      
      This patch doesn't make use of the pointer yet.  The following patch
      will implement netfilter match for cgroup2 membership.
      
      v2: Use sock_cgroup_data to avoid inflating struct sock w/ another
          cgroup specific field.
      
      v3: Add comments explaining why sock_data_prioidx() and
          sock_data_classid() use different fallback values.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Daniel Wagner <daniel.wagner@bmw-carit.de>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd1060a1
    • T
      net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct · 2a56a1fe
      Tejun Heo 提交于
      Introduce sock->sk_cgrp_data which is a struct sock_cgroup_data.
      ->sk_cgroup_prioidx and ->sk_classid are moved into it.  The struct
      and its accessors are defined in cgroup-defs.h.  This is to prepare
      for overloading the fields with a cgroup pointer.
      
      This patch mostly performs equivalent conversions but the followings
      are noteworthy.
      
      * Equality test before updating classid is removed from
        sock_update_classid().  This shouldn't make any noticeable
        difference and a similar test will be implemented on the helper side
        later.
      
      * sock_update_netprioidx() now takes struct sock_cgroup_data and can
        be moved to netprio_cgroup.h without causing include dependency
        loop.  Moved.
      
      * The dummy version of sock_update_netprioidx() converted to a static
        inline function while at it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a56a1fe
    • T
      netprio_cgroup: limit the maximum css->id to USHRT_MAX · 297dbde1
      Tejun Heo 提交于
      netprio builds per-netdev contiguous priomap array which is indexed by
      css->id.  The array is allocated using kzalloc() effectively limiting
      the maximum ID supported to some thousand range.  This patch caps the
      maximum supported css->id to USHRT_MAX which should be way above what
      is actually useable.
      
      This allows reducing sock->sk_cgrp_prioidx to u16 from u32.  The freed
      up part will be used to overload the cgroup related fields.
      sock->sk_cgrp_prioidx's position is swapped with sk_mark so that the
      two cgroup related fields are adjacent.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      297dbde1
    • S
      Revert "Merge branch 'vsock-virtio'" · 8ac2837c
      Stefan Hajnoczi 提交于
      This reverts commit 0d76d6e8 and merge
      commit c402293b, reversing changes made
      to c89359a4.
      
      The virtio-vsock device specification is not finalized yet.  Michael
      Tsirkin voiced concerned about merging this code when the hardware
      interface (and possibly the userspace interface) could still change.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ac2837c
    • R
      net: Fix inverted test in __skb_recv_datagram · 760a4322
      Rainer Weikusat 提交于
      As the kernel generally uses negated error numbers, *err needs to be
      compared with -EAGAIN (d'oh).
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Fixes: ea3793ee ("core: enable more fine-grained datagram reception control")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      760a4322
  3. 08 12月, 2015 4 次提交
  4. 07 12月, 2015 3 次提交
    • I
      mac80211: handle HW ROC expired properly · 1b894521
      Ilan Peer 提交于
      In case of HW ROC, when the driver reports that the ROC expired,
      it is not sufficient to purge the ROCs based on the remaining
      time, as it possible that the device finished the ROC session
      before the actual requested duration.
      
      To handle such cases, in case of ROC expired notification from
      the driver, complete all the ROCs which are marked with hw_begun,
      regardless of the remaining duration.
      Signed-off-by: NIlan Peer <ilan.peer@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      1b894521
    • R
      af_unix: fix unix_dgram_recvmsg entry locking · 64874280
      Rainer Weikusat 提交于
      The current unix_dgram_recvsmg code acquires the u->readlock mutex in
      order to protect access to the peek offset prior to calling
      __skb_recv_datagram for actually receiving data. This implies that a
      blocking reader will go to sleep with this mutex held if there's
      presently no data to return to userspace. Two non-desirable side effects
      of this are that a later non-blocking read call on the same socket will
      block on the ->readlock mutex until the earlier blocking call releases it
      (or the readers is interrupted) and that later blocking read calls
      will wait longer than the effective socket read timeout says they
      should: The timeout will only start 'ticking' once such a reader hits
      the schedule_timeout in wait_for_more_packets (core.c) while the time it
      already had to wait until it could acquire the mutex is unaccounted for.
      
      The patch avoids both by using the __skb_try_recv_datagram and
      __skb_wait_for_more packets functions created by the first patch to
      implement a unix_dgram_recvmsg read loop which releases the readlock
      mutex prior to going to sleep and reacquires it as needed
      afterwards. Non-blocking readers will thus immediately return with
      -EAGAIN if there's no data available regardless of any concurrent
      blocking readers and all blocking readers will end up sleeping via
      schedule_timeout, thus honouring the configured socket receive timeout.
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64874280
    • R
      core: enable more fine-grained datagram reception control · ea3793ee
      Rainer Weikusat 提交于
      The __skb_recv_datagram routine in core/ datagram.c provides a general
      skb reception factility supposed to be utilized by protocol modules
      providing datagram sockets. It encompasses both the actual recvmsg code
      and a surrounding 'sleep until data is available' loop. This is
      inconvenient if a protocol module has to use additional locking in order
      to maintain some per-socket state the generic datagram socket code is
      unaware of (as the af_unix code does). The patch below moves the recvmsg
      proper code into a new __skb_try_recv_datagram routine which doesn't
      sleep and renames wait_for_more_packets to
      __skb_wait_for_more_packets, both routines being exported interfaces. The
      original __skb_recv_datagram routine is reimplemented on top of these
      two functions such that its user-visible behaviour remains unchanged.
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea3793ee
  5. 06 12月, 2015 3 次提交
  6. 04 12月, 2015 7 次提交
    • J
      mac80211: reject zero cookie in mgmt-tx/roc cancel · 7d37fcd4
      Johannes Berg 提交于
      When cancelling, you can cancel "any" (first in list) mgmt-tx
      or remain-on-channel operation by using the value 0 for the
      cookie along with the *opposite* operation, i.e.
       * cancel the first mgmt-tx by cancelling roc with 0 cookie
       * cancel the first roc by cancelling mgmt-tx with 0 cookie
      
      This isn't really that bad since userspace should only pass
      cookies that we gave it, but could lead to hard-to-debug
      issues so better prevent it and reject zero values since we
      never hand those out.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      7d37fcd4
    • J
      mac80211: Allow a STA to join an IBSS with 80+80 MHz channel · c39b336d
      Jouni Malinen 提交于
      While it was possible to create an IBSS with 80+80 MHz channel, joining
      such an IBSS resulted in falling back to 20 MHz channel with VHT
      disabled due to a missing switch case for 80+80.
      Signed-off-by: NJouni Malinen <jouni@qca.qualcomm.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      c39b336d
    • M
      cfg80211: reg: Refactor calculation of bandwidth flags · 1aeb135f
      Michal Sojka 提交于
      The same piece of code appears at two places. Make a function from it.
      Signed-off-by: NMichal Sojka <sojkam1@fel.cvut.cz>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      1aeb135f
    • J
      mac80211: rewrite remain-on-channel logic · aaa016cc
      Johannes Berg 提交于
      Jouni found a bug in the remain-on-channel logic: when a short item
      is queued, a long item is combined with it extending the original
      one, and then the long item is deleted, the timeout doesn't go back
      to the short one, and the short item ends up taking a long time. In
      this case, this showed as blocking scan when running two test cases
      back to back - the scan from the second was delayed even though all
      the remain-on-channel items should long have been gone.
      
      Fixing this with the current data structures turns out to be a bit
      complicated, we just remove the long item from the dependents list
      right now and don't recalculate the timeouts.
      
      There's a somewhat similar bug where we delete the short item and
      all the dependents go with it; to fix this we'd have to move them
      from the dependents to the real list.
      
      Instead of trying to do that, rewrite the code to not have all this
      complexity in the data structures: use a single list and allow more
      than one entry in it being marked as started. This makes the code a
      bit more complex, the worker needs to understand that it might need
      to just remove one of the started items, while keeping the device
      off-channel, but that's not more complicated than the nested data
      structures.
      
      This then fixes both issues described, and makes it easier to also
      limit the overall off-channel time when combining.
      
      TODO: as before, with hardware remain-on-channel, deleting an item
      after combining results in cancelling them all - we can keep track
      of the time elapsed and only cancel after that to fix this.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      aaa016cc
    • J
      mac80211: simplify ack_skb handling · 5ee00dbd
      Johannes Berg 提交于
      Since the cookie is assigned inside ieee80211_make_ack_skb()
      now, we no longer need to return the ack_skb as the cookie
      and can simplify the function's return and the callers. Also
      rename it to ieee80211_attach_ack_skb() to more accurately
      reflect its purpose.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      5ee00dbd
    • J
      mac80211: move off-channel/mgmt-tx code to offchannel.c · a2fcfccb
      Johannes Berg 提交于
      This is quite a bit of code that logically depends here since
      it has to deal with all the remain-on-channel logic.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      a2fcfccb
    • J
      mac80211: fix mgmt-tx abort cookie and leak · e673a659
      Johannes Berg 提交于
      If a mgmt-tx operation is aborted before it runs, the wrong
      cookie is reported back to userspace, and the ack_skb gets
      leaked since the frame is freed directly instead of freeing
      it using ieee80211_free_txskb(). Fix that.
      
      Fixes: 3b79af97 ("mac80211: stop using pointers as userspace cookies")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      e673a659