1. 02 4月, 2012 1 次提交
    • T
      cgroup: relocate cftype and cgroup_subsys definitions in controllers · 676f7c8f
      Tejun Heo 提交于
      blk-cgroup, netprio_cgroup, cls_cgroup and tcp_memcontrol
      unnecessarily define cftype array and cgroup_subsys structures at the
      top of the file, which is unconventional and necessiates forward
      declaration of methods.
      
      This patch relocates those below the definitions of the methods and
      removes the forward declarations.  Note that forward declaration of
      tcp_files[] is added in tcp_memcontrol.c for tcp_init_cgroup().  This
      will be removed soon by another patch.
      
      This patch doesn't introduce any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      676f7c8f
  2. 29 3月, 2012 1 次提交
  3. 26 3月, 2012 1 次提交
  4. 22 3月, 2012 1 次提交
  5. 20 3月, 2012 1 次提交
  6. 12 3月, 2012 1 次提交
  7. 07 3月, 2012 1 次提交
  8. 06 3月, 2012 1 次提交
  9. 05 3月, 2012 1 次提交
  10. 01 3月, 2012 1 次提交
  11. 27 2月, 2012 1 次提交
  12. 25 2月, 2012 1 次提交
  13. 24 2月, 2012 4 次提交
  14. 22 2月, 2012 5 次提交
    • G
      rtnetlink: Fix problem with buffer allocation · 115c9b81
      Greg Rose 提交于
      Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
      is a 32 bit value that can be used to indicate to the kernel that
      certain extended ifinfo values are requested by the user application.
      At this time the only mask value defined is RTEXT_FILTER_VF to
      indicate that the user wants the ifinfo dump to send information
      about the VFs belonging to the interface.
      
      This patch fixes a bug in which certain applications do not have
      large enough buffers to accommodate the extra information returned
      by the kernel with large numbers of SR-IOV virtual functions.
      Those applications will not send the new netlink attribute with
      the interface info dump request netlink messages so they will
      not get unexpectedly large request buffers returned by the kernel.
      
      Modifies the rtnl_calcit function to traverse the list of net
      devices and compute the minimum buffer size that can hold the
      info dumps of all matching devices based upon the filter passed
      in via the new netlink attribute filter mask.  If no filter
      mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.
      
      With this change it is possible to add yet to be defined netlink
      attributes to the dump request which should make it fairly extensible
      in the future.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      115c9b81
    • M
      neighbour: Fixed race condition at tbl->nht · 84338a6c
      Michel Machado 提交于
      When the fixed race condition happens:
      
      1. While function neigh_periodic_work scans the neighbor hash table
      pointed by field tbl->nht, it unlocks and locks tbl->lock between
      buckets in order to call cond_resched.
      
      2. Assume that function neigh_periodic_work calls cond_resched, that is,
      the lock tbl->lock is available, and function neigh_hash_grow runs.
      
      3. Once function neigh_hash_grow finishes, and RCU calls
      neigh_hash_free_rcu, the original struct neigh_hash_table that function
      neigh_periodic_work was using doesn't exist anymore.
      
      4. Once back at neigh_periodic_work, whenever the old struct
      neigh_hash_table is accessed, things can go badly.
      Signed-off-by: NMichel Machado <michel@digirati.com.br>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84338a6c
    • P
      sock: Introduce the SO_PEEK_OFF sock option · ef64a54f
      Pavel Emelyanov 提交于
      This one specifies where to start MSG_PEEK-ing queue data from. When
      set to negative value means that MSG_PEEK works as ususally -- peeks
      from the head of the queue always.
      
      When some bytes are peeked from queue and the peeking offset is non
      negative it is moved forward so that the next peek will return next
      portion of data.
      
      When non-peeking recvmsg occurs and the peeking offset is non negative
      is is moved backward so that the next peek will still peek the proper
      data (i.e. the one that would have been picked if there were no non
      peeking recv in between).
      
      The offset is set using per-proto opteration to let the protocol handle
      the locking issues and to check whether the peeking offset feature is
      supported by the protocol the socket belongs to.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef64a54f
    • P
      datagram: Add offset argument to __skb_recv_datagram · 3f518bf7
      Pavel Emelyanov 提交于
      This one is only considered for MSG_PEEK flag and the value pointed by
      it specifies where to start peeking bytes from. If the offset happens to
      point into the middle of the returned skb, the offset within this skb is
      put back to this very argument.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f518bf7
    • P
      datagram: Factor out sk queue referencing · 4934b032
      Pavel Emelyanov 提交于
      This makes lines shorter and simplifies further patching.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4934b032
  15. 15 2月, 2012 1 次提交
  16. 14 2月, 2012 1 次提交
  17. 11 2月, 2012 3 次提交
  18. 09 2月, 2012 2 次提交
    • E
      gro: more generic L2 header check · 5ca3b72c
      Eric Dumazet 提交于
      Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
      only, and failed on IB/ipoib traffic.
      
      He provided a patch faking a zeroed header to let GRO aggregates frames.
      
      Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
      check to be more generic, ie not assuming L2 header is 14 bytes, but
      taking into account hard_header_len.
      
      __napi_gro_receive() has special handling for the common case (Ethernet)
      to avoid a memcmp() call and use an inline optimized function instead.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Reported-by: NShlomo Pongratz <shlomop@mellanox.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Tested-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ca3b72c
    • E
      gro: more generic L2 header check · 43480aec
      Eric Dumazet 提交于
      Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
      only, and failed on IB/ipoib traffic.
      
      He provided a patch faking a zeroed header to let GRO aggregates frames.
      
      Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
      check to be more generic, ie not assuming L2 header is 14 bytes, but
      taking into account hard_header_len.
      
      __napi_gro_receive() has special handling for the common case (Ethernet)
      to avoid a memcmp() call and use an inline optimized function instead.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Reported-by: NShlomo Pongratz <shlomop@mellanox.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Tested-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43480aec
  19. 05 2月, 2012 1 次提交
  20. 03 2月, 2012 1 次提交
    • L
      cgroup: remove cgroup_subsys argument from callbacks · 761b3ef5
      Li Zefan 提交于
      The argument is not used at all, and it's not necessary, because
      a specific callback handler of course knows which subsys it
      belongs to.
      
      Now only ->pupulate() takes this argument, because the handlers of
      this callback always call cgroup_add_file()/cgroup_add_files().
      
      So we reduce a few lines of code, though the shrinking of object size
      is minimal.
      
       16 files changed, 113 insertions(+), 162 deletions(-)
      
         text    data     bss     dec     hex filename
      5486240  656987 7039960 13183187         c928d3 vmlinux.o.orig
      5486170  656987 7039960 13183117         c9288d vmlinux.o
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      761b3ef5
  21. 02 2月, 2012 4 次提交
  22. 31 1月, 2012 1 次提交
    • T
      net: Allow ipv6 proxies and arp proxies be shown with iproute2 · 84920c14
      Tony Zelenoff 提交于
      Add ability to return neighbour proxies list to caller if
      it sent full ndmsg structure and has NTF_PROXY flag set.
      
      Before this patch (and before iproute2 patches):
      $ ip neigh add proxy 2001::1 dev eth0
      $ ip -6 neigh show
      $
      
      After it and with applied iproute2 patches:
      $ ip neigh add proxy 2001::1 dev eth0
      $ ip -6 neigh show
      2001::1 dev eth0  proxy
      $
      
      Compatibility with old versions of iproute2 is not broken,
      kernel checks for incoming structure size and properly
      works if old structure is came.
      
      [v2]
      * changed comments style.
      * removed useless line with continue and curly bracket.
      * changed incoming message size check from equal to more or
        equal.
      
      CC: davem@davemloft.net
      CC: kuznet@ms2.inr.ac.ru
      CC: netdev@vger.kernel.org
      CC: xemul@parallels.com
      Signed-off-by: NTony Zelenoff <antonz@parallels.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84920c14
  23. 27 1月, 2012 2 次提交
    • S
      net: RTNETLINK adjusting values of min_ifinfo_dump_size · f18da145
      Stefan Gula 提交于
      Setting link parameters on a netdevice changes the value
      of if_nlmsg_size(), therefore it is necessary to recalculate
      min_ifinfo_dump_size.
      Signed-off-by: NStefan Gula <steweg@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f18da145
    • E
      netns: fix net_alloc_generic() · 073862ba
      Eric Dumazet 提交于
      When a new net namespace is created, we should attach to it a "struct
      net_generic" with enough slots (even empty), or we can hit the following
      BUG_ON() :
      
      [  200.752016] kernel BUG at include/net/netns/generic.h:40!
      ...
      [  200.752016]  [<ffffffff825c3cea>] ? get_cfcnfg+0x3a/0x180
      [  200.752016]  [<ffffffff821cf0b0>] ? lockdep_rtnl_is_held+0x10/0x20
      [  200.752016]  [<ffffffff825c41be>] caif_device_notify+0x2e/0x530
      [  200.752016]  [<ffffffff810d61b7>] notifier_call_chain+0x67/0x110
      [  200.752016]  [<ffffffff810d67c1>] raw_notifier_call_chain+0x11/0x20
      [  200.752016]  [<ffffffff821bae82>] call_netdevice_notifiers+0x32/0x60
      [  200.752016]  [<ffffffff821c2b26>] register_netdevice+0x196/0x300
      [  200.752016]  [<ffffffff821c2ca9>] register_netdev+0x19/0x30
      [  200.752016]  [<ffffffff81c1c67a>] loopback_net_init+0x4a/0xa0
      [  200.752016]  [<ffffffff821b5e62>] ops_init+0x42/0x180
      [  200.752016]  [<ffffffff821b600b>] setup_net+0x6b/0x100
      [  200.752016]  [<ffffffff821b6466>] copy_net_ns+0x86/0x110
      [  200.752016]  [<ffffffff810d5789>] create_new_namespaces+0xd9/0x190
      
      net_alloc_generic() should take into account the maximum index into the
      ptr array, as a subsystem might use net_generic() anytime.
      
      This also reduces number of reallocations in net_assign_generic()
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Tested-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Sjur Brændeland <sjur.brandeland@stericsson.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      073862ba
  24. 25 1月, 2012 1 次提交
  25. 23 1月, 2012 2 次提交
    • M
      ethtool: allow ETHTOOL_GSSET_INFO for users · f80400a2
      Michał Mirosław 提交于
      Allow ETHTOOL_GSSET_INFO ethtool ioctl() for unprivileged users.
      ETHTOOL_GSTRINGS is already allowed, but is unusable without this one.
      Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Acked-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f80400a2
    • G
      net: introduce res_counter_charge_nofail() for socket allocations · 0e90b31f
      Glauber Costa 提交于
      There is a case in __sk_mem_schedule(), where an allocation
      is beyond the maximum, but yet we are allowed to proceed.
      It happens under the following condition:
      
      	sk->sk_wmem_queued + size >= sk->sk_sndbuf
      
      The network code won't revert the allocation in this case,
      meaning that at some point later it'll try to do it. Since
      this is never communicated to the underlying res_counter
      code, there is an inbalance in res_counter uncharge operation.
      
      I see two ways of fixing this:
      
      1) storing the information about those allocations somewhere
         in memcg, and then deducting from that first, before
         we start draining the res_counter,
      2) providing a slightly different allocation function for
         the res_counter, that matches the original behavior of
         the network code more closely.
      
      I decided to go for #2 here, believing it to be more elegant,
      since #1 would require us to do basically that, but in a more
      obscure way.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      CC: Tejun Heo <tj@kernel.org>
      CC: Li Zefan <lizf@cn.fujitsu.com>
      CC: Laurent Chavey <chavey@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e90b31f