1. 05 6月, 2012 17 次提交
  2. 03 6月, 2012 1 次提交
    • L
      tty: Revert the tty locking series, it needs more work · f309532b
      Linus Torvalds 提交于
      This reverts the tty layer change to use per-tty locking, because it's
      not correct yet, and fixing it will require some more deep surgery.
      
      The main revert is d29f3ef3 ("tty_lock: Localise the lock"), but
      there are several smaller commits that built upon it, they also get
      reverted here. The list of reverted commits is:
      
        fde86d31 - tty: add lockdep annotations
        8f6576ad - tty: fix ldisc lock inversion trace
        d3ca8b64 - pty: Fix lock inversion
        b1d679af - tty: drop the pty lock during hangup
        abcefe5f - tty/amiserial: Add missing argument for tty_unlock()
        fd11b42e - cris: fix missing tty arg in wait_event_interruptible_tty call
        d29f3ef3 - tty_lock: Localise the lock
      
      The revert had a trivial conflict in the 68360serial.c staging driver
      that got removed in the meantime.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f309532b
  3. 02 6月, 2012 2 次提交
    • E
      tcp: reflect SYN queue_mapping into SYNACK packets · fff32699
      Eric Dumazet 提交于
      While testing how linux behaves on SYNFLOOD attack on multiqueue device
      (ixgbe), I found that SYNACK messages were dropped at Qdisc level
      because we send them all on a single queue.
      
      Obvious choice is to reflect incoming SYN packet @queue_mapping to
      SYNACK packet.
      
      Under stress, my machine could only send 25.000 SYNACK per second (for
      200.000 incoming SYN per second). NIC : ixgbe with 16 rx/tx queues.
      
      After patch, not a single SYNACK is dropped.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fff32699
    • E
      tcp: do not create inetpeer on SYNACK message · 7433819a
      Eric Dumazet 提交于
      Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting
      larger and larger, using lots of memory and cpu time.
      
      tcp_v4_send_synack()
      ->inet_csk_route_req()
       ->ip_route_output_flow()
        ->rt_set_nexthop()
         ->rt_init_metrics()
          ->inet_getpeer( create = true)
      
      This is a side effect of commit a4daad6b (net: Pre-COW metrics for
      TCP) added in 2.6.39
      
      Possible solution :
      
      Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS
      
      Before patch :
      
      # grep peer /proc/slabinfo
      inet_peer_cache   4175430 4175430    192   42    2 : tunables    0    0    0 : slabdata  99415  99415      0
      
      Samples: 41K of event 'cycles', Event count (approx.): 30716565122
      +  20,24%      ksoftirqd/0  [kernel.kallsyms]           [k] inet_getpeer
      +   8,19%      ksoftirqd/0  [kernel.kallsyms]           [k] peer_avl_rebalance.isra.1
      +   4,81%      ksoftirqd/0  [kernel.kallsyms]           [k] sha_transform
      +   3,64%      ksoftirqd/0  [kernel.kallsyms]           [k] fib_table_lookup
      +   2,36%      ksoftirqd/0  [ixgbe]                     [k] ixgbe_poll
      +   2,16%      ksoftirqd/0  [kernel.kallsyms]           [k] __ip_route_output_key
      +   2,11%      ksoftirqd/0  [kernel.kallsyms]           [k] kernel_map_pages
      +   2,11%      ksoftirqd/0  [kernel.kallsyms]           [k] ip_route_input_common
      +   2,01%      ksoftirqd/0  [kernel.kallsyms]           [k] __inet_lookup_established
      +   1,83%      ksoftirqd/0  [kernel.kallsyms]           [k] md5_transform
      +   1,75%      ksoftirqd/0  [kernel.kallsyms]           [k] check_leaf.isra.9
      +   1,49%      ksoftirqd/0  [kernel.kallsyms]           [k] ipt_do_table
      +   1,46%      ksoftirqd/0  [kernel.kallsyms]           [k] hrtimer_interrupt
      +   1,45%      ksoftirqd/0  [kernel.kallsyms]           [k] kmem_cache_alloc
      +   1,29%      ksoftirqd/0  [kernel.kallsyms]           [k] inet_csk_search_req
      +   1,29%      ksoftirqd/0  [kernel.kallsyms]           [k] __netif_receive_skb
      +   1,16%      ksoftirqd/0  [kernel.kallsyms]           [k] copy_user_generic_string
      +   1,15%      ksoftirqd/0  [kernel.kallsyms]           [k] kmem_cache_free
      +   1,02%      ksoftirqd/0  [kernel.kallsyms]           [k] tcp_make_synack
      +   0,93%      ksoftirqd/0  [kernel.kallsyms]           [k] _raw_spin_lock_bh
      +   0,87%      ksoftirqd/0  [kernel.kallsyms]           [k] __call_rcu
      +   0,84%      ksoftirqd/0  [kernel.kallsyms]           [k] rt_garbage_collect
      +   0,84%      ksoftirqd/0  [kernel.kallsyms]           [k] fib_rules_lookup
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7433819a
  4. 01 6月, 2012 9 次提交
  5. 30 5月, 2012 7 次提交
    • N
      drop_monitor: Add module alias to enable automatic module loading · 3fdcbd45
      Neil Horman 提交于
      Now that we have module alias macros for generic netlink families, lets use
      those to mark modules with the appropriate family names for loading
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: David Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fdcbd45
    • N
      genetlink: Build a generic netlink family module alias · e9412c37
      Neil Horman 提交于
      Generic netlink searches for -type- formatted aliases when requesting a module to
      fulfill a protocol request (i.e. net-pf-16-proto-16-type-<x>, where x is a type
      value).  However generic netlink protocols have no well defined type numbers,
      they have string names.  Modify genl_ctrl_getfamily to request an alias in the
      format net-pf-16-proto-16-family-<x> instead, where x is a generic string, and
      add a macro that builds on the previously added MODULE_ALIAS_NET_PF_PROTO_NAME
      macro to allow modules to specifify those generic strings.
      
      Note, l2tp previously hacked together an net-pf-16-proto-16-type-l2tp alias
      using the MODULE_ALIAS macro, with these updates we can convert that to use the
      PROTO_NAME macro.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: James Chapman <jchapman@katalix.com>
      CC: David Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9412c37
    • G
      memcg: decrement static keys at real destroy time · 3f134619
      Glauber Costa 提交于
      We call the destroy function when a cgroup starts to be removed, such as
      by a rmdir event.
      
      However, because of our reference counters, some objects are still
      inflight.  Right now, we are decrementing the static_keys at destroy()
      time, meaning that if we get rid of the last static_key reference, some
      objects will still have charges, but the code to properly uncharge them
      won't be run.
      
      This becomes a problem specially if it is ever enabled again, because now
      new charges will be added to the staled charges making keeping it pretty
      much impossible.
      
      We just need to be careful with the static branch activation: since there
      is no particular preferred order of their activation, we need to make sure
      that we only start using it after all call sites are active.  This is
      achieved by having a per-memcg flag that is only updated after
      static_key_slow_inc() returns.  At this time, we are sure all sites are
      active.
      
      This is made per-memcg, not global, for a reason: it also has the effect
      of making socket accounting more consistent.  The first memcg to be
      limited will trigger static_key() activation, therefore, accounting.  But
      all the others will then be accounted no matter what.  After this patch,
      only limited memcgs will have its sockets accounted.
      
      [akpm@linux-foundation.org: move enum sock_flag_bits into sock.h,
                                  document enum sock_flag_bits,
                                  convert memcg_proto_active() and memcg_proto_activated() to test_bit(),
                                  redo tcp_update_limit() comment to 80 cols]
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f134619
    • T
      rds_rdma: don't assume infiniband device is PCI · a0c6ffbc
      Thadeu Lima de Souza Cascardo 提交于
      RDS code assumes that the struct ib_device dma_device member, which is a
      pointer, points to a struct device embedded in a struct pci_dev.
      
      This is not the case for ehca, for example, which is a OF driver, and
      makes dma_device point to a struct device embedded in a struct
      platform_device.
      
      This will make the system crash when rds_rdma is loaded in a system
      with ehca, since it will try to access the bus member of a non-existent
      struct pci_dev.
      
      The only reason rds_rdma uses the struct pci_dev is to get the NUMA node
      the device is attached to. Using dev_to_node for that is much better,
      since it won't assume which bus the infiniband is attached to.
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Cc: dledford@redhat.com
      Cc: Jes.Sorensen@redhat.com
      Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Acked-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0c6ffbc
    • J
      l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case · c51ce497
      James Chapman 提交于
      An application may call connect() to disconnect a socket using an
      address with family AF_UNSPEC. The L2TP IP sockets were not handling
      this case when the socket is not bound and an attempt to connect()
      using AF_UNSPEC in such cases would result in an oops. This patch
      addresses the problem by protecting the sk_prot->disconnect() call
      against trying to unhash the socket before it is bound.
      
      The L2TP IPv4 and IPv6 sockets have the same problem. Both are fixed
      by this patch.
      
      The patch also adds more checks that the sockaddr supplied to bind()
      and connect() calls is valid.
      
       RIP: 0010:[<ffffffff82e133b0>]  [<ffffffff82e133b0>] inet_unhash+0x50/0xd0
       RSP: 0018:ffff88001989be28  EFLAGS: 00010293
       Stack:
        ffff8800407a8000 0000000000000000 ffff88001989be78 ffffffff82e3a249
        ffffffff82e3a050 ffff88001989bec8 ffff88001989be88 ffff8800407a8000
        0000000000000010 ffff88001989bec8 ffff88001989bea8 ffffffff82e42639
       Call Trace:
       [<ffffffff82e3a249>] udp_disconnect+0x1f9/0x290
       [<ffffffff82e42639>] inet_dgram_connect+0x29/0x80
       [<ffffffff82d012fc>] sys_connect+0x9c/0x100
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NJames Chapman <jchapman@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c51ce497
    • E
      mac80211: fix ADDBA declined after suspend with wowlan · 7b21aea0
      Eyal Shapira 提交于
      WLAN_STA_BLOCK_BA is set while suspending but doesn't get cleared
      when resuming in case of wowlan. This causes further ADDBA requests
      received to be rejected. Fix it by clearing it in the wowlan path
      as well.
      Signed-off-by: NEyal Shapira <eyal@wizery.com>
      Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      7b21aea0
    • C
      c26a0e10
  6. 27 5月, 2012 2 次提交
    • G
      ipv6: fix incorrect ipsec fragment · 0c183379
      Gao feng 提交于
      Since commit ad0081e4
      "ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed"
      the fragment of packets is incorrect.
      because tunnel mode needs IPsec headers and trailer for all fragments,
      while on transport mode it is sufficient to add the headers to the
      first fragment and the trailer to the last.
      
      so modify mtu and maxfraglen base on ipsec mode and if fragment is first
      or last.
      
      with my test,it work well(every fragment's size is the mtu)
      and does not trigger slow fragment path.
      
      Changes from v1:
      	though optimization, mtu_prev and maxfraglen_prev can be delete.
      	replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL.
      	add fuction ip6_append_data_mtu to make codes clearer.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c183379
    • B
      xfrm: take net hdr len into account for esp payload size calculation · 91657eaf
      Benjamin Poirier 提交于
      Corrects the function that determines the esp payload size. The calculations
      done in esp{4,6}_get_mtu() lead to overlength frames in transport mode for
      certain mtu values and suboptimal frames for others.
      
      According to what is done, mainly in esp{,6}_output() and tcp_mtu_to_mss(),
      net_header_len must be taken into account before doing the alignment
      calculation.
      Signed-off-by: NBenjamin Poirier <bpoirier@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91657eaf
  7. 25 5月, 2012 2 次提交