1. 26 5月, 2013 1 次提交
    • L
      net: ipv6: Add IPv6 support to the ping socket. · 6d0bfe22
      Lorenzo Colitti 提交于
      This adds the ability to send ICMPv6 echo requests without a
      raw socket. The equivalent ability for ICMPv4 was added in
      2011.
      
      Instead of having separate code paths for IPv4 and IPv6, make
      most of the code in net/ipv4/ping.c dual-stack and only add a
      few IPv6-specific bits (like the protocol definition) to a new
      net/ipv6/ping.c. Hopefully this will reduce divergence and/or
      duplication of bugs in the future.
      
      Caveats:
      
      - Setting options via ancillary data (e.g., using IPV6_PKTINFO
        to specify the outgoing interface) is not yet supported.
      - There are no separate security settings for IPv4 and IPv6;
        everything is controlled by /proc/net/ipv4/ping_group_range.
      - The proc interface does not yet display IPv6 ping sockets
        properly.
      
      Tested with a patched copy of ping6 and using raw socket calls.
      Compiles and works with all of CONFIG_IPV6={n,m,y}.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d0bfe22
  2. 24 5月, 2013 1 次提交
  3. 22 5月, 2013 1 次提交
    • R
      Add include dependencies to <linux/printk.h>. · 154c2670
      Ralf Baechle 提交于
      If <linux/linkage.h> has not been included before <linux/printk.h>,
      a build error like the below one will result:
      
        CC      arch/mips/kernel/idle.o
      In file included from arch/mips/kernel/idle.c:17:0:
      include/linux/printk.h:109:1: error: data definition has no type or storage class [-Werror]
      include/linux/printk.h:109:1: error: type defaults to ‘int’ in declaration of ‘asmlinkage’ [-Werror=implicit-int]
      include/linux/printk.h:110:1: error: ‘format’ attribute only applies to function types [-Werror=attributes]
      include/linux/printk.h:110:1: error: expected ‘,’ or ‘;’ before ‘int’
      include/linux/printk.h:114:1: error: data definition has no type or storage class [-Werror]
      include/linux/printk.h:114:1: error: type defaults to ‘int’ in declaration of ‘asmlinkage’ [-Werror=implicit-int]
      include/linux/printk.h:115:1: error: ‘format’ attribute only applies to function types [-Werror=attributes]
      include/linux/printk.h:115:1: error: expected ‘,’ or ‘;’ before ‘int’
      include/linux/printk.h:117:1: error: data definition has no type or storage class [-Werror]
      include/linux/printk.h:117:1: error: type defaults to ‘int’ in declaration of ‘asmlinkage’ [-Werror=implicit-int]
      include/linux/printk.h:118:1: error: ‘format’ attribute only applies to function types [-Werror=attributes]
      include/linux/printk.h:118:1: error: ‘__cold__’ attribute ignored [-Werror=attributes]
      include/linux/printk.h:118:1: error: expected ‘,’ or ‘;’ before ‘asmlinkage’
      include/linux/printk.h:122:1: error: data definition has no type or storage class [-Werror]
      include/linux/printk.h:122:1: error: type defaults to ‘int’ in declaration of ‘asmlinkage’ [-Werror=implicit-int]
      include/linux/printk.h:123:1: error: ‘format’ attribute only applies to function types [-Werror=attributes]
      include/linux/printk.h:123:1: error: ‘__cold__’ attribute ignored [-Werror=attributes]
      include/linux/printk.h:123:1: error: expected ‘,’ or ‘;’ before ‘int’
      In file included from include/linux/kernel.h:14:0,
                       from include/linux/sched.h:15,
                       from arch/mips/kernel/idle.c:18:
      include/linux/dynamic_debug.h: In function ‘ddebug_dyndbg_module_param_cb’:
      include/linux/dynamic_debug.h:124:3: error: implicit declaration of function ‘printk’ [-Werror=implicit-function-declaration]
      
      Fixed by including <linux/linkage.h>.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      154c2670
  4. 21 5月, 2013 7 次提交
    • F
      phy: add phy_mac_interrupt() to use with PHY_IGNORE_INTERRUPT · 5ea94e76
      Florian Fainelli 提交于
      There is currently no way for an Ethernet MAC driver servicing PHY link
      interrupts to notify this to the PHY state machine without defining its
      own state machine. Since most drivers are not so special, introduce a
      helper: phy_mac_interrupt() which can be called from a link up/down
      interrupt routine to update the PHY state machine. To avoid code
      duplication some refactoring has been done to expose the workqueue and
      its corresponding callback internally.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ea94e76
    • F
      phy: fix the use of PHY_IGNORE_INTERRUPT · 2c7b4921
      Florian Fainelli 提交于
      When a PHY device is registered with the special IRQ value
      PHY_IGNORE_INTERRUPT (-2) it will not properly be handled by the PHY
      library:
      
      - it continues to poll its register, while we do not want this
        because such PHY link events or register changes are serviced by an
        Ethernet MAC
      - it will still try to configure PHY interrupts at the PHY level, such
        interrupts do not exist at the PHY but at the MAC level
      - the state machine only handles PHY_POLL, but should also handle
        PHY_IGNORE_INTERRUPT similarly
      
      This patch updates the PHY state machine and initialization paths to
      account for the specific PHY_IGNORE_INTERRUPT. Based on an earlier patch
      by Thomas Petazzoni, and reworked to add the missing bits. Add a helper
      phy_interrupt_is_valid() which specifically tests for a PHY interrupt
      not to be PHY_POLL or PHY_IGNORE_INTERRUPT and use it throughout the
      code.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c7b4921
    • E
      tcp: md5: remove spinlock usage in fast path · 71cea17e
      Eric Dumazet 提交于
      TCP md5 code uses per cpu variables but protects access to them with
      a shared spinlock, which is a contention point.
      
      [ tcp_md5sig_pool_lock is locked twice per incoming packet ]
      
      Makes things much simpler, by allocating crypto structures once, first
      time a socket needs md5 keys, and not deallocating them as they are
      really small.
      
      Next step would be to allow crypto allocations being done in a NUMA
      aware way.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71cea17e
    • D
      net: ipv6: remove 'next' member from inet6_dev · 168fc21a
      Daniel Borkmann 提交于
      The next pointer within the inet6_dev structure seems not to be used
      anywhere. So just remove it. Tested with allmodconfig on x86_64.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      168fc21a
    • W
      rps: selective flow shedding during softnet overflow · 99bbc707
      Willem de Bruijn 提交于
      A cpu executing the network receive path sheds packets when its input
      queue grows to netdev_max_backlog. A single high rate flow (such as a
      spoofed source DoS) can exceed a single cpu processing rate and will
      degrade throughput of other flows hashed onto the same cpu.
      
      This patch adds a more fine grained hashtable. If the netdev backlog
      is above a threshold, IRQ cpus track the ratio of total traffic of
      each flow (using 4096 buckets, configurable). The ratio is measured
      by counting the number of packets per flow over the last 256 packets
      from the source cpu. Any flow that occupies a large fraction of this
      (set at 50%) will see packet drop while above the threshold.
      
      Tested:
      Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
      kernel receive (RPS) on cpu0 and application threads on cpus 2--7
      each handling 20k req/s. Throughput halves when hit with a 400 kpps
      antagonist storm. With this patch applied, antagonist overload is
      dropped and the server processes its complete load.
      
      The patch is effective when kernel receive processing is the
      bottleneck. The above RPS scenario is a extreme, but the same is
      reached with RFS and sufficient kernel processing (iptables, packet
      socket tap, ..).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99bbc707
    • P
      tty/vt: Fix vc_deallocate() lock order · 421b40a6
      Peter Hurley 提交于
      Now that the tty port owns the flip buffers and i/o is allowed
      from the driver even when no tty is attached, the destruction
      of the tty port (and the flip buffers) must ensure that no
      outstanding work is pending.
      
      Unfortunately, this creates a lock order problem with the
      console_lock (see attached lockdep report [1] below).
      
      For single console deallocation, drop the console_lock prior
      to port destruction. When multiple console deallocation,
      defer port destruction until the consoles have been
      deallocated.
      
      tty_port_destroy() is not required if the port has not
      been used; remove from vc_allocate() failure path.
      
      [1] lockdep report from Dave Jones <davej@redhat.com>
      
       ======================================================
       [ INFO: possible circular locking dependency detected ]
       3.9.0+ #16 Not tainted
       -------------------------------------------------------
       (agetty)/26163 is trying to acquire lock:
       blocked:  ((&buf->work)){+.+...}, instance: ffff88011c8b0020, at: [<ffffffff81062065>] flush_work+0x5/0x2e0
      
       but task is already holding lock:
       blocked:  (console_lock){+.+.+.}, instance: ffffffff81c2fde0, at: [<ffffffff813bc201>] vt_ioctl+0xb61/0x1230
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (console_lock){+.+.+.}:
              [<ffffffff810b3f74>] lock_acquire+0xa4/0x210
              [<ffffffff810416c7>] console_lock+0x77/0x80
              [<ffffffff813c3dcd>] con_flush_chars+0x2d/0x50
              [<ffffffff813b32b2>] n_tty_receive_buf+0x122/0x14d0
              [<ffffffff813b7709>] flush_to_ldisc+0x119/0x170
              [<ffffffff81064381>] process_one_work+0x211/0x700
              [<ffffffff8106498b>] worker_thread+0x11b/0x3a0
              [<ffffffff8106ce5d>] kthread+0xed/0x100
              [<ffffffff81601cac>] ret_from_fork+0x7c/0xb0
      
       -> #0 ((&buf->work)){+.+...}:
              [<ffffffff810b349a>] __lock_acquire+0x193a/0x1c00
              [<ffffffff810b3f74>] lock_acquire+0xa4/0x210
              [<ffffffff810620ae>] flush_work+0x4e/0x2e0
              [<ffffffff81065305>] __cancel_work_timer+0x95/0x130
              [<ffffffff810653b0>] cancel_work_sync+0x10/0x20
              [<ffffffff813b8212>] tty_port_destroy+0x12/0x20
              [<ffffffff813c65e8>] vc_deallocate+0xf8/0x110
              [<ffffffff813bc20c>] vt_ioctl+0xb6c/0x1230
              [<ffffffff813b01a5>] tty_ioctl+0x285/0xd50
              [<ffffffff811ba825>] do_vfs_ioctl+0x305/0x530
              [<ffffffff811baad1>] sys_ioctl+0x81/0xa0
              [<ffffffff81601d59>] system_call_fastpath+0x16/0x1b
      
       other info that might help us debug this:
      
       [ 6760.076175]  Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(console_lock);
                                      lock((&buf->work));
                                      lock(console_lock);
         lock((&buf->work));
      
        *** DEADLOCK ***
      
       1 lock on stack by (agetty)/26163:
        #0: blocked:  (console_lock){+.+.+.}, instance: ffffffff81c2fde0, at: [<ffffffff813bc201>] vt_ioctl+0xb61/0x1230
       stack backtrace:
       Pid: 26163, comm: (agetty) Not tainted 3.9.0+ #16
       Call Trace:
        [<ffffffff815edb14>] print_circular_bug+0x200/0x20e
        [<ffffffff810b349a>] __lock_acquire+0x193a/0x1c00
        [<ffffffff8100a269>] ? sched_clock+0x9/0x10
        [<ffffffff8100a269>] ? sched_clock+0x9/0x10
        [<ffffffff8100a200>] ? native_sched_clock+0x20/0x80
        [<ffffffff810b3f74>] lock_acquire+0xa4/0x210
        [<ffffffff81062065>] ? flush_work+0x5/0x2e0
        [<ffffffff810620ae>] flush_work+0x4e/0x2e0
        [<ffffffff81062065>] ? flush_work+0x5/0x2e0
        [<ffffffff810b15db>] ? mark_held_locks+0xbb/0x140
        [<ffffffff8113c8a3>] ? __free_pages_ok.part.57+0x93/0xc0
        [<ffffffff810b15db>] ? mark_held_locks+0xbb/0x140
        [<ffffffff810652f2>] ? __cancel_work_timer+0x82/0x130
        [<ffffffff81065305>] __cancel_work_timer+0x95/0x130
        [<ffffffff810653b0>] cancel_work_sync+0x10/0x20
        [<ffffffff813b8212>] tty_port_destroy+0x12/0x20
        [<ffffffff813c65e8>] vc_deallocate+0xf8/0x110
        [<ffffffff813bc20c>] vt_ioctl+0xb6c/0x1230
        [<ffffffff810aec41>] ? lock_release_holdtime.part.30+0xa1/0x170
        [<ffffffff813b01a5>] tty_ioctl+0x285/0xd50
        [<ffffffff812b00f6>] ? inode_has_perm.isra.46.constprop.61+0x56/0x80
        [<ffffffff811ba825>] do_vfs_ioctl+0x305/0x530
        [<ffffffff812b04db>] ? selinux_file_ioctl+0x5b/0x110
        [<ffffffff811baad1>] sys_ioctl+0x81/0xa0
        [<ffffffff81601d59>] system_call_fastpath+0x16/0x1b
      
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      421b40a6
    • A
      2a0f9055
  5. 20 5月, 2013 5 次提交
    • E
      filter: do not output bpf image address for security reason · 16495445
      Eric Dumazet 提交于
      Do not leak starting address of BPF JIT code for non root users,
      as it might help intruders to perform an attack.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16495445
    • Y
      tcp: remove bad timeout logic in fast recovery · 3e59cb0d
      Yuchung Cheng 提交于
      tcp_timeout_skb() was intended to trigger fast recovery on timeout,
      unfortunately in reality it often causes spurious retransmission
      storms during fast recovery. The particular sign is a fast retransmit
      over the highest sacked sequence (SND.FACK).
      
      Currently the RTO timer re-arming (as in RFC6298) offers a nice cushion
      to avoid spurious timeout: when SND.UNA advances the sender re-arms
      RTO and extends the timeout by icsk_rto. The sender does not offset
      the time elapsed since the packet at SND.UNA was sent.
      
      But if the next (DUP)ACK arrives later than ~RTTVAR and triggers
      tcp_fastretrans_alert(), then tcp_timeout_skb() will mark any packet
      sent before the icsk_rto interval lost, including one that's above the
      highest sacked sequence. Most likely a large part of scorebard will be
      marked.
      
      If most packets are not lost then the subsequent DUPACKs with new SACK
      blocks will cause the sender to continue to retransmit packets beyond
      SND.FACK spuriously. Even if only one packet is lost the sender may
      falsely retransmit almost the entire window.
      
      The situation becomes common in the world of bufferbloat: the RTT
      continues to grow as the queue builds up but RTTVAR remains small and
      close to the minimum 200ms. If a data packet is lost and the DUPACK
      triggered by the next data packet is slightly delayed, then a spurious
      retransmission storm forms.
      
      As the original comment on tcp_timeout_skb() suggests: the usefulness
      of this feature is questionable. It also wastes cycles walking the
      sack scoreboard and is actually harmful because of false recovery.
      
      It's time to remove this.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NNandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e59cb0d
    • M
      virtio_console: fix uapi header · 6407d75a
      Michael S. Tsirkin 提交于
      uapi should use __u32 not u32.
      Fix a macro in virtio_console.h which uses u32.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@kernel.org
      6407d75a
    • R
      Hoist memcpy_fromiovec/memcpy_toiovec into lib/ · d2f83e90
      Rusty Russell 提交于
      ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!
      
      That function is only present with CONFIG_NET.  Turns out that
      crypto/algif_skcipher.c also uses that outside net, but it actually
      needs sockets anyway.
      
      In addition, commit 6d4f0139 added
      CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
      that function and revert that commit too.
      
      socket.h already includes uio.h, so no callers need updating; trying
      only broke things fo x86_64 randconfig (thanks Fengguang!).
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      d2f83e90
    • N
      ipv6: add support of peer address · caeaba79
      Nicolas Dichtel 提交于
      This patch adds the support of peer address for IPv6. For example, it is
      possible to specify the remote end of a 6inY tunnel.
      This was already possible in IPv4:
       ip addr add ip1 peer ip2 dev dev1
      
      The peer address is specified with IFA_ADDRESS and the local address with
      IFA_LOCAL (like explained in include/uapi/linux/if_addr.h).
      Note that the API is not changed, because before this patch, it was not
      possible to specify two different addresses in IFA_LOCAL and IFA_REMOTE.
      There is a small change for the dump: if the peer is different from ::,
      IFA_ADDRESS will contain the peer address instead of the local address.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      caeaba79
  6. 18 5月, 2013 2 次提交
  7. 17 5月, 2013 5 次提交
  8. 16 5月, 2013 1 次提交
  9. 15 5月, 2013 5 次提交
  10. 14 5月, 2013 1 次提交
  11. 13 5月, 2013 2 次提交
  12. 12 5月, 2013 4 次提交
  13. 10 5月, 2013 5 次提交