1. 15 3月, 2011 9 次提交
  2. 11 3月, 2011 5 次提交
  3. 10 3月, 2011 4 次提交
    • D
      ipv6: Don't create clones of host routes. · 7343ff31
      David S. Miller 提交于
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=29252
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30462
      
      In commit d80bc0fd ("ipv6: Always
      clone offlink routes.") we forced the kernel to always clone offlink
      routes.
      
      The reason we do that is to make sure we never bind an inetpeer to a
      prefixed route.
      
      The logic turned on here has existed in the tree for many years,
      but was always off due to a protecting CPP define.  So perhaps
      it's no surprise that there is a logic bug here.
      
      The problem is that we canot clone a route that is already a
      host route (ie. has DST_HOST set).  Because if we do, an identical
      entry already exists in the routing tree and therefore the
      ip6_rt_ins() call is going to fail.
      
      This sets off a series of failures and high cpu usage, because when
      ip6_rt_ins() fails we loop retrying this operation a few times in
      order to handle a race between two threads trying to clone and insert
      the same host route at the same time.
      
      Fix this by simply using the route as-is when DST_HOST is set.
      
      Reported-by: slash@ac.auone-net.jp
      Reported-by: NErnst Sjöstrand <ernstp@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7343ff31
    • V
      net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules · 8909c9ad
      Vasiliy Kulikov 提交于
      Since a8f80e8f any process with
      CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
      that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
      limited to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't
      allow anybody load any module not related to networking.
      
      This patch restricts an ability of autoloading modules to netdev modules
      with explicit aliases.  This fixes CVE-2011-1019.
      
      Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
      of loading netdev modules by name (without any prefix) for processes
      with CAP_SYS_MODULE to maintain the compatibility with network scripts
      that use autoloading netdev modules by aliases like "eth0", "wlan0".
      
      Currently there are only three users of the feature in the upstream
      kernel: ipip, ip_gre and sit.
      
          root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
          root@albatros:~# grep Cap /proc/$$/status
          CapInh:	0000000000000000
          CapPrm:	fffffff800001000
          CapEff:	fffffff800001000
          CapBnd:	fffffff800001000
          root@albatros:~# modprobe xfs
          FATAL: Error inserting xfs
          (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
          root@albatros:~# lsmod | grep xfs
          root@albatros:~# ifconfig xfs
          xfs: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep xfs
          root@albatros:~# lsmod | grep sit
          root@albatros:~# ifconfig sit
          sit: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep sit
          root@albatros:~# ifconfig sit0
          sit0      Link encap:IPv6-in-IPv4
      	      NOARP  MTU:1480  Metric:1
      
          root@albatros:~# lsmod | grep sit
          sit                    10457  0
          tunnel4                 2957  1 sit
      
      For CAP_SYS_MODULE module loading is still relaxed:
      
          root@albatros:~# grep Cap /proc/$$/status
          CapInh:	0000000000000000
          CapPrm:	ffffffffffffffff
          CapEff:	ffffffffffffffff
          CapBnd:	ffffffffffffffff
          root@albatros:~# ifconfig xfs
          xfs: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep xfs
          xfs                   745319  0
      
      Reference: https://lkml.org/lkml/2011/2/24/203Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NKees Cook <kees.cook@canonical.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      8909c9ad
    • D
      pktgen: fix errata in show results · 03a14ab1
      Daniel Turull 提交于
      The units in show_results in pktgen were not correct.
      The results are in usec but it was displayed nsec.
      Reported-by: NJong-won Lee <ljw@handong.edu>
      Signed-off-by: NDaniel Turull <daniel.turull@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03a14ab1
    • D
      ipv4: Fix erroneous uses of ifa_address. · 6c91afe1
      David S. Miller 提交于
      In usual cases ifa_address == ifa_local, but in the case where
      SIOCSIFDSTADDR sets the destination address on a point-to-point
      link, ifa_address gets set to that destination address.
      
      Therefore we should use ifa_local when we want the local interface
      address.
      
      There were two cases where the selection was done incorrectly:
      
      1) When devinet_ioctl() does matching, it checks ifa_address even
         though gifconf correct reported ifa_local to the user
      
      2) IN_DEV_ARP_NOTIFY handling sends a gratuitous ARP using
         ifa_address instead of ifa_local.
      Reported-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c91afe1
  4. 09 3月, 2011 1 次提交
    • N
      rds: prevent BUG_ON triggering on congestion map updates · 6094628b
      Neil Horman 提交于
      Recently had this bug halt reported to me:
      
      kernel BUG at net/rds/send.c:329!
      Oops: Exception in kernel mode, sig: 5 [#1]
      SMP NR_CPUS=1024 NUMA pSeries
      Modules linked in: rds sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg
      ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt
      dm_mod [last unloaded: scsi_wait_scan]
      NIP: d000000003ca68f4 LR: d000000003ca67fc CTR: d000000003ca8770
      REGS: c000000175cab980 TRAP: 0700   Not tainted  (2.6.32-118.el6.ppc64)
      MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 44000022  XER: 00000000
      TASK = c00000017586ec90[1896] 'krdsd' THREAD: c000000175ca8000 CPU: 0
      GPR00: 0000000000000150 c000000175cabc00 d000000003cb7340 0000000000002030
      GPR04: ffffffffffffffff 0000000000000030 0000000000000000 0000000000000030
      GPR08: 0000000000000001 0000000000000001 c0000001756b1e30 0000000000010000
      GPR12: d000000003caac90 c000000000fa2500 c0000001742b2858 c0000001742b2a00
      GPR16: c0000001742b2a08 c0000001742b2820 0000000000000001 0000000000000001
      GPR20: 0000000000000040 c0000001742b2814 c000000175cabc70 0800000000000000
      GPR24: 0000000000000004 0200000000000000 0000000000000000 c0000001742b2860
      GPR28: 0000000000000000 c0000001756b1c80 d000000003cb68e8 c0000001742b27b8
      NIP [d000000003ca68f4] .rds_send_xmit+0x4c4/0x8a0 [rds]
      LR [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
      Call Trace:
      [c000000175cabc00] [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
      (unreliable)
      [c000000175cabd30] [d000000003ca7e64] .rds_send_worker+0x54/0x100 [rds]
      [c000000175cabdb0] [c0000000000b475c] .worker_thread+0x1dc/0x3c0
      [c000000175cabed0] [c0000000000baa9c] .kthread+0xbc/0xd0
      [c000000175cabf90] [c000000000032114] .kernel_thread+0x54/0x70
      Instruction dump:
      4bfffd50 60000000 60000000 39080001 935f004c f91f0040 41820024 813d017c
      7d094a78 7d290074 7929d182 394a0020 <0b090000> 40e2ff68 4bffffa4 39200000
      Kernel panic - not syncing: Fatal exception
      Call Trace:
      [c000000175cab560] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable)
      [c000000175cab610] [c0000000005a365c] .panic+0x80/0x1b4
      [c000000175cab6a0] [c00000000002fbcc] .die+0x21c/0x2a0
      [c000000175cab750] [c000000000030000] ._exception+0x110/0x220
      [c000000175cab910] [c000000000004b9c] program_check_common+0x11c/0x180
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6094628b
  5. 08 3月, 2011 2 次提交
    • R
      net: fix multithreaded signal handling in unix recv routines · b3ca9b02
      Rainer Weikusat 提交于
      The unix_dgram_recvmsg and unix_stream_recvmsg routines in
      net/af_unix.c utilize mutex_lock(&u->readlock) calls in order to
      serialize read operations of multiple threads on a single socket. This
      implies that, if all n threads of a process block in an AF_UNIX recv
      call trying to read data from the same socket, one of these threads
      will be sleeping in state TASK_INTERRUPTIBLE and all others in state
      TASK_UNINTERRUPTIBLE. Provided that a particular signal is supposed to
      be handled by a signal handler defined by the process and that none of
      this threads is blocking the signal, the complete_signal routine in
      kernel/signal.c will select the 'first' such thread it happens to
      encounter when deciding which thread to notify that a signal is
      supposed to be handled and if this is one of the TASK_UNINTERRUPTIBLE
      threads, the signal won't be handled until the one thread not blocking
      on the u->readlock mutex is woken up because some data to process has
      arrived (if this ever happens). The included patch fixes this by
      changing mutex_lock to mutex_lock_interruptible and handling possible
      error returns in the same way interruptions are handled by the actual
      receive-code.
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3ca9b02
    • T
      net: Enter net/ipv6/ even if CONFIG_IPV6=n · 2ea6d8c4
      Thomas Graf 提交于
      exthdrs_core.c and addrconf_core.c in net/ipv6/ contain bits which
      must be made available even if IPv6 is disabled.
      
      net/ipv6/Makefile already correctly includes them if CONFIG_IPV6=n
      but net/Makefile prevents entering the subdirectory.
      Signed-off-by: NThomas Graf <tgraf@infradead.org>
      Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ea6d8c4
  6. 05 3月, 2011 3 次提交
    • S
      libceph: fix msgr standby handling · e00de341
      Sage Weil 提交于
      The standby logic used to be pretty dependent on the work requeueing
      behavior that changed when we switched to WQ_NON_REENTRANT.  It was also
      very fragile.
      
      Restructure things so that:
       - We clear WRITE_PENDING when we set STANDBY.  This ensures we will
         requeue work when we wake up later.
       - con_work backs off if STANDBY is set.  There is nothing to do if we are
         in standby.
       - clear_standby() helper is called by both con_send() and con_keepalive(),
         the two actions that can wake us up again.  Move the connect_seq++
         logic here.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e00de341
    • S
      libceph: fix msgr keepalive flag · e76661d0
      Sage Weil 提交于
      There was some broken keepalive code using a dead variable.  Shift to using
      the proper bit flag.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e76661d0
    • S
      libceph: fix msgr backoff · 60bf8bf8
      Sage Weil 提交于
      With commit f363e45f we replaced a bunch of hacky workqueue mutual
      exclusion logic with the WQ_NON_REENTRANT flag.  One pieces of fallout is
      that the exponential backoff breaks in certain cases:
      
       * con_work attempts to connect.
       * we get an immediate failure, and the socket state change handler queues
         immediate work.
       * con_work calls con_fault, we decide to back off, but can't queue delayed
         work.
      
      In this case, we add a BACKOFF bit to make con_work reschedule delayed work
      next time it runs (which should be immediately).
      Signed-off-by: NSage Weil <sage@newdream.net>
      60bf8bf8
  7. 04 3月, 2011 3 次提交
    • D
      DNS: Fix a NULL pointer deref when trying to read an error key [CVE-2011-1076] · 1362fa07
      David Howells 提交于
      When a DNS resolver key is instantiated with an error indication, attempts to
      read that key will result in an oops because user_read() is expecting there to
      be a payload - and there isn't one [CVE-2011-1076].
      
      Give the DNS resolver key its own read handler that returns the error cached in
      key->type_data.x[0] as an error rather than crashing.
      
      Also make the kenter() at the beginning of dns_resolver_instantiate() limit the
      amount of data it prints, since the data is not necessarily NUL-terminated.
      
      The buggy code was added in:
      
      	commit 4a2d7892
      	Author: Wang Lei <wang840925@gmail.com>
      	Date:   Wed Aug 11 09:37:58 2010 +0100
      	Subject: DNS: If the DNS server returns an error, allow that to be cached [ver #2]
      
      This can trivially be reproduced by any user with the following program
      compiled with -lkeyutils:
      
      	#include <stdlib.h>
      	#include <keyutils.h>
      	#include <err.h>
      	static char payload[] = "#dnserror=6";
      	int main()
      	{
      		key_serial_t key;
      		key = add_key("dns_resolver", "a", payload, sizeof(payload),
      			      KEY_SPEC_SESSION_KEYRING);
      		if (key == -1)
      			err(1, "add_key");
      		if (keyctl_read(key, NULL, 0) == -1)
      			err(1, "read_key");
      		return 0;
      	}
      
      What should happen is that keyctl_read() reports error 6 (ENXIO) to the user:
      
      	dns-break: read_key: No such device or address
      
      but instead the kernel oopses.
      
      This cannot be reproduced with the 'keyutils add' or 'keyutils padd' commands
      as both of those cut the data down below the NUL termination that must be
      included in the data.  Without this dns_resolver_instantiate() will return
      -EINVAL and the key will not be instantiated such that it can be read.
      
      The oops looks like:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      IP: [<ffffffff811b99f7>] user_read+0x4f/0x8f
      PGD 3bdf8067 PUD 385b9067 PMD 0
      Oops: 0000 [#1] SMP
      last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/irq
      CPU 0
      Modules linked in:
      
      Pid: 2150, comm: dns-break Not tainted 2.6.38-rc7-cachefs+ #468                  /DG965RY
      RIP: 0010:[<ffffffff811b99f7>]  [<ffffffff811b99f7>] user_read+0x4f/0x8f
      RSP: 0018:ffff88003bf47f08  EFLAGS: 00010246
      RAX: 0000000000000001 RBX: ffff88003b5ea378 RCX: ffffffff81972368
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003b5ea378
      RBP: ffff88003bf47f28 R08: ffff88003be56620 R09: 0000000000000000
      R10: 0000000000000395 R11: 0000000000000002 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffffffffa1
      FS:  00007feab5751700(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000010 CR3: 000000003de40000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process dns-break (pid: 2150, threadinfo ffff88003bf46000, task ffff88003be56090)
      Stack:
       ffff88003b5ea378 ffff88003b5ea3a0 0000000000000000 0000000000000000
       ffff88003bf47f68 ffffffff811b708e ffff88003c442bc8 0000000000000000
       00000000004005a0 00007fffba368060 0000000000000000 0000000000000000
      Call Trace:
       [<ffffffff811b708e>] keyctl_read_key+0xac/0xcf
       [<ffffffff811b7c07>] sys_keyctl+0x75/0xb6
       [<ffffffff81001f7b>] system_call_fastpath+0x16/0x1b
      Code: 75 1f 48 83 7b 28 00 75 18 c6 05 58 2b fb 00 01 be bb 00 00 00 48 c7 c7 76 1c 75 81 e8 13 c2 e9 ff 4c 8b b3 e0 00 00 00 4d 85 ed <41> 0f b7 5e 10 74 2d 4d 85 e4 74 28 e8 98 79 ee ff 49 39 dd 48
      RIP  [<ffffffff811b99f7>] user_read+0x4f/0x8f
       RSP <ffff88003bf47f08>
      CR2: 0000000000000010
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      cc: Wang Lei <wang840925@gmail.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      1362fa07
    • S
      libceph: retry after authorization failure · 692d20f5
      Sage Weil 提交于
      If we mark the connection CLOSED we will give up trying to reconnect to
      this server instance.  That is appropriate for things like a protocol
      version mismatch that won't change until the server is restarted, at which
      point we'll get a new addr and reconnect.  An authorization failure like
      this is probably due to the server not properly rotating it's secret keys,
      however, and should be treated as transient so that the normal backoff and
      retry behavior kicks in.
      Signed-off-by: NSage Weil <sage@newdream.net>
      692d20f5
    • S
      libceph: fix handling of short returns from get_user_pages · 38815b78
      Sage Weil 提交于
      get_user_pages() can return fewer pages than we ask for.  We were returning
      a bogus pointer/error code in that case.  Instead, loop until we get all
      the pages we want or get an error we can return to the caller.
      Signed-off-by: NSage Weil <sage@newdream.net>
      38815b78
  8. 03 3月, 2011 2 次提交
  9. 02 3月, 2011 3 次提交
    • J
      netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values · 9ef0298a
      Jan Engelhardt 提交于
      Like many other places, we have to check that the array index is
      within allowed limits, or otherwise, a kernel oops and other nastiness
      can ensue when we access memory beyond the end of the array.
      
      [ 5954.115381] BUG: unable to handle kernel paging request at 0000004000000000
      [ 5954.120014] IP:  __find_logger+0x6f/0xa0
      [ 5954.123979]  nf_log_bind_pf+0x2b/0x70
      [ 5954.123979]  nfulnl_recv_config+0xc0/0x4a0 [nfnetlink_log]
      [ 5954.123979]  nfnetlink_rcv_msg+0x12c/0x1b0 [nfnetlink]
      ...
      
      The problem goes back to v2.6.30-rc1~1372~1342~31 where nf_log_bind
      was decoupled from nf_log_register.
      
      Reported-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>,
        via irc.freenode.net/#netfilter
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      9ef0298a
    • G
      dccp: fix oops on Reset after close · 720dc34b
      Gerrit Renker 提交于
      This fixes a bug in the order of dccp_rcv_state_process() that still permitted
      reception even after closing the socket. A Reset after close thus causes a NULL
      pointer dereference by not preventing operations on an already torn-down socket.
      
       dccp_v4_do_rcv() 
      	|
      	| state other than OPEN
      	v
       dccp_rcv_state_process()
      	|
      	| DCCP_PKT_RESET
      	v
       dccp_rcv_reset()
      	|
      	v
       dccp_time_wait()
      
       WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128()
       Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah
       [<c0038850>] (unwind_backtrace+0x0/0xec) from [<c0055364>] (warn_slowpath_common)
       [<c0055364>] (warn_slowpath_common+0x4c/0x64) from [<c0055398>] (warn_slowpath_n)
       [<c0055398>] (warn_slowpath_null+0x1c/0x24) from [<c02b72d0>] (__inet_twsk_hashd)
       [<c02b72d0>] (__inet_twsk_hashdance+0x48/0x128) from [<c031caa0>] (dccp_time_wai)
       [<c031caa0>] (dccp_time_wait+0x40/0xc8) from [<c031c15c>] (dccp_rcv_state_proces)
       [<c031c15c>] (dccp_rcv_state_process+0x120/0x538) from [<c032609c>] (dccp_v4_do_)
       [<c032609c>] (dccp_v4_do_rcv+0x11c/0x14c) from [<c0286594>] (release_sock+0xac/0)
       [<c0286594>] (release_sock+0xac/0x110) from [<c031fd34>] (dccp_close+0x28c/0x380)
       [<c031fd34>] (dccp_close+0x28c/0x380) from [<c02d9a78>] (inet_release+0x64/0x70)
      
      The fix is by testing the socket state first. Receiving a packet in Closed state
      now also produces the required "No connection" Reset reply of RFC 4340, 8.3.1.
      Reported-and-tested-by: NJohan Hovold <jhovold@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      720dc34b
    • J
      ipvs: fix dst_lock locking on dest update · ff75f40f
      Julian Anastasov 提交于
      	Fix dst_lock usage in __ip_vs_update_dest. We need
      _bh locking because destination is updated in user context.
      Can cause lockups on frequent destination updates.
      Problem reported by Simon Kirby. Bug was introduced
      in 2.6.37 from the "ipvs: changes for local real server"
      change.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NHans Schillstrom <hans@schillstrom.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      ff75f40f
  10. 01 3月, 2011 1 次提交
    • A
      netlink: handle errors from netlink_dump() · b44d211e
      Andrey Vagin 提交于
      netlink_dump() may failed, but nobody handle its error.
      It generates output data, when a previous portion has been returned to
      user space. This mechanism works when all data isn't go in skb. If we
      enter in netlink_recvmsg() and skb is absent in the recv queue, the
      netlink_dump() will not been executed. So if netlink_dump() is failed
      one time, the new data never appear and the reader will sleep forever.
      
      netlink_dump() is called from two places:
      
      1. from netlink_sendmsg->...->netlink_dump_start().
         In this place we can report error directly and it will be returned
         by sendmsg().
      
      2. from netlink_recvmsg
         There we can't report error directly, because we have a portion of
         valid output data and call netlink_dump() for prepare the next portion.
         If netlink_dump() is failed, the socket will be mark as error and the
         next recvmsg will be failed.
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b44d211e
  11. 26 2月, 2011 3 次提交
  12. 23 2月, 2011 4 次提交