1. 18 5月, 2012 12 次提交
  2. 27 4月, 2012 1 次提交
    • N
      sh: Fix up tracepoint build fallout from static key introduction. · ec2ccd88
      Nobuhiro Iwamatsu 提交于
      With the introduction of static keys, anything using tracepoints blows up
      in the following manner:
      
      include/trace/events/oom.h:8:13: error: initializer element is not constant
      include/trace/events/oom.h:8:13: error: (near initialization for '__tracepoint_oom_score_adj_update')
      include/trace/events/oom.h:8:13: error: initializer element is not constant
      include/trace/events/oom.h:8:13: error: (near initialization for '__tracepoint_oom_score_adj_update.key')
      
      This is a result of the STATIC_KEY_INIT_xxx defs wrapping ATOMIC_INIT()
      which on sh includes an atomic_t typecast. Given that we don't really
      need the typecast for anything anymore, the simplest solution is simply
      to kill off the cast.
      Signed-off-by: NNobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      ec2ccd88
  3. 19 4月, 2012 1 次提交
    • S
      sh: Fix error synchronising kernel page tables · 8d9a784d
      Stuart Menefy 提交于
      The problem is caused by the interaction of two features in the Linux
      memory management code.
      
      A processes address space is described by a struct mm_struct, and
      every thread has a pointer to the mm it should run in. The exception
      to this are kernel threads, which don't have an mm, and so borrow
      the mm from the last thread which ran. The system is bootstrapped
      by the initial kernel thread using init's mm (even though init hasn't
      been created yet, its mm is the static init_mm).
      
      The other feature is how the kernel handles the page table which
      describes the portion of the address space which is only visible when
      executing inside the kernel, and which is shared by all threads. On
      the SH4 the only portion of the kernel's address space which described
      using the page table is called P3, from 0xc0000000 to 0xdfffffff. This
      portion of the address space is divided into three:
        - mappings for dma_alloc_coherent()
        - mappings for vmalloc() and ioremap()
        - fixmap mappings, primarily used in copy_user_pages() to create
          kernel mappings of user pages with the correct cache colour.
      
      To optimise the TLB miss handler we don't want to add an additional
      condition which checks whether the faulting address is in the user or
      the kernel portion of the address space, and so all page tables have a
      common portion which describes the kernel part of the address
      space. As the SH4 uses a two level page table, only the kernel portion
      of first level page table (the pgd entries) is duplicated. These all
      point to the same second level entries (the pte's), and so no memory
      is wasted.
      
      The reference page table for the kernel is called the swapper_pg_dir,
      and when a new page table is created for a new process the kernel
      portion of the page table is copied from swapper_pg_dir. This works
      fine when changes only occur in the second level of the kernel's page
      table, or the first level entries are created before any new user
      processes. However if a change occurs to the first level of the page
      table, and there are existing processes which don't have this entry in
      their page table, this new entry needs to be added. This is done on
      demand, when the kernel accesses a P3 address which isn't mapped using
      the current page table, the code in vmalloc_fault() copies the entry
      from the reference page table (swapper_pg_dir) into the current
      processes page table.
      
      The bug which this patch addresses is that the code in vmalloc_fault()
      was not copying addresses which fell in the dma_alloc_coherent()
      portion of the address space, and it should have been copying any P3
      address.
      
      Why we hadn't seen this before, and what made this hard to reproduce,
      is that normally the kernel will have called dma_alloc_coherent(), and
      accessed the memory mapping created, before any user process
      runs. Typically drivers such as USB or SATA will have created and used
      mappings of this type during the kernel initialisation, when probing
      for the attached devices, before init runs. Ethernet is slightly
      different, as it normally only creates and accesses
      dma_alloc_coherent() mappings when the network is brought up, but if
      kernel level IP configuration is used this will also occur before any
      user space process runs. So the first reproduction of this problem
      which we saw was occurred when USB and SATA were removed from the
      kernel, and then bring up Ethernet from user space using ifconfig.
      I'd like to thank Joseph Bormolini who did the hard work reducing the
      problem to this simple to reproduce criteria.
      
      In your case the situation is slightly different, and turns out to
      depends on the exact kernel configuration (which we had) and your
      ramdisk contents (which we didn't - hence the need for some assumptions).
      
      In this case the problem is a side effect of kernel level module
      loading. Kernel subsystems sometimes trigger the load of kernel
      modules directly, for example the crypto subsystem tries to load the
      cryptomgr and MTD tries to load modules for Flash partitioning if
      these are not built into the kernel. This is done by the kernel
      creating a user process which runs insmod to try and load the
      appropriate module.
      
      In order for this to cause problems the system must be running with a
      initrd or initramfs, which contains an insmod executable - if the
      kernel can't find an insmod to run, no user process is created, and
      the problem doesn't occur.  If an insmod is found, a process is
      created to run it, which will inherit the kernel portion of the
      swapper_pg_dir first level page table. It doesn't matter whether the
      inmod is successful or not, but when the the kernel scheduler context
      switches back to the kernel initialisation thread, the insmod's mm is
      'borrowed' by the kernel thread, as it doesn't have an address space
      of its own. (Reference counting is used to ensure this mm is not
      destroyed, even though the user process which caused its creation may no
      longer exist.) If this address space doesn't have a first level page
      table entry for the consistent mappings, and a driver tries to access
      such a mapping, we are in the same situation as described above,
      except this time in a kernel thread rather than a user thread
      executing inside the kernel.
      
      See bugzilla: 15425, 15836, 15862, 16106, 16793
      Signed-off-by: NStuart Menefy <stuart.menefy@st.com>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      8d9a784d
  4. 04 4月, 2012 1 次提交
  5. 30 3月, 2012 4 次提交
  6. 29 3月, 2012 6 次提交
  7. 28 3月, 2012 10 次提交
  8. 26 3月, 2012 1 次提交
  9. 24 3月, 2012 1 次提交
    • J
      coredump: remove VM_ALWAYSDUMP flag · 909af768
      Jason Baron 提交于
      The motivation for this patchset was that I was looking at a way for a
      qemu-kvm process, to exclude the guest memory from its core dump, which
      can be quite large.  There are already a number of filter flags in
      /proc/<pid>/coredump_filter, however, these allow one to specify 'types'
      of kernel memory, not specific address ranges (which is needed in this
      case).
      
      Since there are no more vma flags available, the first patch eliminates
      the need for the 'VM_ALWAYSDUMP' flag.  The flag is used internally by
      the kernel to mark vdso and vsyscall pages.  However, it is simple
      enough to check if a vma covers a vdso or vsyscall page without the need
      for this flag.
      
      The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
      'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
      'MADV_DONTDUMP', and unset via 'MADV_DODUMP'.  The core dump filters
      continue to work the same as before unless 'MADV_DONTDUMP' is set on the
      region.
      
      The qemu code which implements this features is at:
      
        http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch
      
      In my testing the qemu core dump shrunk from 383MB -> 13MB with this
      patch.
      
      I also believe that the 'MADV_DONTDUMP' flag might be useful for
      security sensitive apps, which might want to select which areas are
      dumped.
      
      This patch:
      
      The VM_ALWAYSDUMP flag is currently used by the coredump code to
      indicate that a vma is part of a vsyscall or vdso section.  However, we
      can determine if a vma is in one these sections by checking it against
      the gate_vma and checking for a non-NULL return value from
      arch_vma_name().  Thus, freeing a valuable vma bit.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NRoland McGrath <roland@hack.frob.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      909af768
  10. 21 3月, 2012 1 次提交
  11. 20 3月, 2012 1 次提交
  12. 15 3月, 2012 1 次提交