1. 17 12月, 2012 1 次提交
  2. 12 12月, 2012 1 次提交
    • A
      mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB · 42d7395f
      Andi Kleen 提交于
      There was some desire in large applications using MAP_HUGETLB or
      SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on
      others.  This is useful together with NUMA policy: use 2MB interleaving
      on some mappings, but 1GB on local mappings.
      
      This patch extends the IPC/SHM syscall interfaces slightly to allow
      specifying the page size.
      
      It borrows some upper bits in the existing flag arguments and allows
      encoding the log of the desired page size in addition to the *_HUGETLB
      flag.  When 0 is specified the default size is used, this makes the
      change fully compatible.
      
      Extending the internal hugetlb code to handle this is straight forward.
      Instead of a single mount it just keeps an array of them and selects the
      right mount based on the specified page size.  When no page size is
      specified it uses the mount of the default page size.
      
      The change is not visible in /proc/mounts because internal mounts don't
      appear there.  It also has very little overhead: the additional mounts
      just consume a super block, but not more memory when not used.
      
      I also exported the new flags to the user headers (they were previously
      under __KERNEL__).  Right now only symbols for x86 and some other
      architecture for 1GB and 2MB are defined.  The interface should already
      work for all other architectures though.  Only architectures that define
      multiple hugetlb sizes actually need it (that is currently x86, tile,
      powerpc).  However tile and powerpc have user configurable hugetlb
      sizes, so it's not easy to add defines.  A program on those
      architectures would need to query sysfs and use the appropiate log2.
      
      [akpm@linux-foundation.org: cleanups]
      [rientjes@google.com: fix build]
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42d7395f
  3. 06 12月, 2012 1 次提交
  4. 29 11月, 2012 9 次提交
  5. 16 11月, 2012 1 次提交
    • J
      TTY: call tty_port_destroy in the rest of drivers · 191c5f10
      Jiri Slaby 提交于
      After commit "TTY: move tty buffers to tty_port", the tty buffers are
      not freed in some drivers. This is because tty_port_destructor is not
      called whenever a tty_port is freed. This was an assumption I counted
      with but was unfortunately untrue. So fix the drivers to fulfil this
      assumption.
      
      To be sure, the TTY buffers (and later some stuff) are gone along with
      the tty_port, we have to call tty_port_destroy at tear-down places.
      This is mostly where the structure containing a tty_port is freed.
      This patch does exactly that -- put tty_port_destroy at those places.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      191c5f10
  6. 14 11月, 2012 1 次提交
  7. 09 11月, 2012 1 次提交
  8. 01 11月, 2012 1 次提交
    • P
      sk-filter: Add ability to get socket filter program (v2) · a8fc9277
      Pavel Emelyanov 提交于
      The SO_ATTACH_FILTER option is set only. I propose to add the get
      ability by using SO_ATTACH_FILTER in getsockopt. To be less
      irritating to eyes the SO_GET_FILTER alias to it is declared. This
      ability is required by checkpoint-restore project to be able to
      save full state of a socket.
      
      There are two issues with getting filter back.
      
      First, kernel modifies the sock_filter->code on filter load, thus in
      order to return the filter element back to user we have to decode it
      into user-visible constants. Fortunately the modification in question
      is interconvertible.
      
      Second, the BPF_S_ALU_DIV_K code modifies the command argument k to
      speed up the run-time division by doing kernel_k = reciprocal(user_k).
      Bad news is that different user_k may result in same kernel_k, so we
      can't get the original user_k back. Good news is that we don't have
      to do it. What we need to is calculate a user2_k so, that
      
        reciprocal(user2_k) == reciprocal(user_k) == kernel_k
      
      i.e. if it's re-loaded back the compiled again value will be exactly
      the same as it was. That said, the user2_k can be calculated like this
      
        user2_k = reciprocal(kernel_k)
      
      with an exception, that if kernel_k == 0, then user2_k == 1.
      
      The optlen argument is treated like this -- when zero, kernel returns
      the amount of sock_fprog elements in filter, otherwise it should be
      large enough for the sock_fprog array.
      
      changes since v1:
      * Declared SO_GET_FILTER in all arch headers
      * Added decode of vlan-tag codes
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8fc9277
  9. 26 10月, 2012 1 次提交
  10. 24 10月, 2012 1 次提交
  11. 21 10月, 2012 1 次提交
  12. 13 10月, 2012 2 次提交
    • J
      vfs: define struct filename and have getname() return it · 91a27b2a
      Jeff Layton 提交于
      getname() is intended to copy pathname strings from userspace into a
      kernel buffer. The result is just a string in kernel space. It would
      however be quite helpful to be able to attach some ancillary info to
      the string.
      
      For instance, we could attach some audit-related info to reduce the
      amount of audit-related processing needed. When auditing is enabled,
      we could also call getname() on the string more than once and not
      need to recopy it from userspace.
      
      This patchset converts the getname()/putname() interfaces to return
      a struct instead of a string. For now, the struct just tracks the
      string in kernel space and the original userland pointer for it.
      
      Later, we'll add other information to the struct as it becomes
      convenient.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      91a27b2a
    • A
      alpha: switch to saner kernel_execve() semantics · 5522be6a
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5522be6a
  13. 12 10月, 2012 4 次提交
    • A
      alpha: get rid of switch_stack argument of do_work_pending() · cb450766
      Al Viro 提交于
      ... and now the asm glue side of that.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cb450766
    • A
      alpha: don't bother passing switch_stack separately from regs · d9d0738a
      Al Viro 提交于
      It's needed only in setup_sigcontext() and it's always reg - <constant>;
      no point passing it all way down through the call chain.  This is just
      the signal.c side of that stuff; next will come the asm glue one...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d9d0738a
    • A
      alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c · 6972d6f2
      Al Viro 提交于
      Turn the slow side of work_pending into C function, including all
      the looping.  What we get out of that:
      	* we do _not_ call get_signal_to_deliver() with IRQs disabled
      anymore
      	* no need to save/restore volatiles on each pass if there
      turns to be more than one (unlikely, but still)
      	* all double-restart prevention is in C now.
      	* glue gets simpler.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6972d6f2
    • A
      alpha: simplify TIF_NEED_RESCHED handling · 7721d3c2
      Al Viro 提交于
      In case we have both NEED_RESCHED and SIGPENDING/NOTIFY_RESUME,
      handle the latter first.  We'll get to original priorities in
      the next commit, but now that allows to simplify the treatment
      of NEED_RESCHED-only case nicely.  Namely, now there no need to
      preserve the data for restarts across the call of schedule() in
      $work_resched; we can get there only if we had either returned
      from syscall without SIGPENDING (in which case we should've
      had no restart-worthy return value and want no restarts) or
      already got through do_notify_resume() call (in which case we
      want no restarts anymore).  So we can just slap 0 into $19
      instead of preserving it (and $20).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7721d3c2
  14. 09 10月, 2012 1 次提交
    • K
      mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9
      Konstantin Khlebnikov 提交于
      A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
      currently it lost original meaning but still has some effects:
      
       | effect                 | alternative flags
      -+------------------------+---------------------------------------------
      1| account as reserved_vm | VM_IO
      2| skip in core dump      | VM_IO, VM_DONTDUMP
      3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      
      This patch removes reserved_vm counter from mm_struct.  Seems like nobody
      cares about it, it does not exported into userspace directly, it only
      reduces total_vm showed in proc.
      
      Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
      
      remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
      remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
      
      [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314e51b9
  15. 04 10月, 2012 1 次提交
  16. 03 10月, 2012 1 次提交
  17. 01 10月, 2012 6 次提交
  18. 28 9月, 2012 1 次提交
    • D
      Make most arch asm/module.h files use asm-generic/module.h · 786d35d4
      David Howells 提交于
      Use the mapping of Elf_[SPE]hdr, Elf_Addr, Elf_Sym, Elf_Dyn, Elf_Rel/Rela,
      ELF_R_TYPE() and ELF_R_SYM() to either the 32-bit version or the 64-bit version
      into asm-generic/module.h for all arches bar MIPS.
      
      Also, use the generic definition mod_arch_specific where possible.
      
      To this end, I've defined three new config bools:
      
       (*) HAVE_MOD_ARCH_SPECIFIC
      
           Arches define this if they don't want to use the empty generic
           mod_arch_specific struct.
      
       (*) MODULES_USE_ELF_RELA
      
           Arches define this if their modules can contain RELA records.  This causes
           the Elf_Rela mapping to be emitted and allows apply_relocate_add() to be
           defined by the arch rather than have the core emit an error message.
      
       (*) MODULES_USE_ELF_REL
      
           Arches define this if their modules can contain REL records.  This causes
           the Elf_Rel mapping to be emitted and allows apply_relocate() to be
           defined by the arch rather than have the core emit an error message.
      
      Note that it is possible to allow both REL and RELA records: m68k and mips are
      two arches that do this.
      
      With this, some arch asm/module.h files can be deleted entirely and replaced
      with a generic-y marker in the arch Kbuild file.
      
      Additionally, I have removed the bits from m32r and score that handle the
      unsupported type of relocation record as that's now handled centrally.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      786d35d4
  19. 27 9月, 2012 2 次提交
  20. 23 9月, 2012 2 次提交
    • F
      alpha: Add missing RCU idle APIs on idle loop · 4c94cada
      Frederic Weisbecker 提交于
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in the Alpha's idle loop.
      Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: NMichael Cree <mcree@orcon.net.nz>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: alpha <linux-alpha@vger.kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org> # 3.3+
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      4c94cada
    • F
      alpha: Fix preemption handling in idle loop · 6a6c0272
      Frederic Weisbecker 提交于
      cpu_idle() is called on the boot CPU by the init code with
      preemption disabled. But the cpu_idle() function in alpha
      doesn't handle this when it calls schedule() directly.
      
      Fix it by converting it into schedule_preempt_disabled().
      
      Also disable preemption before calling cpu_idle() from
      secondary CPU entry code to stay consistent with this
      state.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: NMichael Cree <mcree@orcon.net.nz>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: alpha <linux-alpha@vger.kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      6a6c0272
  21. 21 9月, 2012 1 次提交