1. 11 4月, 2013 1 次提交
  2. 03 4月, 2013 1 次提交
    • P
      x86: remove the x32 syscall bitmask from syscall_get_nr() · 8b4b9f27
      Paul Moore 提交于
      Commit fca460f9 simplified the x32
      implementation by creating a syscall bitmask, equal to 0x40000000, that
      could be applied to x32 syscalls such that the masked syscall number
      would be the same as a x86_64 syscall.  While that patch was a nice
      way to simplify the code, it went a bit too far by adding the mask to
      syscall_get_nr(); returning the masked syscall numbers can cause
      confusion with callers that expect syscall numbers matching the x32
      ABI, e.g. unmasked syscall numbers.
      
      This patch fixes this by simply removing the mask from syscall_get_nr()
      while preserving the other changes from the original commit.  While
      there are several syscall_get_nr() callers in the kernel, most simply
      check that the syscall number is greater than zero, in this case this
      patch will have no effect.  Of those remaining callers, they appear
      to be few, seccomp and ftrace, and from my testing of seccomp without
      this patch the original commit definitely breaks things; the seccomp
      filter does not correctly filter the syscalls due to the difference in
      syscall numbers in the BPF filter and the value from syscall_get_nr().
      Applying this patch restores the seccomp BPF filter functionality on
      x32.
      
      I've tested this patch with the seccomp BPF filters as well as ftrace
      and everything looks reasonable to me; needless to say general usage
      seemed fine as well.
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Link: http://lkml.kernel.org/r/20130215172143.12549.10292.stgit@localhost
      Cc: <stable@vger.kernel.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      8b4b9f27
  3. 25 3月, 2013 1 次提交
  4. 22 3月, 2013 1 次提交
  5. 20 3月, 2013 1 次提交
  6. 18 3月, 2013 1 次提交
  7. 07 3月, 2013 2 次提交
  8. 24 2月, 2013 3 次提交
    • A
      switch lseek to COMPAT_SYSCALL_DEFINE · 561c6731
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      561c6731
    • W
      cpu-hotplug,memory-hotplug: clear cpu_to_node() when offlining the node · e13fe869
      Wen Congyang 提交于
      When the node is offlined, there is no memory/cpu on the node.  If a
      sleep task runs on a cpu of this node, it will be migrated to the cpu on
      the other node.  So we can clear cpu-to-node mapping.
      
      [akpm@linux-foundation.org: numa_clear_node() and numa_set_node() can no longer be __cpuinit]
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e13fe869
    • W
      memory-hotplug: common APIs to support page tables hot-remove · ae9aae9e
      Wen Congyang 提交于
      When memory is removed, the corresponding pagetables should alse be
      removed.  This patch introduces some common APIs to support vmemmap
      pagetable and x86_64 architecture direct mapping pagetable removing.
      
      All pages of virtual mapping in removed memory cannot be freed if some
      pages used as PGD/PUD include not only removed memory but also other
      memory.  So this patch uses the following way to check whether a page
      can be freed or not.
      
      1) When removing memory, the page structs of the removed memory are
         filled with 0FD.
      
      2) All page structs are filled with 0xFD on PT/PMD, PT/PMD can be
         cleared.  In this case, the page used as PT/PMD can be freed.
      
      For direct mapping pages, update direct_pages_count[level] when we freed
      their pagetables.  And do not free the pages again because they were
      freed when offlining.
      
      For vmemmap pages, free the pages and their pagetables.
      
      For larger pages, do not split them into smaller ones because there is
      no way to know if the larger page has been split.  As a result, there is
      no way to decide when to split.  We deal the larger pages in the
      following way:
      
      1) For direct mapped pages, all the pages were freed when they were
         offlined.  And since menmory offline is done section by section, all
         the memory ranges being removed are aligned to PAGE_SIZE.  So only need
         to deal with unaligned pages when freeing vmemmap pages.
      
      2) For vmemmap pages being used to store page_struct, if part of the
         larger page is still in use, just fill the unused part with 0xFD.  And
         when the whole page is fulfilled with 0xFD, then free the larger page.
      
      [akpm@linux-foundation.org: fix typo in comment]
      [tangchen@cn.fujitsu.com: do not calculate direct mapping pages when freeing vmemmap pagetables]
      [tangchen@cn.fujitsu.com: do not free direct mapping pages twice]
      [tangchen@cn.fujitsu.com: do not free page split from hugepage one by one]
      [tangchen@cn.fujitsu.com: do not split pages when freeing pagetable pages]
      [akpm@linux-foundation.org: use pmd_page_vaddr()]
      [akpm@linux-foundation.org: fix used-uninitialised bug]
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae9aae9e
  9. 23 2月, 2013 1 次提交
  10. 20 2月, 2013 2 次提交
  11. 16 2月, 2013 1 次提交
  12. 14 2月, 2013 4 次提交
  13. 13 2月, 2013 5 次提交
    • M
      x86/mm: Check if PUD is large when validating a kernel address · 0ee364eb
      Mel Gorman 提交于
      A user reported the following oops when a backup process reads
      /proc/kcore:
      
       BUG: unable to handle kernel paging request at ffffbb00ff33b000
       IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
       [...]
      
       Call Trace:
        [<ffffffff811b8aaa>] read_kcore+0x17a/0x370
        [<ffffffff811ad847>] proc_reg_read+0x77/0xc0
        [<ffffffff81151687>] vfs_read+0xc7/0x130
        [<ffffffff811517f3>] sys_read+0x53/0xa0
        [<ffffffff81449692>] system_call_fastpath+0x16/0x1b
      
      Investigation determined that the bug triggered when reading
      system RAM at the 4G mark. On this system, that was the first
      address using 1G pages for the virt->phys direct mapping so the
      PUD is pointing to a physical address, not a PMD page.
      
      The problem is that the page table walker in kern_addr_valid() is
      not checking pud_large() and treats the physical address as if
      it was a PMD.  If it happens to look like pmd_none then it'll
      silently fail, probably returning zeros instead of real data. If
      the data happens to look like a present PMD though, it will be
      walked resulting in the oops above.
      
      This patch adds the necessary pud_large() check.
      
      Unfortunately the problem was not readily reproducible and now
      they are running the backup program without accessing
      /proc/kcore so the patch has not been validated but I think it
      makes sense.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.coM>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: stable@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0ee364eb
    • K
      X86: Handle Hyper-V vmbus interrupts as special hypervisor interrupts · bc2b0331
      K. Y. Srinivasan 提交于
      Starting with win8, vmbus interrupts can be delivered on any VCPU in the guest
      and furthermore can be concurrently active on multiple VCPUs. Support this
      interrupt delivery model by setting up a separate IDT entry for Hyper-V vmbus.
      interrupts. I would like to thank Jan Beulich <JBeulich@suse.com> and
      Thomas Gleixner <tglx@linutronix.de>, for their help.
      
      In this version of the patch, based on the feedback, I have merged the IDT
      vector for Xen and Hyper-V and made the necessary adjustments. Furhermore,
      based on Jan's feedback I have added the necessary compilation switches.
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Link: http://lkml.kernel.org/r/1359940959-32168-3-git-send-email-kys@microsoft.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      bc2b0331
    • H
      x86, doc: Clarify the use of asm("%edx") in uaccess.h · ff52c3b0
      H. Peter Anvin 提交于
      Put in a comment that explains that the use of asm("%edx") in
      uaccess.h doesn't actually necessarily mean %edx alone.
      
      Cc: Jamie Lokier <jamie@shareable.org>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: H. J. Lu <hjl.tools@gmail.com>
      Link: http://lkml.kernel.org/r/511ACDFB.1050707@zytor.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      ff52c3b0
    • S
      tracing/syscalls: Allow archs to ignore tracing compat syscalls · f431b634
      Steven Rostedt 提交于
      The tracing of ia32 compat system calls has been a bit of a pain as they
      use different system call numbers than the 64bit equivalents.
      
      I wrote a simple 'lls' program that lists files. I compiled it as a i686
      ELF binary and ran it under a x86_64 box. This is the result:
      
      echo 0 > /debug/tracing/tracing_on
      echo 1 > /debug/tracing/events/syscalls/enable
      echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on
      
      grep lls /debug/tracing/trace
      
      [.. skipping calls before TS_COMPAT is set ...]
      
                   lls-1127  [005] d...   936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420)
                   lls-1127  [005] d...   936.409190: sys_recvfrom -> 0x8a77000
                   lls-1127  [005] d...   936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
                   lls-1127  [005] d...   936.409215: sys_lgetxattr -> 0xf76ff000
                   lls-1127  [005] d...   936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4)
                   lls-1127  [005] d...   936.409228: sys_dup2 -> 0xfffffffffffffffe
                   lls-1127  [005] d...   936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000)
                   lls-1127  [005] d...   936.409242: sys_newfstat -> 0x3
                   lls-1127  [005] d...   936.409243: sys_removexattr(pathname: 3, name: ffcd0060)
                   lls-1127  [005] d...   936.409244: sys_removexattr -> 0x0
                   lls-1127  [005] d...   936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2)
                   lls-1127  [005] d...   936.409248: sys_lgetxattr -> 0xf76e5000
                   lls-1127  [005] d...   936.409248: sys_newlstat(filename: 3, statbuf: 19614)
                   lls-1127  [005] d...   936.409249: sys_newlstat -> 0x0
                   lls-1127  [005] d...   936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000)
                   lls-1127  [005] d...   936.409279: sys_newfstat -> 0x3
                   lls-1127  [005] d...   936.409279: sys_close(fd: 3)
                   lls-1127  [005] d...   936.421550: sys_close -> 0x200
                   lls-1127  [005] d...   936.421558: sys_removexattr(pathname: 3, name: ffcd00d0)
                   lls-1127  [005] d...   936.421560: sys_removexattr -> 0x0
                   lls-1127  [005] d...   936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802)
                   lls-1127  [005] d...   936.421574: sys_lgetxattr -> 0x4d564000
                   lls-1127  [005] d...   936.421575: sys_capget(header: 4d70f000, dataptr: 1000)
                   lls-1127  [005] d...   936.421580: sys_capget -> 0x0
                   lls-1127  [005] d...   936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812)
                   lls-1127  [005] d...   936.421589: sys_lgetxattr -> 0x4d710000
                   lls-1127  [005] d...   936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32)
                   lls-1127  [005] d...   936.426141: sys_lgetxattr -> 0x4d713000
                   lls-1127  [005] d...   936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0)
                   lls-1127  [005] d...   936.426146: sys_newlstat -> 0x0
                   lls-1127  [005] d...   936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
      
      Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are
      obviously incorrect, and confusing.
      
      Other efforts have been made to fix this:
      
      https://lkml.org/lkml/2012/3/26/367
      
      But the real solution is to rewrite the syscall internals and come up
      with a fixed solution. One that doesn't require all the kluge that the
      current solution has.
      
      Thus for now, instead of outputting incorrect data, simply ignore them.
      With this patch the changes now have:
      
       #> grep lls /debug/tracing/trace
       #>
      
      Compat system calls simply are not traced. If users need compat
      syscalls, then they should just use the raw syscall tracepoints.
      
      For an architecture to make their compat syscalls ignored, it must
      define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also
      define an arch_trace_is_compat_syscall() function that will return true
      if the current task should ignore tracing the syscall.
      
      I want to stress that this change does not affect actual syscalls in any
      way, shape or form. It is only used within the tracing system and
      doesn't interfere with the syscall logic at all. The changes are
      consolidated nicely into trace_syscalls.c and asm/ftrace.h.
      
      I had to make one small modification to asm/thread_info.h and that was
      to remove the include of asm/ftrace.h. As asm/ftrace.h required the
      current_thread_info() it was causing include hell. That include was
      added back in 2008 when the function graph tracer was added:
      
       commit caf4b323 "tracing, x86: add low level support for ftrace return tracing"
      
      It does not need to be included there.
      
      Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.homeAcked-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f431b634
    • H
      x86, mm: Redesign get_user with a __builtin_choose_expr hack · 3578baae
      H. Peter Anvin 提交于
      Instead of using a bitfield, use an odd little trick using typeof,
      __builtin_choose_expr, and sizeof.  __builtin_choose_expr is
      explicitly defined to not convert its type (its argument is required
      to be a constant expression) so this should be well-defined.
      
      The code is still not 100% preturbation-free versus the baseline
      before 64-bit get_user(), but the differences seem to be very small,
      mostly related to padding and to gcc deciding when to spill registers.
      
      Cc: Jamie Lokier <jamie@shareable.org>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: H. J. Lu <hjl.tools@gmail.com>
      Link: http://lkml.kernel.org/r/511A8922.6050908@zytor.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      3578baae
  14. 12 2月, 2013 4 次提交
  15. 10 2月, 2013 3 次提交
    • L
      x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok flag · 27be4570
      Len Brown 提交于
      Remove 32-bit x86 a cmdline param "no-hlt",
      and the cpuinfo_x86.hlt_works_ok that it sets.
      
      If a user wants to avoid HLT, then "idle=poll"
      is much more useful, as it avoids invocation of HLT
      in idle, while "no-hlt" failed to do so.
      
      Indeed, hlt_works_ok was consulted in only 3 places.
      
      First, in /proc/cpuinfo where "hlt_bug yes"
      would be printed if and only if the user booted
      the system with "no-hlt" -- as there was no other code
      to set that flag.
      
      Second, check_hlt() would not invoke halt() if "no-hlt"
      were on the cmdline.
      
      Third, it was consulted in stop_this_cpu(), which is invoked
      by native_machine_halt()/reboot_interrupt()/smp_stop_nmi_callback() --
      all cases where the machine is being shutdown/reset.
      The flag was not consulted in the more frequently invoked
      play_dead()/hlt_play_dead() used in processor offline and suspend.
      
      Since Linux-3.0 there has been a run-time notice upon "no-hlt" invocations
      indicating that it would be removed in 2012.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: x86@kernel.org
      27be4570
    • L
      x86 idle: remove mwait_idle() and "idle=mwait" cmdline param · 69fb3676
      Len Brown 提交于
      mwait_idle() is a C1-only idle loop intended to be more efficient
      than HLT, starting on Pentium-4 HT-enabled processors.
      
      But mwait_idle() has been replaced by the more general
      mwait_idle_with_hints(), which handles both C1 and deeper C-states.
      ACPI processor_idle and intel_idle use only mwait_idle_with_hints(),
      and no longer use mwait_idle().
      
      Here we simplify the x86 native idle code by removing mwait_idle(),
      and the "idle=mwait" bootparam used to invoke it.
      
      Since Linux 3.0 there has been a boot-time warning when "idle=mwait"
      was invoked saying it would be removed in 2012.  This removal
      was also noted in the (now removed:-) feature-removal-schedule.txt.
      
      After this change, kernels configured with
      (CONFIG_ACPI=n && CONFIG_INTEL_IDLE=n) when run on hardware
      that supports MWAIT will simply use HLT.  If MWAIT is desired
      on those systems, cpuidle and the cpuidle drivers above
      can be enabled.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: x86@kernel.org
      69fb3676
    • L
      xen idle: make xen-specific macro xen-specific · 6a377ddc
      Len Brown 提交于
      This macro is only invoked by Xen,
      so make its definition specific to Xen.
      
      > set_pm_idle_to_default()
      < xen_set_default_idle()
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: xen-devel@lists.xensource.com
      6a377ddc
  16. 09 2月, 2013 3 次提交
    • L
      intel_idle: remove assumption of one C-state per MWAIT flag · e022e7eb
      Len Brown 提交于
      Remove the assumption that cstate_tables are
      indexed by MWAIT flag values.  Each entry
      identifies itself via its own flags value.
      This change is needed to support multiple states
      that share the same MWAIT flags.
      
      Note that this can have an effect on what state is described
      by 'N' on cmdline intel_idle.max_cstate=N on some systems.
      
      intel_idle.max_cstate=0 still disables the driver
      intel_idle.max_cstate=1 still results in just C1(E)
      However, "place holders" in the sparse C-state name-space
      (eg. Atom) have been removed.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      e022e7eb
    • L
      intel_idle: remove use and definition of MWAIT_MAX_NUM_CSTATES · 137ecc77
      Len Brown 提交于
      Cosmetic only.
      
      Replace use of MWAIT_MAX_NUM_CSTATES with CPUIDLE_STATE_MAX.
      They are both 8, so this patch has no functional change.
      
      The reason to change is that intel_idle will soon be able
      to export more than the 8 "major" states supported by MWAIT.
      When we hit that limit, it is important to know
      where the limit comes from.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      137ecc77
    • L
      tools/power turbostat: decode MSR_IA32_POWER_CTL · 67920418
      Len Brown 提交于
      When verbose is enabled, print the C1E-Enable
      bit in MSR_IA32_POWER_CTL.
      
      also delete some redundant tests on the verbose variable.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      67920418
  17. 08 2月, 2013 1 次提交
  18. 07 2月, 2013 1 次提交
  19. 06 2月, 2013 1 次提交
  20. 04 2月, 2013 3 次提交