1. 31 7月, 2008 2 次提交
  2. 29 7月, 2008 1 次提交
    • L
      cpu masks: optimize and clean up cpumask_of_cpu() · e56b3bc7
      Linus Torvalds 提交于
      Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
      
      Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
      creating a huge array of constant bitmasks, realize that the zero words
      can be shared.
      
      In other words, on a 64-bit architecture, we only ever need 64 of these
      arrays - with a different bit set in one single world (with enough zero
      words around it so that we can create any bitmask by just offsetting in
      that big array). And then we just put enough zeroes around it that we
      can point every single cpumask to be one of those things.
      
      So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
      with one bit set in each array - 2MB memory total), we have exactly 64
      arrays instead, each 8k bits in size (64kB total).
      
      And then we just point cpumask(n) to the right position (which we can
      calculate dynamically). Once we have the right arrays, getting
      "cpumask(n)" ends up being:
      
        static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
        {
                const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
                p -= cpu / BITS_PER_LONG;
                return (const cpumask_t *)p;
        }
      
      This brings other advantages and simplifications as well:
      
       - we are not wasting memory that is just filled with a single bit in
         various different places
      
       - we don't need all those games to re-create the arrays in some dense
         format, because they're already going to be dense enough.
      
      if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
      is a non-issue (especially since by doing this "overlapping" trick we
      probably get better cache behaviour anyway).
      
      [ mingo@elte.hu:
      
        Converted Linus's mails into a commit. See:
      
           http://lkml.org/lkml/2008/7/27/156
           http://lkml.org/lkml/2008/7/28/320
      
        Also applied a family filter - which also has the side-effect of leaving
        out the bits where Linus calls me an idio... Oh, never mind ;-)
      ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Mike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e56b3bc7
  3. 28 7月, 2008 1 次提交
  4. 27 7月, 2008 5 次提交
    • Y
      x86: add apic probe for genapic 64bit - fix · d25ae38b
      Yinghai Lu 提交于
      intr_remapping_enabled get assigned later, so need to check that
      in setup_apic_routing
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d25ae38b
    • H
      kexec jump: save/restore device state · 89081d17
      Huang Ying 提交于
      This patch implements devices state save/restore before after kexec.
      
      This patch together with features in kexec_jump patch can be used for
      following:
      
      - A simple hibernation implementation without ACPI support.  You can kexec a
        hibernating kernel, save the memory image of original system and shutdown
        the system.  When resuming, you restore the memory image of original system
        via ordinary kexec load then jump back.
      
      - Kernel/system debug through making system snapshot.  You can make system
        snapshot, jump back, do some thing and make another system snapshot.
      
      - Cooperative multi-kernel/system.  With kexec jump, you can switch between
        several kernels/systems quickly without boot process except the first time.
        This appears like swap a whole kernel/system out/in.
      
      - A general method to call program in physical mode (paging turning
        off). This can be used to invoke BIOS code under Linux.
      
      The following user-space tools can be used with kexec jump:
      
      - kexec-tools needs to be patched to support kexec jump. The patches
        and the precompiled kexec can be download from the following URL:
             source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
             patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
             binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
      
      - makedumpfile with patches are used as memory image saving tool, it
        can exclude free pages from original kernel memory image file. The
        patches and the precompiled makedumpfile can be download from the
        following URL:
             source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2
             patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2
             binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10
      
      - An initramfs image can be used as the root file system of kexeced
        kernel. An initramfs image built with "BuildRoot" can be downloaded
        from the following URL:
             initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz
        All user space tools above are included in the initramfs image.
      
      Usage example of simple hibernation:
      
      1. Compile and install patched kernel with following options selected:
      
      CONFIG_X86_32=y
      CONFIG_RELOCATABLE=y
      CONFIG_KEXEC=y
      CONFIG_CRASH_DUMP=y
      CONFIG_PM=y
      CONFIG_HIBERNATION=y
      CONFIG_KEXEC_JUMP=y
      
      2. Build an initramfs image contains kexec-tool and makedumpfile, or
         download the pre-built initramfs image, called rootfs.gz in
         following text.
      
      3. Prepare a partition to save memory image of original kernel, called
         hibernating partition in following text.
      
      4. Boot kernel compiled in step 1 (kernel A).
      
      5. In the kernel A, load kernel compiled in step 1 (kernel B) with
         /sbin/kexec. The shell command line can be as follow:
      
         /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
           --mem-max=0xffffff --initrd=rootfs.gz
      
      6. Boot the kernel B with following shell command line:
      
         /sbin/kexec -e
      
      7. The kernel B will boot as normal kexec. In kernel B the memory
         image of kernel A can be saved into hibernating partition as
         follow:
      
         jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
         echo $jump_back_entry > kexec_jump_back_entry
         cp /proc/vmcore dump.elf
      
         Then you can shutdown the machine as normal.
      
      8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
         root file system.
      
      9. In kernel C, load the memory image of kernel A as follow:
      
         /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
      
      10. Jump back to the kernel A as follow:
      
         /sbin/kexec -e
      
         Then, kernel A is resumed.
      
      Implementation point:
      
      To support jumping between two kernels, before jumping to (executing)
      the new kernel and jumping back to the original kernel, the devices
      are put into quiescent state, and the state of devices and CPU is
      saved. After jumping back from kexeced kernel and jumping to the new
      kernel, the state of devices and CPU are restored accordingly. The
      devices/CPU state save/restore code of software suspend is called to
      implement corresponding function.
      
      Known issues:
      
      - Because the segment number supported by sys_kexec_load is limited,
        hibernation image with many segments may not be load. This is
        planned to be eliminated by adding a new flag to sys_kexec_load to
        make a image can be loaded with multiple sys_kexec_load invoking.
      
      Now, only the i386 architecture is supported.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89081d17
    • H
      kexec jump · 3ab83521
      Huang Ying 提交于
      This patch provides an enhancement to kexec/kdump.  It implements the
      following features:
      
      - Backup/restore memory used by the original kernel before/after
        kexec.
      
      - Save/restore CPU state before/after kexec.
      
      The features of this patch can be used as a general method to call program in
      physical mode (paging turning off).  This can be used to call BIOS code under
      Linux.
      
      kexec-tools needs to be patched to support kexec jump. The patches and
      the precompiled kexec can be download from the following URL:
      
             source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
             patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
             binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
      
      Usage example of calling some physical mode code and return:
      
      1. Compile and install patched kernel with following options selected:
      
      CONFIG_X86_32=y
      CONFIG_KEXEC=y
      CONFIG_PM=y
      CONFIG_KEXEC_JUMP=y
      
      2. Build patched kexec-tool or download the pre-built one.
      
      3. Build some physical mode executable named such as "phy_mode"
      
      4. Boot kernel compiled in step 1.
      
      5. Load physical mode executable with /sbin/kexec. The shell command
         line can be as follow:
      
         /sbin/kexec --load-preserve-context --args-none phy_mode
      
      6. Call physical mode executable with following shell command line:
      
         /sbin/kexec -e
      
      Implementation point:
      
      To support jumping without reserving memory.  One shadow backup page (source
      page) is allocated for each page used by kexeced code image (destination
      page).  When do kexec_load, the image of kexeced code is loaded into source
      pages, and before executing, the destination pages and the source pages are
      swapped, so the contents of destination pages are backupped.  Before jumping
      to the kexeced code image and after jumping back to the original kernel, the
      destination pages and the source pages are swapped too.
      
      C ABI (calling convention) is used as communication protocol between
      kernel and called code.
      
      A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
      indicate that the loaded kernel image is used for jumping back.
      
      Now, only the i386 architecture is supported.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ab83521
    • A
      x86 calgary: fix handling of devices that aren't behind the Calgary · 1956a96d
      Alexis Bruemmer 提交于
      The calgary code can give drivers addresses above 4GB which is very bad
      for hardware that is only 32bit DMA addressable.
      
      With this patch, the calgary code sets the global dma_ops to swiotlb or
      nommu properly, and the dma_ops of devices behind the Calgary/CalIOC2
      to calgary_dma_ops.  So the calgary code can handle devices safely that
      aren't behind the Calgary/CalIOC2.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlexis Bruemmer <alexisb@us.ibm.com>
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1956a96d
    • F
      dma-mapping: add the device argument to dma_mapping_error() · 8d8bb39b
      FUJITA Tomonori 提交于
      Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
      architecture does:
      
      This enables us to cleanly fix the Calgary IOMMU issue that some devices
      are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
      
      I think that per-device dma_mapping_ops support would be also helpful for
      KVM people to support PCI passthrough but Andi thinks that this makes it
      difficult to support the PCI passthrough (see the above thread).  So I
      CC'ed this to KVM camp.  Comments are appreciated.
      
      A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
      pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
      NULL, the system-wide dma_ops pointer is used as before.
      
      If it's useful for KVM people, I plan to implement a mechanism to register
      a hook called when a new pci (or dma capable) device is created (it works
      with hot plugging).  It enables IOMMUs to set up an appropriate
      dma_mapping_ops per device.
      
      The major obstacle is that dma_mapping_error doesn't take a pointer to the
      device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
      device.  Note all the POWER IOMMUs use the same dma_mapping_error function
      so this is not a problem for POWER but x86 IOMMUs use different
      dma_mapping_error functions.
      
      The first patch adds the device argument to dma_mapping_error.  The patch
      is trivial but large since it touches lots of drivers and dma-mapping.h in
      all the architecture.
      
      This patch:
      
      dma_mapping_error() doesn't take a pointer to the device unlike other DMA
      operations.  So we can't have dma_mapping_ops per device.
      
      Note that POWER already has dma_mapping_ops per device but all the POWER
      IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
      argument.
      
      [akpm@linux-foundation.org: fix sge]
      [akpm@linux-foundation.org: fix svc_rdma]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix bnx2x]
      [akpm@linux-foundation.org: fix s2io]
      [akpm@linux-foundation.org: fix pasemi_mac]
      [akpm@linux-foundation.org: fix sdhci]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix sparc]
      [akpm@linux-foundation.org: fix ibmvscsi]
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Avi Kivity <avi@qumranet.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d8bb39b
  5. 26 7月, 2008 10 次提交
    • M
      cpumask: change cpumask_of_cpu_ptr to use new cpumask_of_cpu · 0bc3cc03
      Mike Travis 提交于
        * Replace previous instances of the cpumask_of_cpu_ptr* macros
          with a the new (lvalue capable) generic cpumask_of_cpu().
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0bc3cc03
    • M
      cpumask: put cpumask_of_cpu_map in the initdata section · 6524d938
      Mike Travis 提交于
        * Create the cpumask_of_cpu_map statically in the init data section
          using NR_CPUS but replace it during boot up with one sized by
          nr_cpu_ids (num possible cpus).
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6524d938
    • S
      x64, fpu: fix possible FPU leakage in error conditions · 6ffac1e9
      Suresh Siddha 提交于
      On Thu, Jul 24, 2008 at 03:43:44PM -0700, Linus Torvalds wrote:
      > So how about this patch as a starting point? This is the RightThing(tm) to
      > do regardless, and if it then makes it easier to do some other cleanups,
      > we should do it first. What do you think?
      
      restore_fpu_checking() calls init_fpu() in error conditions.
      
      While this is wrong(as our main intention is to clear the fpu state of
      the thread), this was benign before commit 92d140e2 ("x86: fix taking
      DNA during 64bit sigreturn").
      
      Post commit 92d140e2, live FPU registers may not belong to this
      process at this error scenario.
      
      In the error condition for restore_fpu_checking() (especially during the
      64bit signal return), we are doing init_fpu(), which saves the live FPU
      register state (possibly belonging to some other process context) into
      the thread struct (through unlazy_fpu() in init_fpu()). This is wrong
      and can leak the FPU data.
      
      For the signal handler restore error condition in restore_i387(), clear
      the fpu state present in the thread struct(before ultimately sending a
      SIGSEGV for badframe).
      
      For the paranoid error condition check in math_state_restore(), send a
      SIGSEGV, if we fail to restore the state.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: <stable@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ffac1e9
    • Y
      x86: mach_summit to summit · e8c48efd
      Yinghai Lu 提交于
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e8c48efd
    • Y
      x86: add setup_ioapic_ids for numaq in x86_quirks · a4dbc34d
      Yinghai Lu 提交于
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4dbc34d
    • J
      x86, AMD IOMMU: include amd_iommu_last_bdf in device initialization · 3a61ec38
      Joerg Roedel 提交于
      All the values read while searching for amd_iommu_last_bdf are defined as
      inclusive. Let the code handle this value as such. Found by Wei Wang. Thanks
      Wei.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: bhavna.sarathy@amd.com
      Cc: robert.richter@amd.com
      Cc: Wei Wang <wei.wang2@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a61ec38
    • J
      x86 gart: replace to_pages macro with iommu_num_pages · 87e39ea5
      Joerg Roedel 提交于
      This patch removes the to_pages macro from x86 GART code and calls the generic
      iommu_num_pages function instead.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: bhavna.sarathy@amd.com
      Cc: robert.richter@amd.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      87e39ea5
    • J
      x86, AMD IOMMU: replace to_pages macro with iommu_num_pages · a8132e5f
      Joerg Roedel 提交于
      This patch removes the to_pages macro from AMD IOMMU code and calls the generic
      iommu_num_pages function instead.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: bhavna.sarathy@amd.com
      Cc: robert.richter@amd.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a8132e5f
    • C
      calgary iommu: use the first kernels TCE tables in kdump · 95b68dec
      Chandru 提交于
      kdump kernel fails to boot with calgary iommu and aacraid driver on a x366
      box.  The ongoing dma's of aacraid from the first kernel continue to exist
      until the driver is loaded in the kdump kernel.  Calgary is initialized
      prior to aacraid and creation of new tce tables causes wrong dma's to
      occur.  Here we try to get the tce tables of the first kernel in kdump
      kernel and use them.  While in the kdump kernel we do not allocate new tce
      tables but instead read the base address register contents of calgary
      iommu and use the tables that the registers point to.  With these changes
      the kdump kernel and hence aacraid now boots normally.
      Signed-off-by: NChandru Siddalingappa <chandru@in.ibm.com>
      Acked-by: NMuli Ben-Yehuda <muli@il.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95b68dec
    • S
      kprobes: improve kretprobe scalability with hashed locking · ef53d9c5
      Srinivasa D S 提交于
      Currently list of kretprobe instances are stored in kretprobe object (as
      used_instances,free_instances) and in kretprobe hash table.  We have one
      global kretprobe lock to serialise the access to these lists.  This causes
      only one kretprobe handler to execute at a time.  Hence affects system
      performance, particularly on SMP systems and when return probe is set on
      lot of functions (like on all systemcalls).
      
      Solution proposed here gives fine-grain locks that performs better on SMP
      system compared to present kretprobe implementation.
      
      Solution:
      
       1) Instead of having one global lock to protect kretprobe instances
          present in kretprobe object and kretprobe hash table.  We will have
          two locks, one lock for protecting kretprobe hash table and another
          lock for kretporbe object.
      
       2) We hold lock present in kretprobe object while we modify kretprobe
          instance in kretprobe object and we hold per-hash-list lock while
          modifying kretprobe instances present in that hash list.  To prevent
          deadlock, we never grab a per-hash-list lock while holding a kretprobe
          lock.
      
       3) We can remove used_instances from struct kretprobe, as we can
          track used instances of kretprobe instances using kretprobe hash
          table.
      
      Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
      with return probes set on all systemcalls looks like this.
      
      cacheline              non-cacheline             Un-patched kernel
      aligned patch 	       aligned patch
      ===============================================================================
      real    9m46.784s       9m54.412s                  10m2.450s
      user    40m5.715s       40m7.142s                  40m4.273s
      sys     2m57.754s       2m58.583s                  3m17.430s
      ===========================================================
      
      Time duration for kernel compilation ("make -j 8) on the same system, when
      kernel is not probed.
      =========================
      real    9m26.389s
      user    40m8.775s
      sys     2m7.283s
      =========================
      Signed-off-by: NSrinivasa DS <srinivasa@in.ibm.com>
      Signed-off-by: NJim Keniston <jkenisto@us.ibm.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef53d9c5
  6. 25 7月, 2008 13 次提交
    • L
      x86-64: Clean up 'save/restore_i387()' usage · b30f3ae5
      Linus Torvalds 提交于
      Suresh Siddha wants to fix a possible FPU leakage in error conditions,
      but the fact that save/restore_i387() are inlines in a header file makes
      that harder to do than necessary.  So start off with an obvious cleanup.
      
      This just moves the x86-64 version of save/restore_i387() out of the
      header file, and moves it to the only file that it is actually used in:
      arch/x86/kernel/signal_64.c.  So exposing it in a header file was wrong
      to begin with.
      
      [ Side note: I'd like to fix up some of the games we play with the
        32-bit version of these functions too, but that's a separate
        matter.  The 32-bit versions are shared - under different names
        at that! - by both the native x86-32 code and the x86-64 32-bit
        compatibility code ]
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b30f3ae5
    • L
      x86-64: make BUILD_IRQ() also reset section back · 6209ed9d
      Linus Torvalds 提交于
      Commit 9d25d4db ("x86: BUILD_IRQ say
      .text to avoid .data.percpu") added a ".text" specifier to make sure
      that BUILD_IRQ() builds the irq trampoline in the text segment rather
      than in some random left-over segment that the compiler happened to
      leave the asm in.
      
      However, we should also make sure that we switch back by adding a
      ".previous" at the end, so that there are no subtle issues with
      subsequent compiler-generated code.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6209ed9d
    • D
      rtc-cmos: avoid spurious irqs · 7e2a31da
      David Brownell 提交于
      This fixes kernel http://bugzilla.kernel.org/show_bug.cgi?id=11112 (bogus
      RTC update IRQs reported) for rtc-cmos, in two ways:
      
        - When HPET is stealing the IRQs, use the first IRQ to grab
          the seconds counter which will be monitored (instead of
          using whatever was previously in that memory);
      
        - In sane IRQ handling modes, scrub out old IRQ status before
          enabling IRQs.
      
      That latter is done by tightening up IRQ handling for rtc-cmos everywhere,
      also ensuring that when HPET is used it's the only thing triggering IRQ
      reports to userspace; net object shrink.
      
      Also fix a bogus HPET message related to its RTC emulation.
      Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Report-by: NW Unruh <unruh@physics.ubc.ca>
      Cc: Andrew Victor <avictor.za@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e2a31da
    • U
      flag parameters add-on: remove epoll_create size param · 9fe5ad9c
      Ulrich Drepper 提交于
      Remove the size parameter from the new epoll_create syscall and renames the
      syscall itself.  The updated test program follows.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_epoll_create2
      # ifdef __x86_64__
      #  define __NR_epoll_create2 291
      # elif defined __i386__
      #  define __NR_epoll_create2 329
      # else
      #  error "need __NR_epoll_create2"
      # endif
      #endif
      
      #define EPOLL_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_epoll_create2, 0);
        if (fd == -1)
          {
            puts ("epoll_create2(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("epoll_create2(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_epoll_create2, EPOLL_CLOEXEC);
        if (fd == -1)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fe5ad9c
    • U
      flag parameters: inotify_init · 4006553b
      Ulrich Drepper 提交于
      This patch introduces the new syscall inotify_init1 (note: the 1 stands for
      the one parameter the syscall takes, as opposed to no parameter before).  The
      values accepted for this parameter are function-specific and defined in the
      inotify.h header.  Here the values must match the O_* flags, though.  In this
      patch CLOEXEC support is introduced.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_inotify_init1
      # ifdef __x86_64__
      #  define __NR_inotify_init1 294
      # elif defined __i386__
      #  define __NR_inotify_init1 332
      # else
      #  error "need __NR_inotify_init1"
      # endif
      #endif
      
      #define IN_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd;
        fd = syscall (__NR_inotify_init1, 0);
        if (fd == -1)
          {
            puts ("inotify_init1(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("inotify_init1(0) set close-on-exit");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_inotify_init1, IN_CLOEXEC);
        if (fd == -1)
          {
            puts ("inotify_init1(IN_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4006553b
    • U
      flag parameters: pipe · ed8cae8b
      Ulrich Drepper 提交于
      This patch introduces the new syscall pipe2 which is like pipe but it also
      takes an additional parameter which takes a flag value.  This patch implements
      the handling of O_CLOEXEC for the flag.  I did not add support for the new
      syscall for the architectures which have a special sys_pipe implementation.  I
      think the maintainers of those archs have the chance to go with the unified
      implementation but that's up to them.
      
      The implementation introduces do_pipe_flags.  I did that instead of changing
      all callers of do_pipe because some of the callers are written in assembler.
      I would probably screw up changing the assembly code.  To avoid breaking code
      do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
      changed over to do_pipe_flags the old do_pipe function can be removed.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd[2];
        if (syscall (__NR_pipe2, fd, 0) != 0)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (coe & FD_CLOEXEC)
              {
                printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
          {
            puts ("pipe2(O_CLOEXEC) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((coe & FD_CLOEXEC) == 0)
              {
                printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed8cae8b
    • U
      flag parameters: dup2 · 336dd1f7
      Ulrich Drepper 提交于
      This patch adds the new dup3 syscall.  It extends the old dup2 syscall by one
      parameter which is meant to hold a flag value.  Support for the O_CLOEXEC flag
      is added in this patch.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_dup3
      # ifdef __x86_64__
      #  define __NR_dup3 292
      # elif defined __i386__
      #  define __NR_dup3 330
      # else
      #  error "need __NR_dup3"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd = syscall (__NR_dup3, 1, 4, 0);
        if (fd == -1)
          {
            puts ("dup3(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("dup3(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC);
        if (fd == -1)
          {
            puts ("dup3(O_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("dup3(O_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      336dd1f7
    • U
      flag parameters: epoll_create · a0998b50
      Ulrich Drepper 提交于
      This patch adds the new epoll_create2 syscall.  It extends the old epoll_create
      syscall by one parameter which is meant to hold a flag value.  In this
      patch the only flag support is EPOLL_CLOEXEC which causes the close-on-exec
      flag for the returned file descriptor to be set.
      
      A new name EPOLL_CLOEXEC is introduced which in this implementation must
      have the same value as O_CLOEXEC.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_epoll_create2
      # ifdef __x86_64__
      #  define __NR_epoll_create2 291
      # elif defined __i386__
      #  define __NR_epoll_create2 329
      # else
      #  error "need __NR_epoll_create2"
      # endif
      #endif
      
      #define EPOLL_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_epoll_create2, 1, 0);
        if (fd == -1)
          {
            puts ("epoll_create2(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("epoll_create2(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_epoll_create2, 1, EPOLL_CLOEXEC);
        if (fd == -1)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0998b50
    • U
      flag parameters: eventfd · b087498e
      Ulrich Drepper 提交于
      This patch adds the new eventfd2 syscall.  It extends the old eventfd
      syscall by one parameter which is meant to hold a flag value.  In this
      patch the only flag support is EFD_CLOEXEC which causes the close-on-exec
      flag for the returned file descriptor to be set.
      
      A new name EFD_CLOEXEC is introduced which in this implementation must
      have the same value as O_CLOEXEC.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_eventfd2
      # ifdef __x86_64__
      #  define __NR_eventfd2 290
      # elif defined __i386__
      #  define __NR_eventfd2 328
      # else
      #  error "need __NR_eventfd2"
      # endif
      #endif
      
      #define EFD_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_eventfd2, 1, 0);
        if (fd == -1)
          {
            puts ("eventfd2(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("eventfd2(0) sets close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_eventfd2, 1, EFD_CLOEXEC);
        if (fd == -1)
          {
            puts ("eventfd2(EFD_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("eventfd2(EFD_CLOEXEC) does not set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b087498e
    • U
      flag parameters: signalfd · 9deb27ba
      Ulrich Drepper 提交于
      This patch adds the new signalfd4 syscall.  It extends the old signalfd
      syscall by one parameter which is meant to hold a flag value.  In this
      patch the only flag support is SFD_CLOEXEC which causes the close-on-exec
      flag for the returned file descriptor to be set.
      
      A new name SFD_CLOEXEC is introduced which in this implementation must
      have the same value as O_CLOEXEC.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <signal.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_signalfd4
      # ifdef __x86_64__
      #  define __NR_signalfd4 289
      # elif defined __i386__
      #  define __NR_signalfd4 327
      # else
      #  error "need __NR_signalfd4"
      # endif
      #endif
      
      #define SFD_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        sigset_t ss;
        sigemptyset (&ss);
        sigaddset (&ss, SIGUSR1);
        int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0);
        if (fd == -1)
          {
            puts ("signalfd4(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("signalfd4(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC);
        if (fd == -1)
          {
            puts ("signalfd4(SFD_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9deb27ba
    • S
      pm: acpi hibernation: utilize hardware signature · bdfe6b7c
      Shaohua Li 提交于
      ACPI defines a hardware signature.  BIOS calculates the signature according to
      hardware configure and if hardware changes while hibernated, the signature
      will change.  In that case, S4 resume should fail.
      
      Still, there may be systems on which this mechanism does not work correctly,
      so it is better to provide a workaround for them.  For this reason, add a new
      switch to the acpi_sleep= command line argument allowing one to disable
      hardware signature checking.
      
      [shaohua.li@intel.com: build fix]
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Len Brown <lenb@kernel.org>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: <Valdis.Kletnieks@vt.edu>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdfe6b7c
    • A
      remove include/linux/pm_legacy.h · d75f65fd
      Adrian Bunk 提交于
      Remove the obsolete and no longer used include/linux/pm_legacy.h
      Reviewed-by: NRobert P. J. Day <rpjday@crashcourse.ca>
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Cc: Pavel Machek <pavel@suse.cz>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d75f65fd
    • A
      PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures · 27ac792c
      Andrea Righi 提交于
      On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
      boundary. For example:
      
      	u64 val = PAGE_ALIGN(size);
      
      always returns a value < 4GB even if size is greater than 4GB.
      
      The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
      example):
      
      #define PAGE_SHIFT      12
      #define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
      #define PAGE_MASK       (~(PAGE_SIZE-1))
      ...
      #define PAGE_ALIGN(addr)       (((addr)+PAGE_SIZE-1)&PAGE_MASK)
      
      The "~" is performed on a 32-bit value, so everything in "and" with
      PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
      Using the ALIGN() macro seems to be the right way, because it uses
      typeof(addr) for the mask.
      
      Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
      include/linux/mm.h.
      
      See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
      
      [akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
      [akpm@linux-foundation.org: fix v850]
      [akpm@linux-foundation.org: fix powerpc]
      [akpm@linux-foundation.org: fix arm]
      [akpm@linux-foundation.org: fix mips]
      [akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
      [akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
      [akpm@linux-foundation.org: fix powerpc]
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27ac792c
  7. 24 7月, 2008 7 次提交
    • Y
      x86: move declaring x2apic_extra_bits · 510b3725
      Yinghai Lu 提交于
      it is for uv only
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      510b3725
    • H
      x86: BUILD_IRQ say .text to avoid .data.percpu · 9d25d4db
      Hugh Dickins 提交于
      When I edit the x86_64 Makefile to -fno-unit-at-a-time, bootup panics
      on 0xCCs in IRQ0x3e_interrupt(): IRQ0x20_interrupt etc. have got linked
      into .data.percpu.  Perhaps there are other ways of triggering that:
      specify ".text" in the BUILD_IRQ() macro for safety.
      
      I've been using -fno-unit-at-a-time (to lessen inlining, for easier
      debugging) for a long time.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Mike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9d25d4db
    • J
      x86: call early_cpu_init at the same point · 9e882c92
      Jeremy Fitzhardinge 提交于
      Call early_cpu_init() at the same (early) point in setup_arch().
      The x86_64 code was calling it relatively late, after when other arch
      code need to do cpu-related setup which depends on it.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9e882c92
    • R
      i386 syscall audit fast-path · af0575bb
      Roland McGrath 提交于
      This adds fast paths for 32-bit syscall entry and exit when
      TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
      These paths does not need to save and restore all registers as
      the general case of tracing does.  Avoiding the iret return path
      when syscall audit is enabled helps performance a lot.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      af0575bb
    • R
      x86_64 ia32 syscall audit fast-path · 5cbf1565
      Roland McGrath 提交于
      This adds fast paths for 32-bit syscall entry and exit when
      TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
      These paths does not need to save and restore all registers as
      the general case of tracing does.  Avoiding the iret return path
      when syscall audit is enabled helps performance a lot.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      5cbf1565
    • R
      x86_64 syscall audit fast-path · 86a1c34a
      Roland McGrath 提交于
      This adds a fast path for 64-bit syscall entry and exit when
      TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
      This path does not need to save and restore all registers as
      the general case of tracing does.  Avoiding the iret return path
      when syscall audit is enabled helps performance a lot.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      86a1c34a
    • R
      x86_64: remove bogus optimization in sysret_signal · 15e8f348
      Roland McGrath 提交于
      This short-circuit path in sysret_signal looks wrong to me.
      AFAICT, in practice the branch is never taken--and if it were,
      it would go wrong.  To wit, try loading a module whose init
      function does set_thread_flag(TIF_IRET), and see insmod crash
      (presumably with a wrong user stack pointer).
      
      This is because the FIXUP_TOP_OF_STACK work hasn't been done yet
      when we jump around the call to ptregscall_common and get to
      int_with_check--where it expects the user RSP,SS,CS and EFLAGS to
      have been stored by FIXUP_TOP_OF_STACK.
      
      I don't think it's normally possible to get to sysret_signal with no
      _TIF_DO_NOTIFY_MASK bits set anyway, so these two instructions are
      already superfluous.  If it ever did happen, it is harmless to call
      do_notify_resume with nothing for it to do.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      15e8f348
  8. 23 7月, 2008 1 次提交