1. 29 3月, 2012 3 次提交
    • L
      x86: Preserve lazy irq disable semantics in fixup_irqs() · 99dd5497
      Liu, Chuansheng 提交于
      The default irq_disable() sematics are to mark the interrupt disabled,
      but keep it unmasked. If the interrupt is delivered while marked
      disabled, the low level interrupt handler masks it and marks it
      pending. This is important for detecting wakeup interrupts during
      suspend and for edge type interrupts to avoid losing interrupts.
      
      fixup_irqs() moves the interrupts away from an offlined cpu. For
      certain interrupt types it needs to mask the interrupt line before
      changing the affinity. After affinity has changed the interrupt line
      is unmasked again, but only if it is not marked disabled.
      
      This breaks the lazy irq disable semantics and causes problems in
      suspend as the interrupt can be lost or wakeup functionality is
      broken.
      
      Check irqd_irq_masked() instead of irqd_irq_disabled() because
      irqd_irq_masked() is only set, when the core code actually masked the
      interrupt line. If it's not set, we unmask the interrupt and let the
      lazy irq disable logic deal with an eventually incoming interrupt.
      
      [ tglx: Massaged changelog and added a comment ]
      Signed-off-by: Nliu chuansheng <chuansheng.liu@intel.com>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A05DFB3@SHSMSX101.ccr.corp.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      99dd5497
    • R
      x86/apic/amd: Be more verbose about LVT offset assignments · 8abc3122
      Robert Richter 提交于
      Add information about LVT offset assignments to better debug firmware
      bugs related to this. See following examples.
      
       # dmesg | grep -i 'offset\|ibs'
       LVT offset 0 assigned for vector 0xf9
       [Firmware Bug]: cpu 0, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu
       [Firmware Bug]: cpu 0, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100)
       Failed to setup IBS, -22
      
      In this case the BIOS assigns both offsets for MCE (0xf9) and IBS
      (0x400) vectors to offset 0, which is why the second APIC setup (IBS)
      failed.
      
      With correct setup you get:
      
       # dmesg | grep -i 'offset\|ibs'
       LVT offset 0 assigned for vector 0xf9
       LVT offset 1 assigned for vector 0x400
       IBS: LVT offset 1 assigned
       perf: AMD IBS detected (0x00000007)
       oprofile: AMD IBS detected (0x00000007)
      
      Note: The vector includes also the message type to handle also NMIs
      (0x400). In the firmware bug message the format is the same as of the
      APIC500 register and includes the mask bit (bit 16) in addition.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8abc3122
    • D
      x86, tls: Off by one limit check · 8f0750f1
      Dan Carpenter 提交于
      These are used as offsets into an array of GDT_ENTRY_TLS_ENTRIES members
      so GDT_ENTRY_TLS_ENTRIES is one past the end of the array.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Link: http://lkml.kernel.org/r/20120324075250.GA28258@elgon.mountain
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8f0750f1
  2. 28 3月, 2012 2 次提交
  3. 26 3月, 2012 1 次提交
  4. 24 3月, 2012 2 次提交
  5. 23 3月, 2012 19 次提交
  6. 22 3月, 2012 13 次提交
    • L
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 5375871d
      Linus Torvalds 提交于
      Pull powerpc merge from Benjamin Herrenschmidt:
       "Here's the powerpc batch for this merge window.  It is going to be a
        bit more nasty than usual as in touching things outside of
        arch/powerpc mostly due to the big iSeriesectomy :-) We finally got
        rid of the bugger (legacy iSeries support) which was a PITA to
        maintain and that nobody really used anymore.
      
        Here are some of the highlights:
      
         - Legacy iSeries is gone.  Thanks Stephen ! There's still some bits
           and pieces remaining if you do a grep -ir series arch/powerpc but
           they are harmless and will be removed in the next few weeks
           hopefully.
      
         - The 'fadump' functionality (Firmware Assisted Dump) replaces the
           previous (equivalent) "pHyp assisted dump"...  it's a rewrite of a
           mechanism to get the hypervisor to do crash dumps on pSeries, the
           new implementation hopefully being much more reliable.  Thanks
           Mahesh Salgaonkar.
      
         - The "EEH" code (pSeries PCI error handling & recovery) got a big
           spring cleaning, motivated by the need to be able to implement a
           new backend for it on top of some new different type of firwmare.
      
           The work isn't complete yet, but a good chunk of the cleanups is
           there.  Note that this adds a field to struct device_node which is
           not very nice and which Grant objects to.  I will have a patch soon
           that moves that to a powerpc private data structure (hopefully
           before rc1) and we'll improve things further later on (hopefully
           getting rid of the need for that pointer completely).  Thanks Gavin
           Shan.
      
         - I dug into our exception & interrupt handling code to improve the
           way we do lazy interrupt handling (and make it work properly with
           "edge" triggered interrupt sources), and while at it found & fixed
           a wagon of issues in those areas, including adding support for page
           fault retry & fatal signals on page faults.
      
         - Your usual random batch of small fixes & updates, including a bunch
           of new embedded boards, both Freescale and APM based ones, etc..."
      
      I fixed up some conflicts with the generalized irq-domain changes from
      Grant Likely, hopefully correctly.
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (141 commits)
        powerpc/ps3: Do not adjust the wrapper load address
        powerpc: Remove the rest of the legacy iSeries include files
        powerpc: Remove the remaining CONFIG_PPC_ISERIES pieces
        init: Remove CONFIG_PPC_ISERIES
        powerpc: Remove FW_FEATURE ISERIES from arch code
        tty/hvc_vio: FW_FEATURE_ISERIES is no longer selectable
        powerpc/spufs: Fix double unlocks
        powerpc/5200: convert mpc5200 to use of_platform_populate()
        powerpc/mpc5200: add options to mpc5200_defconfig
        powerpc/mpc52xx: add a4m072 board support
        powerpc/mpc5200: update mpc5200_defconfig to fit for charon board
        Documentation/powerpc/mpc52xx.txt: Checkpatch cleanup
        powerpc/44x: Add additional device support for APM821xx SoC and Bluestone board
        powerpc/44x: Add support PCI-E for APM821xx SoC and Bluestone board
        MAINTAINERS: Update PowerPC 4xx tree
        powerpc/44x: The bug fixed support for APM821xx SoC and Bluestone board
        powerpc: document the FSL MPIC message register binding
        powerpc: add support for MPIC message register API
        powerpc/fsl: Added aliased MSIIR register address to MSI node in dts
        powerpc/85xx: mpc8548cds - add 36-bit dts
        ...
      5375871d
    • L
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · b57cb723
      Linus Torvalds 提交于
      Pull m68knommu arch updates from Greg Ungerer:
       "Includes a cleanup of the non-MMU linker script (it now almost
        exclusively uses the well defined linker script support macros and
        definitions).  Some more merging of MMU and non-MMU common files
        (specifically the arch process.c, ptrace and time.c).  And a big
        cleanup of the massively duplicated ColdFire device definition code.
      
        Overall we remove about 2000 lines of code, and end up with a single
        set of platform device definitions for the serial ports, ethernet
        ports and QSPI ports common in most ColdFire SoCs.
      
        I expect you will get a merge conflict on arch/m68k/kernel/process.c,
        in cpu_idle().  It should be relatively strait forward to fixup."
      
      And cpu_idle() conflict resolution was indeed trivial (merging the
      nommu/mmu versions of process.c trivially conflicting with the
      conversion to use the schedule_preempt_disabled() helper function)
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: (57 commits)
        m68knommu: factor more common ColdFire cpu reset code
        m68knommu: make 528x CPU reset register addressing consistent
        m68knommu: make 527x CPU reset register addressing consistent
        m68knommu: make 523x CPU reset register addressing consistent
        m68knommu: factor some common ColdFire cpu reset code
        m68knommu: move old ColdFire timers init from CPU init to timers code
        m68knommu: clean up init code in ColdFire 532x startup
        m68knommu: clean up init code in ColdFire 528x startup
        m68knommu: clean up init code in ColdFire 523x startup
        m68knommu: merge common ColdFire QSPI platform setup code
        m68knommu: make 532x QSPI platform addressing consistent
        m68knommu: make 528x QSPI platform addressing consistent
        m68knommu: make 527x QSPI platform addressing consistent
        m68knommu: make 5249 QSPI platform addressing consistent
        m68knommu: make 523x QSPI platform addressing consistent
        m68knommu: make 520x QSPI platform addressing consistent
        m68knommu: merge common ColdFire FEC platform setup code
        m68knommu: make 532x FEC platform addressing consistent
        m68knommu: make 528x FEC platform addressing consistent
        m68knommu: make 527x FEC platform addressing consistent
        ...
      b57cb723
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw · ad12ab25
      Linus Torvalds 提交于
      Pull gfs2 changes from Steven Whitehouse.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
        GFS2: Change truncate page allocation to be GFP_NOFS
        GFS2: call gfs2_write_alloc_required for each chunk
        GFS2: Clean up log flush header writing
        GFS2: Remove a __GFP_NOFAIL allocation
        GFS2: Flush pending glock work when evicting an inode
        GFS2: make sure rgrps are up to date in func gfs2_blk2rgrpd
        GFS2: Eliminate sd_rindex_mutex
        GFS2: Unlock rindex mutex on glock error
        GFS2: Make bd_cmp() static
        GFS2: Sort the ordered write list
        GFS2: FITRIM ioctl support
        GFS2: Move two functions from log.c to lops.c
        GFS2: glock statistics gathering
      ad12ab25
    • N
      memcg: avoid THP split in task migration · 12724850
      Naoya Horiguchi 提交于
      Currently we can't do task migration among memory cgroups without THP
      split, which means processes heavily using THP experience large overhead
      in task migration.  This patch introduces the code for moving charge of
      THP and makes THP more valuable.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHillf Danton <dhillf@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12724850
    • N
      thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE · d8c37c48
      Naoya Horiguchi 提交于
      These macros will be used in a later patch, where all usages are expected
      to be optimized away without #ifdef CONFIG_TRANSPARENT_HUGEPAGE.  But to
      detect unexpected usages, we convert the existing BUG() to BUILD_BUG().
      
      [akpm@linux-foundation.org: fix build in mm/pgtable-generic.c]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHillf Danton <dhillf@gmail.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8c37c48
    • N
      memcg: clean up existing move charge code · 8d32ff84
      Naoya Horiguchi 提交于
      - Replace lengthy function name is_target_pte_for_mc() with a shorter
        one in order to avoid ugly line breaks.
      
      - explicitly use MC_TARGET_* instead of simply using integers.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NHillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d32ff84
    • J
    • A
      mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event() · 45f3e385
      Anton Vorontsov 提交于
      In the following code:
      
      	if (type == _MEM)
      		thresholds = &memcg->thresholds;
      	else if (type == _MEMSWAP)
      		thresholds = &memcg->memsw_thresholds;
      	else
      		BUG();
      
      	BUG_ON(!thresholds);
      
      The BUG_ON() seems redundant.
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45f3e385
    • A
      mm/memcontrol.c: s/stealed/stolen/ · 13fd1dd9
      Andrew Morton 提交于
      A grammatical fix.
      
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13fd1dd9
    • K
      memcg: fix performance of mem_cgroup_begin_update_page_stat() · 4331f7d3
      KAMEZAWA Hiroyuki 提交于
      mem_cgroup_begin_update_page_stat() should be very fast because it's
      called very frequently.  Now, it needs to look up page_cgroup and its
      memcg....this is slow.
      
      This patch adds a global variable to check "any memcg is moving or not".
      With this, the caller doesn't need to visit page_cgroup and memcg.
      
      Here is a test result.  A test program makes page faults onto a file,
      MAP_SHARED and makes each page's page_mapcount(page) > 1, and free the
      range by madvise() and page fault again.  This program causes 26214400
      times of page fault onto a file(size was 1G.) and shows shows the cost of
      mem_cgroup_begin_update_page_stat().
      
      Before this patch for mem_cgroup_begin_update_page_stat()
      
          [kamezawa@bluextal test]$ time ./mmap 1G
      
          real    0m21.765s
          user    0m5.999s
          sys     0m15.434s
      
          27.46%     mmap  mmap               [.] reader
          21.15%     mmap  [kernel.kallsyms]  [k] page_fault
           9.17%     mmap  [kernel.kallsyms]  [k] filemap_fault
           2.96%     mmap  [kernel.kallsyms]  [k] __do_fault
           2.83%     mmap  [kernel.kallsyms]  [k] __mem_cgroup_begin_update_page_stat
      
      After this patch
      
          [root@bluextal test]# time ./mmap 1G
      
          real    0m21.373s
          user    0m6.113s
          sys     0m15.016s
      
      In usual path, calls to __mem_cgroup_begin_update_page_stat() goes away.
      
      Note: we may be able to remove this optimization in future if
            we can get pointer to memcg directly from struct page.
      
      [akpm@linux-foundation.org: don't return a void]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4331f7d3
    • K
      memcg: remove PCG_FILE_MAPPED · 2ff76f11
      KAMEZAWA Hiroyuki 提交于
      With the new lock scheme for updating memcg's page stat, we don't need a
      flag PCG_FILE_MAPPED which was duplicated information of page_mapped().
      
      [hughd@google.com: cosmetic fix]
      [hughd@google.com: add comment to MEM_CGROUP_CHARGE_TYPE_MAPPED case in __mem_cgroup_uncharge_common()]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ff76f11
    • K
      memcg: use new logic for page stat accounting · 89c06bd5
      KAMEZAWA Hiroyuki 提交于
      Now, page-stat-per-memcg is recorded into per page_cgroup flag by
      duplicating page's status into the flag.  The reason is that memcg has a
      feature to move a page from a group to another group and we have race
      between "move" and "page stat accounting",
      
      Under current logic, assume CPU-A and CPU-B.  CPU-A does "move" and CPU-B
      does "page stat accounting".
      
      When CPU-A goes 1st,
      
                  CPU-A                           CPU-B
                                          update "struct page" info.
          move_lock_mem_cgroup(memcg)
          see pc->flags
          copy page stat to new group
          overwrite pc->mem_cgroup.
          move_unlock_mem_cgroup(memcg)
                                          move_lock_mem_cgroup(mem)
                                          set pc->flags
                                          update page stat accounting
                                          move_unlock_mem_cgroup(mem)
      
      stat accounting is guarded by move_lock_mem_cgroup() and "move" logic
      (CPU-A) doesn't see changes in "struct page" information.
      
      But it's costly to have the same information both in 'struct page' and
      'struct page_cgroup'.  And, there is a potential problem.
      
      For example, assume we have PG_dirty accounting in memcg.
      PG_..is a flag for struct page.
      PCG_ is a flag for struct page_cgroup.
      (This is just an example. The same problem can be found in any
       kind of page stat accounting.)
      
      	  CPU-A                               CPU-B
            TestSet PG_dirty
            (delay)                        TestClear PG_dirty
                                           if (TestClear(PCG_dirty))
                                                memcg->nr_dirty--
            if (TestSet(PCG_dirty))
                memcg->nr_dirty++
      
      Here, memcg->nr_dirty = +1, this is wrong.  This race was reported by Greg
      Thelen <gthelen@google.com>.  Now, only FILE_MAPPED is supported but
      fortunately, it's serialized by page table lock and this is not real bug,
      _now_,
      
      If this potential problem is caused by having duplicated information in
      struct page and struct page_cgroup, we may be able to fix this by using
      original 'struct page' information.  But we'll have a problem in "move
      account"
      
      Assume we use only PG_dirty.
      
               CPU-A                   CPU-B
          TestSet PG_dirty
          (delay)                    move_lock_mem_cgroup()
                                     if (PageDirty(page))
                                            new_memcg->nr_dirty++
                                     pc->mem_cgroup = new_memcg;
                                     move_unlock_mem_cgroup()
          move_lock_mem_cgroup()
          memcg = pc->mem_cgroup
          new_memcg->nr_dirty++
      
      accounting information may be double-counted.  This was original reason to
      have PCG_xxx flags but it seems PCG_xxx has another problem.
      
      I think we need a bigger lock as
      
           move_lock_mem_cgroup(page)
           TestSetPageDirty(page)
           update page stats (without any checks)
           move_unlock_mem_cgroup(page)
      
      This fixes both of problems and we don't have to duplicate page flag into
      page_cgroup.  Please note: move_lock_mem_cgroup() is held only when there
      are possibility of "account move" under the system.  So, in most path,
      status update will go without atomic locks.
      
      This patch introduces mem_cgroup_begin_update_page_stat() and
      mem_cgroup_end_update_page_stat() both should be called at modifying
      'struct page' information if memcg takes care of it.  as
      
           mem_cgroup_begin_update_page_stat()
           modify page information
           mem_cgroup_update_page_stat()
           => never check any 'struct page' info, just update counters.
           mem_cgroup_end_update_page_stat().
      
      This patch is slow because we need to call begin_update_page_stat()/
      end_update_page_stat() regardless of accounted will be changed or not.  A
      following patch adds an easy optimization and reduces the cost.
      
      [akpm@linux-foundation.org: s/lock/locked/]
      [hughd@google.com: fix deadlock by avoiding stat lock when anon]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Greg Thelen <gthelen@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89c06bd5
    • K
      memcg: remove PCG_MOVE_LOCK flag from page_cgroup · 312734c0
      KAMEZAWA Hiroyuki 提交于
      PCG_MOVE_LOCK is used for bit spinlock to avoid race between overwriting
      pc->mem_cgroup and page statistics accounting per memcg.  This lock helps
      to avoid the race but the race is very rare because moving tasks between
      cgroup is not a usual job.  So, it seems using 1bit per page is too
      costly.
      
      This patch changes this lock as per-memcg spinlock and removes
      PCG_MOVE_LOCK.
      
      If smaller lock is required, we'll be able to add some hashes but I'd like
      to start from this.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      312734c0