1. 03 6月, 2011 1 次提交
  2. 01 6月, 2011 2 次提交
    • Y
      intel-iommu: Enable super page (2MiB, 1GiB, etc.) support · 6dd9a7c7
      Youquan Song 提交于
      There are no externally-visible changes with this. In the loop in the
      internal __domain_mapping() function, we simply detect if we are mapping:
        - size >= 2MiB, and
        - virtual address aligned to 2MiB, and
        - physical address aligned to 2MiB, and
        - on hardware that supports superpages.
      
      (and likewise for larger superpages).
      
      We automatically use a superpage for such mappings. We never have to
      worry about *breaking* superpages, since we trust that we will always
      *unmap* the same range that was mapped. So all we need to do is ensure
      that dma_pte_clear_range() will also cope with superpages.
      
      Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
      it can return a PTE at the appropriate level rather than always
      extending the page tables all the way down to level 1. Again, this is
      simplified by the fact that we should never encounter existing small
      pages when we're creating a mapping; any old mapping that used the same
      virtual range will have been entirely removed and its obsolete page
      tables freed.
      
      Provide an 'intel_iommu=sp_off' argument on the command line as a
      chicken bit. Not that it should ever be required.
      
      ==
      
      The original commit seen in the iommu-2.6.git was Youquan's
      implementation (and completion) of my own half-baked code which I'd
      typed into an email. Followed by half a dozen subsequent 'fixes'.
      
      I've taken the unusual step of rewriting history and collapsing the
      original commits in order to keep the main history simpler, and make
      life easier for the people who are going to have to backport this to
      older kernels. And also so I can give it a more coherent commit comment
      which (hopefully) gives a better explanation of what's going on.
      
      The original sequence of commits leading to identical code was:
      
      Youquan Song (3):
            intel-iommu: super page support
            intel-iommu: Fix superpage alignment calculation error
            intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()
      
      David Woodhouse (4):
            intel-iommu: Precalculate superpage support for dmar_domain
            intel-iommu: Fix hardware_largepage_caps()
            intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
            intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      6dd9a7c7
    • R
      mtd: fix physmap.h warnings · 63da0290
      Randy Dunlap 提交于
      Fix build warnings in physmap.h:
      
      include/linux/mtd/physmap.h:25: warning: 'struct platform_device' declared inside parameter list
      include/linux/mtd/physmap.h:25: warning: its scope is only this definition or declaration, which is probably not what you want
      include/linux/mtd/physmap.h:26: warning: 'struct platform_device' declared inside parameter list
      include/linux/mtd/physmap.h:27: warning: 'struct platform_device' declared inside parameter list
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      63da0290
  3. 30 5月, 2011 10 次提交
  4. 29 5月, 2011 7 次提交
    • M
      dm kcopyd: return client directly and not through a pointer · fa34ce73
      Mikulas Patocka 提交于
      Return client directly from dm_kcopyd_client_create, not through a
      parameter, making it consistent with dm_io_client_create.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fa34ce73
    • M
      dm kcopyd: reserve fewer pages · 5f43ba29
      Mikulas Patocka 提交于
      Reserve just the minimum of pages needed to process one job.
      
      Because we allocate pages from page allocator, we don't need to reserve
      a large number of pages.  The maximum job size is SUB_JOB_SIZE and we
      calculate the number of reserved pages based on this.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      5f43ba29
    • M
      dm io: use fixed initial mempool size · bda8efec
      Mikulas Patocka 提交于
      Replace the arbitrary calculation of an initial io struct mempool size
      with a constant.
      
      The code calculated the number of reserved structures based on the request
      size and used a "magic" multiplication constant of 4.  This patch changes
      it to reserve a fixed number - itself still chosen quite arbitrarily.
      Further testing might show if there is a better number to choose.
      
      Note that if there is no memory pressure, we can still allocate an
      arbitrary number of "struct io" structures.  One structure is enough to
      process the whole request.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      bda8efec
    • M
      dm table: allow targets to support discards internally · 4c259327
      Mike Snitzer 提交于
      Permit a target to support discards regardless of whether or not all its
      underlying devices do.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4c259327
    • H
      [S390] mm: fix storage key handling · a43a9d93
      Heiko Carstens 提交于
      page_get_storage_key() and page_set_storage_key() expect a page address
      and not its page frame number. This got inconsistent with 2d42552d
      "[S390] merge page_test_dirty and page_clear_dirty".
      
      Result is that we read/write storage keys from random pages and do not
      have a working dirty bit tracking at all.
      E.g. SetPageUpdate() doesn't clear the dirty bit of requested pages, which
      for example ext4 doesn't like very much and panics after a while.
      
      Unable to handle kernel paging request at virtual user address (null)
      Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in:
      CPU: 1 Not tainted 2.6.39-07551-g139f37f5-dirty #152
      Process flush-94:0 (pid: 1576, task: 000000003eb34538, ksp: 000000003c287b70)
      Krnl PSW : 0704c00180000000 0000000000316b12 (jbd2_journal_file_inode+0x10e/0x138)
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
      Krnl GPRS: 0000000000000000 0000000000000000 0000000000000000 0700000000000000
                 0000000000316a62 000000003eb34cd0 0000000000000025 000000003c287b88
                 0000000000000001 000000003c287a70 000000003f1ec678 000000003f1ec000
                 0000000000000000 000000003e66ec00 0000000000316a62 000000003c287988
      Krnl Code: 0000000000316b04: f0a0000407f4       srp     4(11,%r0),2036,0
                 0000000000316b0a: b9020022           ltgr    %r2,%r2
                 0000000000316b0e: a7740015           brc     7,316b38
                >0000000000316b12: e3d0c0000024       stg     %r13,0(%r12)
                 0000000000316b18: 4120c010           la      %r2,16(%r12)
                 0000000000316b1c: 4130d060           la      %r3,96(%r13)
                 0000000000316b20: e340d0600004       lg      %r4,96(%r13)
                 0000000000316b26: c0e50002b567       brasl   %r14,36d5f4
      Call Trace:
      ([<0000000000316a62>] jbd2_journal_file_inode+0x5e/0x138)
       [<00000000002da13c>] mpage_da_map_and_submit+0x2e8/0x42c
       [<00000000002daac2>] ext4_da_writepages+0x2da/0x504
       [<00000000002597e8>] writeback_single_inode+0xf8/0x268
       [<0000000000259f06>] writeback_sb_inodes+0xd2/0x18c
       [<000000000025a700>] writeback_inodes_wb+0x80/0x168
       [<000000000025aa92>] wb_writeback+0x2aa/0x324
       [<000000000025abde>] wb_do_writeback+0xd2/0x274
       [<000000000025ae3a>] bdi_writeback_thread+0xba/0x1c4
       [<00000000001737be>] kthread+0xa6/0xb0
       [<000000000056c1da>] kernel_thread_starter+0x6/0xc
       [<000000000056c1d4>] kernel_thread_starter+0x0/0xc
      INFO: lockdep is turned off.
      Last Breaking-Event-Address:
       [<0000000000316a8a>] jbd2_journal_file_inode+0x86/0x138
      Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a43a9d93
    • T
      idle governor: Avoid lock acquisition to read pm_qos before entering idle · 333c5ae9
      Tim Chen 提交于
      Thanks to the reviews and comments by Rafael, James, Mark and Andi.
      Here's version 2 of the patch incorporating your comments and also some
      update to my previous patch comments.
      
      I noticed that before entering idle state, the menu idle governor will
      look up the current pm_qos target value according to the list of qos
      requests received.  This look up currently needs the acquisition of a
      lock to access the list of qos requests to find the qos target value,
      slowing down the entrance into idle state due to contention by multiple
      cpus to access this list.  The contention is severe when there are a lot
      of cpus waking and going into idle.  For example, for a simple workload
      that has 32 pair of processes ping ponging messages to each other, where
      64 cpu cores are active in test system, I see the following profile with
      37.82% of cpu cycles spent in contention of pm_qos_lock:
      
      -     37.82%          swapper  [kernel.kallsyms]          [k]
      _raw_spin_lock_irqsave
         - _raw_spin_lock_irqsave
            - 95.65% pm_qos_request
                 menu_select
                 cpuidle_idle_call
               - cpu_idle
                    99.98% start_secondary
      
      A better approach will be to cache the updated pm_qos target value so
      reading it does not require lock acquisition as in the patch below.
      With this patch the contention for pm_qos_lock is removed and I saw a
      2.2X increase in throughput for my message passing workload.
      
      cc: stable@kernel.org
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJames Bottomley <James.Bottomley@suse.de>
      Acked-by: Nmark gross <markgross@thegnar.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      333c5ae9
    • A
      Cache xattr security drop check for write v2 · 69b45732
      Andi Kleen 提交于
      Some recent benchmarking on btrfs showed that a major scaling bottleneck
      on large systems on btrfs is currently the xattr lookup on every write.
      
      Why xattr lookup on every write I hear you ask?
      
      write wants to drop suid and security related xattrs that could set o
      capabilities for executables.  To do that it currently looks up
      security.capability on EVERY write (even for non executables) to decide
      whether to drop it or not.
      
      In btrfs this causes an additional tree walk, hitting some per file system
      locks and quite bad scalability. In a simple read workload on a 8S
      system I saw over 90% CPU time in spinlocks related to that.
      
      Chris Mason tells me this is also a problem in ext4, where it hits
      the global mbcache lock.
      
      This patch adds a simple per inode to avoid this problem.  We only
      do the lookup once per file and then if there is no xattr cache
      the decision. All xattr changes clear the flag.
      
      I also used the same flag to avoid the suid check, although
      that one is pretty cheap.
      
      A file system can also set this flag when it creates the inode,
      if it has a cheap way to do so.  This is done for some common file systems
      in followon patches.
      
      With this patch a major part of the lock contention disappears
      for btrfs. Some testing on smaller systems didn't show significant
      performance changes, but at least it helps the larger systems
      and is generally more efficient.
      
      v2: Rename is_sgid. add file system helper.
      Cc: chris.mason@oracle.com
      Cc: josef@redhat.com
      Cc: viro@zeniv.linux.org.uk
      Cc: agruen@linbit.com
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      69b45732
  5. 28 5月, 2011 7 次提交
  6. 27 5月, 2011 13 次提交