1. 23 11月, 2005 3 次提交
    • H
      [PATCH] unpaged: unifdefed PageCompound · 664beed0
      Hugh Dickins 提交于
      It looks like snd_xxx is not the only nopage to be using PageReserved as a way
      of holding a high-order page together: which no longer works, but is masked by
      our failure to free from VM_RESERVED areas.  We cannot fix that bug without
      first substituting another way to hold the high-order page together, while
      farming out the 0-order pages from within it.
      
      That's just what PageCompound is designed for, but it's been kept under
      CONFIG_HUGETLB_PAGE.  Remove the #ifdefs: which saves some space (out- of-line
      put_page), doesn't slow down what most needs to be fast (already using
      hugetlb), and unifies the way we handle high-order pages.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      664beed0
    • H
      [PATCH] unpaged: private write VM_RESERVED · 83e9b7e9
      Hugh Dickins 提交于
      The PageReserved removal in 2.6.15-rc1 issued a "deprecated" message when you
      tried to mmap or mprotect MAP_PRIVATE PROT_WRITE a VM_RESERVED, and failed
      with -EACCES: because do_wp_page lacks the refinement to COW pages in those
      areas, nor do we expect to find anonymous pages in them; and it seemed just
      bloat to add code for handling such a peculiar case.  But immediately it
      caused vbetool and ddcprobe (using lrmi) to fail.
      
      So revert the "deprecated" messages, letting mmap and mprotect succeed.  But
      leave do_wp_page's BUG_ON(vma->vm_flags & VM_RESERVED) in place until we've
      added the code to do it right: so this particular patch is only good if the
      app doesn't really need to write to that private area.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      83e9b7e9
    • H
      [PATCH] unpaged: get_user_pages VM_RESERVED · ed5297a9
      Hugh Dickins 提交于
      The PageReserved removal in 2.6.15-rc1 prohibited get_user_pages on the areas
      flagged VM_RESERVED in place of PageReserved.  That is correct in theory - we
      ought not to interfere with struct pages in such a reserved area; but in
      practice it broke BTTV for one.
      
      So revert to prohibiting only on VM_IO: if someone gets into trouble with
      get_user_pages on VM_RESERVED, it'll just be a "don't do that".
      
      You can argue that videobuf_mmap_mapper shouldn't set VM_RESERVED in the first
      place, but now's not the time for breaking drivers without notice.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ed5297a9
  2. 19 11月, 2005 1 次提交
  3. 18 11月, 2005 2 次提交
  4. 15 11月, 2005 3 次提交
    • A
      [PATCH] x86_64: Remove obsolete ARCH_HAS_ATOMIC_UNSIGNED and page_flags_t · 07808b74
      Andi Kleen 提交于
      Has been introduced for x86-64 at some point to save memory
      in struct page, but has been obsolete for some time. Just
      remove it.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      07808b74
    • A
      b0d41693
    • A
      [PATCH] x86_64: Add 4GB DMA32 zone · a2f1b424
      Andi Kleen 提交于
      Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.
      
      As a bit of historical background: when the x86-64 port
      was originally designed we had some discussion if we should
      use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
      both. Both was ruled out at this point because it was in early
      2.4 when VM is still quite shakey and had bad troubles even
      dealing with one DMA zone.  We settled on the 16MB DMA zone mainly
      because we worried about older soundcards and the floppy.
      
      But this has always caused problems since then because
      device drivers had trouble getting enough DMA able memory. These days
      the VM works much better and the wide use of NUMA has proven
      it can deal with many zones successfully.
      
      So this patch adds both zones.
      
      This helps drivers who need a lot of memory below 4GB because
      their hardware is not accessing more (graphic drivers - proprietary
      and free ones, video frame buffer drivers, sound drivers etc.).
      Previously they could only use IOMMU+16MB GFP_DMA, which
      was not enough memory.
      
      Another common problem is that hardware who has full memory
      addressing for >4GB misses it for some control structures in memory
      (like transmit rings or other metadata).  They tended to allocate memory
      in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
      but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
      (even on AMD systems the IOMMU tends to be quite small) especially if you have
      many devices.  With the new zone pci_alloc_consistent can just put
      this stuff into memory below 4GB which works better.
      
      One argument was still if the zone should be 4GB or 2GB. The main
      motivation for 2GB would be an unnamed not so unpopular hardware
      raid controller (mostly found in older machines from a particular four letter
      company) who has a strange 2GB restriction in firmware. But
      that one works ok with swiotlb/IOMMU anyways, so it doesn't really
      need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
      it seems to be the most common restriction.
      
      The new zone is so far added only for x86-64.
      
      For other architectures who don't set up this
      new zone nothing changes. Architectures can set a compatibility
      define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
      as GFP_DMA. Otherwise it's a nop because on 32bit architectures
      it's normally not needed because GFP_NORMAL (=0) is DMA able
      enough.
      
      One problem is still that GFP_DMA means different things on different
      architectures. e.g. some drivers used to have #ifdef ia64  use GFP_DMA
      (trusting it to be 4GB) #elif __x86_64__ (use other hacks like
      the swiotlb because 16MB is not enough) ... . This was quite
      ugly and is now obsolete.
      
      These should be now converted to use GFP_DMA32 unconditionally. I haven't done
      this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
      which will use GFP_DMA32 transparently.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a2f1b424
  5. 14 11月, 2005 6 次提交
  6. 11 11月, 2005 1 次提交
  7. 08 11月, 2005 1 次提交
  8. 07 11月, 2005 15 次提交
  9. 02 11月, 2005 1 次提交
  10. 31 10月, 2005 7 次提交
    • M
      [PATCH] Error checks omitted in init_tmpfs() in mm/tiny-shmem.c · 5d57bd39
      Matt Mackall 提交于
      From: Hareesh Nagarajan <hnagar2@gmail.com>
      Signed-off-by: NHareesh Nagarajan <hnagar2@gmail.com>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5d57bd39
    • P
      [PATCH] RCU torture-testing kernel module · a241ec65
      Paul E. McKenney 提交于
      This patch is a rewrite of the one submitted on October 1st, using modules
      (http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2).
      
      This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an
      intense torture test of the RCU infratructure.  This is needed due to the
      continued changes to the RCU infrastructure to accommodate dynamic ticks,
      CPU hotplug, realtime, and so on.  Most of the code is in a separate file
      that is compiled only if the CONFIG variable is set.  Documentation on how
      to run the test and interpret the output is also included.
      
      This code has been tested on i386 and ppc64, and an earlier version of the
      code has received extensive testing on a number of architectures as part of
      the PREEMPT_RT patchset.
      Signed-off-by: N"Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a241ec65
    • T
      [PATCH] vm: remove redundant assignment from __pagevec_release_nonlru() · c7e9dd4d
      Tejun Heo 提交于
      This patch removes redundant assignment from __pagevec_release_nonlru().
      pages_to_free.cold is set to pvec->cold by pagevec_init() call right above
      the assignment.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c7e9dd4d
    • T
      [PATCH] fs: error case fix in __generic_file_aio_read · 39e88ca2
      Tejun Heo 提交于
      When __generic_file_aio_read() hits an error during reading, it reports the
      error iff nothing has successfully been read yet.  This is condition - when
      an error occurs, if nothing has been read/written, report the error code;
      otherwise, report the amount of bytes successfully transferred upto that
      point.
      
      This corner case can be exposed by performing readv(2) with the following
      iov.
      
       iov[0] = len0 @ ptr0
       iov[1] = len1 @ NULL (or any other invalid pointer)
       iov[2] = len2 @ ptr2
      
      When file size is enough, performing above readv(2) results in
      
       len0 bytes from file_pos @ ptr0
       len2 bytes from file_pos + len0 @ ptr2
      
      And the return value is len0 + len2.  Test program is attached to this
      mail.
      
      This patch makes __generic_file_aio_read()'s error handling identical to
      other functions.
      
      #include <stdio.h>
      #include <stdlib.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <sys/uio.h>
      #include <errno.h>
      #include <string.h>
      
      int main(int argc, char **argv)
      {
      	const char *path;
      	struct stat stbuf;
      	size_t len0, len1;
      	void *buf0, *buf1;
      	struct iovec iov[3];
      	int fd, i;
      	ssize_t ret;
      
      	if (argc < 2) {
      		fprintf(stderr, "Usage: testreadv path (better be a "
      			"small text file)\n");
      		return 1;
      	}
      	path = argv[1];
      
      	if (stat(path, &stbuf) < 0) {
      		perror("stat");
      		return 1;
      	}
      
      	len0 = stbuf.st_size / 2;
      	len1 = stbuf.st_size - len0;
      
      	if (!len0 || !len1) {
      		fprintf(stderr, "Dude, file is too small\n");
      		return 1;
      	}
      
      	if ((fd = open(path, O_RDONLY)) < 0) {
      		perror("open");
      		return 1;
      	}
      
      	if (!(buf0 = malloc(len0)) || !(buf1 = malloc(len1))) {
      		perror("malloc");
      		return 1;
      	}
      
      	memset(buf0, 0, len0);
      	memset(buf1, 0, len1);
      
      	iov[0].iov_base = buf0;
      	iov[0].iov_len = len0;
      	iov[1].iov_base = NULL;
      	iov[1].iov_len = len1;
      	iov[2].iov_base = buf1;
      	iov[2].iov_len = len1;
      
      	printf("vector ");
      	for (i = 0; i < 3; i++)
      		printf("%p:%zu ", iov[i].iov_base, iov[i].iov_len);
      	printf("\n");
      
      	ret = readv(fd, iov, 3);
      	if (ret < 0)
      		perror("readv");
      
      	printf("readv returned %zd\nbuf0 = [%s]\nbuf1 = [%s]\n",
      	       ret, (char *)buf0, (char *)buf1);
      
      	return 0;
      }
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      39e88ca2
    • P
      [PATCH] cpusets: automatic numa mempolicy rebinding · 68860ec1
      Paul Jackson 提交于
      This patch automatically updates a tasks NUMA mempolicy when its cpuset
      memory placement changes.  It does so within the context of the task,
      without any need to support low level external mempolicy manipulation.
      
      If a system is not using cpusets, or if running on a system with just the
      root (all-encompassing) cpuset, then this remap is a no-op.  Only when a
      task is moved between cpusets, or a cpusets memory placement is changed
      does the following apply.  Otherwise, the main routine below,
      rebind_policy() is not even called.
      
      When mixing cpusets, scheduler affinity, and NUMA mempolicies, the
      essential role of cpusets is to place jobs (several related tasks) on a set
      of CPUs and Memory Nodes, the essential role of sched_setaffinity is to
      manage a jobs processor placement within its allowed cpuset, and the
      essential role of NUMA mempolicy (mbind, set_mempolicy) is to manage a jobs
      memory placement within its allowed cpuset.
      
      However, CPU affinity and NUMA memory placement are managed within the
      kernel using absolute system wide numbering, not cpuset relative numbering.
      
      This is ok until a job is migrated to a different cpuset, or what's the
      same, a jobs cpuset is moved to different CPUs and Memory Nodes.
      
      Then the CPU affinity and NUMA memory placement of the tasks in the job
      need to be updated, to preserve their cpuset-relative position.  This can
      be done for CPU affinity using sched_setaffinity() from user code, as one
      task can modify anothers CPU affinity.  This cannot be done from an
      external task for NUMA memory placement, as that can only be modified in
      the context of the task using it.
      
      However, it easy enough to remap a tasks NUMA mempolicy automatically when
      a task is migrated, using the existing cpuset mechanism to trigger a
      refresh of a tasks memory placement after its cpuset has changed.  All that
      is needed is the old and new nodemask, and notice to the task that it needs
      to rebind its mempolicy.  The tasks mems_allowed has the old mask, the
      tasks cpuset has the new mask, and the existing
      cpuset_update_current_mems_allowed() mechanism provides the notice.  The
      bitmap/cpumask/nodemask remap operators provide the cpuset relative
      calculations.
      
      This patch leaves open a couple of issues:
      
       1) Updating vma and shmfs/tmpfs/hugetlbfs memory policies:
      
          These mempolicies may reference nodes outside of those allowed to
          the current task by its cpuset.  Tasks are migrated as part of jobs,
          which reside on what might be several cpusets in a subtree.  When such
          a job is migrated, all NUMA memory policy references to nodes within
          that cpuset subtree should be translated, and references to any nodes
          outside that subtree should be left untouched.  A future patch will
          provide the cpuset mechanism needed to mark such subtrees.  With that
          patch, we will be able to correctly migrate these other memory policies
          across a job migration.
      
       2) Updating cpuset, affinity and memory policies in user space:
      
          This is harder.  Any placement state stored in user space using
          system-wide numbering will be invalidated across a migration.  More
          work will be required to provide user code with a migration-safe means
          to manage its cpuset relative placement, while preserving the current
          API's that pass system wide numbers, not cpuset relative numbers across
          the kernel-user boundary.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      68860ec1
    • P
      [PATCH] cpusets: confine pdflush to its cpuset · 28a42b9e
      Paul Jackson 提交于
      This patch keeps pdflush daemons on the same cpuset as their parent, the
      kthread daemon.
      
      Some large NUMA configurations put as much as they can of kernel threads
      and other classic Unix load in what's called a bootcpuset, keeping the rest
      of the system free for dedicated jobs.
      
      This effort is thwarted by pdflush, which dynamically destroys and
      recreates pdflush daemons depending on load.
      
      It's easy enough to force the originally created pdflush deamons into the
      bootcpuset, at system boottime.  But the pdflush threads created later were
      allowed to run freely across the system, due to the necessary line in their
      startup kthread():
      
              set_cpus_allowed(current, CPU_MASK_ALL);
      
      By simply coding pdflush to start its threads with the cpus_allowed
      restrictions of its cpuset (inherited from kthread, its parent) we can
      ensure that dynamically created pdflush threads are also kept in the
      bootcpuset.
      
      On systems w/o cpusets, or w/o a bootcpuset implementation, the following
      will have no affect, leaving pdflush to run on any CPU, as before.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      28a42b9e
    • J
      [PATCH] ext3: Fix unmapped buffers in transaction's lists · aaa4059b
      Jan Kara 提交于
      Fix the problem (BUG 4964) with unmapped buffers in transaction's
      t_sync_data list.  The problem is we need to call filesystem's own
      invalidatepage() from block_write_full_page().
      
      block_write_full_page() must call filesystem's invalidatepage().  Otherwise
      following nasty race can happen:
      
         proc 1                                        proc 2
         ------                                        ------
      - write some new data to 'offset'
        => bh gets to the transactions data list
                                                    - starts truncate
                                                      => i_size set to new size
      - mpage_writepages()
        - ext3_ordered_writepage() to 'offset'
          - block_write_full_page()
            - page->index > end_index+1
              - block_invalidatepage()
                - discard_buffer()
                  - clear_buffer_mapped()
      
      - commit triggers and finds unmapped buffer - BOOM!
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aaa4059b