1. 11 3月, 2014 2 次提交
  2. 10 2月, 2014 2 次提交
    • T
      locking/mcs: Allow architecture specific asm files to be used for contended case · ddf1d169
      Tim Chen 提交于
      This patch allows each architecture to add its specific assembly optimized
      arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for
      MCS lock and unlock functions.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: AswinChandramouleeswaran <aswin@hp.com>
      Cc: George Spelvin <linux@horizon.com>
      Cc: Rik vanRiel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: MichelLespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "Figo.zhang" <figo1802@gmail.com>
      Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESKSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ddf1d169
    • T
      locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order · b119fa61
      Tim Chen 提交于
      We perform a clean up of the Kbuid files in each architecture.
      We order the files in each Kbuild in alphabetical order
      by running the below script.
      
      for i in arch/*/include/asm/Kbuild
      do
              cat $i | gawk '/^generic-y/ {
                      i = 3;
                      do {
                              for (; i <= NF; i++) {
                                      if ($i == "\\") {
                                              getline;
                                              i = 1;
                                              continue;
                                      }
                                      if ($i != "")
                                              hdr[$i] = $i;
                              }
                              break;
                      } while (1);
                      next;
              }
              // {
                      print $0;
              }
              END {
                      n = asort(hdr);
                      for (i = 1; i <= n; i++)
                              print "generic-y += " hdr[i];
              }' > ${i}.sorted;
              mv ${i}.sorted $i;
      done
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
      Cc: AswinChandramouleeswaran <aswin@hp.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "Figo.zhang" <figo1802@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Waiman Long <waiman.long@hp.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: George Spelvin <linux@horizon.com>
      Cc: MichelLespinasse <walken@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      [ Fixed build bug. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b119fa61
  3. 29 1月, 2014 1 次提交
  4. 19 1月, 2014 1 次提交
    • M
      net: introduce SO_BPF_EXTENSIONS · ea02f941
      Michal Sekletar 提交于
      For user space packet capturing libraries such as libpcap, there's
      currently only one way to check which BPF extensions are supported
      by the kernel, that is, commit aa1113d9 ("net: filter: return
      -EINVAL if BPF_S_ANC* operation is not supported"). For querying all
      extensions at once this might be rather inconvenient.
      
      Therefore, this patch introduces a new option which can be used as
      an argument for getsockopt(), and allows one to obtain information
      about which BPF extensions are supported by the current kernel.
      
      As David Miller suggests, we do not need to define any bits right
      now and status quo can just return 0 in order to state that this
      versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
      additions to BPF extensions need to add their bits to the
      bpf_tell_extensions() function, as documented in the comment.
      Signed-off-by: NMichal Sekletar <msekleta@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Reviewed-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea02f941
  5. 12 1月, 2014 2 次提交
    • P
      arch: Introduce smp_load_acquire(), smp_store_release() · 47933ad4
      Peter Zijlstra 提交于
      A number of situations currently require the heavyweight smp_mb(),
      even though there is no need to order prior stores against later
      loads.  Many architectures have much cheaper ways to handle these
      situations, but the Linux kernel currently has no portable way
      to make use of them.
      
      This commit therefore supplies smp_load_acquire() and
      smp_store_release() to remedy this situation.  The new
      smp_load_acquire() primitive orders the specified load against
      any subsequent reads or writes, while the new smp_store_release()
      primitive orders the specifed store against any prior reads or
      writes.  These primitives allow array-based circular FIFOs to be
      implemented without an smp_mb(), and also allow a theoretical
      hole in rcu_assign_pointer() to be closed at no additional
      expense on most architectures.
      
      In addition, the RCU experience transitioning from explicit
      smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
      and rcu_assign_pointer(), respectively resulted in substantial
      improvements in readability.  It therefore seems likely that
      replacing other explicit barriers with smp_load_acquire() and
      smp_store_release() will provide similar benefits.  It appears
      that roughly half of the explicit barriers in core kernel code
      might be so replaced.
      
      [Changelog by PaulMck]
      Reviewed-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      47933ad4
    • P
      arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h · 93ea02bb
      Peter Zijlstra 提交于
      We're going to be adding a few new barrier primitives, and in order to
      avoid endless duplication make more agressive use of
      asm-generic/barrier.h.
      
      Change the asm-generic/barrier.h such that it allows partial barrier
      definitions and fills out the rest with defaults.
      
      There are a few architectures (m32r, m68k) that could probably
      do away with their barrier.h file entirely but are kept for now due to
      their unconventional nop() implementation.
      Suggested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20131213150640.846368594@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      93ea02bb
  6. 05 1月, 2014 1 次提交
  7. 19 12月, 2013 1 次提交
    • R
      mm: fix TLB flush race between migration, and change_protection_range · 20841405
      Rik van Riel 提交于
      There are a few subtle races, between change_protection_range (used by
      mprotect and change_prot_numa) on one side, and NUMA page migration and
      compaction on the other side.
      
      The basic race is that there is a time window between when the PTE gets
      made non-present (PROT_NONE or NUMA), and the TLB is flushed.
      
      During that time, a CPU may continue writing to the page.
      
      This is fine most of the time, however compaction or the NUMA migration
      code may come in, and migrate the page away.
      
      When that happens, the CPU may continue writing, through the cached
      translation, to what is no longer the current memory location of the
      process.
      
      This only affects x86, which has a somewhat optimistic pte_accessible.
      All other architectures appear to be safe, and will either always flush,
      or flush whenever there is a valid mapping, even with no permissions
      (SPARC).
      
      The basic race looks like this:
      
      CPU A			CPU B			CPU C
      
      						load TLB entry
      make entry PTE/PMD_NUMA
      			fault on entry
      						read/write old page
      			start migrating page
      			change PTE/PMD to new page
      						read/write old page [*]
      flush TLB
      						reload TLB from new entry
      						read/write new page
      						lose data
      
      [*] the old page may belong to a new user at this point!
      
      The obvious fix is to flush remote TLB entries, by making sure that
      pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
      still be accessible if there is a TLB flush pending for the mm.
      
      This should fix both NUMA migration and compaction.
      
      [mgorman@suse.de: fix build]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20841405
  8. 18 12月, 2013 1 次提交
  9. 19 11月, 2013 1 次提交
  10. 15 11月, 2013 1 次提交
  11. 14 11月, 2013 3 次提交
  12. 13 11月, 2013 8 次提交
    • E
      errno.h: remove "NFS" from descriptions in comments · 0ca43435
      Eric Sandeen 提交于
      glibc recently changed the error string for ESTALE to remove "NFS" -
      
      https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=96945714ec61951cc748da2b4b8a80cf02127ee9
      
      from: [ERR_REMAP (ESTALE)] = N_("Stale NFS file handle"),
      to:   [ERR_REMAP (ESTALE)] = N_("Stale file handle"),
      
      And some have expressed concern that the kernel's errno.h
      comments still refer to NFS.
      
      So make that change... note that this is a comment-only change,
      and has no functional difference.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ca43435
    • D
      sparc64: Move to 64-bit PGDs and PMDs. · 2b77933c
      David S. Miller 提交于
      To make the page tables compact, we were using 32-bit PGDs and PMDs.
      We only had to support <= 43 bits of physical addresses so this was
      quite feasible.
      
      In order to support larger physical addresses we have to move to
      64-bit PGDs and PMDs.
      
      Most of the changes are straight-forward:
      
      1) {pgd,pmd}_t --> unsigned long
      
      2) Anything that tries to use plain "unsigned int" types with pgd/pmd
         values needs to be adjusted.  In particular things like "0U" become
         "0UL".
      
      3) {PGDIR,PMD}_BITS decrease by one.
      
      4) In the assembler page table walkers, use "ldxa" instead of "lduwa"
         and adjust the low bit masks to clear out the low 3 bits instead of
         just the low 2 bits during pgd/pmd address formation.
      
      Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the
      swapper_{pg_dir,low_pmd_dir} arrays.
      
      This patch does not try to take advantage of having 64-bits in the
      PMDs to simplify the hugepage code, that will come in a subsequent
      change.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b77933c
    • D
      sparc64: Move from 4MB to 8MB huge pages. · 37b3a8ff
      David S. Miller 提交于
      The impetus for this is that we would like to move to 64-bit PMDs and
      PGDs, but that would result in only supporting a 42-bit address space
      with the current page table layout.  It'd be nice to support at least
      43-bits.
      
      The reason we'd end up with only 42-bits after making PMDs and PGDs
      64-bit is that we only use half-page sized PTE tables in order to make
      PMDs line up to 4MB, the hardware huge page size we use.
      
      So what we do here is we make huge pages 8MB, and fabricate them using
      4MB hw TLB entries.
      
      Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
      places that really need to operate on hardware 4MB pages.
      
      Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
      PGD_SHIFT, and the build time CPP test as needed.  Use a CPP test to
      make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.
      
      This makes the pgtable cache completely unused, so remove the code
      managing it and the state used in mm_context_t.  Now we have less
      spinlocks taken in the page table allocation path.
      
      The technique we use to fabricate the 8MB pages is to transfer bit 22
      from the missing virtual address into the PTEs physical address field.
      That takes care of the transparent huge pages case.
      
      For hugetlb, we fill things in at the PTE level and that code already
      puts the sub huge page physical bits into the PTEs, based upon the
      offset, so there is nothing special we need to do.  It all just works
      out.
      
      So, a small amount of complexity in the THP case, but this code is
      about to get much simpler when we move the 64-bit PMDs as we can move
      away from the fancy 32-bit huge PMD encoding and just put a real PTE
      value in there.
      
      With bug fixes and help from Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37b3a8ff
    • D
      sparc64: Make PAGE_OFFSET variable. · b2d43834
      David S. Miller 提交于
      Choose PAGE_OFFSET dynamically based upon cpu type.
      
      Original UltraSPARC-I (spitfire) chips only supported a 44-bit
      virtual address space.
      
      Newer chips (T4 and later) support 52-bit virtual addresses
      and up to 47-bits of physical memory space.
      
      Therefore we have to adjust PAGE_SIZE dynamically based upon
      the capabilities of the chip.
      
      Note that this change alone does not allow us to support > 43-bit
      physical memory, to do that we need to re-arrange our page table
      support.  The current encodings of the pmd_t and pgd_t pointers
      restricts us to "32 + 11" == 43 bits.
      
      This change can waste quite a bit of memory for the various tables.
      In particular, a future change should work to size and allocate
      kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
      This isn't easy as we really cannot take a TLB miss when accessing
      kern_linear_bitmap[].  We'd have to lock it into the TLB or similar.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      b2d43834
    • D
      sparc64: Fix inconsistent max-physical-address defines. · f998c9c0
      David S. Miller 提交于
      Some parts of the code use '41' others use '42', make them
      all use the same value.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      f998c9c0
    • D
      sparc64: Document the shift counts used to validate linear kernel addresses. · bb7b4353
      David S. Miller 提交于
      This way we can see exactly what they are derived from, and in particular
      how they would change if we were to use a different PAGE_OFFSET value.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      bb7b4353
    • D
      sparc64: Define PAGE_OFFSET in terms of physical address bits. · e0a45e35
      David S. Miller 提交于
      This makes clearer the implications for a given choosen
      value.
      
      Based upon patches by Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      e0a45e35
    • D
      sparc64: Clean up 64-bit mmap exclusion defines. · c920745e
      David S. Miller 提交于
      Older UltraSPARC chips had an address space hole due to the MMU only
      supporting 44-bit virtual addresses.
      
      The top end of this hole also has the same value as the current
      definition of PAGE_OFFSET, so this can be confusing.
      
      Consolidate the defines for the userspace mmap exclusion range into
      page_64.h and use them in sys_sparc_64.c and hugetlbpage.c
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      c920745e
  13. 11 10月, 2013 1 次提交
  14. 10 10月, 2013 2 次提交
  15. 03 10月, 2013 1 次提交
  16. 29 9月, 2013 1 次提交
    • E
      net: introduce SO_MAX_PACING_RATE · 62748f32
      Eric Dumazet 提交于
      As mentioned in commit afe4fd06 ("pkt_sched: fq: Fair Queue packet
      scheduler"), this patch adds a new socket option.
      
      SO_MAX_PACING_RATE offers the application the ability to cap the
      rate computed by transport layer. Value is in bytes per second.
      
      u32 val = 1000000;
      setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
      
      To be effectively paced, a flow must use FQ packet scheduler.
      
      Note that a packet scheduler takes into account the headers for its
      computations. The effective payload rate depends on MSS and retransmits
      if any.
      
      I chose to make this pacing rate a SOL_SOCKET option instead of a
      TCP one because this can be used by other protocols.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Steinar H. Gunderson <sesse@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62748f32
  17. 25 9月, 2013 1 次提交
  18. 01 8月, 2013 1 次提交
  19. 13 7月, 2013 1 次提交
    • A
      Safer ABI for O_TMPFILE · bb458c64
      Al Viro 提交于
      [suggested by Rasmus Villemoes] make O_DIRECTORY | O_RDWR part of O_TMPFILE;
      that will fail on old kernels in a lot more cases than what I came up with.
      And make sure O_CREAT doesn't get there...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bb458c64
  20. 11 7月, 2013 2 次提交
  21. 29 6月, 2013 2 次提交
  22. 20 6月, 2013 1 次提交
  23. 19 6月, 2013 2 次提交
  24. 18 6月, 2013 1 次提交