提交 · ed2d265d1266736bd294332d7f649003943ae36e · openeuler / raspberrypi-kernel

24 3月, 2012 2 次提交

coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP · accb61fe

由 Jason Baron 提交于 3月 23, 2012

Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
for 'VM_NODUMP' flag.  The idea is is to add a new madvise() flag:
MADV_DONTDUMP, which can be set by applications to specifically request
memory regions which should not dump core.

The specific application I have in mind is qemu: we can add a flag there
that wouldn't dump all of guest memory when qemu dumps core.  This flag
might also be useful for security sensitive apps that want to absolutely
make sure that parts of memory are not dumped.  To clear the flag use:
MADV_DODUMP.

[akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
[akpm@linux-foundation.org: fix up the architectures which broke]
Signed-off-by: NJason Baron <jbaron@redhat.com>
Acked-by: NRoland McGrath <roland@hack.frob.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

accb61fe

consolidate WARN_...ONCE() static variables · 7ccaba53

由 Jan Beulich 提交于 3月 23, 2012

Due to the alignment of following variables, these typically consume
more than just the single byte that 'bool' requires, and as there are a
few hundred instances, the cache pollution (not so much the waste of
memory) sums up.  Put these variables into their own section, outside of
any half way frequently used memory range.

Do the same also to the __warned variable of rcu_lockdep_assert().
(Don't, however, include the ones used by printk_once() and alike, as
they can potentially be hot.)
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ccaba53

22 3月, 2012 1 次提交

mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode · 1a5a9906

由 Andrea Arcangeli 提交于 3月 21, 2012

In some cases it may happen that pmd_none_or_clear_bad() is called with
the mmap_sem hold in read mode.  In those cases the huge page faults can
allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a
false positive from pmd_bad() that will not like to see a pmd
materializing as trans huge.

It's not khugepaged causing the problem, khugepaged holds the mmap_sem
in write mode (and all those sites must hold the mmap_sem in read mode
to prevent pagetables to go away from under them, during code review it
seems vm86 mode on 32bit kernels requires that too unless it's
restricted to 1 thread per process or UP builds).  The race is only with
the huge pagefaults that can convert a pmd_none() into a
pmd_trans_huge().

Effectively all these pmd_none_or_clear_bad() sites running with
mmap_sem in read mode are somewhat speculative with the page faults, and
the result is always undefined when they run simultaneously.  This is
probably why it wasn't common to run into this.  For example if the
madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page
fault, the hugepage will not be zapped, if the page fault runs first it
will be zapped.

Altering pmd_bad() not to error out if it finds hugepmds won't be enough
to fix this, because zap_pmd_range would then proceed to call
zap_pte_range (which would be incorrect if the pmd become a
pmd_trans_huge()).

The simplest way to fix this is to read the pmd in the local stack
(regardless of what we read, no need of actual CPU barriers, only
compiler barrier needed), and be sure it is not changing under the code
that computes its value.  Even if the real pmd is changing under the
value we hold on the stack, we don't care.  If we actually end up in
zap_pte_range it means the pmd was not none already and it was not huge,
and it can't become huge from under us (khugepaged locking explained
above).

All we need is to enforce that there is no way anymore that in a code
path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad
can run into a hugepmd.  The overhead of a barrier() is just a compiler
tweak and should not be measurable (I only added it for THP builds).  I
don't exclude different compiler versions may have prevented the race
too by caching the value of *pmd on the stack (that hasn't been
verified, but it wouldn't be impossible considering
pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines
and there's no external function called in between pmd_trans_huge and
pmd_none_or_clear_bad).

		if (pmd_trans_huge(*pmd)) {
			if (next-addr != HPAGE_PMD_SIZE) {
				VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
				split_huge_page_pmd(vma->vm_mm, pmd);
			} else if (zap_huge_pmd(tlb, vma, pmd, addr))
				continue;
			/* fall through */
		}
		if (pmd_none_or_clear_bad(pmd))

Because this race condition could be exercised without special
privileges this was reported in CVE-2012-1179.

The race was identified and fully explained by Ulrich who debugged it.
I'm quoting his accurate explanation below, for reference.

====== start quote =======
      mapcount 0 page_mapcount 1
      kernel BUG at mm/huge_memory.c:1384!

    At some point prior to the panic, a "bad pmd ..." message similar to the
    following is logged on the console:

      mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7).

    The "bad pmd ..." message is logged by pmd_clear_bad() before it clears
    the page's PMD table entry.

        143 void pmd_clear_bad(pmd_t *pmd)
        144 {
    ->  145         pmd_ERROR(*pmd);
        146         pmd_clear(pmd);
        147 }

    After the PMD table entry has been cleared, there is an inconsistency
    between the actual number of PMD table entries that are mapping the page
    and the page's map count (_mapcount field in struct page). When the page
    is subsequently reclaimed, __split_huge_page() detects this inconsistency.

       1381         if (mapcount != page_mapcount(page))
       1382                 printk(KERN_ERR "mapcount %d page_mapcount %d\n",
       1383                        mapcount, page_mapcount(page));
    -> 1384         BUG_ON(mapcount != page_mapcount(page));

    The root cause of the problem is a race of two threads in a multithreaded
    process. Thread B incurs a page fault on a virtual address that has never
    been accessed (PMD entry is zero) while Thread A is executing an madvise()
    system call on a virtual address within the same 2 MB (huge page) range.

               virtual address space
              .---------------------.
              |                     |
              |                     |
            .-|---------------------|
            | |                     |
            | |                     |<-- B(fault)
            | |                     |
      2 MB  | |/////////////////////|-.
      huge <  |/////////////////////|  > A(range)
      page  | |/////////////////////|-'
            | |                     |
            | |                     |
            '-|---------------------|
              |                     |
              |                     |
              '---------------------'

    - Thread A is executing an madvise(..., MADV_DONTNEED) system call
      on the virtual address range "A(range)" shown in the picture.

    sys_madvise
      // Acquire the semaphore in shared mode.
      down_read(&current->mm->mmap_sem)
      ...
      madvise_vma
        switch (behavior)
        case MADV_DONTNEED:
             madvise_dontneed
               zap_page_range
                 unmap_vmas
                   unmap_page_range
                     zap_pud_range
                       zap_pmd_range
                         //
                         // Assume that this huge page has never been accessed.
                         // I.e. content of the PMD entry is zero (not mapped).
                         //
                         if (pmd_trans_huge(*pmd)) {
                             // We don't get here due to the above assumption.
                         }
                         //
                         // Assume that Thread B incurred a page fault and
             .---------> // sneaks in here as shown below.
             |           //
             |           if (pmd_none_or_clear_bad(pmd))
             |               {
             |                 if (unlikely(pmd_bad(*pmd)))
             |                     pmd_clear_bad
             |                     {
             |                       pmd_ERROR
             |                         // Log "bad pmd ..." message here.
             |                       pmd_clear
             |                         // Clear the page's PMD entry.
             |                         // Thread B incremented the map count
             |                         // in page_add_new_anon_rmap(), but
             |                         // now the page is no longer mapped
             |                         // by a PMD entry (-> inconsistency).
             |                     }
             |               }
             |
             v
    - Thread B is handling a page fault on virtual address "B(fault)" shown
      in the picture.

    ...
    do_page_fault
      __do_page_fault
        // Acquire the semaphore in shared mode.
        down_read_trylock(&mm->mmap_sem)
        ...
        handle_mm_fault
          if (pmd_none(*pmd) && transparent_hugepage_enabled(vma))
              // We get here due to the above assumption (PMD entry is zero).
              do_huge_pmd_anonymous_page
                alloc_hugepage_vma
                  // Allocate a new transparent huge page here.
                ...
                __do_huge_pmd_anonymous_page
                  ...
                  spin_lock(&mm->page_table_lock)
                  ...
                  page_add_new_anon_rmap
                    // Here we increment the page's map count (starts at -1).
                    atomic_set(&page->_mapcount, 0)
                  set_pmd_at
                    // Here we set the page's PMD entry which will be cleared
                    // when Thread A calls pmd_clear_bad().
                  ...
                  spin_unlock(&mm->page_table_lock)

    The mmap_sem does not prevent the race because both threads are acquiring
    it in shared mode (down_read).  Thread B holds the page_table_lock while
    the page's map count and PMD table entry are updated.  However, Thread A
    does not synchronize on that lock.

====== end quote =======

[akpm@linux-foundation.org: checkpatch fixes]
Reported-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Jones <davej@redhat.com>
Acked-by: NLarry Woodman <lwoodman@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>		[2.6.38+]
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1a5a9906

05 3月, 2012 1 次提交

BUG: headers with BUG/BUG_ON etc. need linux/bug.h · 187f1882

由 Paul Gortmaker 提交于 11月 23, 2011

If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
other BUG variant in a static inline (i.e. not in a #define) then
that header really should be including <linux/bug.h> and not just
expecting it to be implicitly present.

We can make this change risk-free, since if the files using these
headers didn't have exposure to linux/bug.h already, they would have
been causing compile failures/warnings.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

187f1882

27 2月, 2012 1 次提交

[PARISC] fix compile break caused by iomap: make IOPORT/PCI mapping functions conditional · 97a29d59

由 James Bottomley 提交于 1月 30, 2012

The problem in

commit fea80311
Author: Randy Dunlap <rdunlap@xenotime.net>
Date:   Sun Jul 24 11:39:14 2011 -0700

    iomap: make IOPORT/PCI mapping functions conditional

is that if your architecture supplies pci_iomap/pci_iounmap, it expects
always to supply them.  Adding empty body defitions in the !CONFIG_PCI
case, which is what this patch does, breaks the parisc compile because
the functions become doubly defined.  It took us a while to spot this,
because we don't actually build !CONFIG_PCI very often (only if someone
is brave enough to test the snake/asp machines).

Since the note in the commit log says this is to fix a
CONFIG_GENERIC_IOMAP issue (which it does because CONFIG_GENERIC_IOMAP
supplies pci_iounmap only if CONFIG_PCI is set), there should actually
have been a condition upon this.  This should make sure no other
architecture's !CONFIG_PCI compile breaks in the same way as parisc.

The fix had to be updated to take account of the GENERIC_PCI_IOMAP
separation.
Reported-by: NRolf Eike Beer <eike@sf-mail.de>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

97a29d59

25 2月, 2012 2 次提交

epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() · d80e731e

由 Oleg Nesterov 提交于 2月 24, 2012

This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

	- we do not have wake_up_all_poll() but wake_up_poll()
	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
	  we need a couple of simple changes in eventpoll.c to
	  make sure it can't be "lost".
Reported-by: NMaxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d80e731e

bitops: Add missing parentheses to new get_order macro · b893485d

由 Joerg Roedel 提交于 2月 24, 2012

The new get_order macro introcuded in commit

d66acc39

does not use parentheses around all uses of the parameter n.
This causes new compile warnings, for example in the
amd_iommu_init.c function:

drivers/iommu/amd_iommu_init.c:561:6: warning: suggest parentheses around comparison in operand of ‘&’ [-Wparentheses]
drivers/iommu/amd_iommu_init.c:561:6: warning: suggest parentheses around comparison in operand of ‘&’ [-Wparentheses]

Fix those warnings by adding the missing parentheses.
Reported-by: NIngo Molnar <mingo@elte.hu>
Cc: David Howells <dhowells@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1330088295-28732-1-git-send-email-joerg.roedel@amd.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b893485d

24 2月, 2012 4 次提交

net: Add framework to allow sending packets with customized CRC. · 3bdc0eba

由 Ben Greear 提交于 2月 11, 2012

This is useful for testing RX handling of frames with bad
CRCs.

Requires driver support to actually put the packet on the
wire properly.
Signed-off-by: NBen Greear <greearb@candelatech.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

3bdc0eba

PCI: collapse pcibios_resource_to_bus · fb127cb9

由 Bjorn Helgaas 提交于 2月 23, 2012

Everybody uses the generic pcibios_resource_to_bus() supplied by the core
now, so remove the ARCH_HAS_GENERIC_PCI_OFFSETS used during conversion.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

fb127cb9

PCI: add generic pcibios_resource_to_bus() · 36a66cd6

由 Bjorn Helgaas 提交于 2月 23, 2012

This replaces the generic versions of pcibios_resource_to_bus() and
pcibios_bus_to_resource() in asm-generic/pci.h with versions that use
pci_resource_to_bus() and pci_bus_to_resource().

The replacements are equivalent except that they can apply host
bridge window offsets when the arch has supplied them by using
pci_add_resource_offset().

Each arch can convert to using pci_add_resource_offset() individually by
removing its device resource fixups from pcibios_fixup_bus() and supplying
ARCH_HAS_GENERIC_PCI_OFFSETS.  ARCH_HAS_GENERIC_PCI_OFFSETS can be removed
after all have converted.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

36a66cd6

PCI: add pci_clear_flags() · dcce6dc4

由 Bjorn Helgaas 提交于 2月 23, 2012

Add a pci_clear_flags() for cases when we statically initialize
pci_flags, then decide to clear things out later.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

dcce6dc4

22 2月, 2012 2 次提交

asm-generic: architecture independent readq/writeq for 32bit environment · 797a796a

由 Hitoshi Mitake 提交于 2月 07, 2012

This provides unified readq()/writeq() helper functions for 32-bit
drivers.

For some cases, readq/writeq without atomicity is harmful, and order of
io access has to be specified explicitly.  So in this patch, new two
header files which contain non-atomic readq/writeq are added.

 - <asm-generic/io-64-nonatomic-lo-hi.h> provides non-atomic readq/
   writeq with the order of lower address -> higher address

 - <asm-generic/io-64-nonatomic-hi-lo.h> provides non-atomic readq/
   writeq with reversed order

This allows us to remove some readq()s that were added drivers when the
default non-atomic ones were removed in commit dbee8a0a ("x86:
remove 32-bit versions of readq()/writeq()")

The drivers which need readq/writeq but can do with the non-atomic ones
must add the line:

  #include <asm-generic/io-64-nonatomic-lo-hi.h> /* or hi-lo.h */

But this will be nop in 64-bit environments, and no other #ifdefs are
required.  So I believe that this patch can solve the problem of
 1. driver-specific readq/writeq
 2. atomicity and order of io access

This patch is tested with building allyesconfig and allmodconfig as
ARCH=x86 and ARCH=i386 on top of tip/master.

Cc: Kashyap Desai <Kashyap.Desai@lsi.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NHitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

797a796a

sock: Introduce the SO_PEEK_OFF sock option · ef64a54f

由 Pavel Emelyanov 提交于 2月 21, 2012

This one specifies where to start MSG_PEEK-ing queue data from. When
set to negative value means that MSG_PEEK works as ususally -- peeks
from the head of the queue always.

When some bytes are peeked from queue and the peeking offset is non
negative it is moved forward so that the next peek will return next
portion of data.

When non-peeking recvmsg occurs and the peeking offset is non negative
is is moved backward so that the next peek will still peek the proper
data (i.e. the one that would have been picked if there were no non
peeking recv in between).

The offset is set using per-proto opteration to let the protocol handle
the locking issues and to check whether the peeking offset feature is
supported by the protocol the socket belongs to.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef64a54f

21 2月, 2012 2 次提交

bitops: Optimise get_order() · d66acc39

由 David Howells 提交于 2月 20, 2012

Optimise get_order() to use bit scanning instructions if such exist rather than
a loop.  Also, make it possible to use get_order() in static initialisations
too by building it on top of ilog2() in the constant parameter case.

This has been tested for i386 and x86_64 using the following userspace program,
and for FRV by making appropriate substitutions for fls() and fls64().  It will
abort if the case for get_order() deviates from the original except for the
order of 0, for which get_order() produces an undefined result.  This program
tests both dynamic and static parameters.

	#include <stdlib.h>
	#include <stdio.h>

	#ifdef __x86_64__
	#define BITS_PER_LONG 64
	#else
	#define BITS_PER_LONG 32
	#endif

	#define PAGE_SHIFT 12

	typedef unsigned long long __u64, u64;
	typedef unsigned int __u32, u32;
	#define noinline	__attribute__((noinline))

	static inline int fls(int x)
	{
		int bitpos = -1;

		asm("bsrl %1,%0"
		    : "+r" (bitpos)
		    : "rm" (x));
		return bitpos + 1;
	}

	static __always_inline int fls64(__u64 x)
	{
	#if BITS_PER_LONG == 64
		long bitpos = -1;

		asm("bsrq %1,%0"
		    : "+r" (bitpos)
		    : "rm" (x));
		return bitpos + 1;
	#else
		__u32 h = x >> 32, l = x;
		int bitpos = -1;

		asm("bsrl	%1,%0	\n"
		    "subl	%2,%0	\n"
		    "bsrl	%3,%0	\n"
		    : "+r" (bitpos)
		    : "rm" (l), "i"(32), "rm" (h));

		return bitpos + 33;
	#endif
	}

	static inline __attribute__((const))
	int __ilog2_u32(u32 n)
	{
		return fls(n) - 1;
	}

	static inline __attribute__((const))
	int __ilog2_u64(u64 n)
	{
		return fls64(n) - 1;
	}

	extern __attribute__((const, noreturn))
	int ____ilog2_NaN(void);

	#define ilog2(n)				\
	(						\
		__builtin_constant_p(n) ? (		\
			(n) < 1 ? ____ilog2_NaN() :	\
			(n) & (1ULL << 63) ? 63 :	\
			(n) & (1ULL << 62) ? 62 :	\
			(n) & (1ULL << 61) ? 61 :	\
			(n) & (1ULL << 60) ? 60 :	\
			(n) & (1ULL << 59) ? 59 :	\
			(n) & (1ULL << 58) ? 58 :	\
			(n) & (1ULL << 57) ? 57 :	\
			(n) & (1ULL << 56) ? 56 :	\
			(n) & (1ULL << 55) ? 55 :	\
			(n) & (1ULL << 54) ? 54 :	\
			(n) & (1ULL << 53) ? 53 :	\
			(n) & (1ULL << 52) ? 52 :	\
			(n) & (1ULL << 51) ? 51 :	\
			(n) & (1ULL << 50) ? 50 :	\
			(n) & (1ULL << 49) ? 49 :	\
			(n) & (1ULL << 48) ? 48 :	\
			(n) & (1ULL << 47) ? 47 :	\
			(n) & (1ULL << 46) ? 46 :	\
			(n) & (1ULL << 45) ? 45 :	\
			(n) & (1ULL << 44) ? 44 :	\
			(n) & (1ULL << 43) ? 43 :	\
			(n) & (1ULL << 42) ? 42 :	\
			(n) & (1ULL << 41) ? 41 :	\
			(n) & (1ULL << 40) ? 40 :	\
			(n) & (1ULL << 39) ? 39 :	\
			(n) & (1ULL << 38) ? 38 :	\
			(n) & (1ULL << 37) ? 37 :	\
			(n) & (1ULL << 36) ? 36 :	\
			(n) & (1ULL << 35) ? 35 :	\
			(n) & (1ULL << 34) ? 34 :	\
			(n) & (1ULL << 33) ? 33 :	\
			(n) & (1ULL << 32) ? 32 :	\
			(n) & (1ULL << 31) ? 31 :	\
			(n) & (1ULL << 30) ? 30 :	\
			(n) & (1ULL << 29) ? 29 :	\
			(n) & (1ULL << 28) ? 28 :	\
			(n) & (1ULL << 27) ? 27 :	\
			(n) & (1ULL << 26) ? 26 :	\
			(n) & (1ULL << 25) ? 25 :	\
			(n) & (1ULL << 24) ? 24 :	\
			(n) & (1ULL << 23) ? 23 :	\
			(n) & (1ULL << 22) ? 22 :	\
			(n) & (1ULL << 21) ? 21 :	\
			(n) & (1ULL << 20) ? 20 :	\
			(n) & (1ULL << 19) ? 19 :	\
			(n) & (1ULL << 18) ? 18 :	\
			(n) & (1ULL << 17) ? 17 :	\
			(n) & (1ULL << 16) ? 16 :	\
			(n) & (1ULL << 15) ? 15 :	\
			(n) & (1ULL << 14) ? 14 :	\
			(n) & (1ULL << 13) ? 13 :	\
			(n) & (1ULL << 12) ? 12 :	\
			(n) & (1ULL << 11) ? 11 :	\
			(n) & (1ULL << 10) ? 10 :	\
			(n) & (1ULL <<  9) ?  9 :	\
			(n) & (1ULL <<  8) ?  8 :	\
			(n) & (1ULL <<  7) ?  7 :	\
			(n) & (1ULL <<  6) ?  6 :	\
			(n) & (1ULL <<  5) ?  5 :	\
			(n) & (1ULL <<  4) ?  4 :	\
			(n) & (1ULL <<  3) ?  3 :	\
			(n) & (1ULL <<  2) ?  2 :	\
			(n) & (1ULL <<  1) ?  1 :	\
			(n) & (1ULL <<  0) ?  0 :	\
			____ilog2_NaN()			\
					   ) :		\
		(sizeof(n) <= 4) ?			\
		__ilog2_u32(n) :			\
		__ilog2_u64(n)				\
	 )

	static noinline __attribute__((const))
	int old_get_order(unsigned long size)
	{
		int order;

		size = (size - 1) >> (PAGE_SHIFT - 1);
		order = -1;
		do {
			size >>= 1;
			order++;
		} while (size);
		return order;
	}

	static noinline __attribute__((const))
	int __get_order(unsigned long size)
	{
		int order;
		size--;
		size >>= PAGE_SHIFT;
	#if BITS_PER_LONG == 32
		order = fls(size);
	#else
		order = fls64(size);
	#endif
		return order;
	}

	#define get_order(n)						\
	(								\
		__builtin_constant_p(n) ? (				\
			(n == 0UL) ? BITS_PER_LONG - PAGE_SHIFT :	\
			((n < (1UL << PAGE_SHIFT)) ? 0 :		\
			 ilog2((n) - 1) - PAGE_SHIFT + 1)		\
		) :							\
		__get_order(n)						\
	)

	#define order(N) \
		{ (1UL << N) - 1,	get_order((1UL << N) - 1)	},	\
		{ (1UL << N),		get_order((1UL << N))		},	\
		{ (1UL << N) + 1,	get_order((1UL << N) + 1)	}

	struct order {
		unsigned long n, order;
	};

	static const struct order order_table[] = {
		order(0),
		order(1),
		order(2),
		order(3),
		order(4),
		order(5),
		order(6),
		order(7),
		order(8),
		order(9),
		order(10),
		order(11),
		order(12),
		order(13),
		order(14),
		order(15),
		order(16),
		order(17),
		order(18),
		order(19),
		order(20),
		order(21),
		order(22),
		order(23),
		order(24),
		order(25),
		order(26),
		order(27),
		order(28),
		order(29),
		order(30),
		order(31),
	#if BITS_PER_LONG == 64
		order(32),
		order(33),
		order(34),
		order(35),
	#endif
		{ 0x2929 }
	};

	void check(int loop, unsigned long n)
	{
		unsigned long old, new;

		printf("[%2d]: %09lx | ", loop, n);

		old = old_get_order(n);
		new = get_order(n);

		printf("%3ld, %3ld\n", old, new);
		if (n != 0 && old != new)
			abort();
	}

	int main(int argc, char **argv)
	{
		const struct order *p;
		unsigned long n;
		int loop;

		for (loop = 0; loop <= BITS_PER_LONG - 1; loop++) {
			n = 1UL << loop;
			check(loop, n - 1);
			check(loop, n);
			check(loop, n + 1);
		}

		for (p = order_table; p->n != 0x2929; p++) {
			unsigned long old, new;

			old = old_get_order(p->n);
			new = p->order;
			printf("%09lx\t%3ld, %3ld\n", p->n, old, new);
			if (p->n != 0 && old != new)
				abort();
		}

		return 0;
	}

Disassembling the x86_64 version of the above code shows:

	0000000000400510 <old_get_order>:
	  400510:       48 83 ef 01             sub    $0x1,%rdi
	  400514:       b8 ff ff ff ff          mov    $0xffffffff,%eax
	  400519:       48 c1 ef 0b             shr    $0xb,%rdi
	  40051d:       0f 1f 00                nopl   (%rax)
	  400520:       83 c0 01                add    $0x1,%eax
	  400523:       48 d1 ef                shr    %rdi
	  400526:       75 f8                   jne    400520 <old_get_order+0x10>
	  400528:       f3 c3                   repz retq
	  40052a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

	0000000000400530 <__get_order>:
	  400530:       48 83 ef 01             sub    $0x1,%rdi
	  400534:       48 c7 c0 ff ff ff ff    mov    $0xffffffffffffffff,%rax
	  40053b:       48 c1 ef 0c             shr    $0xc,%rdi
	  40053f:       48 0f bd c7             bsr    %rdi,%rax
	  400543:       83 c0 01                add    $0x1,%eax
	  400546:       c3                      retq
	  400547:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
	  40054e:       00 00

As can be seen, the new __get_order() function is simpler than the
old_get_order() function.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120220223928.16199.29548.stgit@warthog.procyon.org.ukAcked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

d66acc39

bitops: Adjust the comment on get_order() to describe the size==0 case · e0891a98

由 David Howells 提交于 2月 20, 2012

Adjust the comment on get_order() to note that the result of passing a size of
0 results in an undefined value.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120220223917.16199.9416.stgit@warthog.procyon.org.ukAcked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

e0891a98

01 2月, 2012 1 次提交

lib: add NO_GENERIC_PCI_IOPORT_MAP · b923650b

由 Michael S. Tsirkin 提交于 1月 30, 2012

Some architectures need to override the way
IO port mapping is done on PCI devices.
Supply a generic macro that calls
ioport_map, and make it possible for architectures
to override.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>

b923650b

13 1月, 2012 1 次提交

thp: add tlb_remove_pmd_tlb_entry · f21760b1

由 Shaohua Li 提交于 1月 12, 2012

We have tlb_remove_tlb_entry to indicate a pte tlb flush entry should be
flushed, but not a corresponding API for pmd entry.  This isn't a
problem so far because THP is only for x86 currently and tlb_flush()
under x86 will flush entire TLB.  But this is confusion and could be
missed if thp is ported to other arch.

Also convert tlb->need_flush = 1 to a VM_BUG_ON(!tlb->need_flush) in
__tlb_remove_page() as suggested by Andrea Arcangeli.  The
__tlb_remove_page() function is supposed to be called after
tlb_remove_xxx_tlb_entry() and we can catch any misuse.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f21760b1

05 1月, 2012 1 次提交

GPIO: add bindings for managed devices · 1a0703ed

由 John Crispin 提交于 12月 20, 2011

This patch adds 2 functions that allow managed devices to request GPIOs.
These GPIOs will then be managed by drivers/base/devres.c.
Signed-off-by: NJohn Crispin <blogic@openwrt.org>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

1a0703ed

04 1月, 2012 1 次提交
- A
  consolidate umode_t declarations · 0583fcc9
  由 Al Viro 提交于 7月 26, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  0583fcc9
30 12月, 2011 1 次提交

procfs: do not confuse jiffies with cputime64_t · 34845636

由 Andreas Schwab 提交于 12月 28, 2011

Commit 2a95ea6c ("procfs: do not overflow get_{idle,iowait}_time
for nohz") did not take into account that one some architectures jiffies
and cputime use different units.

This causes get_idle_time() to return numbers in the wrong units, making
the idle time fields in /proc/stat wrong.

Instead of converting the usec value returned by
get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
usecs_to_cputime64 to convert it to the correct unit of cputime64_t.
Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Artem S. Tashkinov" <t.artem@mailcity.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

34845636

15 12月, 2011 1 次提交

[S390] cputime: add sparse checking and cleanup · 64861634

由 Martin Schwidefsky 提交于 12月 15, 2011

Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

64861634

13 12月, 2011 3 次提交

UAPI: Rearrange definition of HZ in asm-generic/param.h · 3d7f1dc1

由 David Howells 提交于 12月 13, 2011

Rearrange the definition of HZ in asm-generic/param.h so that the user-specific
is declared before the kernel-specific one.  We then explicitly #undef the
userspace HZ value and replace it with the kernel HZ value.

This allows the userspace params to be excised into a separate header as part
of the UAPI header split.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>

3d7f1dc1

of: create of_phandle_args to simplify return of phandle parsing data · 15c9a0ac

由 Grant Likely 提交于 12月 12, 2011

of_parse_phandle_with_args() needs to return quite a bit of data.  Rather
than making each datum a separate **out_ argument, this patch creates
struct of_phandle_args to contain all the returned data and reworks the
user of the function.  This patch also enables of_parse_phandle_with_args()
to return the device node pointer for the phandle node.

This patch also ends up being fairly major surgery to
of_parse_handle_with_args().  The existing structure didn't work well
when extending to use of_phandle_args, and I discovered bugs during testing.
I also took the opportunity to rename the function to be like the
existing of_parse_phandle().

v2: - moved declaration of of_phandle_args to fix compile on non-DT builds
    - fixed incorrect index in example usage
    - fixed incorrect return code handling for empty entries
Reviewed-by: NShawn Guo <shawn.guo@freescale.com>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

15c9a0ac

gpio/powerpc: Eliminate duplication of of_get_named_gpio_flags() · 1a2d397a

由 Grant Likely 提交于 12月 12, 2011

A large chunk of qe_pin_request() is unnecessarily cut-and-paste
directly from of_get_named_gpio_flags().  This patch cuts out the
duplicate code and replaces it with a call to of_get_gpio().

v2: fixed compile error due to missing gpio_to_chip()
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>

1a2d397a

04 12月, 2011 1 次提交

asm-generic/unistd.h: support new process_vm_{readv,write} syscalls · a67ba43d

由 Chris Metcalf 提交于 12月 01, 2011

Also prototype the "compat" functions so they can be referenced
from C code.
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>

a67ba43d

29 11月, 2011 1 次提交

lib: add GENERIC_PCI_IOMAP · 66eab4df

由 Michael S. Tsirkin 提交于 11月 24, 2011

Many architectures want a generic pci_iomap but
not the rest of iomap.c. Split that to a separate .c
file and add a new config symbol. select automatically
by GENERIC_IOMAP.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

66eab4df

10 11月, 2011 1 次提交

net: add wireless TX status socket option · 6e3e939f

由 Johannes Berg 提交于 11月 09, 2011

The 802.1X EAPOL handshake hostapd does requires
knowing whether the frame was ack'ed by the peer.
Currently, we fudge this pretty badly by not even
transmitting the frame as a normal data frame but
injecting it with radiotap and getting the status
out of radiotap monitor as well. This is rather
complex, confuses users (mon.wlan0 presence) and
doesn't work with all hardware.

To get rid of that hack, introduce a real wifi TX
status option for data frame transmissions.

This works similar to the existing TX timestamping
in that it reflects the SKB back to the socket's
error queue with a SCM_WIFI_STATUS cmsg that has
an int indicating ACK status (0/1).

Since it is possible that at some point we will
want to have TX timestamping and wifi status in a
single errqueue SKB (there's little point in not
doing that), redefine SO_EE_ORIGIN_TIMESTAMPING
to SO_EE_ORIGIN_TXSTATUS which can collect more
than just the timestamp; keep the old constant
as an alias of course. Currently the internal APIs
don't make that possible, but it wouldn't be hard
to split them up in a way that makes it possible.

Thanks to Neil Horman for helping me figure out
the functions that add the control messages.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

6e3e939f

01 11月, 2011 5 次提交

Add extra arch overrides to asm-generic/checksum.h · 4e29198e

由 Linas Vepstas 提交于 10月 31, 2011

There are plausible reasons for architectures to provide their own
versions of csum_partial_copy_nocheck and csum_tcpudp_magic.
By protecting these, the architecture can still re-use the
asm-generic checksum.h, instead of copying it.
Signed-off-by: NLinas Vepstas <linas@codeaurora.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NRichard Kuo <rkuo@codeaurora.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e29198e

Hexagon: Add locking types and functions · dd472da3

由 Richard Kuo 提交于 10月 31, 2011

Signed-off-by: NRichard Kuo <rkuo@codeaurora.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd472da3

treewide: use __printf not __attribute__((format(printf,...))) · b9075fa9

由 Joe Perches 提交于 10月 31, 2011

Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
  grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
  xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9075fa9

include/asm-generic/page.h: calculate virt_to_page and page_to_virt via predefined macro · 06d5e032

由 Sonic Zhang 提交于 10月 31, 2011

On NOMMU architectures, if physical memory doesn't start from 0,
ARCH_PFN_OFFSET is defined to generate page index in mem_map array.
Because virtual address is equal to physical address, PAGE_OFFSET is
always 0.  virt_to_page and page_to_virt should not index page by
PAGE_OFFSET directly.
Signed-off-by: NSonic Zhang <sonic.zhang@analog.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

06d5e032

dma-mapping: fix sync_single_range_* DMA debugging · 07a72309

由 Clemens Ladisch 提交于 10月 31, 2011

Commit 5fd75a78 (dma-mapping: remove unnecessary sync_single_range_*
in dma_map_ops) unified not only the dma_map_ops but also the
corresponding debug_dma_sync_* calls.  This led to spurious WARN()ings
like the following because the DMA debug code was no longer able to detect
the DMA buffer base address without the separate offset parameter:

WARNING: at lib/dma-debug.c:911 check_sync+0xce/0x446()
firewire_ohci 0000:04:00.0: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x00000000cedaa400] [size=1024 bytes]
Call Trace: ...
 [<ffffffff811326a5>] check_sync+0xce/0x446
 [<ffffffff81132ad9>] debug_dma_sync_single_for_device+0x39/0x3b
 [<ffffffffa01d6e6a>] ohci_queue_iso+0x4f3/0x77d [firewire_ohci]
 ...

To fix this, unshare the sync_single_* and sync_single_range_*
implementations so that we are able to call the correct debug_dma_sync_*
functions.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NClemens Ladisch <clemens@ladisch.de>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

07a72309

27 10月, 2011 1 次提交

gpio: fix build error in include/asm-generic/gpio.h · eb9ae7f2

由 Hamo 提交于 10月 21, 2011

Should call the platform-specific __gpio_{get,set}_value
instead of generic gpio_{get,set}_value
Signed-off-by: NYang Bai <hamo.by@gmail.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

eb9ae7f2

25 10月, 2011 1 次提交

m68k: Finally remove leftover markers sections · bc74ee97

由 Kirill Tkhai 提交于 9月 04, 2011

Markers have removed already twice:

1: fc537766
2: eb878b3b

But a little bit is still here.
Signed-off-by: NTkhai Kirill <tkhai@yandex.ru>
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>

bc74ee97

24 10月, 2011 1 次提交

gpiolib: Ensure struct gpio is always defined · feb83699

由 Mark Brown 提交于 10月 24, 2011

Currently struct gpio is only defined when using gpiolib which makes the
stub gpio_request_array() much less useful in drivers than is ideal as
they can't work with struct gpio.  Since there are no other definitions
in kernel instead make the define always available no matter if gpiolib
is selectable or selected, ensuring that drivers can always use the
type.
Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

feb83699

07 10月, 2011 3 次提交

M
add missing __iomem to generic iounmap declaration · e66d3c49
由 Mark Salter 提交于 10月 04, 2011
```
Signed-off-by: NMark Salter <msalter@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
```
e66d3c49

fixed generic page.h for non-zero PAGE_OFFSET · b7a0556e

由 Mark Salter 提交于 10月 04, 2011

asm-generic/page.h had several problems when used with a non-zero PAGE_OFFSET.
This patch adds a default ARCH_PFN_OFFSET and fixes the __va, __pa, and
pfn_valid macros to work with non-zero PAGE_OFFSETs.
Signed-off-by: NMark Salter <msalter@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>

b7a0556e

fix default __strnlen_user macro · 830f5800

由 Mark Salter 提交于 10月 04, 2011

The existing __strnlen_user macro simply resolved to strnlen. However, the
count returned by strnlen_user should include the NULL byte. This patch
fixes the __strnlen_user macro to include the NULL byte in the count.
Signed-off-by: NMark Salter <msalter@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>

830f5800

08 9月, 2011 1 次提交

cputime: Clean up cputime_to_usecs and usecs_to_cputime macros · ef0e0f5e

由 Michal Hocko 提交于 8月 24, 2011

Get rid of semicolon so that those expressions can be used also
somewhere else than just in an assignment.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Link: http://lkml.kernel.org/r/7565417ce30d7e6b1ddc169843af0777dbf66e75.1314172057.git.mhocko@suse.czSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

ef0e0f5e

27 8月, 2011 1 次提交

All Arch: remove linkage for sys_nfsservctl system call · f5b94099

由 NeilBrown 提交于 8月 26, 2011

The nfsservctl system call is now gone, so we should remove all
linkage for it.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5b94099