提交 · b9fa6d6ee9b88eedf2d2261f931841664aa6980e · openeuler / Kernel

05 2月, 2019 16 次提交

KVM: s390: fix possible null pointer dereference in pending_irqs() · b9fa6d6e

由 Michael Mueller 提交于 1月 31, 2019

Assure a GISA is in use before accessing the IPM to avoid a
null pointer dereference issue.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reported-by: NHalil Pasic <pasic@linux.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Message-Id: <20190131085247.13826-16-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

b9fa6d6e

KVM: s390: start using the GIB · b1d1e76e

由 Michael Mueller 提交于 1月 31, 2019

By initializing the GIB, it will be used by the kvm host.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-15-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

b1d1e76e

KVM: s390: add gib_alert_irq_handler() · 9f30f621

由 Michael Mueller 提交于 1月 31, 2019

The patch implements a handler for GIB alert interruptions
on the host. Its task is to alert guests that interrupts are
pending for them.

A GIB alert interrupt statistic counter is added as well:

$ cat /proc/interrupts
          CPU0       CPU1
  ...
  GAL:      23         37   [I/O] GIB Alert
  ...
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Acked-by: NHalil Pasic <pasic@linux.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Message-Id: <20190131085247.13826-14-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

9f30f621

KVM: s390: kvm_s390_gisa_clear() now clears the IPM only · 174dd4f8

由 Michael Mueller 提交于 1月 31, 2019

Function kvm_s390_gisa_clear() now clears the Interruption
Pending Mask of the GISA asap. If the GISA is in the alert
list at this time it stays in the list but is removed by
process_gib_alert_list().
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Acked-by: NHalil Pasic <pasic@linux.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Message-Id: <20190131085247.13826-13-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

174dd4f8

KVM: s390: add functions to (un)register GISC with GISA · 6cff2e10

由 Michael Mueller 提交于 1月 31, 2019

Add the Interruption Alert Mask (IAM) to the architecture specific
kvm struct. This mask in the GISA is used to define for which ISC
a GIB alert will be issued.

The functions kvm_s390_gisc_register() and kvm_s390_gisc_unregister()
are used to (un)register a GISC (guest ISC) with a virtual machine and
its GISA.

Upon successful completion, kvm_s390_gisc_register() returns the
ISC to be used for GIB alert interruptions. A negative return code
indicates an error during registration.

Theses functions will be used by other adapter types like AP and PCI to
request pass-through interruption support.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Acked-by: NPierre Morel <pmorel@linux.ibm.com>
Acked-by: NHalil Pasic <pasic@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Message-Id: <20190131085247.13826-12-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

6cff2e10

KVM: s390: add kvm reference to struct sie_page2 · 25c84dba

由 Michael Mueller 提交于 1月 31, 2019

Adding the kvm reference to struct sie_page2 will allow to
determine the kvm a given gisa belongs to:

  container_of(gisa, struct sie_page2, gisa)->kvm

This functionality will be required to process a gisa in
gib alert interruption context.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NHalil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-11-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

25c84dba

KVM: s390: add the GIB and its related life-cyle functions · 1282c21e

由 Michael Mueller 提交于 1月 31, 2019

The Guest Information Block (GIB) links the GISA of all guests
that have adapter interrupts pending. These interrupts cannot be
delivered because all vcpus of these guests are currently in WAIT
state or have masked the respective Interruption Sub Class (ISC).
If enabled, a GIB alert is issued on the host to schedule these
guests to run suitable vcpus to consume the pending interruptions.

This mechanism allows to process adapter interrupts for currently
not running guests.

The GIB is created during host initialization and associated with
the Adapter Interruption Facility in case an Adapter Interruption
Virtualization Facility is available.

The GIB initialization and thus the activation of the related code
will be done in an upcoming patch of this series.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Acked-by: NHalil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-10-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

1282c21e

s390/cio: add function chsc_sgib() · 3dec1922

由 Michael Mueller 提交于 1月 31, 2019

This patch implements the Set Guest Information Block operation
to request association or disassociation of a Guest Information
Block (GIB) with the Adapter Interruption Facility. The operation
is required to receive GIB alert interrupts for guest adapters
in conjunction with AIV and GISA.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NSebastian Ott <sebott@linux.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NHalil Pasic <pasic@linux.ibm.com>
Acked-by: NJanosch Frank <frankja@linux.ibm.com>
Acked-by: NCornelia Huck <cohuck@redhat.com>
Message-Id: <20190131085247.13826-9-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

3dec1922

KVM: s390: introduce struct kvm_s390_gisa_interrupt · 982cff42

由 Michael Mueller 提交于 1月 31, 2019

Use this struct analog to the kvm interruption structs
for kvm emulated floating and local interruptions.

GIB handling will add further fields to this structure as
required.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NHalil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-8-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

982cff42

KVM: s390: remove kvm_s390_ from gisa static inline functions · bb2fb8cd

由 Michael Mueller 提交于 1月 31, 2019

This will shorten the length of code lines. All GISA related
static inline functions are local to interrupt.c.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-7-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

bb2fb8cd

KVM: s390: use pending_irqs_no_gisa() where appropriate · 96723d32

由 Michael Mueller 提交于 1月 31, 2019

Interruption types that are not represented in GISA shall
use pending_irqs_no_gisa() to test pending interruptions.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-6-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

96723d32

KVM: s390: coding style kvm_s390_gisa_init/clear() · 672128bf

由 Michael Mueller 提交于 1月 31, 2019

The change helps to reduce line length and
increases code readability.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Reviewed-by: NHalil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-5-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

672128bf

KVM: s390: move bitmap idle_mask into arch struct top level · 246b7218

由 Michael Mueller 提交于 1月 31, 2019

The vcpu idle_mask state is used by but not specific
to the emulated floating interruptions. The state is
relevant to gisa related interruptions as well.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Acked-by: NHalil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-4-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

246b7218

KVM: s390: make bitmap declaration consistent · 689bdf9e

由 Michael Mueller 提交于 1月 31, 2019

Use a consistent bitmap declaration throughout the code.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NHalil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-3-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

689bdf9e

KVM: s390: drop obsolete else path · b7d45571

由 Michael Mueller 提交于 1月 31, 2019

The explicit else path specified in set_intercept_indicators_io
is not required as the function returns in case the first branch
is taken anyway.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NHalil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-2-mimu@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

b7d45571

KVM: s390: clarify kvm related kernel message · 8d43d570

由 Michael Mueller 提交于 12月 10, 2018

As suggested by our ID dept. here are some kernel message
updates.
Signed-off-by: NMichael Mueller <mimu@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

8d43d570

07 1月, 2019 1 次提交

Fix 'acccess_ok()' on alpha and SH · 94bd8a05

由 Linus Torvalds 提交于 1月 06, 2019

Commit 594cc251 ("make 'user_access_begin()' do 'access_ok()'")
broke both alpha and SH booting in qemu, as noticed by Guenter Roeck.

It turns out that the bug wasn't actually in that commit itself (which
would have been surprising: it was mostly a no-op), but in how the
addition of access_ok() to the strncpy_from_user() and strnlen_user()
functions now triggered the case where those functions would test the
access of the very last byte of the user address space.

The string functions actually did that user range test before too, but
they did it manually by just comparing against user_addr_max().  But
with user_access_begin() doing the check (using "access_ok()"), it now
exposed problems in the architecture implementations of that function.

For example, on alpha, the access_ok() helper macro looked like this:

  #define __access_ok(addr, size) \
        ((get_fs().seg & (addr | size | (addr+size))) == 0)

and what it basically tests is of any of the high bits get set (the
USER_DS masking value is 0xfffffc0000000000).

And that's completely wrong for the "addr+size" check.  Because it's
off-by-one for the case where we check to the very end of the user
address space, which is exactly what the strn*_user() functions do.

Why? Because "addr+size" will be exactly the size of the address space,
so trying to access the last byte of the user address space will fail
the __access_ok() check, even though it shouldn't.  As a result, the
user string accessor functions failed consistently - because they
literally don't know how long the string is going to be, and the max
access is going to be that last byte of the user address space.

Side note: that alpha macro is buggy for another reason too - it re-uses
the arguments twice.

And SH has another version of almost the exact same bug:

  #define __addr_ok(addr) \
        ((unsigned long __force)(addr) < current_thread_info()->addr_limit.seg)

so far so good: yes, a user address must be below the limit.  But then:

  #define __access_ok(addr, size)         \
        (__addr_ok((addr) + (size)))

is wrong with the exact same off-by-one case: the case when "addr+size"
is exactly _equal_ to the limit is actually perfectly fine (think "one
byte access at the last address of the user address space")

The SH version is actually seriously buggy in another way: it doesn't
actually check for overflow, even though it did copy the _comment_ that
talks about overflow.

So it turns out that both SH and alpha actually have completely buggy
implementations of access_ok(), but they happened to work in practice
(although the SH overflow one is a serious serious security bug, not
that anybody likely cares about SH security).

This fixes the problems by using a similar macro on both alpha and SH.
It isn't trying to be clever, the end address is based on this logic:

        unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b;

which basically says "add start and length, and then subtract one unless
the length was zero".  We can't subtract one for a zero length, or we'd
just hit an underflow instead.

For a lot of access_ok() users the length is a constant, so this isn't
actually as expensive as it initially looks.
Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

94bd8a05

06 1月, 2019 9 次提交

kbuild: use assignment instead of define ... endef for filechk_* rules · ba97df45

由 Masahiro Yamada 提交于 1月 03, 2019

You do not have to use define ... endef for filechk_* rules.

For simple cases, the use of assignment looks cleaner, IMHO.

I updated the usage for scripts/Kbuild.include in case somebody
misunderstands the 'define ... endif' is the requirement.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

ba97df45

arch: remove redundant UAPI generic-y defines · d6e4b3e3

由 Masahiro Yamada 提交于 1月 03, 2019

Now that Kbuild automatically creates asm-generic wrappers for missing
mandatory headers, it is redundant to list the same headers in
generic-y and mandatory-y.
Suggested-by: NSam Ravnborg <sam@ravnborg.org>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: NSam Ravnborg <sam@ravnborg.org>

d6e4b3e3

arch: remove stale comments "UAPI Header export list" · d4ce5458

由 Masahiro Yamada 提交于 1月 03, 2019

These comments are leftovers of commit fcc8487d ("uapi: export all
headers under uapi directories").

Prior to that commit, exported headers must be explicitly added to
header-y. Now, all headers under the uapi/ directories are exported.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

d4ce5458

riscv: remove redundant kernel-space generic-y · 8c4fa8b8

由 Masahiro Yamada 提交于 1月 03, 2019

This commit removes redundant generic-y defines in
arch/riscv/include/asm/Kbuild.

[1] It is redundant to define the same generic-y in both
    arch/$(ARCH)/include/asm/Kbuild and
    arch/$(ARCH)/include/uapi/asm/Kbuild.

    Remove the following generic-y:

      errno.h
      fcntl.h
      ioctl.h
      ioctls.h
      ipcbuf.h
      mman.h
      msgbuf.h
      param.h
      poll.h
      posix_types.h
      resource.h
      sembuf.h
      setup.h
      shmbuf.h
      signal.h
      socket.h
      sockios.h
      stat.h
      statfs.h
      swab.h
      termbits.h
      termios.h
      types.h

[2] It is redundant to define generic-y when arch-specific
    implementation exists in arch/$(ARCH)/include/asm/*.h

    Remove the following generic-y:

      cacheflush.h
      module.h
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

8c4fa8b8

kbuild: change filechk to surround the given command with { } · ad774086

由 Masahiro Yamada 提交于 12月 31, 2018

filechk_* rules often consist of multiple 'echo' lines. They must be
surrounded with { } or ( ) to work correctly. Otherwise, only the
string from the last 'echo' would be written into the target.

Let's take care of that in the 'filechk' in scripts/Kbuild.include
to clean up filechk_* rules.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

ad774086

kbuild: remove redundant target cleaning on failure · 172caf19

由 Masahiro Yamada 提交于 12月 31, 2018

Since commit 9c2af1c7 ("kbuild: add .DELETE_ON_ERROR special
target"), the target file is automatically deleted on failure.

The boilerplate code

  ... || { rm -f $@; false; }

is unneeded.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

172caf19

jump_label: move 'asm goto' support test to Kconfig · e9666d10

由 Masahiro Yamada 提交于 12月 31, 2018

Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".

The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:

  #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
  # define HAVE_JUMP_LABEL
  #endif

We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.

Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>

e9666d10

nds32: remove redundant kernel-space generic-y · 5c0ab286

由 Masahiro Yamada 提交于 12月 16, 2018

This commit removes redundant generic-y defines in
arch/nds32/include/asm/Kbuild.

[1] It is redundant to define the same generic-y in both
    arch/$(ARCH)/include/asm/Kbuild and
    arch/$(ARCH)/include/uapi/asm/Kbuild.

    Remove the following generic-y:

      bitsperlong.h
      bpf_perf_event.h
      errno.h
      fcntl.h
      ioctl.h
      ioctls.h
      mman.h
      shmbuf.h
      stat.h

[2] It is redundant to define generic-y when arch-specific
    implementation exists in arch/$(ARCH)/include/asm/*.h

    Remove the following generic-y:

      ftrace.h
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

5c0ab286

nios2: remove unneeded HAS_DMA define · fd8658b5

由 Masahiro Yamada 提交于 11月 26, 2018

kernel/dma/Kconfig globally defines HAS_DMA as follows:

  config HAS_DMA
          bool
          depends on !NO_DMA
          default y
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

fd8658b5

05 1月, 2019 14 次提交

x86/amd_gart: fix unmapping of non-GART mappings · 06f55fd2

由 Christoph Hellwig 提交于 1月 04, 2019

In many cases we don't have to create a GART mapping at all, which
also means there is nothing to unmap.  Fix the range check that was
incorrectly modified when removing the mapping_error method.

Fixes: 9e8aa6b5 ("x86/amd_gart: remove the mapping_error dma_map_ops method")
Reported-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMichal Kubecek <mkubecek@suse.cz>

06f55fd2

ia64: fix compile without swiotlb · 3fed6ae4

由 Christoph Hellwig 提交于 1月 04, 2019

Some non-generic ia64 configs don't build swiotlb, and thus should not
pull in the generic non-coherent DMA infrastructure.

Fixes: 68c60834 ("swiotlb: remove dma_mark_clean")
Reported-by: NTony Luck <tony.luck@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3fed6ae4

x86: re-introduce non-generic memcpy_{to,from}io · 170d13ca

由 Linus Torvalds 提交于 1月 04, 2019

This has been broken forever, and nobody ever really noticed because
it's purely a performance issue.

Long long ago, in commit 6175ddf0 ("x86: Clean up mem*io functions")
Brian Gerst simplified the memory copies to and from iomem, since on
x86, the instructions to access iomem are exactly the same as the
regular instructions.

That is technically true, and things worked, and nobody said anything.
Besides, back then the regular memcpy was pretty simple and worked fine.

Nobody noticed except for David Laight, that is. David has a testing a
TLP monitor he was writing for an FPGA, and has been occasionally
complaining about how memcpy_toio() writes things one byte at a time.

Which is completely unacceptable from a performance standpoint, even if
it happens to technically work.

The reason it's writing one byte at a time is because while it's
technically true that accesses to iomem are the same as accesses to
regular memory on x86, the _granularity_ (and ordering) of accesses
matter to iomem in ways that they don't matter to regular cached memory.

In particular, when ERMS is set, we default to using "rep movsb" for
larger memory copies. That is indeed perfectly fine for real memory,
since the whole point is that the CPU is going to do cacheline
optimizations and executes the memory copy efficiently for cached
memory.

With iomem? Not so much. With iomem, "rep movsb" will indeed work, but
it will copy things one byte at a time. Slowly and ponderously.

Now, originally, back in 2010 when commit 6175ddf0 was done, we
didn't use ERMS, and this was much less noticeable.

Our normal memcpy() was simpler in other ways too.

Because in fact, it's not just about using the string instructions. Our
memcpy() these days does things like "read and write overlapping values"
to handle the last bytes of the copy. Again, for normal memory,
overlapping accesses isn't an issue. For iomem? It can be.

So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and
memset_io() functions. It doesn't particularly optimize them, but it
tries to at least not be horrid, or do overlapping accesses. In fact,
this uses the existing __inline_memcpy() function that we still had
lying around that uses our very traditional "rep movsl" loop followed by
movsw/movsb for the final bytes.

Somebody may decide to try to improve on it, but if we've gone almost a
decade with only one person really ever noticing and complaining, maybe
it's not worth worrying about further, once it's not _completely_ broken?
Reported-by: NDavid Laight <David.Laight@aculab.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

170d13ca

Use __put_user_goto in __put_user_size() and unsafe_put_user() · a959dc88

由 Linus Torvalds 提交于 5月 21, 2016

This actually enables the __put_user_goto() functionality in
unsafe_put_user().

For an example of the effect of this, this is the code generated for the

        unsafe_put_user(signo, &infop->si_signo, Efault);

in the waitid() system call:

	movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_2]

It's just one single store instruction, along with generating an
exception table entry pointing to the Efault label case in case that
instruction faults.

Before, we would generate this:

	xorl    %edx, %edx
	movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_3]
        testl   %edx, %edx
        jne     .L309

with the exception table generated for that 'mov' instruction causing us
to jump to a stub that set %edx to -EFAULT and then jumped back to the
'testl' instruction.

So not only do we now get rid of the extra code in the normal sequence,
we also avoid unnecessarily keeping that extra error register live
across it all.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a959dc88

x86 uaccess: Introduce __put_user_goto · 4a789213

由 Linus Torvalds 提交于 5月 21, 2016

This is finally the actual reason for the odd error handling in the
"unsafe_get/put_user()" functions, introduced over three years ago.

Using a "jump to error label" interface is somewhat odd, but very
convenient as a programming interface, and more importantly, it fits
very well with simply making the target be the exception handler address
directly from the inline asm.

The reason it took over three years to actually do this? We need "asm
goto" support for it, which only became the default on x86 last year.
It's now been a year that we've forced asm goto support (see commit
e501ce95 "x86: Force asm-goto"), and so let's just do it here too.

[ Side note: this commit was originally done back in 2016. The above
  commentary about timing is obviously about it only now getting merged
  into my real upstream tree     - Linus ]

Sadly, gcc still only supports "asm goto" with asms that do not have any
outputs, so we are limited to only the put_user case for this.  Maybe in
several more years we can do the get_user case too.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4a789213

parisc: Remap hugepage-aligned pages in set_kernel_text_rw() · dfbaecb2

由 Helge Deller 提交于 1月 04, 2019

The alternative coding patch for parisc in kernel 4.20 broke booting
machines with PA8500-PA8700 CPUs. The problem is, that for such machines
the parisc kernel automatically utilizes huge pages to access kernel
text code, but the set_kernel_text_rw() function, which is used shortly
before applying any alternative patches, didn't used the correctly
hugepage-aligned addresses to remap the kernel text read-writeable.

Fixes: 3847dab7 ("parisc: Add alternative coding infrastructure")
Cc: <stable@vger.kernel.org> [4.20]
Signed-off-by: NHelge Deller <deller@gmx.de>

dfbaecb2

ARM: multi_v7_defconfig: enable CONFIG_UNIPHIER_MDMAC · 8e564895

由 Masahiro Yamada 提交于 12月 21, 2018

Enable the UniPhier MIO DMAC driver. This is used as the DMA engine
for accelerating the SD/eMMC controller drivers.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NOlof Johansson <olof@lixom.net>

8e564895

arch/arc/mm/fault.c: remove caller signal_pending_branch predictions · d8d7d842

由 Davidlohr Bueso 提交于 1月 03, 2019

This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-4-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dave@stgolabs.net>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8d7d842

mm: select HAVE_MOVE_PMD on x86 for faster mremap · 9f132f7e

由 Joel Fernandes (Google) 提交于 1月 03, 2019

Moving page-tables at the PMD-level on x86 is known to be safe. Enable
this option so that we can do fast mremap when possible.

Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f132f7e

mm: speed up mremap by 20x on large regions · 2c91bd4a

由 Joel Fernandes (Google) 提交于 1月 03, 2019

Android needs to mremap large regions of memory during memory management
related operations.  The mremap system call can be really slow if THP is
not enabled.  The bottleneck is move_page_tables, which is copying each
pte at a time, and can be really slow across a large map.  Turning on
THP may not be a viable option, and is not for us.  This patch speeds up
the performance for non-THP system by copying at the PMD level when
possible.

The speedup is an order of magnitude on x86 (~20x).  On a 1GB mremap,
the mremap completion times drops from 3.4-3.6 milliseconds to 144-160
microseconds.

Before:
Total mremap time for 1GB data: 3521942 nanoseconds.
Total mremap time for 1GB data: 3449229 nanoseconds.
Total mremap time for 1GB data: 3488230 nanoseconds.

After:
Total mremap time for 1GB data: 150279 nanoseconds.
Total mremap time for 1GB data: 144665 nanoseconds.
Total mremap time for 1GB data: 158708 nanoseconds.

If THP is enabled the optimization is mostly skipped except in certain
situations.

[joel@joelfernandes.org: fix 'move_normal_pmd' unused function warning]
  Link: http://lkml.kernel.org/r/20181108224457.GB209347@google.com
Link: http://lkml.kernel.org/r/20181108181201.88826-3-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2c91bd4a

mm: treewide: remove unused address argument from pte_alloc functions · 4cf58924

由 Joel Fernandes (Google) 提交于 1月 03, 2019

Patch series "Add support for fast mremap".

This series speeds up the mremap(2) syscall by copying page tables at
the PMD level even for non-THP systems.  There is concern that the extra
'address' argument that mremap passes to pte_alloc may do something
subtle architecture related in the future that may make the scheme not
work.  Also we find that there is no point in passing the 'address' to
pte_alloc since its unused.  This patch therefore removes this argument
tree-wide resulting in a nice negative diff as well.  Also ensuring
along the way that the enabled architectures do not do anything funky
with the 'address' argument that goes unnoticed by the optimization.

Build and boot tested on x86-64.  Build tested on arm64.  The config
enablement patch for arm64 will be posted in the future after more
testing.

The changes were obtained by applying the following Coccinelle script.
(thanks Julia for answering all Coccinelle questions!).
Following fix ups were done manually:
* Removal of address argument from  pte_fragment_alloc
* Removal of pte_alloc_one_fast definitions from m68k and microblaze.

// Options: --include-headers --no-includes
// Note: I split the 'identifier fn' line, so if you are manually
// running it, please unsplit it so it runs for you.

virtual patch

@pte_alloc_func_def depends on patch exists@
identifier E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
type T2;
@@

 fn(...
- , T2 E2
 )
 { ... }

@pte_alloc_func_proto_noarg depends on patch exists@
type T1, T2, T3, T4;
identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@

(
- T3 fn(T1, T2);
+ T3 fn(T1);
|
- T3 fn(T1, T2, T4);
+ T3 fn(T1, T2);
)

@pte_alloc_func_proto depends on patch exists@
identifier E1, E2, E4;
type T1, T2, T3, T4;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@

(
- T3 fn(T1 E1, T2 E2);
+ T3 fn(T1 E1);
|
- T3 fn(T1 E1, T2 E2, T4 E4);
+ T3 fn(T1 E1, T2 E2);
)

@pte_alloc_func_call depends on patch exists@
expression E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@

 fn(...
-,  E2
 )

@pte_alloc_macro depends on patch exists@
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
identifier a, b, c;
expression e;
position p;
@@

(
- #define fn(a, b, c) e
+ #define fn(a, b) e
|
- #define fn(a, b) e
+ #define fn(a) e
)

Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4cf58924

fls: change parameter to unsigned int · 3fc2579e

由 Matthew Wilcox 提交于 1月 03, 2019

When testing in userspace, UBSAN pointed out that shifting into the sign
bit is undefined behaviour.  It doesn't really make sense to ask for the
highest set bit of a negative value, so just turn the argument type into
an unsigned int.

Some architectures (eg ppc) already had it declared as an unsigned int,
so I don't expect too many problems.

Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3fc2579e

make 'user_access_begin()' do 'access_ok()' · 594cc251

由 Linus Torvalds 提交于 1月 04, 2019

Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.

But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar.  Which makes it very hard to verify that the access has
actually been range-checked.

If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin().  But
nothing really forces the range check.

By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses.  We have way too long a history of people
trying to avoid them.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

594cc251

Fix access_ok() fallout for sparc32 and powerpc · 4caf4ebf

由 Linus Torvalds 提交于 1月 04, 2019

These two architectures actually had an intentional use of the 'type'
argument to access_ok() just to avoid warnings.

I had actually noticed the powerpc one, but forgot to then fix it up.
And I missed the sparc32 case entirely.

This is hopefully all of it.
Reported-by: NMathieu Malaterre <malat@debian.org>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Fixes: 96d4f267 ("Remove 'type' argument from access_ok() function")
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4caf4ebf

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功