1. 05 6月, 2018 1 次提交
  2. 29 5月, 2018 1 次提交
    • S
      tracing: Do not reference event data in post call triggers · c94e45bc
      Steven Rostedt (VMware) 提交于
      Trace event triggers can be called before or after the event has been
      committed. If it has been called after the commit, there's a possibility
      that the event no longer exists. Currently, the two post callers is the
      trigger to disable tracing (traceoff) and the one that will record a stack
      dump (stacktrace). Neither of them reference the trace event entry record,
      as that would lead to a race condition that could pass in corrupted data.
      
      To prevent any other users of the post data triggers from using the trace
      event record, pass in NULL to the post call trigger functions for the event
      record, as they should never need to use them in the first place.
      
      This does not fix any bug, but prevents bugs from happening by new post call
      trigger users.
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      c94e45bc
  3. 03 5月, 2018 1 次提交
  4. 29 4月, 2018 1 次提交
    • A
      <linux/stringhash.h>: fix end_name_hash() for 64bit long · 19b9ad67
      Amir Goldstein 提交于
      The comment claims that this helper will try not to loose bits, but for
      64bit long it looses the high bits before hashing 64bit long into 32bit
      int.  Use the helper hash_long() to do the right thing for 64bit long.
      For 32bit long, there is no change.
      
      All the callers of end_name_hash() either assign the result to
      qstr->hash, which is u32 or return the result as an int value (e.g.
      full_name_hash()).  Change the helper return type to int to conform to
      its users.
      
      [ It took me a while to apply this, because my initial reaction to it
        was - incorrectly - that it could make for slower code.
      
        After having looked more at it, I take back all my complaints about
        the patch, Amir was right and I was mis-reading things or just being
        stupid.
      
        I also don't worry too much about the possible performance impact of
        this on 64-bit, since most architectures that actually care about
        performance end up not using this very much (the dcache code is the
        most performance-critical, but the word-at-a-time case uses its own
        hashing anyway).
      
        So this ends up being mostly used for filesystems that do their own
        degraded hashing (usually because they want a case-insensitive
        comparison function).
      
        A _tiny_ worry remains, in that not everybody uses DCACHE_WORD_ACCESS,
        and then this potentially makes things more expensive on 64-bit
        architectures with slow or lacking multipliers even for the normal
        case.
      
        That said, realistically the only such architecture I can think of is
        PA-RISC. Nobody really cares about performance on that, it's more of a
        "look ma, I've got warts^W an odd machine" platform.
      
        So the patch is fine, and all my initial worries were just misplaced
        from not looking at this properly.   - Linus ]
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19b9ad67
  5. 27 4月, 2018 2 次提交
  6. 26 4月, 2018 4 次提交
    • O
      blk-mq: fix sysfs inflight counter · bf0ddaba
      Omar Sandoval 提交于
      When the blk-mq inflight implementation was added, /proc/diskstats was
      converted to use it, but /sys/block/$dev/inflight was not. Fix it by
      adding another helper to count in-flight requests by data direction.
      
      Fixes: f299b7c7 ("blk-mq: provide internal in-flight variant")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bf0ddaba
    • T
      Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME · a3ed0e43
      Thomas Gleixner 提交于
      Revert commits
      
      92af4dcb ("tracing: Unify the "boot" and "mono" tracing clocks")
      127bfa5f ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior")
      7250a404 ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior")
      d6c7270e ("timekeeping: Remove boot time specific code")
      f2d6fdbf ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior")
      d6ed449a ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock")
      72199320 ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock")
      
      As stated in the pull request for the unification of CLOCK_MONOTONIC and
      CLOCK_BOOTTIME, it was clear that we might have to revert the change.
      
      As reported by several folks systemd and other applications rely on the
      documented behaviour of CLOCK_MONOTONIC on Linux and break with the above
      changes. After resume daemons time out and other timeout related issues are
      observed. Rafael compiled this list:
      
      * systemd kills daemons on resume, after >WatchdogSec seconds
        of suspending (Genki Sky).  [Verified that that's because systemd uses
        CLOCK_MONOTONIC and expects it to not include the suspend time.]
      
      * systemd-journald misbehaves after resume:
        systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal
      corrupted or uncleanly shut down, renaming and replacing.
        (Mike Galbraith).
      
      * NetworkManager reports "networking disabled" and networking is broken
        after resume 50% of the time (Pavel).  [May be because of systemd.]
      
      * MATE desktop dims the display and starts the screensaver right after
        system resume (Pavel).
      
      * Full system hang during resume (me).  [May be due to systemd or NM or both.]
      
      That happens on debian and open suse systems.
      
      It's sad, that these problems were neither catched in -next nor by those
      folks who expressed interest in this change.
      Reported-by: NRafael J. Wysocki <rjw@rjwysocki.net>
      Reported-by: Genki Sky <sky@genki.is>,
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kevin Easton <kevin@guarana.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      a3ed0e43
    • A
      remoteproc: fix crashed parameter logic on stop call · fcd58037
      Arnaud Pouliquen 提交于
      Fix rproc_add_subdev parameter name and inverse the crashed logic.
      
      Fixes: 880f5b38 ("remoteproc: Pass type of shutdown to subdev remove")
      Reviewed-by: NAlex Elder <elder@linaro.org>
      Signed-off-by: NArnaud Pouliquen <arnaud.pouliquen@st.com>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      fcd58037
    • M
      virtio: add ability to iterate over vqs · 24a7e4d2
      Michael S. Tsirkin 提交于
      For cleanup it's helpful to be able to simply scan all vqs and discard
      all data. Add an iterator to do that.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      24a7e4d2
  7. 25 4月, 2018 2 次提交
  8. 24 4月, 2018 3 次提交
  9. 23 4月, 2018 3 次提交
  10. 21 4月, 2018 3 次提交
    • A
      kasan: add no_sanitize attribute for clang builds · 12c8f25a
      Andrey Konovalov 提交于
      KASAN uses the __no_sanitize_address macro to disable instrumentation of
      particular functions.  Right now it's defined only for GCC build, which
      causes false positives when clang is used.
      
      This patch adds a definition for clang.
      
      Note, that clang's revision 329612 or higher is required.
      
      [andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
        Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
      Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Paul Lawrence <paullawrence@google.com>
      Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12c8f25a
    • G
      writeback: safer lock nesting · 2e898e4c
      Greg Thelen 提交于
      lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
      the page's memcg is undergoing move accounting, which occurs when a
      process leaves its memcg for a new one that has
      memory.move_charge_at_immigrate set.
      
      unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
      the given inode is switching writeback domains.  Switches occur when
      enough writes are issued from a new domain.
      
      This existing pattern is thus suspicious:
          lock_page_memcg(page);
          unlocked_inode_to_wb_begin(inode, &locked);
          ...
          unlocked_inode_to_wb_end(inode, locked);
          unlock_page_memcg(page);
      
      If both inode switch and process memcg migration are both in-flight then
      unlocked_inode_to_wb_end() will unconditionally enable interrupts while
      still holding the lock_page_memcg() irq spinlock.  This suggests the
      possibility of deadlock if an interrupt occurs before unlock_page_memcg().
      
          truncate
          __cancel_dirty_page
          lock_page_memcg
          unlocked_inode_to_wb_begin
          unlocked_inode_to_wb_end
          <interrupts mistakenly enabled>
                                          <interrupt>
                                          end_page_writeback
                                          test_clear_page_writeback
                                          lock_page_memcg
                                          <deadlock>
          unlock_page_memcg
      
      Due to configuration limitations this deadlock is not currently possible
      because we don't mix cgroup writeback (a cgroupv2 feature) and
      memory.move_charge_at_immigrate (a cgroupv1 feature).
      
      If the kernel is hacked to always claim inode switching and memcg
      moving_account, then this script triggers lockup in less than a minute:
      
        cd /mnt/cgroup/memory
        mkdir a b
        echo 1 > a/memory.move_charge_at_immigrate
        echo 1 > b/memory.move_charge_at_immigrate
        (
          echo $BASHPID > a/cgroup.procs
          while true; do
            dd if=/dev/zero of=/mnt/big bs=1M count=256
          done
        ) &
        while true; do
          sync
        done &
        sleep 1h &
        SLEEP=$!
        while true; do
          echo $SLEEP > a/cgroup.procs
          echo $SLEEP > b/cgroup.procs
        done
      
      The deadlock does not seem possible, so it's debatable if there's any
      reason to modify the kernel.  I suggest we should to prevent future
      surprises.  And Wang Long said "this deadlock occurs three times in our
      environment", so there's more reason to apply this, even to stable.
      Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
      see "[PATCH for-4.4] writeback: safer lock nesting"
      https://lkml.org/lkml/2018/4/11/146
      
      Wang Long said "this deadlock occurs three times in our environment"
      
      [gthelen@google.com: v4]
        Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
      [akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
      Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
      Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
      Fixes: 682aa8e1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Reported-by: NWang Long <wanglong19@meituan.com>
      Acked-by: NWang Long <wanglong19@meituan.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>	[v4.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e898e4c
    • K
      fork: unconditionally clear stack on fork · e01e8063
      Kees Cook 提交于
      One of the classes of kernel stack content leaks[1] is exposing the
      contents of prior heap or stack contents when a new process stack is
      allocated.  Normally, those stacks are not zeroed, and the old contents
      remain in place.  In the face of stack content exposure flaws, those
      contents can leak to userspace.
      
      Fixing this will make the kernel no longer vulnerable to these flaws, as
      the stack will be wiped each time a stack is assigned to a new process.
      There's not a meaningful change in runtime performance; it almost looks
      like it provides a benefit.
      
      Performing back-to-back kernel builds before:
      	Run times: 157.86 157.09 158.90 160.94 160.80
      	Mean: 159.12
      	Std Dev: 1.54
      
      and after:
      	Run times: 159.31 157.34 156.71 158.15 160.81
      	Mean: 158.46
      	Std Dev: 1.46
      
      Instead of making this a build or runtime config, Andy Lutomirski
      recommended this just be enabled by default.
      
      [1] A noisy search for many kinds of stack content leaks can be seen here:
      https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak
      
      I did some more with perf and cycle counts on running 100,000 execs of
      /bin/true.
      
      before:
      Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
      Mean:  221015379122.60
      Std Dev: 4662486552.47
      
      after:
      Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
      Mean:  217745009865.40
      Std Dev: 5935559279.99
      
      It continues to look like it's faster, though the deviation is rather
      wide, but I'm not sure what I could do that would be less noisy.  I'm
      open to ideas!
      
      Link: http://lkml.kernel.org/r/20180221021659.GA37073@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e01e8063
  11. 20 4月, 2018 1 次提交
    • R
      fsnotify: Fix fsnotify_mark_connector race · d90a10e2
      Robert Kolchmeyer 提交于
      fsnotify() acquires a reference to a fsnotify_mark_connector through
      the SRCU-protected pointer to_tell->i_fsnotify_marks. However, it
      appears that no precautions are taken in fsnotify_put_mark() to
      ensure that fsnotify() drops its reference to this
      fsnotify_mark_connector before assigning a value to its 'destroy_next'
      field. This can result in fsnotify_put_mark() assigning a value
      to a connector's 'destroy_next' field right before fsnotify() tries to
      traverse the linked list referenced by the connector's 'list' field.
      Since these two fields are members of the same union, this behavior
      results in a kernel panic.
      
      This issue is resolved by moving the connector's 'destroy_next' field
      into the object pointer union. This should work since the object pointer
      access is protected by both a spinlock and the value of the 'flags'
      field, and the 'flags' field is cleared while holding the spinlock in
      fsnotify_put_mark() before 'destroy_next' is updated. It shouldn't be
      possible for another thread to accidentally read from the object pointer
      after the 'destroy_next' field is updated.
      
      The offending behavior here is extremely unlikely; since
      fsnotify_put_mark() removes references to a connector (specifically,
      it ensures that the connector is unreachable from the inode it was
      formerly attached to) before updating its 'destroy_next' field, a
      sizeable chunk of code in fsnotify_put_mark() has to execute in the
      short window between when fsnotify() acquires the connector reference
      and saves the value of its 'list' field. On the HEAD kernel, I've only
      been able to reproduce this by inserting a udelay(1) in fsnotify().
      However, I've been able to reproduce this issue without inserting a
      udelay(1) anywhere on older unmodified release kernels, so I believe
      it's worth fixing at HEAD.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=199437
      Fixes: 08991e83
      CC: stable@vger.kernel.org
      Signed-off-by: NRobert Kolchmeyer <rkolchmeyer@google.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      d90a10e2
  12. 19 4月, 2018 4 次提交
  13. 18 4月, 2018 2 次提交
    • D
      block: add blk_queue_fua() helper function · 0ce91444
      Dave Chinner 提交于
      So we can check FUA support status from the iomap direct IO code.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0ce91444
    • T
      vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi · 7ce23672
      Toshiaki Makita 提交于
      Syzkaller spotted an old bug which leads to reading skb beyond tail by 4
      bytes on vlan tagged packets.
      This is caused because skb_vlan_tagged_multi() did not check
      skb_headlen.
      
      BUG: KMSAN: uninit-value in eth_type_vlan include/linux/if_vlan.h:283 [inline]
      BUG: KMSAN: uninit-value in skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline]
      BUG: KMSAN: uninit-value in vlan_features_check include/linux/if_vlan.h:672 [inline]
      BUG: KMSAN: uninit-value in dflt_features_check net/core/dev.c:2949 [inline]
      BUG: KMSAN: uninit-value in netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009
      CPU: 1 PID: 3582 Comm: syzkaller435149 Not tainted 4.16.0+ #82
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x185/0x1d0 lib/dump_stack.c:53
        kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
        __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
        eth_type_vlan include/linux/if_vlan.h:283 [inline]
        skb_vlan_tagged_multi include/linux/if_vlan.h:656 [inline]
        vlan_features_check include/linux/if_vlan.h:672 [inline]
        dflt_features_check net/core/dev.c:2949 [inline]
        netif_skb_features+0xd1b/0xdc0 net/core/dev.c:3009
        validate_xmit_skb+0x89/0x1320 net/core/dev.c:3084
        __dev_queue_xmit+0x1cb2/0x2b60 net/core/dev.c:3549
        dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
        packet_snd net/packet/af_packet.c:2944 [inline]
        packet_sendmsg+0x7c57/0x8a10 net/packet/af_packet.c:2969
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg net/socket.c:640 [inline]
        sock_write_iter+0x3b9/0x470 net/socket.c:909
        do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
        do_iter_write+0x30d/0xd40 fs/read_write.c:932
        vfs_writev fs/read_write.c:977 [inline]
        do_writev+0x3c9/0x830 fs/read_write.c:1012
        SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
        SyS_writev+0x56/0x80 fs/read_write.c:1082
        do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x43ffa9
      RSP: 002b:00007fff2cff3948 EFLAGS: 00000217 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ffa9
      RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003
      RBP: 00000000006cb018 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004018d0
      R13: 0000000000401960 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
        kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
        kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
        kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
        slab_post_alloc_hook mm/slab.h:445 [inline]
        slab_alloc_node mm/slub.c:2737 [inline]
        __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
        __kmalloc_reserve net/core/skbuff.c:138 [inline]
        __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
        alloc_skb include/linux/skbuff.h:984 [inline]
        alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
        sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
        packet_alloc_skb net/packet/af_packet.c:2803 [inline]
        packet_snd net/packet/af_packet.c:2894 [inline]
        packet_sendmsg+0x6444/0x8a10 net/packet/af_packet.c:2969
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg net/socket.c:640 [inline]
        sock_write_iter+0x3b9/0x470 net/socket.c:909
        do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
        do_iter_write+0x30d/0xd40 fs/read_write.c:932
        vfs_writev fs/read_write.c:977 [inline]
        do_writev+0x3c9/0x830 fs/read_write.c:1012
        SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
        SyS_writev+0x56/0x80 fs/read_write.c:1082
        do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 58e998c6 ("offloading: Force software GSO for multiple vlan tags.")
      Reported-and-tested-by: syzbot+0bbe42c764feafa82c5a@syzkaller.appspotmail.com
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ce23672
  14. 17 4月, 2018 6 次提交
  15. 16 4月, 2018 1 次提交
  16. 14 4月, 2018 5 次提交
    • P
      kernel/kexec_file.c: move purgatories sha256 to common code · df6f2801
      Philipp Rudo 提交于
      The code to verify the new kernels sha digest is applicable for all
      architectures.  Move it to common code.
      
      One problem is the string.c implementation on x86.  Currently sha256
      includes x86/boot/string.h which defines memcpy and memset to be gcc
      builtins.  By moving the sha256 implementation to common code and
      changing the include to linux/string.h both functions are no longer
      defined.  Thus definitions have to be provided in x86/purgatory/string.c
      
      Link: http://lkml.kernel.org/r/20180321112751.22196-12-prudo@linux.vnet.ibm.comSigned-off-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Acked-by: NDave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df6f2801
    • P
      kernel/kexec_file.c: allow archs to set purgatory load address · 3be3f61d
      Philipp Rudo 提交于
      For s390 new kernels are loaded to fixed addresses in memory before they
      are booted.  With the current code this is a problem as it assumes the
      kernel will be loaded to an 'arbitrary' address.  In particular,
      kexec_locate_mem_hole searches for a large enough memory region and sets
      the load address (kexec_bufer->mem) to it.
      
      Luckily there is a simple workaround for this problem.  By returning 1
      in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off.  This
      allows the architecture to set kbuf->mem by hand.  While the trick works
      fine for the kernel it does not for the purgatory as here the
      architectures don't have access to its kexec_buffer.
      
      Give architectures access to the purgatories kexec_buffer by changing
      kexec_load_purgatory to take a pointer to it.  With this change
      architectures have access to the buffer and can edit it as they need.
      
      A nice side effect of this change is that we can get rid of the
      purgatory_info->purgatory_load_address field.  As now the information
      stored there can directly be accessed from kbuf->mem.
      
      Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.comSigned-off-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: NDave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3be3f61d
    • P
      kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations* · 8aec395b
      Philipp Rudo 提交于
      When the relocations are applied to the purgatory only the section the
      relocations are applied to is writable.  The other sections, i.e.  the
      symtab and .rel/.rela, are in read-only kexec_purgatory.  Highlight this
      by marking the corresponding variables as 'const'.
      
      While at it also change the signatures of arch_kexec_apply_relocations* to
      take section pointers instead of just the index of the relocation section.
      This removes the second lookup and sanity check of the sections in arch
      code.
      
      Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.comSigned-off-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Acked-by: NDave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8aec395b
    • P
      kernel/kexec_file.c: make purgatory_info->ehdr const · 65c225d3
      Philipp Rudo 提交于
      The kexec_purgatory buffer is read-only.  Thus all pointers into
      kexec_purgatory are read-only, too.  Point this out by explicitly
      marking purgatory_info->ehdr as 'const' and update the comments in
      purgatory_info.
      
      Link: http://lkml.kernel.org/r/20180321112751.22196-4-prudo@linux.vnet.ibm.comSigned-off-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Acked-by: NDave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65c225d3
    • P
      include/linux/kexec.h: silence compile warnings · ee6ebeda
      Philipp Rudo 提交于
      Patch series "kexec_file: Clean up purgatory load", v2.
      
      Following the discussion with Dave and AKASHI, here are the common code
      patches extracted from my recent patch set (Add kexec_file_load support
      to s390) [1].  The patches were extracted to allow upstream integration
      together with AKASHI's common code patches before the arch code gets
      adjusted to the new base.
      
      The reason for this series is to prepare common code for adding
      kexec_file_load to s390 as well as cleaning up the mis-use of the
      sh_offset field during purgatory load.  In detail this series contains:
      
      Patch #1&2: Minor cleanups/fixes.
      
      Patch #3-9: Clean up the purgatory load/relocation code.  Especially
      remove the mis-use of the purgatory_info->sechdrs->sh_offset field,
      currently holding a pointer into either kexec_purgatory (ro) or
      purgatory_buf (rw) depending on the section.  With these patches the
      section address will be calculated verbosely and sh_offset will contain
      the offset of the section in the stripped purgatory binary
      (purgatory_buf).
      
      Patch #10: Allows architectures to set the purgatory load address.  This
      patch is important for s390 as the kernel and purgatory have to be
      loaded to fixed addresses.  In current code this is impossible as the
      purgatory load is opaque to the architecture.
      
      Patch #11: Moves x86 purgatories sha implementation to common lib/
      directory to allow reuse in other architectures.
      
      This patch (of 11)
      
      When building the kernel with CONFIG_KEXEC_FILE enabled gcc prints a
      compile warning multiple times.
      
        In file included from <path>/linux/init/initramfs.c:526:0:
        <path>/include/linux/kexec.h:120:9: warning: `struct kimage' declared inside parameter list [enabled by default]
                 unsigned long cmdline_len);
                 ^
      
      This is because the typedefs for kexec_file_load uses struct kimage
      before it is declared.  Fix this by simply forward declaring struct
      kimage.
      
      Link: http://lkml.kernel.org/r/20180321112751.22196-2-prudo@linux.vnet.ibm.comSigned-off-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com>
      Acked-by: NDave Young <dyoung@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee6ebeda