1. 17 1月, 2019 2 次提交
    • M
      livepatch: Remove signal sysfs attribute · 0b3d5279
      Miroslav Benes 提交于
      The fake signal is send automatically now. We can rely on it completely
      and remove the sysfs attribute.
      Signed-off-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      0b3d5279
    • M
      livepatch: Send a fake signal periodically · cba82dea
      Miroslav Benes 提交于
      An administrator may send a fake signal to all remaining blocking tasks
      of a running transition by writing to
      /sys/kernel/livepatch/<patch>/signal attribute. Let's do it
      automatically after 15 seconds. The timeout is chosen deliberately. It
      gives the tasks enough time to transition themselves.
      
      Theoretically, sending it once should be more than enough. However,
      every task must get outside of a patched function to be successfully
      transitioned. It could prove not to be simple and resending could be
      helpful in that case.
      
      A new workqueue job could be a cleaner solution to achieve it, but it
      could also introduce deadlocks and cause more headaches with
      synchronization and cancelling.
      
      [jkosina@suse.cz: removed added newline]
      Signed-off-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      cba82dea
  2. 12 1月, 2019 9 次提交
    • P
      livepatch: Remove ordering (stacking) of the livepatches · d67a5372
      Petr Mladek 提交于
      The atomic replace and cumulative patches were introduced as a more secure
      way to handle dependent patches. They simplify the logic:
      
        + Any new cumulative patch is supposed to take over shadow variables
          and changes made by callbacks from previous livepatches.
      
        + All replaced patches are discarded and the modules can be unloaded.
          As a result, there is only one scenario when a cumulative livepatch
          gets disabled.
      
      The different handling of "normal" and cumulative patches might cause
      confusion. It would make sense to keep only one mode. On the other hand,
      it would be rude to enforce using the cumulative livepatches even for
      trivial and independent (hot) fixes.
      
      However, the stack of patches is not really necessary any longer.
      The patch ordering was never clearly visible via the sysfs interface.
      Also the "normal" patches need a lot of caution anyway.
      
      Note that the list of enabled patches is still necessary but the ordering
      is not longer enforced.
      
      Otherwise, the code is ready to disable livepatches in an random order.
      Namely, klp_check_stack_func() always looks for the function from
      the livepatch that is being disabled. klp_func structures are just
      removed from the related func_stack. Finally, the ftrace handlers
      is removed only when the func_stack becomes empty.
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      d67a5372
    • P
      livepatch: Remove Nop structures when unused · d697bad5
      Petr Mladek 提交于
      Replaced patches are removed from the stack when the transition is
      finished. It means that Nop structures will never be needed again
      and can be removed. Why should we care?
      
        + Nop structures give the impression that the function is patched
          even though the ftrace handler has no effect.
      
        + Ftrace handlers do not come for free. They cause slowdown that might
          be visible in some workloads. The ftrace-related slowdown might
          actually be the reason why the function is no longer patched in
          the new cumulative patch. One would expect that cumulative patch
          would help solve these problems as well.
      
        + Cumulative patches are supposed to replace any earlier version of
          the patch. The amount of NOPs depends on which version was replaced.
          This multiplies the amount of scenarios that might happen.
      
          One might say that NOPs are innocent. But there are even optimized
          NOP instructions for different processors, for example, see
          arch/x86/kernel/alternative.c. And klp_ftrace_handler() is much
          more complicated.
      
        + It sounds natural to clean up a mess that is no longer needed.
          It could only be worse if we do not do it.
      
      This patch allows to unpatch and free the dynamic structures independently
      when the transition finishes.
      
      The free part is a bit tricky because kobject free callbacks are called
      asynchronously. We could not wait for them easily. Fortunately, we do
      not have to. Any further access can be avoided by removing them from
      the dynamic lists.
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      d697bad5
    • J
      livepatch: Add atomic replace · e1452b60
      Jason Baron 提交于
      Sometimes we would like to revert a particular fix. Currently, this
      is not easy because we want to keep all other fixes active and we
      could revert only the last applied patch.
      
      One solution would be to apply new patch that implemented all
      the reverted functions like in the original code. It would work
      as expected but there will be unnecessary redirections. In addition,
      it would also require knowing which functions need to be reverted at
      build time.
      
      Another problem is when there are many patches that touch the same
      functions. There might be dependencies between patches that are
      not enforced on the kernel side. Also it might be pretty hard to
      actually prepare the patch and ensure compatibility with the other
      patches.
      
      Atomic replace && cumulative patches:
      
      A better solution would be to create cumulative patch and say that
      it replaces all older ones.
      
      This patch adds a new "replace" flag to struct klp_patch. When it is
      enabled, a set of 'nop' klp_func will be dynamically created for all
      functions that are already being patched but that will no longer be
      modified by the new patch. They are used as a new target during
      the patch transition.
      
      The idea is to handle Nops' structures like the static ones. When
      the dynamic structures are allocated, we initialize all values that
      are normally statically defined.
      
      The only exception is "new_func" in struct klp_func. It has to point
      to the original function and the address is known only when the object
      (module) is loaded. Note that we really need to set it. The address is
      used, for example, in klp_check_stack_func().
      
      Nevertheless we still need to distinguish the dynamically allocated
      structures in some operations. For this, we add "nop" flag into
      struct klp_func and "dynamic" flag into struct klp_object. They
      need special handling in the following situations:
      
        + The structures are added into the lists of objects and functions
          immediately. In fact, the lists were created for this purpose.
      
        + The address of the original function is known only when the patched
          object (module) is loaded. Therefore it is copied later in
          klp_init_object_loaded().
      
        + The ftrace handler must not set PC to func->new_func. It would cause
          infinite loop because the address points back to the beginning of
          the original function.
      
        + The various free() functions must free the structure itself.
      
      Note that other ways to detect the dynamic structures are not considered
      safe. For example, even the statically defined struct klp_object might
      include empty funcs array. It might be there just to run some callbacks.
      
      Also note that the safe iterator must be used in the free() functions.
      Otherwise already freed structures might get accessed.
      
      Special callbacks handling:
      
      The callbacks from the replaced patches are _not_ called by intention.
      It would be pretty hard to define a reasonable semantic and implement it.
      
      It might even be counter-productive. The new patch is cumulative. It is
      supposed to include most of the changes from older patches. In most cases,
      it will not want to call pre_unpatch() post_unpatch() callbacks from
      the replaced patches. It would disable/break things for no good reasons.
      Also it should be easier to handle various scenarios in a single script
      in the new patch than think about interactions caused by running many
      scripts from older patches. Not to say that the old scripts even would
      not expect to be called in this situation.
      
      Removing replaced patches:
      
      One nice effect of the cumulative patches is that the code from the
      older patches is no longer used. Therefore the replaced patches can
      be removed. It has several advantages:
      
        + Nops' structs will no longer be necessary and might be removed.
          This would save memory, restore performance (no ftrace handler),
          allow clear view on what is really patched.
      
        + Disabling the patch will cause using the original code everywhere.
          Therefore the livepatch callbacks could handle only one scenario.
          Note that the complication is already complex enough when the patch
          gets enabled. It is currently solved by calling callbacks only from
          the new cumulative patch.
      
        + The state is clean in both the sysfs interface and lsmod. The modules
          with the replaced livepatches might even get removed from the system.
      
      Some people actually expected this behavior from the beginning. After all
      a cumulative patch is supposed to "completely" replace an existing one.
      It is like when a new version of an application replaces an older one.
      
      This patch does the first step. It removes the replaced patches from
      the list of patches. It is safe. The consistency model ensures that
      they are no longer used. By other words, each process works only with
      the structures from klp_transition_patch.
      
      The removal is done by a special function. It combines actions done by
      __disable_patch() and klp_complete_transition(). But it is a fast
      track without all the transaction-related stuff.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      [pmladek@suse.com: Split, reuse existing code, simplified]
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      e1452b60
    • J
      livepatch: Use lists to manage patches, objects and functions · 20e55025
      Jason Baron 提交于
      Currently klp_patch contains a pointer to a statically allocated array of
      struct klp_object and struct klp_objects contains a pointer to a statically
      allocated array of klp_func. In order to allow for the dynamic allocation
      of objects and functions, link klp_patch, klp_object, and klp_func together
      via linked lists. This allows us to more easily allocate new objects and
      functions, while having the iterator be a simple linked list walk.
      
      The static structures are added to the lists early. It allows to add
      the dynamically allocated objects before klp_init_object() and
      klp_init_func() calls. Therefore it reduces the further changes
      to the code.
      
      This patch does not change the existing behavior.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      [pmladek@suse.com: Initialize lists before init calls]
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      20e55025
    • P
      livepatch: Simplify API by removing registration step · 958ef1e3
      Petr Mladek 提交于
      The possibility to re-enable a registered patch was useful for immediate
      patches where the livepatch module had to stay until the system reboot.
      The improved consistency model allows to achieve the same result by
      unloading and loading the livepatch module again.
      
      Also we are going to add a feature called atomic replace. It will allow
      to create a patch that would replace all already registered patches.
      The aim is to handle dependent patches more securely. It will obsolete
      the stack of patches that helped to handle the dependencies so far.
      Then it might be unclear when a cumulative patch re-enabling is safe.
      
      It would be complicated to support the many modes. Instead we could
      actually make the API and code easier to understand.
      
      Therefore, remove the two step public API. All the checks and init calls
      are moved from klp_register_patch() to klp_enabled_patch(). Also the patch
      is automatically freed, including the sysfs interface when the transition
      to the disabled state is completed.
      
      As a result, there is never a disabled patch on the top of the stack.
      Therefore we do not need to check the stack in __klp_enable_patch().
      And we could simplify the check in __klp_disable_patch().
      
      Also the API and logic is much easier. It is enough to call
      klp_enable_patch() in module_init() call. The patch can be disabled
      by writing '0' into /sys/kernel/livepatch/<patch>/enabled. Then the module
      can be removed once the transition finishes and sysfs interface is freed.
      
      The only problem is how to free the structures and kobjects safely.
      The operation is triggered from the sysfs interface. We could not put
      the related kobject from there because it would cause lock inversion
      between klp_mutex and kernfs locks, see kn->count lockdep map.
      
      Therefore, offload the free task to a workqueue. It is perfectly fine:
      
        + The patch can no longer be used in the livepatch operations.
      
        + The module could not be removed until the free operation finishes
          and module_put() is called.
      
        + The operation is asynchronous already when the first
          klp_try_complete_transition() fails and another call
          is queued with a delay.
      Suggested-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      958ef1e3
    • P
      livepatch: Don't block the removal of patches loaded after a forced transition · 68007289
      Petr Mladek 提交于
      module_put() is currently never called in klp_complete_transition() when
      klp_force is set. As a result, we might keep the reference count even when
      klp_enable_patch() fails and klp_cancel_transition() is called.
      
      This might give the impression that a module might get blocked in some
      strange init state. Fortunately, it is not the case. The reference count
      is ignored when mod->init fails and erroneous modules are always removed.
      
      Anyway, this might be confusing. Instead, this patch moves
      the global klp_forced flag into struct klp_patch. As a result,
      we block only modules that might still be in use after a forced
      transition. Newly loaded livepatches might be eventually completely
      removed later.
      
      It is not a big deal. But the code is at least consistent with
      the reality.
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      68007289
    • P
      livepatch: Consolidate klp_free functions · 0430f78b
      Petr Mladek 提交于
      The code for freeing livepatch structures is a bit scattered and tricky:
      
        + direct calls to klp_free_*_limited() and kobject_put() are
          used to release partially initialized objects
      
        + klp_free_patch() removes the patch from the public list
          and releases all objects except for patch->kobj
      
        + object_put(&patch->kobj) and the related wait_for_completion()
          are called directly outside klp_mutex; this code is duplicated;
      
      Now, we are going to remove the registration stage to simplify the API
      and the code. This would require handling more situations in
      klp_enable_patch() error paths.
      
      More importantly, we are going to add a feature called atomic replace.
      It will need to dynamically create func and object structures. We will
      want to reuse the existing init() and free() functions. This would
      create even more error path scenarios.
      
      This patch implements more straightforward free functions:
      
        + checks kobj_added flag instead of @limit[*]
      
        + initializes patch->list early so that the check for empty list
          always works
      
        + The action(s) that has to be done outside klp_mutex are done
          in separate klp_free_patch_finish() function. It waits only
          when patch->kobj was really released via the _start() part.
      
      The patch does not change the existing behavior.
      
      [*] We need our own flag to track that the kobject was successfully
          added to the hierarchy.  Note that kobj.state_initialized only
          indicates that kobject has been initialized, not whether is has
          been added (and needs to be removed on cleanup).
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      0430f78b
    • P
      livepatch: Shuffle klp_enable_patch()/klp_disable_patch() code · 26c3e98e
      Petr Mladek 提交于
      We are going to simplify the API and code by removing the registration
      step. This would require calling init/free functions from enable/disable
      ones.
      
      This patch just moves the code to prevent more forward declarations.
      
      This patch does not change the code except for two forward declarations.
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      26c3e98e
    • P
      livepatch: Change unsigned long old_addr -> void *old_func in struct klp_func · 19514910
      Petr Mladek 提交于
      The address of the to be patched function and new function is stored
      in struct klp_func as:
      
      	void *new_func;
      	unsigned long old_addr;
      
      The different naming scheme and type are derived from the way
      the addresses are set. @old_addr is assigned at runtime using
      kallsyms-based search. @new_func is statically initialized,
      for example:
      
        static struct klp_func funcs[] = {
      	{
      		.old_name = "cmdline_proc_show",
      		.new_func = livepatch_cmdline_proc_show,
      	}, { }
        };
      
      This patch changes unsigned long old_addr -> void *old_func. It removes
      some confusion when these address are later used in the code. It is
      motivated by a followup patch that adds special NOP struct klp_func
      where we want to assign func->new_func = func->old_addr respectively
      func->new_func = func->old_func.
      
      This patch does not modify the existing behavior.
      Suggested-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NMiroslav Benes <mbenes@suse.cz>
      Acked-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Acked-by: NAlice Ferrazzi <alice.ferrazzi@gmail.com>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      19514910
  3. 05 1月, 2019 10 次提交
  4. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  5. 03 1月, 2019 8 次提交
    • D
      bpf: prevent out of bounds speculation on pointer arithmetic · 979d63d5
      Daniel Borkmann 提交于
      Jann reported that the original commit back in b2157399
      ("bpf: prevent out-of-bounds speculation") was not sufficient
      to stop CPU from speculating out of bounds memory access:
      While b2157399 only focussed on masking array map access
      for unprivileged users for tail calls and data access such
      that the user provided index gets sanitized from BPF program
      and syscall side, there is still a more generic form affected
      from BPF programs that applies to most maps that hold user
      data in relation to dynamic map access when dealing with
      unknown scalars or "slow" known scalars as access offset, for
      example:
      
        - Load a map value pointer into R6
        - Load an index into R7
        - Do a slow computation (e.g. with a memory dependency) that
          loads a limit into R8 (e.g. load the limit from a map for
          high latency, then mask it to make the verifier happy)
        - Exit if R7 >= R8 (mispredicted branch)
        - Load R0 = R6[R7]
        - Load R0 = R6[R0]
      
      For unknown scalars there are two options in the BPF verifier
      where we could derive knowledge from in order to guarantee
      safe access to the memory: i) While </>/<=/>= variants won't
      allow to derive any lower or upper bounds from the unknown
      scalar where it would be safe to add it to the map value
      pointer, it is possible through ==/!= test however. ii) another
      option is to transform the unknown scalar into a known scalar,
      for example, through ALU ops combination such as R &= <imm>
      followed by R |= <imm> or any similar combination where the
      original information from the unknown scalar would be destroyed
      entirely leaving R with a constant. The initial slow load still
      precedes the latter ALU ops on that register, so the CPU
      executes speculatively from that point. Once we have the known
      scalar, any compare operation would work then. A third option
      only involving registers with known scalars could be crafted
      as described in [0] where a CPU port (e.g. Slow Int unit)
      would be filled with many dependent computations such that
      the subsequent condition depending on its outcome has to wait
      for evaluation on its execution port and thereby executing
      speculatively if the speculated code can be scheduled on a
      different execution port, or any other form of mistraining
      as described in [1], for example. Given this is not limited
      to only unknown scalars, not only map but also stack access
      is affected since both is accessible for unprivileged users
      and could potentially be used for out of bounds access under
      speculation.
      
      In order to prevent any of these cases, the verifier is now
      sanitizing pointer arithmetic on the offset such that any
      out of bounds speculation would be masked in a way where the
      pointer arithmetic result in the destination register will
      stay unchanged, meaning offset masked into zero similar as
      in array_index_nospec() case. With regards to implementation,
      there are three options that were considered: i) new insn
      for sanitation, ii) push/pop insn and sanitation as inlined
      BPF, iii) reuse of ax register and sanitation as inlined BPF.
      
      Option i) has the downside that we end up using from reserved
      bits in the opcode space, but also that we would require
      each JIT to emit masking as native arch opcodes meaning
      mitigation would have slow adoption till everyone implements
      it eventually which is counter-productive. Option ii) and iii)
      have both in common that a temporary register is needed in
      order to implement the sanitation as inlined BPF since we
      are not allowed to modify the source register. While a push /
      pop insn in ii) would be useful to have in any case, it
      requires once again that every JIT needs to implement it
      first. While possible, amount of changes needed would also
      be unsuitable for a -stable patch. Therefore, the path which
      has fewer changes, less BPF instructions for the mitigation
      and does not require anything to be changed in the JITs is
      option iii) which this work is pursuing. The ax register is
      already mapped to a register in all JITs (modulo arm32 where
      it's mapped to stack as various other BPF registers there)
      and used in constant blinding for JITs-only so far. It can
      be reused for verifier rewrites under certain constraints.
      The interpreter's tmp "register" has therefore been remapped
      into extending the register set with hidden ax register and
      reusing that for a number of instructions that needed the
      prior temporary variable internally (e.g. div, mod). This
      allows for zero increase in stack space usage in the interpreter,
      and enables (restricted) generic use in rewrites otherwise as
      long as such a patchlet does not make use of these instructions.
      The sanitation mask is dynamic and relative to the offset the
      map value or stack pointer currently holds.
      
      There are various cases that need to be taken under consideration
      for the masking, e.g. such operation could look as follows:
      ptr += val or val += ptr or ptr -= val. Thus, the value to be
      sanitized could reside either in source or in destination
      register, and the limit is different depending on whether
      the ALU op is addition or subtraction and depending on the
      current known and bounded offset. The limit is derived as
      follows: limit := max_value_size - (smin_value + off). For
      subtraction: limit := umax_value + off. This holds because
      we do not allow any pointer arithmetic that would
      temporarily go out of bounds or would have an unknown
      value with mixed signed bounds where it is unclear at
      verification time whether the actual runtime value would
      be either negative or positive. For example, we have a
      derived map pointer value with constant offset and bounded
      one, so limit based on smin_value works because the verifier
      requires that statically analyzed arithmetic on the pointer
      must be in bounds, and thus it checks if resulting
      smin_value + off and umax_value + off is still within map
      value bounds at time of arithmetic in addition to time of
      access. Similarly, for the case of stack access we derive
      the limit as follows: MAX_BPF_STACK + off for subtraction
      and -off for the case of addition where off := ptr_reg->off +
      ptr_reg->var_off.value. Subtraction is a special case for
      the masking which can be in form of ptr += -val, ptr -= -val,
      or ptr -= val. In the first two cases where we know that
      the value is negative, we need to temporarily negate the
      value in order to do the sanitation on a positive value
      where we later swap the ALU op, and restore original source
      register if the value was in source.
      
      The sanitation of pointer arithmetic alone is still not fully
      sufficient as is, since a scenario like the following could
      happen ...
      
        PTR += 0x1000 (e.g. K-based imm)
        PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
        PTR += 0x1000
        PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
        [...]
      
      ... which under speculation could end up as ...
      
        PTR += 0x1000
        PTR -= 0 [ truncated by mitigation ]
        PTR += 0x1000
        PTR -= 0 [ truncated by mitigation ]
        [...]
      
      ... and therefore still access out of bounds. To prevent such
      case, the verifier is also analyzing safety for potential out
      of bounds access under speculative execution. Meaning, it is
      also simulating pointer access under truncation. We therefore
      "branch off" and push the current verification state after the
      ALU operation with known 0 to the verification stack for later
      analysis. Given the current path analysis succeeded it is
      likely that the one under speculation can be pruned. In any
      case, it is also subject to existing complexity limits and
      therefore anything beyond this point will be rejected. In
      terms of pruning, it needs to be ensured that the verification
      state from speculative execution simulation must never prune
      a non-speculative execution path, therefore, we mark verifier
      state accordingly at the time of push_stack(). If verifier
      detects out of bounds access under speculative execution from
      one of the possible paths that includes a truncation, it will
      reject such program.
      
      Given we mask every reg-based pointer arithmetic for
      unprivileged programs, we've been looking into how it could
      affect real-world programs in terms of size increase. As the
      majority of programs are targeted for privileged-only use
      case, we've unconditionally enabled masking (with its alu
      restrictions on top of it) for privileged programs for the
      sake of testing in order to check i) whether they get rejected
      in its current form, and ii) by how much the number of
      instructions and size will increase. We've tested this by
      using Katran, Cilium and test_l4lb from the kernel selftests.
      For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o
      and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb
      we've used test_l4lb.o as well as test_l4lb_noinline.o. We
      found that none of the programs got rejected by the verifier
      with this change, and that impact is rather minimal to none.
      balancer_kern.o had 13,904 bytes (1,738 insns) xlated and
      7,797 bytes JITed before and after the change. Most complex
      program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated
      and 18,538 bytes JITed before and after and none of the other
      tail call programs in bpf_lxc.o had any changes either. For
      the older bpf_lxc_opt_-DUNKNOWN.o object we found a small
      increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed
      before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed
      after the change. Other programs from that object file had
      similar small increase. Both test_l4lb.o had no change and
      remained at 6,544 bytes (817 insns) xlated and 3,401 bytes
      JITed and for test_l4lb_noinline.o constant at 5,080 bytes
      (634 insns) xlated and 3,313 bytes JITed. This can be explained
      in that LLVM typically optimizes stack based pointer arithmetic
      by using K-based operations and that use of dynamic map access
      is not overly frequent. However, in future we may decide to
      optimize the algorithm further under known guarantees from
      branch and value speculation. Latter seems also unclear in
      terms of prediction heuristics that today's CPUs apply as well
      as whether there could be collisions in e.g. the predictor's
      Value History/Pattern Table for triggering out of bounds access,
      thus masking is performed unconditionally at this point but could
      be subject to relaxation later on. We were generally also
      brainstorming various other approaches for mitigation, but the
      blocker was always lack of available registers at runtime and/or
      overhead for runtime tracking of limits belonging to a specific
      pointer. Thus, we found this to be minimally intrusive under
      given constraints.
      
      With that in place, a simple example with sanitized access on
      unprivileged load at post-verification time looks as follows:
      
        # bpftool prog dump xlated id 282
        [...]
        28: (79) r1 = *(u64 *)(r7 +0)
        29: (79) r2 = *(u64 *)(r7 +8)
        30: (57) r1 &= 15
        31: (79) r3 = *(u64 *)(r0 +4608)
        32: (57) r3 &= 1
        33: (47) r3 |= 1
        34: (2d) if r2 > r3 goto pc+19
        35: (b4) (u32) r11 = (u32) 20479  |
        36: (1f) r11 -= r2                | Dynamic sanitation for pointer
        37: (4f) r11 |= r2                | arithmetic with registers
        38: (87) r11 = -r11               | containing bounded or known
        39: (c7) r11 s>>= 63              | scalars in order to prevent
        40: (5f) r11 &= r2                | out of bounds speculation.
        41: (0f) r4 += r11                |
        42: (71) r4 = *(u8 *)(r4 +0)
        43: (6f) r4 <<= r1
        [...]
      
      For the case where the scalar sits in the destination register
      as opposed to the source register, the following code is emitted
      for the above example:
      
        [...]
        16: (b4) (u32) r11 = (u32) 20479
        17: (1f) r11 -= r2
        18: (4f) r11 |= r2
        19: (87) r11 = -r11
        20: (c7) r11 s>>= 63
        21: (5f) r2 &= r11
        22: (0f) r2 += r0
        23: (61) r0 = *(u32 *)(r2 +0)
        [...]
      
      JIT blinding example with non-conflicting use of r10:
      
        [...]
         d5:	je     0x0000000000000106    _
         d7:	mov    0x0(%rax),%edi       |
         da:	mov    $0xf153246,%r10d     | Index load from map value and
         e0:	xor    $0xf153259,%r10      | (const blinded) mask with 0x1f.
         e7:	and    %r10,%rdi            |_
         ea:	mov    $0x2f,%r10d          |
         f0:	sub    %rdi,%r10            | Sanitized addition. Both use r10
         f3:	or     %rdi,%r10            | but do not interfere with each
         f6:	neg    %r10                 | other. (Neither do these instructions
         f9:	sar    $0x3f,%r10           | interfere with the use of ax as temp
         fd:	and    %r10,%rdi            | in interpreter.)
        100:	add    %rax,%rdi            |_
        103:	mov    0x0(%rdi),%eax
       [...]
      
      Tested that it fixes Jann's reproducer, and also checked that test_verifier
      and test_progs suite with interpreter, JIT and JIT with hardening enabled
      on x86-64 and arm64 runs successfully.
      
        [0] Speculose: Analyzing the Security Implications of Speculative
            Execution in CPUs, Giorgi Maisuradze and Christian Rossow,
            https://arxiv.org/pdf/1801.04084.pdf
      
        [1] A Systematic Evaluation of Transient Execution Attacks and
            Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz,
            Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens,
            Dmitry Evtyushkin, Daniel Gruss,
            https://arxiv.org/pdf/1811.05441.pdf
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      979d63d5
    • D
      bpf: fix check_map_access smin_value test when pointer contains offset · b7137c4e
      Daniel Borkmann 提交于
      In check_map_access() we probe actual bounds through __check_map_access()
      with offset of reg->smin_value + off for lower bound and offset of
      reg->umax_value + off for the upper bound. However, even though the
      reg->smin_value could have a negative value, the final result of the
      sum with off could be positive when pointer arithmetic with known and
      unknown scalars is combined. In this case we reject the program with
      an error such as "R<x> min value is negative, either use unsigned index
      or do a if (index >=0) check." even though the access itself would be
      fine. Therefore extend the check to probe whether the actual resulting
      reg->smin_value + off is less than zero.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b7137c4e
    • D
      bpf: restrict unknown scalars of mixed signed bounds for unprivileged · 9d7eceed
      Daniel Borkmann 提交于
      For unknown scalars of mixed signed bounds, meaning their smin_value is
      negative and their smax_value is positive, we need to reject arithmetic
      with pointer to map value. For unprivileged the goal is to mask every
      map pointer arithmetic and this cannot reliably be done when it is
      unknown at verification time whether the scalar value is negative or
      positive. Given this is a corner case, the likelihood of breaking should
      be very small.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      9d7eceed
    • D
      bpf: restrict stack pointer arithmetic for unprivileged · e4298d25
      Daniel Borkmann 提交于
      Restrict stack pointer arithmetic for unprivileged users in that
      arithmetic itself must not go out of bounds as opposed to the actual
      access later on. Therefore after each adjust_ptr_min_max_vals() with
      a stack pointer as a destination we simulate a check_stack_access()
      of 1 byte on the destination and once that fails the program is
      rejected for unprivileged program loads. This is analog to map
      value pointer arithmetic and needed for masking later on.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e4298d25
    • D
      bpf: restrict map value pointer arithmetic for unprivileged · 0d6303db
      Daniel Borkmann 提交于
      Restrict map value pointer arithmetic for unprivileged users in that
      arithmetic itself must not go out of bounds as opposed to the actual
      access later on. Therefore after each adjust_ptr_min_max_vals() with a
      map value pointer as a destination it will simulate a check_map_access()
      of 1 byte on the destination and once that fails the program is rejected
      for unprivileged program loads. We use this later on for masking any
      pointer arithmetic with the remainder of the map value space. The
      likelihood of breaking any existing real-world unprivileged eBPF
      program is very small for this corner case.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      0d6303db
    • D
      bpf: enable access to ax register also from verifier rewrite · 9b73bfdd
      Daniel Borkmann 提交于
      Right now we are using BPF ax register in JIT for constant blinding as
      well as in interpreter as temporary variable. Verifier will not be able
      to use it simply because its use will get overridden from the former in
      bpf_jit_blind_insn(). However, it can be made to work in that blinding
      will be skipped if there is prior use in either source or destination
      register on the instruction. Taking constraints of ax into account, the
      verifier is then open to use it in rewrites under some constraints. Note,
      ax register already has mappings in every eBPF JIT.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      9b73bfdd
    • D
      bpf: move tmp variable into ax register in interpreter · 144cd91c
      Daniel Borkmann 提交于
      This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run()
      into the hidden ax register. The latter is currently only used in JITs
      for constant blinding as a temporary scratch register, meaning the BPF
      interpreter will never see the use of ax. Therefore it is safe to use
      it for the cases where tmp has been used earlier. This is needed to later
      on allow restricted hidden use of ax in both interpreter and JITs.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      144cd91c
    • D
      bpf: move {prev_,}insn_idx into verifier env · c08435ec
      Daniel Borkmann 提交于
      Move prev_insn_idx and insn_idx from the do_check() function into
      the verifier environment, so they can be read inside the various
      helper functions for handling the instructions. It's easier to put
      this into the environment rather than changing all call-sites only
      to pass it along. insn_idx is useful in particular since this later
      on allows to hold state in env->insn_aux_data[env->insn_idx].
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c08435ec
  6. 30 12月, 2018 6 次提交
    • L
      sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f654 · c40f7d74
      Linus Torvalds 提交于
      Zhipeng Xie, Xie XiuQi and Sargun Dhillon reported lockups in the
      scheduler under high loads, starting at around the v4.18 time frame,
      and Zhipeng Xie tracked it down to bugs in the rq->leaf_cfs_rq_list
      manipulation.
      
      Do a (manual) revert of:
      
        a9e7f654 ("sched/fair: Fix O(nr_cgroups) in load balance path")
      
      It turns out that the list_del_leaf_cfs_rq() introduced by this commit
      is a surprising property that was not considered in followup commits
      such as:
      
        9c2791f9 ("sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list")
      
      As Vincent Guittot explains:
      
       "I think that there is a bigger problem with commit a9e7f654 and
        cfs_rq throttling:
      
        Let take the example of the following topology TG2 --> TG1 --> root:
      
         1) The 1st time a task is enqueued, we will add TG2 cfs_rq then TG1
            cfs_rq to leaf_cfs_rq_list and we are sure to do the whole branch in
            one path because it has never been used and can't be throttled so
            tmp_alone_branch will point to leaf_cfs_rq_list at the end.
      
         2) Then TG1 is throttled
      
         3) and we add TG3 as a new child of TG1.
      
         4) The 1st enqueue of a task on TG3 will add TG3 cfs_rq just before TG1
            cfs_rq and tmp_alone_branch will stay  on rq->leaf_cfs_rq_list.
      
        With commit a9e7f654, we can del a cfs_rq from rq->leaf_cfs_rq_list.
        So if the load of TG1 cfs_rq becomes NULL before step 2) above, TG1
        cfs_rq is removed from the list.
        Then at step 4), TG3 cfs_rq is added at the beginning of rq->leaf_cfs_rq_list
        but tmp_alone_branch still points to TG3 cfs_rq because its throttled
        parent can't be enqueued when the lock is released.
        tmp_alone_branch doesn't point to rq->leaf_cfs_rq_list whereas it should.
      
        So if TG3 cfs_rq is removed or destroyed before tmp_alone_branch
        points on another TG cfs_rq, the next TG cfs_rq that will be added,
        will be linked outside rq->leaf_cfs_rq_list - which is bad.
      
        In addition, we can break the ordering of the cfs_rq in
        rq->leaf_cfs_rq_list but this ordering is used to update and
        propagate the update from leaf down to root."
      
      Instead of trying to work through all these cases and trying to reproduce
      the very high loads that produced the lockup to begin with, simplify
      the code temporarily by reverting a9e7f654 - which change was clearly
      not thought through completely.
      
      This (hopefully) gives us a kernel that doesn't lock up so people
      can continue to enjoy their holidays without worrying about regressions. ;-)
      
      [ mingo: Wrote changelog, fixed weird spelling in code comment while at it. ]
      Analyzed-by: NXie XiuQi <xiexiuqi@huawei.com>
      Analyzed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Reported-by: NZhipeng Xie <xiezhipeng1@huawei.com>
      Reported-by: NSargun Dhillon <sargun@sargun.me>
      Reported-by: NXie XiuQi <xiexiuqi@huawei.com>
      Tested-by: NZhipeng Xie <xiezhipeng1@huawei.com>
      Tested-by: NSargun Dhillon <sargun@sargun.me>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NVincent Guittot <vincent.guittot@linaro.org>
      Cc: <stable@vger.kernel.org> # v4.13+
      Cc: Bin Li <huawei.libin@huawei.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: a9e7f654 ("sched/fair: Fix O(nr_cgroups) in load balance path")
      Link: http://lkml.kernel.org/r/1545879866-27809-1-git-send-email-xiexiuqi@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c40f7d74
    • N
      kdb: use bool for binary state indicators · 7faedcd4
      Nicholas Mc Guire 提交于
      defcmd_in_progress  is the state trace for command group processing
      - within a command group or not -  usable  is an indicator if a command
      set is valid (allocated/non-empty) - so use a bool for those binary
      indication here.
      Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
      Reviewed-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      7faedcd4
    • D
      kdb: Don't back trace on a cpu that didn't round up · 162bc7f5
      Douglas Anderson 提交于
      If you have a CPU that fails to round up and then run 'btc' you'll end
      up crashing in kdb becaue we dereferenced NULL.  Let's add a check.
      It's wise to also set the task to NULL when leaving the debugger so
      that if we fail to round up on a later entry into the debugger we
      won't backtrace a stale task.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Acked-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      162bc7f5
    • D
      kgdb: Don't round up a CPU that failed rounding up before · 87b09592
      Douglas Anderson 提交于
      If we're using the default implementation of kgdb_roundup_cpus() that
      uses smp_call_function_single_async() we can end up hanging
      kgdb_roundup_cpus() if we try to round up a CPU that failed to round
      up before.
      
      Specifically smp_call_function_single_async() will try to wait on the
      csd lock for the CPU that we're trying to round up.  If the previous
      round up never finished then that lock could still be held and we'll
      just sit there hanging.
      
      There's not a lot of use trying to round up a CPU that failed to round
      up before.  Let's keep a flag that indicates whether the CPU started
      but didn't finish to round up before.  If we see that flag set then
      we'll skip the next round up.
      
      In general we have a few goals here:
      - We never want to end up calling smp_call_function_single_async()
        when the csd is still locked.  This is accomplished because
        flush_smp_call_function_queue() unlocks the csd _before_ invoking
        the callback.  That means that when kgdb_nmicallback() runs we know
        for sure the the csd is no longer locked.  Thus when we set
        "rounding_up = false" we know for sure that the csd is unlocked.
      - If there are no timeouts rounding up we should never skip a round
        up.
      
      NOTE #1: In general trying to continue running after failing to round
      up CPUs doesn't appear to be supported in the debugger.  When I
      simulate this I find that kdb reports "Catastrophic error detected"
      when I try to continue.  I can overrule and continue anyway, but it
      should be noted that we may be entering the land of dragons here.
      Possibly the "Catastrophic error detected" was added _because_ of the
      future failure to round up, but even so this is an area of the code
      that hasn't been strongly tested.
      
      NOTE #2: I did a bit of testing before and after this change.  I
      introduced a 10 second hang in the kernel while holding a spinlock
      that I could invoke on a certain CPU with 'taskset -c 3 cat /sys/...".
      
      Before this change if I did:
      - Invoke hang
      - Enter debugger
      - g (which warns about Catastrophic error, g again to go anyway)
      - g
      - Enter debugger
      
      ...I'd hang the rest of the 10 seconds without getting a debugger
      prompt.  After this change I end up in the debugger the 2nd time after
      only 1 second with the standard warning about 'Timed out waiting for
      secondary CPUs.'
      
      I'll also note that once the CPU finished waiting I could actually
      debug it (aka "btc" worked)
      
      I won't promise that everything works perfectly if the errant CPU
      comes back at just the wrong time (like as we're entering or exiting
      the debugger) but it certainly seems like an improvement.
      
      NOTE #3: setting 'kgdb_info[cpu].rounding_up = false' is in
      kgdb_nmicallback() instead of kgdb_call_nmi_hook() because some
      implementations override kgdb_call_nmi_hook().  It shouldn't hurt to
      have it in kgdb_nmicallback() in any case.
      
      NOTE #4: this logic is really only needed because there is no API call
      like "smp_try_call_function_single_async()" or "smp_csd_is_locked()".
      If such an API existed then we'd use it instead, but it seemed a bit
      much to add an API like this just for kgdb.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Acked-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      87b09592
    • D
      kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function() · 3cd99ac3
      Douglas Anderson 提交于
      When I had lockdep turned on and dropped into kgdb I got a nice splat
      on my system.  Specifically it hit:
        DEBUG_LOCKS_WARN_ON(current->hardirq_context)
      
      Specifically it looked like this:
        sysrq: SysRq : DEBUG
        ------------[ cut here ]------------
        DEBUG_LOCKS_WARN_ON(current->hardirq_context)
        WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
        pstate: 604003c9 (nZCv DAIF +PAN -UAO)
        pc : lockdep_hardirqs_on+0xf0/0x160
        ...
        Call trace:
         lockdep_hardirqs_on+0xf0/0x160
         trace_hardirqs_on+0x188/0x1ac
         kgdb_roundup_cpus+0x14/0x3c
         kgdb_cpu_enter+0x53c/0x5cc
         kgdb_handle_exception+0x180/0x1d4
         kgdb_compiled_brk_fn+0x30/0x3c
         brk_handler+0x134/0x178
         do_debug_exception+0xfc/0x178
         el1_dbg+0x18/0x78
         kgdb_breakpoint+0x34/0x58
         sysrq_handle_dbg+0x54/0x5c
         __handle_sysrq+0x114/0x21c
         handle_sysrq+0x30/0x3c
         qcom_geni_serial_isr+0x2dc/0x30c
        ...
        ...
        irq event stamp: ...45
        hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
        hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
        softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
        softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
        ---[ end trace adf21f830c46e638 ]---
      
      Looking closely at it, it seems like a really bad idea to be calling
      local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
      like it could violate spinlock semantics and cause a deadlock.
      
      Instead, let's use a private csd alongside
      smp_call_function_single_async() to round up the other CPUs.  Using
      smp_call_function_single_async() doesn't require interrupts to be
      enabled so we can remove the offending bit of code.
      
      In order to avoid duplicating this across all the architectures that
      use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
      to debug_core.c.
      
      Looking at all the people who previously had copies of this code,
      there were a few variants.  I've attempted to keep the variants
      working like they used to.  Specifically:
      * For arch/arc we passed NULL to kgdb_nmicallback() instead of
        get_irq_regs().
      * For arch/mips there was a bit of extra code around
        kgdb_nmicallback()
      
      NOTE: In this patch we will still get into trouble if we try to round
      up a CPU that failed to round up before.  We'll try to round it up
      again and potentially hang when we try to grab the csd lock.  That's
      not new behavior but we'll still try to do better in a future patch.
      Suggested-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      3cd99ac3
    • D
      kgdb: Remove irq flags from roundup · 9ef7fa50
      Douglas Anderson 提交于
      The function kgdb_roundup_cpus() was passed a parameter that was
      documented as:
      
      > the flags that will be used when restoring the interrupts. There is
      > local_irq_save() call before kgdb_roundup_cpus().
      
      Nobody used those flags.  Anyone who wanted to temporarily turn on
      interrupts just did local_irq_enable() and local_irq_disable() without
      looking at them.  So we can definitely remove the flags.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      9ef7fa50
  7. 29 12月, 2018 4 次提交