1. 25 7月, 2017 1 次提交
    • E
      signal: Remove kernel interal si_code magic · cc731525
      Eric W. Biederman 提交于
      struct siginfo is a union and the kernel since 2.4 has been hiding a union
      tag in the high 16bits of si_code using the values:
      __SI_KILL
      __SI_TIMER
      __SI_POLL
      __SI_FAULT
      __SI_CHLD
      __SI_RT
      __SI_MESGQ
      __SI_SYS
      
      While this looks plausible on the surface, in practice this situation has
      not worked well.
      
      - Injected positive signals are not copied to user space properly
        unless they have these magic high bits set.
      
      - Injected positive signals are not reported properly by signalfd
        unless they have these magic high bits set.
      
      - These kernel internal values leaked to userspace via ptrace_peek_siginfo
      
      - It was possible to inject these kernel internal values and cause the
        the kernel to misbehave.
      
      - Kernel developers got confused and expected these kernel internal values
        in userspace in kernel self tests.
      
      - Kernel developers got confused and set si_code to __SI_FAULT which
        is SI_USER in userspace which causes userspace to think an ordinary user
        sent the signal and that it was not kernel generated.
      
      - The values make it impossible to reorganize the code to transform
        siginfo_copy_to_user into a plain copy_to_user.  As si_code must
        be massaged before being passed to userspace.
      
      So remove these kernel internal si codes and make the kernel code simpler
      and more maintainable.
      
      To replace these kernel internal magic si_codes introduce the helper
      function siginfo_layout, that takes a signal number and an si_code and
      computes which union member of siginfo is being used.  Have
      siginfo_layout return an enumeration so that gcc will have enough
      information to warn if a switch statement does not handle all of union
      members.
      
      A couple of architectures have a messed up ABI that defines signal
      specific duplications of SI_USER which causes more special cases in
      siginfo_layout than I would like.  The good news is only problem
      architectures pay the cost.
      
      Update all of the code that used the previous magic __SI_ values to
      use the new SIL_ values and to call siginfo_layout to get those
      values.  Escept where not all of the cases are handled remove the
      defaults in the switch statements so that if a new case is missed in
      the future the lack will show up at compile time.
      
      Modify the code that copies siginfo si_code to userspace to just copy
      the value and not cast si_code to a short first.  The high bits are no
      longer used to hold a magic union member.
      
      Fixup the siginfo header files to stop including the __SI_ values in
      their constants and for the headers that were missing it to properly
      update the number of si_codes for each signal type.
      
      The fixes to copy_siginfo_from_user32 implementations has the
      interesting property that several of them perviously should never have
      worked as the __SI_ values they depended up where kernel internal.
      With that dependency gone those implementations should work much
      better.
      
      The idea of not passing the __SI_ values out to userspace and then
      not reinserting them has been tested with criu and criu worked without
      changes.
      
      Ref: 2.4.0-test1
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cc731525
  2. 20 7月, 2017 2 次提交
    • K
      prctl: Allow local CAP_SYS_ADMIN changing exe_file · 4d28df61
      Kirill Tkhai 提交于
      During checkpointing and restore of userspace tasks
      we bumped into the situation, that it's not possible
      to restore the tasks, which user namespace does not
      have uid 0 or gid 0 mapped.
      
      People create user namespace mappings like they want,
      and there is no a limitation on obligatory uid and gid
      "must be mapped". So, if there is no uid 0 or gid 0
      in the mapping, it's impossible to restore mm->exe_file
      of the processes belonging to this user namespace.
      
      Also, there is no a workaround. It's impossible
      to create a temporary uid/gid mapping, because
      only one write to /proc/[pid]/uid_map and gid_map
      is allowed during a namespace lifetime.
      If there is an entry, then no more mapings can't be
      written. If there isn't an entry, we can't write
      there too, otherwise user task won't be able
      to do that in the future.
      
      The patch changes the check, and looks for CAP_SYS_ADMIN
      instead of zero uid and gid. This allows to restore
      a task independently of its user namespace mappings.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Serge Hallyn <serge@hallyn.com>
      CC: "Eric W. Biederman" <ebiederm@xmission.com>
      CC: Oleg Nesterov <oleg@redhat.com>
      CC: Michal Hocko <mhocko@suse.com>
      CC: Andrei Vagin <avagin@openvz.org>
      CC: Cyrill Gorcunov <gorcunov@openvz.org>
      CC: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
      CC: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      4d28df61
    • E
      userns,pidns: Verify the userns for new pid namespaces · a2b42626
      Eric W. Biederman 提交于
      It is pointless and confusing to allow a pid namespace hierarchy and
      the user namespace hierarchy to get out of sync.  The owner of a child
      pid namespace should be the owner of the parent pid namespace or
      a descendant of the owner of the parent pid namespace.
      
      Otherwise it is possible to construct scenarios where a process has a
      capability over a parent pid namespace but does not have the
      capability over a child pid namespace.  Which confusingly makes
      permission checks non-transitive.
      
      It requires use of setns into a pid namespace (but not into a user
      namespace) to create such a scenario.
      
      Add the function in_userns to help in making this determination.
      
      v2: Optimized in_userns by using level as suggested
          by: Kirill Tkhai <ktkhai@virtuozzo.com>
      
      Ref: 49f4d8b9 ("pidns: Capture the user namespace and filter ns_last_pid")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      a2b42626
  3. 15 7月, 2017 2 次提交
  4. 13 7月, 2017 17 次提交
  5. 12 7月, 2017 6 次提交
  6. 11 7月, 2017 8 次提交
  7. 09 7月, 2017 1 次提交
  8. 08 7月, 2017 3 次提交