1. 04 8月, 2018 3 次提交
    • E
      fork: Have new threads join on-going signal group stops · 924de3b8
      Eric W. Biederman 提交于
      There are only two signals that are delivered to every member of a
      signal group: SIGSTOP and SIGKILL.  Signal delivery requires every
      signal appear to be delivered either before or after a clone syscall.
      SIGKILL terminates the clone so does not need to be considered.  Which
      leaves only SIGSTOP that needs to be considered when creating new
      threads.
      
      Today in the event of a group stop TIF_SIGPENDING will get set and the
      fork will restart ensuring the fork syscall participates in the group
      stop.
      
      A fork (especially of a process with a lot of memory) is one of the
      most expensive system so we really only want to restart a fork when
      necessary.
      
      It is easy so check to see if a SIGSTOP is ongoing and have the new
      thread join it immediate after the clone completes.  Making it appear
      the clone completed happened just before the SIGSTOP.
      
      The calculate_sigpending function will see the bits set in jobctl and
      set TIF_SIGPENDING to ensure the new task takes the slow path to userspace.
      
      V2: The call to task_join_group_stop was moved before the new task is
          added to the thread group list.  This should not matter as
          sighand->siglock is held over both the addition of the threads,
          the call to task_join_group_stop and do_signal_stop.  But the change
          is trivial and it is one less thing to worry about when reading
          the code.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      924de3b8
    • E
      fork: Skip setting TIF_SIGPENDING in ptrace_init_task · 4390e9ea
      Eric W. Biederman 提交于
      The code in calculate_sigpending will now handle this so
      it is just redundant and possibly a little confusing
      to continue setting TIF_SIGPENDING in ptrace_init_task.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4390e9ea
    • E
      signal: Add calculate_sigpending() · 088fe47c
      Eric W. Biederman 提交于
      Add a function calculate_sigpending to test to see if any signals are
      pending for a new task immediately following fork.  Signals have to
      happen either before or after fork.  Today our practice is to push
      all of the signals to before the fork, but that has the downside that
      frequent or periodic signals can make fork take much much longer than
      normal or prevent fork from completing entirely.
      
      So we need move signals that we can after the fork to prevent that.
      
      This updates the code to set TIF_SIGPENDING on a new task if there
      are signals or other activities that have moved so that they appear
      to happen after the fork.
      
      As the code today restarts if it sees any such activity this won't
      immediately have an effect, as there will be no reason for it
      to set TIF_SIGPENDING immediately after the fork.
      
      Adding calculate_sigpending means the code in fork can safely be
      changed to not always restart if a signal is pending.
      
      The new calculate_sigpending function sets sigpending if there
      are pending bits in jobctl, pending signals, the freezer needs
      to freeze the new task or the live kernel patching framework
      need the new thread to take the slow path to userspace.
      
      I have verified that setting TIF_SIGPENDING does make a new process
      take the slow path to userspace before it executes it's first userspace
      instruction.
      
      I have looked at the callers of signal_wake_up and the code paths
      setting TIF_SIGPENDING and I don't see anything else that needs to be
      handled.  The code probably doesn't need to set TIF_SIGPENDING for the
      kernel live patching as it uses a separate thread flag as well.  But
      at this point it seems safer reuse the recalc_sigpending logic and get
      the kernel live patching folks to sort out their story later.
      
      V2: I have moved the test into schedule_tail where siglock can
          be grabbed and recalc_sigpending can be reused directly.
          Further as the last action of setting up a new task this
          guarantees that TIF_SIGPENDING will be properly set in the
          new process.
      
          The helper calculate_sigpending takes the siglock and
          uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending
          clear TIF_SIGPENDING if it is unnecessary.  This allows reusing
          the existing code and keeps maintenance of the conditions simple.
      
          Oleg Nesterov <oleg@redhat.com>  suggested the movement
          and pointed out the need to take siglock if this code
          was going to be called while the new task is discoverable.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      088fe47c
  2. 23 7月, 2018 2 次提交
    • E
      fork: Unconditionally exit if a fatal signal is pending · 7673bf55
      Eric W. Biederman 提交于
      In practice this does not change anything as testing for fatal_signal_pending
      and exiting for with an error code duplicates the work of the next clause
      which recalculates pending signals and then exits fork if any are pending.
      In both cases the pending signal will trigger the slow path when existing
      to userspace, and the fatal signal will cause do_exit to be called.
      
      The advantage of making this a separate test is that it makes it clear
      processing the fatal signal will terminate the fork, and it allows the
      rest of the signal logic to be updated without fear that this important
      case will be lost.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7673bf55
    • E
      fork: Move and describe why the code examines PIDNS_ADDING · 4ca1d3ee
      Eric W. Biederman 提交于
      Normally this would be something that would be handled by handling
      signals that are sent to a group of processes but in this case the
      forking process is not a member of the group being signaled.  Thus
      special code is needed to prevent a race with pid namespaces exiting,
      and fork adding new processes within them.
      
      Move this test up before the signal restart just in case signals are
      also pending.  Fatal conditions should take presedence over restarts.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4ca1d3ee
  3. 22 7月, 2018 6 次提交
  4. 21 7月, 2018 9 次提交
    • E
      signal: Pass pid and pid type into send_sigqueue · 24122c7f
      Eric W. Biederman 提交于
      Make the code more maintainable by performing more of the signal
      related work in send_sigqueue.
      
      A quick inspection of do_timer_create will show that this code path
      does not lookup a thread group by a thread's pid.  Making it safe
      to find the task pointed to by it_pid with "pid_task(it_pid, type)";
      
      This supports the changes needed in fork to tell if a signal was sent
      to a single process or a group of processes.
      
      Having the pid to task transition in signal.c will also make it easier
      to sort out races with de_thread and and the thread group leader
      exiting when it comes time to address that.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      24122c7f
    • E
      posix-timers: Noralize good_sigevent · 2118e1f5
      Eric W. Biederman 提交于
      In good_sigevent directly compute the default return value as
      "task_tgid(current)".  This is exactly the same as
      "task_pid(current->group_leader)" but written more clearly.
      
      In the thread case first compute the thread's pid.  Then veify that
      attached to that pid is a thread of the current thread group.
      
      This has the net effect of making the code a little clearer, and
      making it obvious that posix timers never look up a process by a the
      pid of a thread.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      2118e1f5
    • E
      signal: Use PIDTYPE_TGID to clearly store where file signals will be sent · 01919134
      Eric W. Biederman 提交于
      When f_setown is called a pid and a pid type are stored.  Replace the use
      of PIDTYPE_PID with PIDTYPE_TGID as PIDTYPE_TGID goes to the entire thread
      group.  Replace the use of PIDTYPE_MAX with PIDTYPE_PID as PIDTYPE_PID now
      is only for a thread.
      
      Update the users of __f_setown to use PIDTYPE_TGID instead of
      PIDTYPE_PID.
      
      For now the code continues to capture task_pid (when task_tgid would
      really be appropriate), and iterate on PIDTYPE_PID (even when type ==
      PIDTYPE_TGID) out of an abundance of caution to preserve existing
      behavior.
      
      Oleg Nesterov suggested using the test to ensure we use PIDTYPE_PID
      for tgid lookup also be used to avoid taking the tasklist lock.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      01919134
    • E
      pid: Implement PIDTYPE_TGID · 6883f81a
      Eric W. Biederman 提交于
      Everywhere except in the pid array we distinguish between a tasks pid and
      a tasks tgid (thread group id).  Even in the enumeration we want that
      distinction sometimes so we have added __PIDTYPE_TGID.  With leader_pid
      we almost have an implementation of PIDTYPE_TGID in struct signal_struct.
      
      Add PIDTYPE_TGID as a first class member of the pid_type enumeration and
      into the pids array.  Then remove the __PIDTYPE_TGID special case and the
      leader_pid in signal_struct.
      
      The net size increase is just an extra pointer added to struct pid and
      an extra pair of pointers of an hlist_node added to task_struct.
      
      The effect on code maintenance is the removal of a number of special
      cases today and the potential to remove many more special cases as
      PIDTYPE_TGID gets used to it's fullest.  The long term potential
      is allowing zombie thread group leaders to exit, which will remove
      a lot more special cases in the code.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      6883f81a
    • E
      pids: Move the pgrp and session pid pointers from task_struct to signal_struct · 2c470475
      Eric W. Biederman 提交于
      To access these fields the code always has to go to group leader so
      going to signal struct is no loss and is actually a fundamental simplification.
      
      This saves a little bit of memory by only allocating the pid pointer array
      once instead of once for every thread, and even better this removes a
      few potential races caused by the fact that group_leader can be changed
      by de_thread, while signal_struct can not.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      2c470475
    • E
      71dbc8a9
    • E
      pids: Compute task_tgid using signal->leader_pid · 7a36094d
      Eric W. Biederman 提交于
      The cost is the the same and this removes the need
      to worry about complications that come from de_thread
      and group_leader changing.
      
      __task_pid_nr_ns has been updated to take advantage of this change.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7a36094d
    • E
      pids: Move task_pid_type into sched/signal.h · 1fb53567
      Eric W. Biederman 提交于
      The function is general and inline so there is no need
      to hide it inside of exit.c
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      1fb53567
    • E
      pids: Initialize leader_pid in init_task · 2896b0f0
      Eric W. Biederman 提交于
      This is cheap and no cost so we might as well.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      2896b0f0
  5. 17 6月, 2018 5 次提交
    • L
      Linux 4.18-rc1 · ce397d21
      Linus Torvalds 提交于
      ce397d21
    • L
      Merge tag 'for-linus-20180616' of git://git.kernel.dk/linux-block · 265c5596
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "A collection of fixes that should go into -rc1. This contains:
      
         - bsg_open vs bsg_unregister race fix (Anatoliy)
      
         - NVMe pull request from Christoph, with fixes for regressions in
           this window, FC connect/reconnect path code unification, and a
           trace point addition.
      
         - timeout fix (Christoph)
      
         - remove a few unused functions (Christoph)
      
         - blk-mq tag_set reinit fix (Roman)"
      
      * tag 'for-linus-20180616' of git://git.kernel.dk/linux-block:
        bsg: fix race of bsg_open and bsg_unregister
        block: remov blk_queue_invalidate_tags
        nvme-fabrics: fix and refine state checks in __nvmf_check_ready
        nvme-fabrics: handle the admin-only case properly in nvmf_check_ready
        nvme-fabrics: refactor queue ready check
        blk-mq: remove blk_mq_tagset_iter
        nvme: remove nvme_reinit_tagset
        nvme-fc: fix nulling of queue data on reconnect
        nvme-fc: remove reinit_request routine
        blk-mq: don't time out requests again that are in the timeout handler
        nvme-fc: change controllers first connect to use reconnect path
        nvme: don't rely on the changed namespace list log
        nvmet: free smart-log buffer after use
        nvme-rdma: fix error flow during mapping request data
        nvme: add bio remapping tracepoint
        nvme: fix NULL pointer dereference in nvme_init_subsystem
        blk-mq: reinit q->tag_set_list entry only after grace period
      265c5596
    • L
      Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental · 5e7b9212
      Linus Torvalds 提交于
      Pull documentation fixes from Mauro Carvalho Chehab:
       "This solves a series of broken links for files under Documentation,
        and improves a script meant to detect such broken links (see
        scripts/documentation-file-ref-check).
      
        The changes on this series are:
      
         - can.rst: fix a footnote reference;
      
         - crypto_engine.rst: Fix two parsing warnings;
      
         - Fix a lot of broken references to Documentation/*;
      
         - improve the scripts/documentation-file-ref-check script, in order
           to help detecting/fixing broken references, preventing
           false-positives.
      
        After this patch series, only 33 broken references to doc files are
        detected by scripts/documentation-file-ref-check"
      
      * tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
        fix a series of Documentation/ broken file name references
        Documentation: rstFlatTable.py: fix a broken reference
        ABI: sysfs-devices-system-cpu: remove a broken reference
        devicetree: fix a series of wrong file references
        devicetree: fix name of pinctrl-bindings.txt
        devicetree: fix some bindings file names
        MAINTAINERS: fix location of DT npcm files
        MAINTAINERS: fix location of some display DT bindings
        kernel-parameters.txt: fix pointers to sound parameters
        bindings: nvmem/zii: Fix location of nvmem.txt
        docs: Fix more broken references
        scripts/documentation-file-ref-check: check tools/*/Documentation
        scripts/documentation-file-ref-check: get rid of false-positives
        scripts/documentation-file-ref-check: hint: dash or underline
        scripts/documentation-file-ref-check: add a fix logic for DT
        scripts/documentation-file-ref-check: accept more wildcards at filenames
        scripts/documentation-file-ref-check: fix help message
        media: max2175: fix location of driver's companion documentation
        media: v4l: fix broken video4linux docs locations
        media: dvb: point to the location of the old README.dvb-usb file
        ...
      5e7b9212
    • L
      Merge tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · dbb2816f
      Linus Torvalds 提交于
      Pull fsnotify updates from Jan Kara:
       "fsnotify cleanups unifying handling of different watch types.
      
        This is the shortened fsnotify series from Amir with the last five
        patches pulled out. Amir has modified those patches to not change
        struct inode but obviously it's too late for those to go into this
        merge window"
      
      * tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        fsnotify: add fsnotify_add_inode_mark() wrappers
        fanotify: generalize fanotify_should_send_event()
        fsnotify: generalize send_to_group()
        fsnotify: generalize iteration of marks by object type
        fsnotify: introduce marks iteration helpers
        fsnotify: remove redundant arguments to handle_event()
        fsnotify: use type id to identify connector object type
      dbb2816f
    • L
      Merge tag 'fbdev-v4.18' of git://github.com/bzolnier/linux · 644f2639
      Linus Torvalds 提交于
      Pull fbdev updates from Bartlomiej Zolnierkiewicz:
       "There is nothing really major here, few small fixes, some cleanups and
        dead drivers removal:
      
         - mark omapfb drivers as orphans in MAINTAINERS file (Tomi Valkeinen)
      
         - add missing module license tags to omap/omapfb driver (Arnd
           Bergmann)
      
         - add missing GPIOLIB dependendy to omap2/omapfb driver (Arnd
           Bergmann)
      
         - convert savagefb, aty128fb & radeonfb drivers to use msleep & co.
           (Jia-Ju Bai)
      
         - allow COMPILE_TEST build for viafb driver (media part was reviewed
           by media subsystem Maintainer)
      
         - remove unused MERAM support from sh_mobile_lcdcfb and shmob-drm
           drivers (drm parts were acked by shmob-drm driver Maintainer)
      
         - remove unused auo_k190xfb drivers
      
         - misc cleanups (Souptick Joarder, Wolfram Sang, Markus Elfring, Andy
           Shevchenko, Colin Ian King)"
      
      * tag 'fbdev-v4.18' of git://github.com/bzolnier/linux: (26 commits)
        fb_omap2: add gpiolib dependency
        video/omap: add module license tags
        MAINTAINERS: make omapfb orphan
        video: fbdev: pxafb: match_string() conversion fixup
        video: fbdev: nvidia: fix spelling mistake: "scaleing" -> "scaling"
        video: fbdev: fix spelling mistake: "frambuffer" -> "framebuffer"
        video: fbdev: pxafb: Convert to use match_string() helper
        video: fbdev: via: allow COMPILE_TEST build
        video: fbdev: remove unused sh_mobile_meram driver
        drm: shmobile: remove unused MERAM support
        video: fbdev: sh_mobile_lcdcfb: remove unused MERAM support
        video: fbdev: remove unused auo_k190xfb drivers
        video: omap: Improve a size determination in omapfb_do_probe()
        video: sm501fb: Improve a size determination in sm501fb_probe()
        video: fbdev-MMP: Improve a size determination in path_init()
        video: fbdev-MMP: Delete an error message for a failed memory allocation in two functions
        video: auo_k190x: Delete an error message for a failed memory allocation in auok190x_common_probe()
        video: sh_mobile_lcdcfb: Delete an error message for a failed memory allocation in two functions
        video: sh_mobile_meram: Delete an error message for a failed memory allocation in sh_mobile_meram_probe()
        video: fbdev: sh_mobile_meram: Drop SUPERH platform dependency
        ...
      644f2639
  6. 16 6月, 2018 15 次提交