1. 04 5月, 2017 12 次提交
    • B
      blk-mq: Do not invoke queue operations on a dead queue · 18d4d7d0
      Bart Van Assche 提交于
      In commit e869b546 ("blk-mq: Unregister debugfs attributes
      earlier"), we shuffled the debugfs cleanup around so that the "state"
      attribute was removed before we freed the blk-mq data structures.
      However, later changes are going to undo that, so we need to explicitly
      disallow running a dead queue.
      
      [Omar: rebased and updated commit message]
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      18d4d7d0
    • O
      blk-mq-debugfs: get rid of a bunch of boilerplate · f57de23a
      Omar Sandoval 提交于
      A large part of blk-mq-debugfs.c is file_operations and seq_file
      boilerplate. This sucks as is but will suck even more when schedulers
      can define their own debugfs entries. Factor it all out into a single
      blk_mq_debugfs_fops which multiplexes as needed. We store the
      request_queue, blk_mq_hw_ctx, or blk_mq_ctx in the parent directory
      dentry, which is kind of hacky, but it works.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f57de23a
    • O
      blk-mq-debugfs: rename hw queue directories from <n> to hctx<n> · 88aabbd7
      Omar Sandoval 提交于
      It's not clear what these numbered directories represent unless you
      consult the code. We're about to get rid of the intermediate "mq"
      directory, so these would be even more confusing without that context.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      88aabbd7
    • O
      blk-mq-debugfs: don't open code strstrip() · 71b90511
      Omar Sandoval 提交于
      Slightly more readable, plus we also strip leading spaces.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      71b90511
    • O
      blk-mq-debugfs: error on long write to queue "state" file · c7e4145a
      Omar Sandoval 提交于
      blk_queue_flags_store() currently truncates and returns a short write if
      the operation being written is too long. This can give us weird results,
      like here:
      
      $ echo "run            bar"
      echo: write error: invalid argument
      $ dmesg
      [ 1103.075435] blk_queue_flags_store: unsupported operation bar. Use either 'run' or 'start'
      
      Instead, return an error if the user does this. While we're here, make
      the argument names consistent with everywhere else in this file.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c7e4145a
    • O
      blk-mq-debugfs: clean up flag definitions · 1a435111
      Omar Sandoval 提交于
      Make sure the spelled out flag names match the definition. This also
      adds a missing hctx state, BLK_MQ_S_START_ON_RUN, and a missing
      cmd_flag, __REQ_NOUNMAP.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1a435111
    • O
      blk-mq-debugfs: separate flags with | · bec03d6b
      Omar Sandoval 提交于
      This reads more naturally than spaces.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bec03d6b
    • J
      nfs: Fix bdi handling for cloned superblocks · 9052c7cf
      Jan Kara 提交于
      In commit 0d3b12584972 "nfs: Convert to separately allocated bdi" I have
      wrongly cloned bdi reference in nfs_clone_super(). Further inspection
      has shown that originally the code was actually allocating a new bdi (in
      ->clone_server callback) which was later registered in
      nfs_fs_mount_common() and used for sb->s_bdi in nfs_initialise_sb().
      This could later result in bdi for the original superblock not getting
      unregistered when that superblock got shutdown (as the cloned sb still
      held bdi reference) and later when a new superblock was created under
      the same anonymous device number, a clash in sysfs has happened on bdi
      registration:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 10284 at /linux-next/fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x74
      sysfs: cannot create duplicate filename '/devices/virtual/bdi/0:32'
      Modules linked in: axp20x_usb_power gpio_axp209 nvmem_sunxi_sid sun4i_dma sun4i_ss virt_dma
      CPU: 1 PID: 10284 Comm: mount.nfs Not tainted 4.11.0-rc4+ #14
      Hardware name: Allwinner sun7i (A20) Family
      [<c010f19c>] (unwind_backtrace) from [<c010bc74>] (show_stack+0x10/0x14)
      [<c010bc74>] (show_stack) from [<c03c6e24>] (dump_stack+0x78/0x8c)
      [<c03c6e24>] (dump_stack) from [<c0122200>] (__warn+0xe8/0x100)
      [<c0122200>] (__warn) from [<c0122250>] (warn_slowpath_fmt+0x38/0x48)
      [<c0122250>] (warn_slowpath_fmt) from [<c02ac178>] (sysfs_warn_dup+0x64/0x74)
      [<c02ac178>] (sysfs_warn_dup) from [<c02ac254>] (sysfs_create_dir_ns+0x84/0x94)
      [<c02ac254>] (sysfs_create_dir_ns) from [<c03c8b8c>] (kobject_add_internal+0x9c/0x2ec)
      [<c03c8b8c>] (kobject_add_internal) from [<c03c8e24>] (kobject_add+0x48/0x98)
      [<c03c8e24>] (kobject_add) from [<c048d75c>] (device_add+0xe4/0x5a0)
      [<c048d75c>] (device_add) from [<c048ddb4>] (device_create_groups_vargs+0xac/0xbc)
      [<c048ddb4>] (device_create_groups_vargs) from [<c048dde4>] (device_create_vargs+0x20/0x28)
      [<c048dde4>] (device_create_vargs) from [<c02075c8>] (bdi_register_va+0x44/0xfc)
      [<c02075c8>] (bdi_register_va) from [<c023d378>] (super_setup_bdi_name+0x48/0xa4)
      [<c023d378>] (super_setup_bdi_name) from [<c0312ef4>] (nfs_fill_super+0x1a4/0x204)
      [<c0312ef4>] (nfs_fill_super) from [<c03133f0>] (nfs_fs_mount_common+0x140/0x1e8)
      [<c03133f0>] (nfs_fs_mount_common) from [<c03335cc>] (nfs4_remote_mount+0x50/0x58)
      [<c03335cc>] (nfs4_remote_mount) from [<c023ef98>] (mount_fs+0x14/0xa4)
      [<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128)
      [<c025cba0>] (vfs_kern_mount) from [<c033352c>] (nfs_do_root_mount+0x80/0xa0)
      [<c033352c>] (nfs_do_root_mount) from [<c0333818>] (nfs4_try_mount+0x28/0x3c)
      [<c0333818>] (nfs4_try_mount) from [<c0313874>] (nfs_fs_mount+0x2cc/0x8c4)
      [<c0313874>] (nfs_fs_mount) from [<c023ef98>] (mount_fs+0x14/0xa4)
      [<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128)
      [<c025cba0>] (vfs_kern_mount) from [<c02600f0>] (do_mount+0x158/0xc7c)
      [<c02600f0>] (do_mount) from [<c0260f98>] (SyS_mount+0x8c/0xb4)
      [<c0260f98>] (SyS_mount) from [<c0107840>] (ret_fast_syscall+0x0/0x3c)
      
      Fix the problem by always creating new bdi for a superblock as we used
      to do.
      Reported-and-tested-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
      Fixes: 0d3b12584972ce5781179ad3f15cca3cdb5cae05
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9052c7cf
    • P
      block/mq: Cure cpu hotplug lock inversion · eabe0659
      Peter Zijlstra 提交于
      By poking at /debug/sched_features I triggered the following splat:
      
       [] ======================================================
       [] WARNING: possible circular locking dependency detected
       [] 4.11.0-00873-g964c8b7-dirty #694 Not tainted
       [] ------------------------------------------------------
       [] bash/2109 is trying to acquire lock:
       []  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff8120cb8b>] static_key_slow_dec+0x1b/0x50
       []
       [] but task is already holding lock:
       []  (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170
       []
       [] which lock already depends on the new lock.
       []
       []
       [] the existing dependency chain (in reverse order) is:
       []
       [] -> #2 (&sb->s_type->i_mutex_key#4){+++++.}:
       []        lock_acquire+0x100/0x210
       []        down_write+0x28/0x60
       []        start_creating+0x5e/0xf0
       []        debugfs_create_dir+0x13/0x110
       []        blk_mq_debugfs_register+0x21/0x70
       []        blk_mq_register_dev+0x64/0xd0
       []        blk_register_queue+0x6a/0x170
       []        device_add_disk+0x22d/0x440
       []        loop_add+0x1f3/0x280
       []        loop_init+0x104/0x142
       []        do_one_initcall+0x43/0x180
       []        kernel_init_freeable+0x1de/0x266
       []        kernel_init+0xe/0x100
       []        ret_from_fork+0x31/0x40
       []
       [] -> #1 (all_q_mutex){+.+.+.}:
       []        lock_acquire+0x100/0x210
       []        __mutex_lock+0x6c/0x960
       []        mutex_lock_nested+0x1b/0x20
       []        blk_mq_init_allocated_queue+0x37c/0x4e0
       []        blk_mq_init_queue+0x3a/0x60
       []        loop_add+0xe5/0x280
       []        loop_init+0x104/0x142
       []        do_one_initcall+0x43/0x180
       []        kernel_init_freeable+0x1de/0x266
       []        kernel_init+0xe/0x100
       []        ret_from_fork+0x31/0x40
      
       []  *** DEADLOCK ***
       []
       [] 3 locks held by bash/2109:
       []  #0:  (sb_writers#11){.+.+.+}, at: [<ffffffff81292bcd>] vfs_write+0x17d/0x1a0
       []  #1:  (debugfs_srcu){......}, at: [<ffffffff8155a90d>] full_proxy_write+0x5d/0xd0
       []  #2:  (&sb->s_type->i_mutex_key#4){+++++.}, at: [<ffffffff81140216>] sched_feat_write+0x86/0x170
       []
       [] stack backtrace:
       [] CPU: 9 PID: 2109 Comm: bash Not tainted 4.11.0-00873-g964c8b7-dirty #694
       [] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
       [] Call Trace:
      
       []  lock_acquire+0x100/0x210
       []  get_online_cpus+0x2a/0x90
       []  static_key_slow_dec+0x1b/0x50
       []  static_key_disable+0x20/0x30
       []  sched_feat_write+0x131/0x170
       []  full_proxy_write+0x97/0xd0
       []  __vfs_write+0x28/0x120
       []  vfs_write+0xb5/0x1a0
       []  SyS_write+0x49/0xa0
       []  entry_SYSCALL_64_fastpath+0x23/0xc2
      
      This is because of the cpu hotplug lock rework. Break the chain at #1
      by reversing the lock acquisition order. This way i_mutex_key#4 no
      longer depends on cpu_hotplug_lock and things are good.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      eabe0659
    • J
      lightnvm: fix bad back free on error path · 507f7d68
      Javier González 提交于
      Free memory correctly when an allocation fails on a loop and we free
      backwards previously successful allocations.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Reviewed-by: NMatias Bjørling <matias@cnexlabs.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      507f7d68
    • J
      lightnvm: create cmd before allocating request · 2e13f33a
      Javier González 提交于
      Create nvme command before allocating a request using
      nvme_alloc_request, which uses the command direction. Up until now, the
      command has been zeroized, so all commands have been allocated as a
      read operation.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Reviewed-by: NMatias Bjørling <matias@cnexlabs.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2e13f33a
    • J
      blk-mq: don't use sync workqueue flushing from drivers · 2719aa21
      Jens Axboe 提交于
      A previous commit introduced the sync flush, which we need from
      internal callers like blk_mq_quiesce_queue(). However, we also
      call the stop helpers from drivers, particularly from ->queue_rq()
      when we have to stop processing for a bit. We can't block from
      those locations, and we don't have to guarantee that we're
      fully flushed.
      
      Fixes: 9f993737 ("blk-mq: unify hctx delayed_run_work and run_work")
      Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2719aa21
  2. 03 5月, 2017 3 次提交
  3. 02 5月, 2017 10 次提交
    • C
      blk-mq: update ->init_request and ->exit_request prototypes · d6296d39
      Christoph Hellwig 提交于
      Remove the request_idx parameter, which can't be used safely now that we
      support I/O schedulers with blk-mq.  Except for a superflous check in
      mtip32xx it was unused anyway.
      
      Also pass the tag_set instead of just the driver data - this allows drivers
      to avoid some code duplication in a follow on cleanup.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      d6296d39
    • J
      Revert "mtip32xx: pass BLK_MQ_F_NO_SCHED" · a800ce8b
      Jens Axboe 提交于
      This reverts commit 4981d04d.
      
      The driver has been converted to using the proper infrastructure
      for issuing internal commands. This means it's now safe to use with
      the scheduling infrastruture, so we can now revert the change
      that turned off scheduling for mtip32xx.
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a800ce8b
    • J
      blk-mq-sched: remove hack that bypasses scheduler for reserved requests · 9f2779bf
      Jens Axboe 提交于
      We have update the troublesome driver (mtip32xx) to deal with this
      appropriately. So kill the hack that bypassed scheduler allocation
      and insertion for reserved requests.
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9f2779bf
    • J
      mtip32xx: convert internal command issue to block IO path · 3f5e6a35
      Jens Axboe 提交于
      The driver special cases certain things for command issue, depending
      on whether it's an internal command or not. Make the internal commands
      use the regular infrastructure for issuing IO.
      
      Since this is an 8-group souped up AHCI variant, we have to deal
      with NCQ vs non-queueable commands. Do this from the queue_rq
      handler, by backing off unless the drive is idle.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      3f5e6a35
    • J
      mtip32xx: abstract out "are any commands active" helper · baed548a
      Jens Axboe 提交于
      This is a prep patch for backoff in ->queue_rq() for non-ncq commands.
      Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      baed548a
    • J
      mtip32xx: kill atomic argument to mtip_quiesce_io() · 8afdd94c
      Jens Axboe 提交于
      All callers now pass in GFP_KERNEL, get rid of the argument.
      Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8afdd94c
    • J
      mtip32xx: get rid of 'atomic' argument to mtip_exec_internal_command() · 0f6422a2
      Jens Axboe 提交于
      All callers can safely block. Kill the atomic/block argument, and
      remove the argument from all callers.
      Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0f6422a2
    • B
      block: Remove elevator_change() · c0332694
      Bart Van Assche 提交于
      Since commit 84253394 ("remove the mg_disk driver") removed the
      only caller of elevator_change(), also remove the elevator_change()
      function itself.
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c0332694
    • L
      Merge branch 'for-4.12/post-merge' of git://git.kernel.dk/linux-block · 08c521a2
      Linus Torvalds 提交于
      Pull second round of block layer updates from Jens Axboe:
      
       - Further fixups to the NVMe APST code, from Andy.
      
       - Various fixes for (mostly) nvme-fc, from Christoph and James.
      
       - NVMe scsi fixes from Jon and Christoph.
      
      * 'for-4.12/post-merge' of git://git.kernel.dk/linux-block: (39 commits)
        nvme-scsi: remove nvme_trans_security_protocol
        nvme-lightnvm: add missing endianess conversion in nvme_nvm_end_io
        nvme-scsi: Consider LBA format in IO splitting calculation
        nvme-fc: avoid memory corruption caused by calling nvmf_free_options() twice
        lpfc: Fix memory corruption of the lpfc_ncmd->list pointers
        nvme: Add nvme_core.force_apst to ignore the NO_APST quirk
        nvme: Display raw APST configuration via DYNAMIC_DEBUG
        nvme: Fix APST comment
        lpfc revison 11.2.0.12
        Fix Express lane queue creation.
        Update ABORT processing for NVMET.
        Fix implicit logo and RSCN handling for NVMET
        Add Fabric assigned WWN support.
        Fix max_sgl_segments settings for NVME / NVMET
        Fix crash after issuing lip reset
        Fix driver load issues when MRQ=8
        Remove hba lock from NVMET issue WQE.
        Fix nvme initiator handling when not enabled.
        Fix driver usage of 128B WQEs when WQ_CREATE is V1.
        Fix driver unload/reload operation.
        ...
      08c521a2
    • L
      Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block · 69475292
      Linus Torvalds 提交于
      Pull block layer updates from Jens Axboe:
      
       - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
         was initially a fork of CFQ, but subsequently changed to implement
         fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
         to be used on desktop type single drives, providing good fairness.
         From Paolo.
      
       - Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
         using a scalable token based algorithm that throttles IO based on
         live completion IO stats, similary to blk-wbt. From Omar.
      
       - A series from Jan, moving users to separately allocated backing
         devices. This continues the work of separating backing device life
         times, solving various problems with hot removal.
      
       - A series of updates for lightnvm, mostly from Javier. Includes a
         'pblk' target that exposes an open channel SSD as a physical block
         device.
      
       - A series of fixes and improvements for nbd from Josef.
      
       - A series from Omar, removing queue sharing between devices on mostly
         legacy drivers. This helps us clean up other bits, if we know that a
         queue only has a single device backing. This has been overdue for
         more than a decade.
      
       - Fixes for the blk-stats, and improvements to unify the stats and user
         windows. This both improves blk-wbt, and enables other users to
         register a need to receive IO stats for a device. From Omar.
      
       - blk-throttle improvements from Shaohua. This provides a scalable
         framework for implementing scalable priotization - particularly for
         blk-mq, but applicable to any type of block device. The interface is
         marked experimental for now.
      
       - Bucketized IO stats for IO polling from Stephen Bates. This improves
         efficiency of polled workloads in the presence of mixed block size
         IO.
      
       - A few fixes for opal, from Scott.
      
       - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
         From a variety of folks, mostly Sagi and James Smart.
      
       - A series from Bart, improving our exposed info and capabilities from
         the blk-mq debugfs support.
      
       - A series from Christoph, cleaning up how handle WRITE_ZEROES.
      
       - A series from Christoph, cleaning up the block layer handling of how
         we track errors in a request. On top of being a nice cleanup, it also
         shrinks the size of struct request a bit.
      
       - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
         never used by platforms, and the latter has outlived it's usefulness.
      
       - Various little bug fixes and cleanups from a wide variety of folks.
      
      * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
        block: hide badblocks attribute by default
        blk-mq: unify hctx delay_work and run_work
        block: add kblock_mod_delayed_work_on()
        blk-mq: unify hctx delayed_run_work and run_work
        nbd: fix use after free on module unload
        MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
        blk-mq-sched: alloate reserved tags out of normal pool
        mtip32xx: use runtime tag to initialize command header
        scsi: Implement blk_mq_ops.show_rq()
        blk-mq: Add blk_mq_ops.show_rq()
        blk-mq: Show operation, cmd_flags and rq_flags names
        blk-mq: Make blk_flags_show() callers append a newline character
        blk-mq: Move the "state" debugfs attribute one level down
        blk-mq: Unregister debugfs attributes earlier
        blk-mq: Only unregister hctxs for which registration succeeded
        blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
        blk-mq: Let blk_mq_debugfs_register() look up the queue name
        blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
        ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
        ide-pm: always pass 0 error to __blk_end_request_all
        ..
      69475292
  4. 01 5月, 2017 3 次提交
  5. 30 4月, 2017 3 次提交
  6. 29 4月, 2017 9 次提交
    • L
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 0060e79a
      Linus Torvalds 提交于
      Pull clk fix from Stephen Boyd:
       "One odd config build fix for a recent Allwinner clock driver change
        that got merged. The common code called code in another file that
        wasn't always built. This just forces it on so people don't run into
        this bad configuration"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: sunxi-ng: always select CCU_GATE
      0060e79a
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0e911788
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Just a couple more stragglers, I really hope this is it.
      
        1) Don't let frags slip down into the GRO segmentation handlers, from
           Steffen Klassert.
      
        2) Truesize under-estimation triggers warnings in TCP over loopback
           with socket filters, 2 part fix from Eric Dumazet.
      
        3) Fix undesirable reset of bonding MTU to ETH_HLEN on slave removal,
           from Paolo Abeni.
      
        4) If we flush the XFRM policy after garbage collection, it doesn't
           work because stray entries can be created afterwards. Fix from Xin
           Long.
      
        5) Hung socket connection fixes in TIPC from Parthasarathy Bhuvaragan.
      
        6) Fix GRO regression with IPSEC when netfilter is disabled, from
           Sabrina Dubroca.
      
        7) Fix cpsw driver Kconfig dependency regression, from Arnd Bergmann"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: hso: register netdev later to avoid a race condition
        net: adjust skb->truesize in ___pskb_trim()
        tcp: do not underestimate skb->truesize in tcp_trim_head()
        bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
        ipv4: Don't pass IP fragments to upper layer GRO handlers.
        cpsw/netcp: refine cpts dependency
        tipc: close the connection if protocol messages contain errors
        tipc: improve error validations for sockets in CONNECTING state
        tipc: Fix missing connection request handling
        xfrm: fix GRO for !CONFIG_NETFILTER
        xfrm: do the garbage collection after flushing policy
      0e911788
    • A
      net: hso: register netdev later to avoid a race condition · 4c761daf
      Andreas Kemnade 提交于
      If the netdev is accessed before the urbs are initialized,
      there will be NULL pointer dereferences. That is avoided by
      registering it when it is fully initialized.
      
      This case occurs e.g. if dhcpcd is running in the background
      and the device is probed, either after insmod hso or
      when the device appears on the usb bus.
      
      A backtrace is the following:
      
      [ 1357.356048] usb 1-2: new high-speed USB device number 12 using ehci-omap
      [ 1357.551177] usb 1-2: New USB device found, idVendor=0af0, idProduct=8800
      [ 1357.558654] usb 1-2: New USB device strings: Mfr=3, Product=2, SerialNumber=0
      [ 1357.568572] usb 1-2: Product: Globetrotter HSUPA Modem
      [ 1357.574096] usb 1-2: Manufacturer: Option N.V.
      [ 1357.685882] hso 1-2:1.5: Not our interface
      [ 1460.886352] hso: unloaded
      [ 1460.889984] usbcore: deregistering interface driver hso
      [ 1513.769134] hso: ../drivers/net/usb/hso.c: Option Wireless
      [ 1513.846771] Unable to handle kernel NULL pointer dereference at virtual address 00000030
      [ 1513.887664] hso 1-2:1.5: Not our interface
      [ 1513.906890] usbcore: registered new interface driver hso
      [ 1513.937988] pgd = ecdec000
      [ 1513.949890] [00000030] *pgd=acd15831, *pte=00000000, *ppte=00000000
      [ 1513.956573] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
      [ 1513.962371] Modules linked in: hso usb_f_ecm omap2430 bnep bluetooth g_ether usb_f_rndis u_ether libcomposite configfs ipv6 arc4 wl18xx wlcore mac80211 cfg80211 bq27xxx_battery panel_tpo_td028ttec1 omapdrm drm_kms_helper cfbfillrect snd_soc_simple_card syscopyarea cfbimgblt snd_soc_simple_card_utils sysfillrect sysimgblt fb_sys_fops snd_soc_omap_twl4030 cfbcopyarea encoder_opa362 drm twl4030_madc_hwmon wwan_on_off snd_soc_gtm601 pwm_omap_dmtimer generic_adc_battery connector_analog_tv pwm_bl extcon_gpio omap3_isp wlcore_sdio videobuf2_dma_contig videobuf2_memops w1_bq27000 videobuf2_v4l2 videobuf2_core omap_hdq snd_soc_omap_mcbsp ov9650 snd_soc_omap bmp280_i2c bmg160_i2c v4l2_common snd_pcm_dmaengine bmp280 bmg160_core at24 bmc150_magn_i2c nvmem_core videodev phy_twl4030_usb bmc150_accel_i2c tsc2007
      [ 1514.037384]  bmc150_magn bmc150_accel_core media leds_tca6507 bno055 industrialio_triggered_buffer kfifo_buf gpio_twl4030 musb_hdrc snd_soc_twl4030 twl4030_vibra twl4030_madc twl4030_pwrbutton twl4030_charger industrialio w2sg0004 ehci_omap omapdss [last unloaded: hso]
      [ 1514.062622] CPU: 0 PID: 3433 Comm: dhcpcd Tainted: G        W       4.11.0-rc8-letux+ #1
      [ 1514.071136] Hardware name: Generic OMAP36xx (Flattened Device Tree)
      [ 1514.077758] task: ee748240 task.stack: ecdd6000
      [ 1514.082580] PC is at hso_start_net_device+0x50/0xc0 [hso]
      [ 1514.088287] LR is at hso_net_open+0x68/0x84 [hso]
      [ 1514.093231] pc : [<bf79c304>]    lr : [<bf79ced8>]    psr: a00f0013
      sp : ecdd7e20  ip : 00000000  fp : ffffffff
      [ 1514.105316] r10: 00000000  r9 : ed0e080c  r8 : ecd8fe2c
      [ 1514.110839] r7 : bf79cef4  r6 : ecd8fe00  r5 : 00000000  r4 : ed0dbd80
      [ 1514.117706] r3 : 00000000  r2 : c0020c80  r1 : 00000000  r0 : ecdb7800
      [ 1514.124572] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [ 1514.132110] Control: 10c5387d  Table: acdec019  DAC: 00000051
      [ 1514.138153] Process dhcpcd (pid: 3433, stack limit = 0xecdd6218)
      [ 1514.144470] Stack: (0xecdd7e20 to 0xecdd8000)
      [ 1514.149078] 7e20: ed0dbd80 ecd8fe98 00000001 00000000 ecd8f800 ecd8fe00 ecd8fe60 00000000
      [ 1514.157714] 7e40: ed0e080c bf79ced8 bf79ce70 ecd8f800 00000001 bf7a0258 ecd8f830 c068d958
      [ 1514.166320] 7e60: c068d8b8 ecd8f800 00000001 00001091 00001090 c068dba4 ecd8f800 00001090
      [ 1514.174926] 7e80: ecd8f940 ecd8f800 00000000 c068dc60 00000000 00000001 ed0e0800 ecd8f800
      [ 1514.183563] 7ea0: 00000000 c06feaa8 c0ca39c2 beea57dc 00000020 00000000 306f7368 00000000
      [ 1514.192169] 7ec0: 00000000 00000000 00001091 00000000 00000000 00000000 00000000 00008914
      [ 1514.200805] 7ee0: eaa9ab60 beea57dc c0c9bfc0 eaa9ab40 00000006 00000000 00046858 c066a948
      [ 1514.209411] 7f00: beea57dc eaa9ab60 ecc6b0c0 c02837b0 00000006 c0282c90 0000c000 c0283654
      [ 1514.218017] 7f20: c09b0c00 c098bc31 00000001 c0c5e513 c0c5e513 00000000 c0151354 c01a20c0
      [ 1514.226654] 7f40: c0c5e513 c01a3134 ecdd6000 c01a3160 ee7487f0 600f0013 00000000 ee748240
      [ 1514.235260] 7f60: ee748734 00000000 ecc6b0c0 ecc6b0c0 beea57dc 00008914 00000006 00000000
      [ 1514.243896] 7f80: 00046858 c02837b0 00001091 0003a1f0 00046608 0003a248 00000036 c01071e4
      [ 1514.252502] 7fa0: ecdd6000 c0107040 0003a1f0 00046608 00000006 00008914 beea57dc 00001091
      [ 1514.261108] 7fc0: 0003a1f0 00046608 0003a248 00000036 0003ac0c 00046608 00046610 00046858
      [ 1514.269744] 7fe0: 0003a0ac beea57d4 000167eb b6f23106 400f0030 00000006 00000000 00000000
      [ 1514.278411] [<bf79c304>] (hso_start_net_device [hso]) from [<bf79ced8>] (hso_net_open+0x68/0x84 [hso])
      [ 1514.288238] [<bf79ced8>] (hso_net_open [hso]) from [<c068d958>] (__dev_open+0xa0/0xf4)
      [ 1514.296600] [<c068d958>] (__dev_open) from [<c068dba4>] (__dev_change_flags+0x8c/0x130)
      [ 1514.305023] [<c068dba4>] (__dev_change_flags) from [<c068dc60>] (dev_change_flags+0x18/0x48)
      [ 1514.313934] [<c068dc60>] (dev_change_flags) from [<c06feaa8>] (devinet_ioctl+0x348/0x714)
      [ 1514.322540] [<c06feaa8>] (devinet_ioctl) from [<c066a948>] (sock_ioctl+0x2b0/0x308)
      [ 1514.330627] [<c066a948>] (sock_ioctl) from [<c0282c90>] (vfs_ioctl+0x20/0x34)
      [ 1514.338165] [<c0282c90>] (vfs_ioctl) from [<c0283654>] (do_vfs_ioctl+0x82c/0x93c)
      [ 1514.346038] [<c0283654>] (do_vfs_ioctl) from [<c02837b0>] (SyS_ioctl+0x4c/0x74)
      [ 1514.353759] [<c02837b0>] (SyS_ioctl) from [<c0107040>] (ret_fast_syscall+0x0/0x1c)
      [ 1514.361755] Code: e3822103 e3822080 e1822781 e5981014 (e5832030)
      [ 1514.510833] ---[ end trace dfb3e53c657f34a0 ]---
      Reported-by: NH. Nikolaus Schaller <hns@goldelico.com>
      Signed-off-by: NAndreas Kemnade <andreas@kemnade.info>
      Reviewed-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c761daf
    • E
      net: adjust skb->truesize in ___pskb_trim() · c21b48cc
      Eric Dumazet 提交于
      Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
      skb_try_coalesce() using syzkaller and a filter attached to a TCP
      socket.
      
      As we did recently in commit 158f323b ("net: adjust skb->truesize in
      pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
      via a call to skb_condense().
      
      If all frags were freed, then skb->truesize can be recomputed.
      
      This call can be done if skb is not yet owned, or destructor is
      sock_edemux().
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c21b48cc
    • E
      tcp: do not underestimate skb->truesize in tcp_trim_head() · 7162fb24
      Eric Dumazet 提交于
      Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
      skb_try_coalesce() using syzkaller and a filter attached to a TCP
      socket over loopback interface.
      
      I believe one issue with looped skbs is that tcp_trim_head() can end up
      producing skb with under estimated truesize.
      
      It hardly matters for normal conditions, since packets sent over
      loopback are never truncated.
      
      Bytes trimmed from skb->head should not change skb truesize, since
      skb->head is not reallocated.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7162fb24
    • P
      bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal · 19cdead3
      Paolo Abeni 提交于
      On slave list updates, the bonding driver computes its hard_header_len
      as the maximum of all enslaved devices's hard_header_len.
      If the slave list is empty, e.g. on last enslaved device removal,
      ETH_HLEN is used.
      
      Since the bonding header_ops are set only when the first enslaved
      device is attached, the above can lead to header_ops->create()
      being called with the wrong skb headroom in place.
      
      If bond0 is configured on top of ipoib devices, with the
      following commands:
      
      ifup bond0
      for slave in $BOND_SLAVES_LIST; do
      	ip link set dev $slave nomaster
      done
      ping -c 1 <ip on bond0 subnet>
      
      we will obtain a skb_under_panic() with a similar call trace:
      	skb_push+0x3d/0x40
      	push_pseudo_header+0x17/0x30 [ib_ipoib]
      	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
      	arp_create+0x12f/0x220
      	arp_send_dst.part.19+0x28/0x50
      	arp_solicit+0x115/0x290
      	neigh_probe+0x4d/0x70
      	__neigh_event_send+0xa7/0x230
      	neigh_resolve_output+0x12e/0x1c0
      	ip_finish_output2+0x14b/0x390
      	ip_finish_output+0x136/0x1e0
      	ip_output+0x76/0xe0
      	ip_local_out+0x35/0x40
      	ip_send_skb+0x19/0x40
      	ip_push_pending_frames+0x33/0x40
      	raw_sendmsg+0x7d3/0xb50
      	inet_sendmsg+0x31/0xb0
      	sock_sendmsg+0x38/0x50
      	SYSC_sendto+0x102/0x190
      	SyS_sendto+0xe/0x10
      	do_syscall_64+0x67/0x180
      	entry_SYSCALL64_slow_path+0x25/0x25
      
      This change addresses the issue avoiding updating the bonding device
      hard_header_len when the slaves list become empty, forbidding to
      shrink it below the value used by header_ops->create().
      
      The bug is there since commit 54ef3137 ("[PATCH] bonding: Handle large
      hard_header_len") but the panic can be triggered only since
      commit fc791b63 ("IB/ipoib: move back IB LL address into the hard
      header").
      Reported-by: NNorbert P <noe@physik.uzh.ch>
      Fixes: 54ef3137 ("[PATCH] bonding: Handle large hard_header_len")
      Fixes: fc791b63 ("IB/ipoib: move back IB LL address into the hard header")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19cdead3
    • S
      ipv4: Don't pass IP fragments to upper layer GRO handlers. · 9b83e031
      Steffen Klassert 提交于
      Upper layer GRO handlers can not handle IP fragments, so
      exit GRO processing in this case.
      
      This fixes ESP GRO because the packet must be reassembled
      before we can decapsulate, otherwise we get authentication
      failures.
      
      It also aligns IPv4 to IPv6 where packets with fragmentation
      headers are not passed to upper layer GRO handlers.
      
      Fixes: 7785bba2 ("esp: Add a software GRO codepath")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b83e031
    • A
      cpsw/netcp: refine cpts dependency · 504926df
      Arnd Bergmann 提交于
      Tony Lindgren reports a kernel oops that resulted from my compile-time
      fix on the default config. This shows two problems:
      
      a) configurations that did not already enable PTP_1588_CLOCK will
         now miss the cpts driver
      
      b) when cpts support is disabled, the driver crashes. This is a
         preexisting problem that we did not notice before my patch.
      
      While the second problem is still being investigated, this modifies
      the dependencies again, getting us back to the original state, with
      another 'select NET_PTP_CLASSIFY' added in to avoid the original
      link error we got, and the 'depends on POSIX_TIMERS' to hide
      the CPTS support when turning it on would be useless.
      
      Cc: stable@vger.kernel.org # 4.11 needs this
      Fixes: 07fef362 ("cpsw/netcp: cpts depends on posix_timers")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Tested-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      504926df
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 5577e679
      David S. Miller 提交于
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2017-04-28
      
      1) Do garbage collecting after a policy flush to remove old
         bundles immediately. From Xin Long.
      
      2) Fix GRO if netfilter is not defined.
         From Sabrina Dubroca.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5577e679