1. 04 5月, 2017 1 次提交
    • O
      blk-mq: untangle debugfs and sysfs · 9c1051aa
      Omar Sandoval 提交于
      Originally, I tied debugfs registration/unregistration together with
      sysfs. There's no reason to do this, and it's getting in the way of
      letting schedulers define their own debugfs attributes. Instead, tie the
      debugfs registration to the lifetime of the structures themselves.
      
      The saner lifetimes mean we can also get rid of the extra mq directory
      and move everything one level up. I.e., nvme0n1/mq/hctx0/tags is now
      just nvme0n1/hctx0/tags.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9c1051aa
  2. 02 5月, 2017 2 次提交
  3. 28 4月, 2017 3 次提交
  4. 27 4月, 2017 3 次提交
    • D
      uapi: change the type of struct statx_timestamp.tv_nsec to unsigned · 1741937d
      Dmitry V. Levin 提交于
      The comment asserting that the value of struct statx_timestamp.tv_nsec
      must be negative when statx_timestamp.tv_sec is negative, is wrong, as
      could be seen from the following example:
      
      	#define _FILE_OFFSET_BITS 64
      	#include <assert.h>
      	#include <fcntl.h>
      	#include <stdio.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <asm/unistd.h>
      	#include <linux/stat.h>
      
      	int main(void)
      	{
      		static const struct timespec ts[2] = {
      			{ .tv_nsec = UTIME_OMIT },
      			{ .tv_sec = -2, .tv_nsec = 42 }
      		};
      		assert(utimensat(AT_FDCWD, ".", ts, 0) == 0);
      
      		struct stat st;
      		assert(stat(".", &st) == 0);
      		printf("st_mtim.tv_sec = %lld, st_mtim.tv_nsec = %lu\n",
      		       (long long) st.st_mtim.tv_sec,
      		       (unsigned long) st.st_mtim.tv_nsec);
      
      		struct statx stx;
      		assert(syscall(__NR_statx, AT_FDCWD, ".", 0, 0, &stx) == 0);
      		printf("stx_mtime.tv_sec = %lld, stx_mtime.tv_nsec = %lu\n",
      		       (long long) stx.stx_mtime.tv_sec,
      		       (unsigned long) stx.stx_mtime.tv_nsec);
      
      		return 0;
      	}
      
      It expectedly prints:
      st_mtim.tv_sec = -2, st_mtim.tv_nsec = 42
      stx_mtime.tv_sec = -2, stx_mtime.tv_nsec = 42
      
      The more generic comment asserting that the value of struct
      statx_timestamp.tv_nsec might be negative is confusing to say the least.
      
      It contradicts both the struct stat.st_[acm]time_nsec tradition and
      struct timespec.tv_nsec requirements in utimensat syscall.
      If statx syscall ever returns a stx_[acm]time containing a negative
      tv_nsec that cannot be passed unmodified to utimensat syscall,
      it will cause an immense confusion.
      
      Fix this source of confusion by changing the type of struct
      statx_timestamp.tv_nsec from __s32 to __u32.
      
      Fixes: a528d35e ("statx: Add a system call to make enhanced file info available")
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: linux-api@vger.kernel.org
      cc: mtk.manpages@gmail.com
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1741937d
    • B
      blk-mq: Add blk_mq_ops.show_rq() · 2836ee4b
      Bart Van Assche 提交于
      This new callback function will be used in the next patch to show
      more information about SCSI requests.
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2836ee4b
    • A
      net: phy: fix auto-negotiation stall due to unavailable interrupt · f555f34f
      Alexander Kochetkov 提交于
      The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet
      cable was plugged before the Ethernet interface was brought up.
      
      The patch trigger PHY state machine to update link state if PHY was requested to
      do auto-negotiation and auto-negotiation complete flag already set.
      
      During power-up cycle the PHY do auto-negotiation, generate interrupt and set
      auto-negotiation complete flag. Interrupt is handled by PHY state machine but
      doesn't update link state because PHY is in PHY_READY state. After some time
      MAC bring up, start and request PHY to do auto-negotiation. If there are no new
      settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation.
      PHY continue to stay in auto-negotiation complete state and doesn't fire
      interrupt. At the same time PHY state machine expect that PHY started
      auto-negotiation and is waiting for interrupt from PHY and it won't get it.
      
      Fixes: 321beec5 ("net: phy: Use interrupts when available in NOLINK state")
      Signed-off-by: NAlexander Kochetkov <al.kochet@gmail.com>
      Cc: stable <stable@vger.kernel.org> # v4.9+
      Tested-by: NRoger Quadros <rogerq@ti.com>
      Tested-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f555f34f
  5. 22 4月, 2017 2 次提交
    • I
      block: get rid of blk_integrity_revalidate() · 19b7ccf8
      Ilya Dryomov 提交于
      Commit 25520d55 ("block: Inline blk_integrity in struct gendisk")
      introduced blk_integrity_revalidate(), which seems to assume ownership
      of the stable pages flag and unilaterally clears it if no blk_integrity
      profile is registered:
      
          if (bi->profile)
                  disk->queue->backing_dev_info->capabilities |=
                          BDI_CAP_STABLE_WRITES;
          else
                  disk->queue->backing_dev_info->capabilities &=
                          ~BDI_CAP_STABLE_WRITES;
      
      It's called from revalidate_disk() and rescan_partitions(), making it
      impossible to enable stable pages for drivers that support partitions
      and don't use blk_integrity: while the call in revalidate_disk() can be
      trivially worked around (see zram, which doesn't support partitions and
      hence gets away with zram_revalidate_disk()), rescan_partitions() can
      be triggered from userspace at any time.  This breaks rbd, where the
      ceph messenger is responsible for generating/verifying CRCs.
      
      Since blk_integrity_{un,}register() "must" be used for (un)registering
      the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
      setting there.  This way drivers that call blk_integrity_register() and
      use integrity infrastructure won't interfere with drivers that don't
      but still want stable pages.
      
      Fixes: 25520d55 ("block: Inline blk_integrity in struct gendisk")
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.4+, needs backporting
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      19b7ccf8
    • D
      net: ipv6: RTF_PCPU should not be settable from userspace · 557c44be
      David Ahern 提交于
      Andrey reported a fault in the IPv6 route code:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff880069809600 task.stack: ffff880062dc8000
      RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
      RSP: 0018:ffff880062dced30 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
      RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
      RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
      FS:  00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
      Call Trace:
       ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
       ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
      ...
      
      Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
      set. Flags passed to the kernel are blindly copied to the allocated
      rt6_info by ip6_route_info_create making a newly inserted route appear
      as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
      and expects rt->dst.from to be set - which it is not since it is not
      really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
      generates the fault.
      
      Fix by checking for the flag and failing with EINVAL.
      
      Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      557c44be
  6. 21 4月, 2017 19 次提交
  7. 20 4月, 2017 4 次提交
  8. 19 4月, 2017 2 次提交
    • A
      block, bfq: add full hierarchical scheduling and cgroups support · e21b7a0b
      Arianna Avanzini 提交于
      Add complete support for full hierarchical scheduling, with a cgroups
      interface. Full hierarchical scheduling is implemented through the
      'entity' abstraction: both bfq_queues, i.e., the internal BFQ queues
      associated with processes, and groups are represented in general by
      entities. Given the bfq_queues associated with the processes belonging
      to a given group, the entities representing these queues are sons of
      the entity representing the group. At higher levels, if a group, say
      G, contains other groups, then the entity representing G is the parent
      entity of the entities representing the groups in G.
      
      Hierarchical scheduling is performed as follows: if the timestamps of
      a leaf entity (i.e., of a bfq_queue) change, and such a change lets
      the entity become the next-to-serve entity for its parent entity, then
      the timestamps of the parent entity are recomputed as a function of
      the budget of its new next-to-serve leaf entity. If the parent entity
      belongs, in its turn, to a group, and its new timestamps let it become
      the next-to-serve for its parent entity, then the timestamps of the
      latter parent entity are recomputed as well, and so on. When a new
      bfq_queue must be set in service, the reverse path is followed: the
      next-to-serve highest-level entity is chosen, then its next-to-serve
      child entity, and so on, until the next-to-serve leaf entity is
      reached, and the bfq_queue that this entity represents is set in
      service.
      
      Writeback is accounted for on a per-group basis, i.e., for each group,
      the async I/O requests of the processes of the group are enqueued in a
      distinct bfq_queue, and the entity associated with this queue is a
      child of the entity associated with the group.
      
      Weights can be assigned explicitly to groups and processes through the
      cgroups interface, differently from what happens, for single
      processes, if the cgroups interface is not used (as explained in the
      description of the previous patch). In particular, since each node has
      a full scheduler, each group can be assigned its own weight.
      Signed-off-by: NFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e21b7a0b
    • H
      mmc: sdio: fix alignment issue in struct sdio_func · 5ef1ecf0
      Heiner Kallweit 提交于
      Certain 64-bit systems (e.g. Amlogic Meson GX) require buffers to be
      used for DMA to be 8-byte-aligned. struct sdio_func has an embedded
      small DMA buffer not meeting this requirement.
      When testing switching to descriptor chain mode in meson-gx driver
      SDIO is broken therefore. Fix this by allocating the small DMA buffer
      separately as kmalloc ensures that the returned memory area is
      properly aligned for every basic data type.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Tested-by: NHelmut Klein <hgkr.klein@gmail.com>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      5ef1ecf0
  9. 17 4月, 2017 4 次提交