- 19 1月, 2019 8 次提交
-
-
由 Toke Høiland-Jørgensen 提交于
This adds TX airtime statistics to the cfg80211 station dump (to go along with the RX info already present), and adds a new parameter to set the airtime weight of each station. The latter allows userspace to implement policies for different stations by varying their weights. Signed-off-by: NToke Høiland-Jørgensen <toke@toke.dk> [rmanohar@codeaurora.org: fixed checkpatch warnings] Signed-off-by: NRajkumar Manoharan <rmanohar@codeaurora.org> [move airtime weight != 0 check into policy] Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Eran Ben Elisha 提交于
Add devlink health dump commands, in order to run an dump operation over a specific reporter. The supported operations are dump_get in order to get last saved dump (if not exist, dump now) and dump_clear to clear last saved dump. It is expected from driver's callback for diagnose command to fill it via the buffer descriptors API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Reviewed-by: NMoshe Shemesh <moshe@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eran Ben Elisha 提交于
Add devlink health diagnose command, in order to run a diagnose operation over a specific reporter. It is expected from driver's callback for diagnose command to fill it via the buffer descriptors API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Reviewed-by: NMoshe Shemesh <moshe@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eran Ben Elisha 提交于
Add devlink health recover command to the uapi, in order to allow the user to execute a recover operation over a specific reporter. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Reviewed-by: NMoshe Shemesh <moshe@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eran Ben Elisha 提交于
Add devlink health set command, in order to set configuration parameters for a specific reporter. Supported parameters are: - graceful_period: Time interval between auto recoveries (in msec) - auto_recover: Determines if the devlink shall execute recover upon receiving error for the reporter Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Reviewed-by: NMoshe Shemesh <moshe@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eran Ben Elisha 提交于
Add devlink health get command to provide reporter/s data for user space. Add the ability to get data per reporter or dump data from all available reporters. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Reviewed-by: NMoshe Shemesh <moshe@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eran Ben Elisha 提交于
Devlink health buffer is a mechanism to pass descriptors between drivers and devlink. The API allows the driver to add objects, object pair, value array (nested attributes), value and name. Driver can use this API to fill the buffers in a format which can be translated by the devlink to the netlink message. In order to fulfill it, an internal buffer descriptor is defined. This will hold the data and metadata per each attribute and by used to pass actual commands to the netlink. This mechanism will be later used in devlink health for dump and diagnose data store by the drivers. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Reviewed-by: NMoshe Shemesh <moshe@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cong Wang 提交于
Although matchall always matches packets, however, it still relies on a protocol match first. So it is still useful to have such a counter for matchall. Of course, unlike u32, every time we hit a matchall filter, it is always a success, so we don't have to distinguish them. Sample output: filter protocol 802.1Q pref 100 matchall chain 0 filter protocol 802.1Q pref 100 matchall chain 0 handle 0x1 not_in_hw (rule hit 10) action order 1: vlan pop continue index 1 ref 1 bind 1 installed 40 sec used 1 sec Action statistics: Sent 836 bytes 10 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Reported-by: NMartin Olsson <martin.olsson+netdev@sentorsecurity.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 1月, 2019 1 次提交
-
-
由 David Herrmann 提交于
This introduces a new generic SOL_SOCKET-level socket option called SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name. User-space often refers to network-interfaces via their index, but has to temporarily resolve it to a name for a call into SO_BINDTODEVICE. This might pose problems when the network-device is renamed asynchronously by other parts of the system. When this happens, the SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong device. In most cases user-space only ever operates on devices which they either manage themselves, or otherwise have a guarantee that the device name will not change (e.g., devices that are UP cannot be renamed). However, particularly in libraries this guarantee is non-obvious and it would be nice if that race-condition would simply not exist. It would make it easier for those libraries to operate even in situations where the device-name might change under the hood. A real use-case that we recently hit is trying to start the network stack early in the initrd but make it survive into the real system. Existing distributions rename network-interfaces during the transition from initrd into the real system. This, obviously, cannot affect devices that are up and running (unless you also consider moving them between network-namespaces). However, the network manager now has to make sure its management engine for dormant devices will not run in parallel to these renames. Particularly, when you offload operations like DHCP into separate processes, these might setup their sockets early, and thus have to resolve the device-name possibly running into this race-condition. By avoiding a call to resolve the device-name, we no longer depend on the name and can run network setup of dormant devices in parallel to the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this race. Reviewed-by: NTom Gundersen <teg@jklm.no> Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 1月, 2019 1 次提交
-
-
由 Eugene Syromiatnikov 提交于
The ioctl command is read/write (or just read, if the fact that user space writes n_samples field is ignored). Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 1月, 2019 1 次提交
-
-
由 David Abdurachmanov 提交于
On RISC-V (riscv) audit is supported through generic lib/audit.c. The patch adds required arch specific definitions. Signed-off-by: NDavid Abdurachmanov <david.abdurachmanov@gmail.com> Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
-
- 06 1月, 2019 2 次提交
-
-
由 Eric Biggers 提交于
Add support for the Adiantum encryption mode to fscrypt. Adiantum is a tweakable, length-preserving encryption mode with security provably reducible to that of XChaCha12 and AES-256, subject to a security bound. It's also a true wide-block mode, unlike XTS. See the paper "Adiantum: length-preserving encryption for entry-level processors" (https://eprint.iacr.org/2018/720.pdf) for more details. Also see commit 059c2a4d ("crypto: adiantum - add Adiantum support"). On sufficiently long messages, Adiantum's bottlenecks are XChaCha12 and the NH hash function. These algorithms are fast even on processors without dedicated crypto instructions. Adiantum makes it feasible to enable storage encryption on low-end mobile devices that lack AES instructions; currently such devices are unencrypted. On ARM Cortex-A7, on 4096-byte messages Adiantum encryption is about 4 times faster than AES-256-XTS encryption; decryption is about 5 times faster. In fscrypt, Adiantum is suitable for encrypting both file contents and names. With filenames, it fixes a known weakness: when two filenames in a directory share a common prefix of >= 16 bytes, with CTS-CBC their encrypted filenames share a common prefix too, leaking information. Adiantum does not have this problem. Since Adiantum also accepts long tweaks (IVs), it's also safe to use the master key directly for Adiantum encryption rather than deriving per-file keys, provided that the per-file nonce is included in the IVs and the master key isn't used for any other encryption mode. This configuration saves memory and improves performance. A new fscrypt policy flag is added to allow users to opt-in to this configuration. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Masahiro Yamada 提交于
These comments are leftovers of commit fcc8487d ("uapi: export all headers under uapi directories"). Prior to that commit, exported headers must be explicitly added to header-y. Now, all headers under the uapi/ directories are exported. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-
- 05 1月, 2019 6 次提交
-
-
由 Feng Tang 提交于
So that we can also runtime chose to print out the needed system info for panic, other than setting the kernel cmdline. Link: http://lkml.kernel.org/r/1543398842-19295-3-git-send-email-feng.tang@intel.comSigned-off-by: NFeng Tang <feng.tang@intel.com> Suggested-by: NSteven Rostedt <rostedt@goodmis.org> Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tigran Aivazian 提交于
Strengthen validation of BFS superblock against corruption. Make in-core inode bitmap static part of superblock info structure. Print a warning when mounting a BFS filesystem created with "-N 512" option as only 510 files can be created in the root directory. Make the kernel messages more uniform. Update the 'prefix' passed to bfs_dump_imap() to match the current naming of operations. White space and comments cleanup. Link: http://lkml.kernel.org/r/CAK+_RLkFZMduoQF36wZFd3zLi-6ZutWKsydjeHFNdtRvZZEb4w@mail.gmail.comSigned-off-by: NTigran Aivazian <aivazian.tigran@gmail.com> Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Carmeli Tamir 提交于
MAX_FAT is useless in msdos_fs.h, since it uses the MSDOS_SB function that is defined in fat.h. So really, this macro can be only called from code that already includes fat.h. Hence, this patch moves it to fat.h, right after MSDOS_SB is defined. I also changed it to an inline function in order to save the double call to MSDOS_SB. This was suggested by joe@perches.com in the previous version. This patch is required for the next in the series, in which the variant (whether this is FAT12, FAT16 or FAT32) checks are replaced with new macros. Link: http://lkml.kernel.org/r/1544990640-11604-3-git-send-email-carmeli.tamir@gmail.comSigned-off-by: NCarmeli Tamir <carmeli.tamir@gmail.com> Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Carmeli Tamir 提交于
The comment edited in this patch was the only reference to the FAT_FIRST_ENT macro, which is not used anymore. Moreover, the commented line of code does not compile with the current code. Since the FAT_FIRST_ENT macro checks the FAT variant in a way that the patch series changes, I removed it, and instead wrote a clear explanation of what was checked. I verified that the changed comment is correct according to Microsoft FAT spec, search for "BPB_Media" in the following references: 1. Microsoft FAT specification 2005 (http://read.pudn.com/downloads77/ebook/294884/FAT32%20Spec%20%28SDA%20Contribution%29.pdf). Search for 'volume label'. 2. Microsoft Extensible Firmware Initiative, FAT32 File System Specification (https://staff.washington.edu/dittrich/misc/fatgen103.pdf). Search for 'volume label'. Link: http://lkml.kernel.org/r/1544990640-11604-2-git-send-email-carmeli.tamir@gmail.comSigned-off-by: NCarmeli Tamir <carmeli.tamir@gmail.com> Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Carmeli Tamir 提交于
The FAT file system volume label file stored in the root directory should match the volume label field in the FAT boot sector. As consequence, the max length of these fields ought to be the same. This patch replaces the magic '11' usef in the struct fat_boot_sector with MSDOS_NAME, which is used in struct msdos_dir_entry. Please check the following references: 1. Microsoft FAT specification 2005 (http://read.pudn.com/downloads77/ebook/294884/FAT32%20Spec%20%28SDA%20Contribution%29.pdf). Search for 'volume label'. 2. Microsoft Extensible Firmware Initiative, FAT32 File System Specification (https://staff.washington.edu/dittrich/misc/fatgen103.pdf). Search for 'volume label'. 3. User space code that creates FAT filesystem sometimes uses MSDOS_NAME for the label, sometimes not. Search for 'if (memcmp(label, NO_NAME, MSDOS_NAME))'. I consider to make the same patch there as well. https://github.com/dosfstools/dosfstools/blob/master/src/mkfs.fat.c Link: http://lkml.kernel.org/r/1543096879-82837-1-git-send-email-carmeli.tamir@gmail.comSigned-off-by: NCarmeli Tamir <carmeli.tamir@gmail.com> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ian Kent 提交于
Commit 092a5345 ("autofs: take more care to not update last_used on path walk") helped to (partially) resolve a problem where automounts were not expiring due to aggressive accesses from user space. This patch was later reverted because, for very large environments, it meant more mount requests from clients and when there are a lot of clients this caused a fairly significant increase in server load. But there is a need for both types of expire check, depending on use case, so add a mount option to allow for strict update of last use of autofs dentrys (which just means not updating the last use on path walk access). Link: http://lkml.kernel.org/r/154296973880.9889.14085372741514507967.stgit@pluto-themaw-netSigned-off-by: NIan Kent <raven@themaw.net> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 1月, 2019 1 次提交
-
-
由 Manivannan Sadhasivam 提交于
Add UART driver for RDA Micro RDA8810PL SoC. Signed-off-by: NAndreas Färber <afaerber@suse.de> Signed-off-by: NManivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NOlof Johansson <olof@lixom.net>
-
- 30 12月, 2018 2 次提交
-
-
由 Dmitry V. Levin 提交于
syscall_get_arch() is required to be implemented on all architectures in order to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request. Cc: Guo Ren <guoren@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Paris <eparis@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Elvira Khabirova <lineprinter@altlinux.org> Cc: Eugene Syromyatnikov <esyr@redhat.com> Cc: linux-audit@redhat.com Signed-off-by: NDmitry V. Levin <ldv@altlinux.org> Signed-off-by: NGuo Ren <guoren@kernel.org> arch/csky/include/asm/syscall.h | 7 +++++++ include/uapi/linux/audit.h | 1 + 2 files changed, 8 insertions(+)
-
由 Dmitry V. Levin 提交于
The uapi/linux/audit.h header is going to use EM_CSKY in order to define AUDIT_ARCH_CSKY which is needed to implement syscall_get_arch() which in turn is required to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request. The value for EM_CSKY has been taken from arch/csky/include/asm/elf.h and confirmed by binutils:include/elf/common.h Cc: Guo Ren <guoren@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Elvira Khabirova <lineprinter@altlinux.org> Cc: Eugene Syromyatnikov <esyr@redhat.com> Signed-off-by: NDmitry V. Levin <ldv@altlinux.org> Signed-off-by: NGuo Ren <guoren@kernel.org>
-
- 21 12月, 2018 5 次提交
-
-
由 Alexey Kardashevskiy 提交于
POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not pluggable PCIe devices but still have PCIe links which are used for config space and MMIO. In addition to that the GPUs have 6 NVLinks which are connected to other GPUs and the POWER9 CPU. POWER9 chips have a special unit on a die called an NPU which is an NVLink2 host bus adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each. These systems also support ATS (address translation services) which is a part of the NVLink2 protocol. Such GPUs also share on-board RAM (16GB or 32GB) to the system via the same NVLink2 so a CPU has cache-coherent access to a GPU RAM. This exports GPU RAM to the userspace as a new VFIO device region. This preregisters the new memory as device memory as it might be used for DMA. This inserts pfns from the fault handler as the GPU memory is not onlined until the vendor driver is loaded and trained the NVLinks so doing this earlier causes low level errors which we fence in the firmware so it does not hurt the host system but still better be avoided; for the same reason this does not map GPU RAM into the host kernel (usual thing for emulated access otherwise). This exports an ATSD (Address Translation Shootdown) register of NPU which allows TLB invalidations inside GPU for an operating system. The register conveniently occupies a single 64k page. It is also presented to the userspace as a new VFIO device region. One NPU has 8 ATSD registers, each of them can be used for TLB invalidation in a GPU linked to this NPU. This allocates one ATSD register per an NVLink bridge allowing passing up to 6 registers. Due to the host firmware bug (just recently fixed), only 1 ATSD register per NPU was actually advertised to the host system so this passes that alone register via the first NVLink bridge device in the group which is still enough as QEMU collects them all back and presents to the guest via vPHB to mimic the emulated NPU PHB on the host. In order to provide the userspace with the information about GPU-to-NVLink connections, this exports an additional capability called "tgt" (which is an abbreviated host system bus address). The "tgt" property tells the GPU its own system address and allows the guest driver to conglomerate the routing information so each GPU knows how to get directly to the other GPUs. For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to know LPID (a logical partition ID or a KVM guest hardware ID in other words) and PID (a memory context ID of a userspace process, not to be confused with a linux pid). This assigns a GPU to LPID in the NPU and this is why this adds a listener for KVM on an IOMMU group. A PID comes via NVLink from a GPU and NPU uses a PID wildcard to pass it through. This requires coherent memory and ATSD to be available on the host as the GPU vendor only supports configurations with both features enabled and other configurations are known not to work. Because of this and because of the ways the features are advertised to the host system (which is a device tree with very platform specific properties), this requires enabled POWERNV platform. The V100 GPUs do not advertise any of these capabilities via the config space and there are more than just one device ID so this relies on the platform to tell whether these GPUs have special abilities such as NVLinks. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Guralnik 提交于
Add a method for query port under the uverbs global methods. Current ib_port_attr struct is passed as a single attribute and port_cap_flags2 is added as a new attribute to the function. Signed-off-by: NMichael Guralnik <michaelgur@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Michael Guralnik 提交于
port_cap_flags2 represents IBTA PortInfo:CapabilityMask2. The field safely extends the RDMA_NLDEV_ATTR_CAP_FLAGS operand as it was exported as 64 bit to allow this kind of extension. Signed-off-by: NMichael Guralnik <michaelgur@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Rob Clark 提交于
BACKLIGHT_CLASS_DEVICE is already tristate, but a dependency FB_BACKLIGHT prevents it from being built as a module. There doesn't seem to be any particularly good reason for this, so switch FB_BACKLIGHT over to tristate. Signed-off-by: NRob Clark <robdclark@gmail.com> Tested-by: NArnd Bergmann <arnd@arndb.de> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ulf Magnusson <ulfalizer@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Hans de Goede <j.w.r.degoede@gmail.com> Signed-off-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
-
由 David Howells 提交于
Only the mount namespace code that implements mount(2) should be using the MS_* flags. Suppress them inside the kernel unless uapi/linux/mount.h is included. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Reviewed-by: NDavid Howells <dhowells@redhat.com>
-
- 20 12月, 2018 3 次提交
-
-
由 wenxu 提交于
ip l add dev tun type gretap external ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 dev gretap For gretap Key example when the command set the id but don't set the TUNNEL_KEY flags. There is no key field in the send packet In the lwtunnel situation, some TUNNEL_FLAGS should can be set by userspace Signed-off-by: Nwenxu <wenxu@ucloud.cn> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Bonzini 提交于
vhost structs are shared by vhost-kernel and vhost-user. Split them into a separate file to ease copying them into programs that implement either the server or the client side of vhost-user. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Changpeng Liu 提交于
In commit 88c85538, "virtio-blk: add discard and write zeroes features to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio block specification has been extended to add VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES commands. This patch enables support for discard and write zeroes in the virtio-blk driver when the device advertises the corresponding features, VIRTIO_BLK_F_DISCARD and VIRTIO_BLK_F_WRITE_ZEROES. Signed-off-by: NChangpeng Liu <changpeng.liu@intel.com> Signed-off-by: NDaniel Verkamp <dverkamp@chromium.org> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
-
- 19 12月, 2018 7 次提交
-
-
由 Christian Brauner 提交于
As discussed at Linux Plumbers Conference 2018 in Vancouver [1] this is the implementation of binderfs. /* Abstract */ binderfs is a backwards-compatible filesystem for Android's binder ipc mechanism. Each ipc namespace will mount a new binderfs instance. Mounting binderfs multiple times at different locations in the same ipc namespace will not cause a new super block to be allocated and hence it will be the same filesystem instance. Each new binderfs mount will have its own set of binder devices only visible in the ipc namespace it has been mounted in. All devices in a new binderfs mount will follow the scheme binder%d and numbering will always start at 0. /* Backwards compatibility */ Devices requested in the Kconfig via CONFIG_ANDROID_BINDER_DEVICES for the initial ipc namespace will work as before. They will be registered via misc_register() and appear in the devtmpfs mount. Specifically, the standard devices binder, hwbinder, and vndbinder will all appear in their standard locations in /dev. Mounting or unmounting the binderfs mount in the initial ipc namespace will have no effect on these devices, i.e. they will neither show up in the binderfs mount nor will they disappear when the binderfs mount is gone. /* binder-control */ Each new binderfs instance comes with a binder-control device. No other devices will be present at first. The binder-control device can be used to dynamically allocate binder devices. All requests operate on the binderfs mount the binder-control device resides in. Assuming a new instance of binderfs has been mounted at /dev/binderfs via mount -t binderfs binderfs /dev/binderfs. Then a request to create a new binder device can be made as illustrated in [2]. Binderfs devices can simply be removed via unlink(). /* Implementation details */ - dynamic major number allocation: When binderfs is registered as a new filesystem it will dynamically allocate a new major number. The allocated major number will be returned in struct binderfs_device when a new binder device is allocated. - global minor number tracking: Minor are tracked in a global idr struct that is capped at BINDERFS_MAX_MINOR. The minor number tracker is protected by a global mutex. This is the only point of contention between binderfs mounts. - struct binderfs_info: Each binderfs super block has its own struct binderfs_info that tracks specific details about a binderfs instance: - ipc namespace - dentry of the binder-control device - root uid and root gid of the user namespace the binderfs instance was mounted in - mountable by user namespace root: binderfs can be mounted by user namespace root in a non-initial user namespace. The devices will be owned by user namespace root. - binderfs binder devices without misc infrastructure: New binder devices associated with a binderfs mount do not use the full misc_register() infrastructure. The misc_register() infrastructure can only create new devices in the host's devtmpfs mount. binderfs does however only make devices appear under its own mountpoint and thus allocates new character device nodes from the inode of the root dentry of the super block. This will have the side-effect that binderfs specific device nodes do not appear in sysfs. This behavior is similar to devpts allocated pts devices and has no effect on the functionality of the ipc mechanism itself. [1]: https://goo.gl/JL2tfX [2]: program to allocate a new binderfs binder device: #define _GNU_SOURCE #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <unistd.h> #include <linux/android/binder_ctl.h> int main(int argc, char *argv[]) { int fd, ret, saved_errno; size_t len; struct binderfs_device device = { 0 }; if (argc < 2) exit(EXIT_FAILURE); len = strlen(argv[1]); if (len > BINDERFS_MAX_NAME) exit(EXIT_FAILURE); memcpy(device.name, argv[1], len); fd = open("/dev/binderfs/binder-control", O_RDONLY | O_CLOEXEC); if (fd < 0) { printf("%s - Failed to open binder-control device\n", strerror(errno)); exit(EXIT_FAILURE); } ret = ioctl(fd, BINDER_CTL_ADD, &device); saved_errno = errno; close(fd); errno = saved_errno; if (ret < 0) { printf("%s - Failed to allocate new binder device\n", strerror(errno)); exit(EXIT_FAILURE); } printf("Allocated new binder device with major %d, minor %d, and " "name %s\n", device.major, device.minor, device.name); exit(EXIT_SUCCESS); } Cc: Martijn Coenen <maco@android.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com> Acked-by: NTodd Kjos <tkjos@google.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Davide Caratti 提交于
Herton reports the following error when building a userspace program that includes net_stamp.h: In file included from foo.c:2: /usr/include/linux/net_tstamp.h:158:2: error: unknown type name ‘clockid_t’ clockid_t clockid; /* reference clockid */ ^~~~~~~~~ Fix it by using __kernel_clockid_t in place of clockid_t. Fixes: 80b14dee ("net: Add a new socket option for a future transmit time.") Cc: Timothy Redaelli <tredaelli@redhat.com> Reported-by: NHerton R. Krzesinski <herton@redhat.com> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Tested-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
This adds metadata to sk_msg_md for BPF programs to read the sk_msg size. When the SK_MSG program is running under an application that is using sendfile the data is not copied into sk_msg buffers by default. Rather the BPF program uses sk_msg_pull_data to read the bytes in. This avoids doing the costly memcopy instructions when they are not in fact needed. However, if we don't know the size of the sk_msg we have to guess if needed bytes are available by doing a pull request which may fail. By including the size of the sk_msg BPF programs can check the size before issuing sk_msg_pull_data requests. Additionally, the same applies for sendmsg calls when the application provides multiple iovs. Here the BPF program needs to pull in data to update data pointers but its not clear where the data ends without a size parameter. In many cases "guessing" is not easy to do and results in multiple calls to pull and without bounded loops everything gets fairly tricky. Clean this up by including a u32 size field. Note, all writes into sk_msg_md are rejected already from sk_msg_is_valid_access so nothing additional is needed there. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Moni Shoua 提交于
Add new ioctl method for the MR object - ADVISE_MR. This command can be used by users to give an advice or directions to the kernel about an address range that belongs to memory regions. A new ib_device callback, advise_mr(), is introduced here to suupport the new command. This command takes the following arguments: - pd: The protection domain to which all memory regions belong - advice: The type of the advice * IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH - Pre-fetch a range of an on-demand paging MR * IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH_WRITE - Pre-fetch a range of an on-demand paging MR with write intention - flags: The properties of the advice * IB_UVERBS_ADVISE_MR_FLAG_FLUSH - Operation must end before return to the caller - sg_list: The list of memory ranges - num_sge: The number of memory ranges in the list - attrs: More attributes to be parsed by the provider Signed-off-by: NMoni Shoua <monis@mellanox.com> Reviewed-by: NGuy Levi <guyle@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Add an ioctl method to destroy the PD, MR, MW, AH, flow, RWQ indirection table and XRCD objects by handle which doesn't require any output response during destruction. Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Introduce a helper function gather_objects_handle() to copy object handles under a spin lock. Expose these objects handles via the uverbs ioctl interface. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
Now that the handlers do not process their own udata we can make a sensible ioctl that wrappers them. The ioctl follows the same format as the write_ex() and has the user explicitly specify the core and driver in/out opaque structures and a command number. This works for all forms of write commands. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 18 12月, 2018 3 次提交
-
-
由 Sriram R 提交于
Currently radar detection and corresponding channel switch is handled at the AP device. STA ignores these detected radar events since the radar signal can be seen mostly by the AP as well. But in scenarios where a radar signal is seen only at STA, notifying this event to the AP which can trigger a channel switch can be useful. Stations can report such radar events autonomously through Spectrum management (Measurement Report) action frame to its AP. The userspace on processing the report can notify the kernel with the use of the added NL80211_CMD_NOTIFY_RADAR to indicate the detected event and inturn adding the reported channel to NOL. Signed-off-by: NSriram R <srirrama@codeaurora.org> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Johannes Berg 提交于
The older code and current userspace assumed that this data is the content of the Measurement Report element, starting with the Measurement Token. Clarify this in the documentation. Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Yonghong Song 提交于
This patch fixed two issues with BTF. One is related to struct/union bitfield encoding and the other is related to forward type. Issue #1 and solution: ====================== Current btf encoding of bitfield follows what pahole generates. For each bitfield, pahole will duplicate the type chain and put the bitfield size at the final int or enum type. Since the BTF enum type cannot encode bit size, pahole workarounds the issue by generating an int type whenever the enum bit size is not 32. For example, -bash-4.4$ cat t.c typedef int ___int; enum A { A1, A2, A3 }; struct t { int a[5]; ___int b:4; volatile enum A c:4; } g; -bash-4.4$ gcc -c -O2 -g t.c The current kernel supports the following BTF encoding: $ pahole -JV t.o [1] TYPEDEF ___int type_id=2 [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED [3] ENUM A size=4 vlen=3 A1 val=0 A2 val=1 A3 val=2 [4] STRUCT t size=24 vlen=3 a type_id=5 bits_offset=0 b type_id=9 bits_offset=160 c type_id=11 bits_offset=164 [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5 [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none) [7] VOLATILE (anon) type_id=3 [8] INT int size=1 bit_offset=0 nr_bits=4 encoding=(none) [9] TYPEDEF ___int type_id=8 [10] INT (anon) size=1 bit_offset=0 nr_bits=4 encoding=SIGNED [11] VOLATILE (anon) type_id=10 Two issues are in the above: . by changing enum type to int, we lost the original type information and this will not be ideal later when we try to convert BTF to a header file. . the type duplication for bitfields will cause BTF bloat. Duplicated types cannot be deduplicated later if the bitfield size is different. To fix this issue, this patch implemented a compatible change for BTF struct type encoding: . the bit 31 of struct_type->info, previously reserved, now is used to indicate whether bitfield_size is encoded in btf_member or not. . if bit 31 of struct_type->info is set, btf_member->offset will encode like: bit 0 - 23: bit offset bit 24 - 31: bitfield size if bit 31 is not set, the old behavior is preserved: bit 0 - 31: bit offset So if the struct contains a bit field, the maximum bit offset will be reduced to (2^24 - 1) instead of MAX_UINT. The maximum bitfield size will be 256 which is enough for today as maximum bitfield in compiler can be 128 where int128 type is supported. This kernel patch intends to support the new BTF encoding: $ pahole -JV t.o [1] TYPEDEF ___int type_id=2 [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED [3] ENUM A size=4 vlen=3 A1 val=0 A2 val=1 A3 val=2 [4] STRUCT t kind_flag=1 size=24 vlen=3 a type_id=5 bitfield_size=0 bits_offset=0 b type_id=1 bitfield_size=4 bits_offset=160 c type_id=7 bitfield_size=4 bits_offset=164 [5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5 [6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none) [7] VOLATILE (anon) type_id=3 Issue #2 and solution: ====================== Current forward type in BTF does not specify whether the original type is struct or union. This will not work for type pretty print and BTF-to-header-file conversion as struct/union must be specified. $ cat tt.c struct t; union u; int foo(struct t *t, union u *u) { return 0; } $ gcc -c -g -O2 tt.c $ pahole -JV tt.o [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED [2] FWD t type_id=0 [3] PTR (anon) type_id=2 [4] FWD u type_id=0 [5] PTR (anon) type_id=4 To fix this issue, similar to issue #1, type->info bit 31 is used. If the bit is set, it is union type. Otherwise, it is a struct type. $ pahole -JV tt.o [1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED [2] FWD t kind_flag=0 type_id=0 [3] PTR (anon) kind_flag=0 type_id=2 [4] FWD u kind_flag=1 type_id=0 [5] PTR (anon) kind_flag=0 type_id=4 Pahole/LLVM change: =================== The new kind_flag functionality has been implemented in pahole and llvm: https://github.com/yonghong-song/pahole/tree/bitfield https://github.com/yonghong-song/llvm/tree/bitfield Note that pahole hasn't implemented func/func_proto kind and .BTF.ext. So to print function signature with bpftool, the llvm compiler should be used. Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)") Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-