- 24 11月, 2017 2 次提交
-
-
由 Bjorn Helgaas 提交于
There are no in-tree callers of ht_create_irq(), the driver interface for HyperTransport interrupts, left. Remove the unused entry point and all the supporting code. See 8b955b0d ("[PATCH] Initial generic hypertransport interrupt support"). Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-pci@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
-
由 Willem de Bruijn 提交于
Tuntap and similar devices can inject GSO packets. Accept type VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively. Processes are expected to use feature negotiation such as TUNSETOFFLOAD to detect supported offload types and refrain from injecting other packets. This process breaks down with live migration: guest kernels do not renegotiate flags, so destination hosts need to expose all features that the source host does. Partially revert the UFO removal from 182e0b6b~1..d9d30adf. This patch introduces nearly(*) no new code to simplify verification. It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP insertion and software UFO segmentation. It does not reinstate protocol stack support, hardware offload (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception of VIRTIO_NET_HDR_GSO_UDP packets in tuntap. To support SKB_GSO_UDP reappearing in the stack, also reinstate logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD by squashing in commit 93991221 ("net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6 ("net: avoid skb_warn_bad_offload false positives on UFO"). (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id, ipv6_proxy_select_ident is changed to return a __be32 and this is assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted at the end of the enum to minimize code churn. Tested Booted a v4.13 guest kernel with QEMU. On a host kernel before this patch `ethtool -k eth0` shows UFO disabled. After the patch, it is enabled, same as on a v4.13 host kernel. A UFO packet sent from the guest appears on the tap device: host: nc -l -p -u 8000 & tcpdump -n -i tap0 guest: dd if=/dev/zero of=payload.txt bs=1 count=2000 nc -u 192.16.1.1 8000 < payload.txt Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds, packets arriving fragmented: ./with_tap_pair.sh ./tap_send_ufo tap0 tap1 (from https://github.com/wdebruij/kerneltools/tree/master/tests) Changes v1 -> v2 - simplified set_offload change (review comment) - documented test procedure Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com> Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.") Reported-by: NMichal Kubecek <mkubecek@suse.cz> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 11月, 2017 2 次提交
-
-
由 Alexei Starovoitov 提交于
when the verifier detects that register contains a runtime constant and it's compared with another constant it will prune exploration of the branch that is guaranteed not to be taken at runtime. This is all correct, but malicious program may be constructed in such a way that it always has a constant comparison and the other branch is never taken under any conditions. In this case such path through the program will not be explored by the verifier. It won't be taken at run-time either, but since all instructions are JITed the malicious program may cause JITs to complain about using reserved fields, etc. To fix the issue we have to track the instructions explored by the verifier and sanitize instructions that are dead at run time with NOPs. We cannot reject such dead code, since llvm generates it for valid C code, since it doesn't do as much data flow analysis as the verifier does. Fixes: 17a52670 ("bpf: verifier (add verifier core)") Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Gianluca Borello 提交于
With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO and the verifier can prove the value of this next argument is 0. However, most helpers are just interested in handling <!NULL, 0>, so forcing them to deal with <NULL, 0> makes the implementation of those helpers more complicated for no apparent benefits, requiring them to explicitly handle those corner cases with checks that bpf programs could start relying upon, preventing the possibility of removing them later. Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case. Currently, the only helper that needs this is bpf_csum_diff_proto(), so change arg1 and arg3 to this new type as well. Also add a new battery of tests that explicitly test the !ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the various <NULL, 0> variations are focused on bpf_csum_diff, so cover also other helpers. Signed-off-by: NGianluca Borello <g.borello@gmail.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 22 11月, 2017 9 次提交
-
-
由 Kees Cook 提交于
With all callbacks converted, and the timer callback prototype switched over, the TIMER_FUNC_TYPE cast is no longer needed, so remove it. Conversion was done with the following scripts: perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \ $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u) perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \ $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u) The now unused macros are also dropped from include/linux/timer.h. Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
With __init_timer*() now matching __setup_timer*(), remove the redundant internal interface, clean up the resulting definitions and add more documentation. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Shaohua Li <shli@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
In preparation for removing more macros, pass the function down to the initialization routines instead of doing it in macros. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
With the .data field removed, the ignored data arguments in timer macros can be removed. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shaohua Li <shli@fb.com> Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
Since all callbacks have been converted, we can switch the core prototype to "struct timer_list *" now too. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
Now that all timer callbacks are already taking their struct timer_list pointer as the callback argument, just do this unconditionally and remove the .data field. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
With all callers converted to timer_setup(), the old setup_*timer() interface can be removed. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
All users of init_timer() have been updated. Remove the ancient interface. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Kees Cook 提交于
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Jens Axboe <axboe@kernel.dk> Cc: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: linux-block@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 21 11月, 2017 5 次提交
-
-
由 Dan Carpenter 提交于
It doesn't cause a run-time bug, but these bitfields should be unsigned. When it's signed ->dl_throttled is set to either 0 or -1, instead of 0 and 1 as expected. The sched.h file is included into tons of places so Sparse generates a flood of warnings like this: ./include/linux/sched.h:477:54: error: dubious one-bit signed bitfield Reported-by: NMatthew Wilcox <willy@infradead.org> Reported-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NLuca Abeni <luca.abeni@santannapisa.it> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Cc: luca abeni <luca.abeni@santannapisa.it> Link: http://lkml.kernel.org/r/20171013070121.dzcncojuj2f4utij@mwandaSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jakub Kicinski 提交于
Header implementation of bpf_prog_offload_verifier_prep() which is used if CONFIG_NET=n should be a static inline. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jakub Kicinski 提交于
This reverts commit bd601b6a ("bpf: report offload info to user space"). The ifindex by itself is not sufficient, we should provide information on which network namespace this ifindex belongs to. After considering some options we concluded that it's best to just remove this API for now, and rework it in -next. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jakub Kicinski 提交于
bpf_prog_get_type() is identical to bpf_prog_get_type_dev(), with false passed as attach_drv. Instead of keeping it as an exported symbol turn it into static inline wrapper. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jakub Kicinski 提交于
With TC shared block changes we can't depend on correct netdev pointer being available in cls_bpf. Move the device validation to the driver. Core will only make sure that offloaded programs are always attached in the driver (or in HW by the driver). We trust that drivers which implement offload callbacks will perform necessary checks. Moving the checks to the driver is generally a useful thing, in practice the check should be against a switchdev instance, not a netdev, given that most ASICs will probably allow using the same program on many ports. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 19 11月, 2017 7 次提交
-
-
由 Logan Gunthorpe 提交于
Add a skeleton NTB driver which will be filled out in subsequent patches. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NStephen Bates <sbates@raithlin.com> Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: NAllen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: NJon Mason <jdmason@kudzu.us>
-
由 Logan Gunthorpe 提交于
Seeing the Switchtec NTB hardware shares the same endpoint as the management endpoint we utilize the class_interface API to register an NTB driver for every Switchtec device in the system that has the NTB class code. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NStephen Bates <sbates@raithlin.com> Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: NAllen Hubbe <Allen.Hubbe@dell.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NJon Mason <jdmason@kudzu.us>
-
由 Logan Gunthorpe 提交于
Adds a comment and a check to ntb_mw_get_align() so that it always fails if the function is called before the link is up. Also adds a comment to ntb_mw_count() to note that it may return 0 if it is called before the link is up. This is to prevent accidental mis-use in clients that are testing on hardware that this doesn't matter for. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Acked-by: NAllen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: NJon Mason <jdmason@kudzu.us>
-
由 Logan Gunthorpe 提交于
In order for the Switchtec NTB code to handle link change events we create a notifier callback in the switchtec code which gets called whenever an appropriate event interrupt occurs. In order to preserve userspace's ability to follow these events, we compare the event count with a stored copy from last time we checked. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NStephen Bates <sbates@raithlin.com> Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NJon Mason <jdmason@kudzu.us>
-
由 Logan Gunthorpe 提交于
There are two additional regions: ctrl and dbmsg. The first is for generic NTB control and memory windows. The second is for doorbells and message registers. This patch also adds a number of related constants for using these registers. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NStephen Bates <sbates@raithlin.com> Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com> Signed-off-by: NJon Mason <jdmason@kudzu.us>
-
由 Logan Gunthorpe 提交于
We export the class pointer symbol and add an extern define in the Switchtec header file. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NStephen Bates <sbates@raithlin.com> Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NJon Mason <jdmason@kudzu.us>
-
由 Logan Gunthorpe 提交于
Create the switchtec.h header in include/linux with hardware defines and the switchtec_dev structure. Both moved directly from switchtec.c. This is a prep patch for creating an NTB driver for Switchtec. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NStephen Bates <sbates@raithlin.com> Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NJon Mason <jdmason@kudzu.us>
-
- 18 11月, 2017 15 次提交
-
-
由 Davidlohr Bueso 提交于
For a custom microbenchmark on a 3.30GHz Xeon SandyBridge, which calls IPC_STAT over and over, it was calculated that, on avg the cost of ipc_get_maxid() for increasing amounts of keys was: 10 keys: ~900 cycles 100 keys: ~15000 cycles 1000 keys: ~150000 cycles 10000 keys: ~2100000 cycles This is unsurprising as maxid is currently O(n). By having the max_id available in O(1) we save all those cycles for each semctl(_STAT) command, the idr_find can be expensive -- which some real (customer) workloads actually poll on. Note that this used to be the case, until commit 7ca7e564 ("ipc: store ipcs into IDRs"). The cost is the extra idr_find when doing RMIDs, but we simply go backwards, and should not take too many iterations to find the new value. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170831172049.14576-5-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
Patch series "sysvipc: ipc-key management improvements". Here are a few improvements I spotted while eyeballing Guillaume's rhashtable implementation for ipc keys. The first and fourth patches are the interesting ones, the middle two are trivial. This patch (of 4): The next_id object-allocation functionality was introduced in commit 03f59566 ("ipc: add sysctl to specify desired next object id"). Given that these new entries are _only_ exported under the CONFIG_CHECKPOINT_RESTORE option, there is no point for the common case to even know about ->next_id. As such rewrite ipc_buildid() such that it can do away with the field as well as unnecessary branches when adding a new identifier. The end result also better differentiates both cases, so the code ends up being cleaner; albeit the small duplications regarding the default case. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170831172049.14576-2-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Smirnov 提交于
Add devm_* wrapper around register_reboot_notifier to simplify device specific reboot notifier registration/unregistration. [akpm@linux-foundation.org: move `struct device' forward decl to top-of-file] Link: http://lkml.kernel.org/r/20170320171753.1705-1-andrew.smirnov@gmail.comSigned-off-by: NAndrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Victor Chibotaru 提交于
Enables kcov to collect comparison operands from instrumented code. This is done by using Clang's -fsanitize=trace-cmp instrumentation (currently not available for GCC). The comparison operands help a lot in fuzz testing. E.g. they are used in Syzkaller to cover the interiors of conditional statements with way less attempts and thus make previously unreachable code reachable. To allow separate collection of coverage and comparison operands two different work modes are implemented. Mode selection is now done via a KCOV_ENABLE ioctl call with corresponding argument value. Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.comSigned-off-by: NVictor Chibotaru <tchibo@google.com> Signed-off-by: NAlexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: <syzkaller@googlegroups.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Borislav Petkov 提交于
This is the gist of a patch which we've been forward-porting in our kernels for a long time now and it probably would make a good sense to have such TAINT_AUX flag upstream which can be used by each distro etc, how they see fit. This way, we won't need to forward-port a distro-only version indefinitely. Add an auxiliary taint flag to be used by distros and others. This obviates the need to forward-port whatever internal solutions people have in favor of a single flag which they can map arbitrarily to a definition of their pleasing. The "X" mnemonic could also mean eXternal, which would be taint from a distro or something else but not the upstream kernel. We will use it to mark modules for which we don't provide support. I.e., a really eXternal module. Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnicSigned-off-by: NBorislav Petkov <bp@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gargi Sharma 提交于
pidhash is no longer required as all the information can be looked up from idr tree. nr_hashed represented the number of pids that had been hashed. Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant, it has been renamed to pid_allocated and PIDNS_ADDING respectively. [gs051095@gmail.com: v6] Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.comSigned-off-by: NGargi Sharma <gs051095@gmail.com> Reviewed-by: NRik van Riel <riel@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> [ia64] Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gargi Sharma 提交于
Patch series "Replacing PID bitmap implementation with IDR API", v4. This series replaces kernel bitmap implementation of PID allocation with IDR API. These patches are written to simplify the kernel by replacing custom code with calls to generic code. The following are the stats for pid and pid_namespace object files before and after the replacement. There is a noteworthy change between the IDR and bitmap implementation. Before text data bss dec hex filename 8447 3894 64 12405 3075 kernel/pid.o After text data bss dec hex filename 3397 304 0 3701 e75 kernel/pid.o Before text data bss dec hex filename 5692 1842 192 7726 1e2e kernel/pid_namespace.o After text data bss dec hex filename 2854 216 16 3086 c0e kernel/pid_namespace.o The following are the stats for ps, pstree and calling readdir on /proc for 10,000 processes. ps: With IDR API With bitmap real 0m1.479s 0m2.319s user 0m0.070s 0m0.060s sys 0m0.289s 0m0.516s pstree: With IDR API With bitmap real 0m1.024s 0m1.794s user 0m0.348s 0m0.612s sys 0m0.184s 0m0.264s proc: With IDR API With bitmap real 0m0.059s 0m0.074s user 0m0.000s 0m0.004s sys 0m0.016s 0m0.016s This patch (of 2): Replace the current bitmap implementation for Process ID allocation. Functions that are no longer required, for example, free_pidmap(), alloc_pidmap(), etc. are removed. The rest of the functions are modified to use the IDR API. The change was made to make the PID allocation less complex by replacing custom code with calls to generic API. [gs051095@gmail.com: v6] Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com [avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl] Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.comSigned-off-by: NGargi Sharma <gs051095@gmail.com> Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Lawrence 提交于
pipe_max_size is assigned directly via procfs sysctl: static struct ctl_table fs_table[] = { ... { .procname = "pipe-max-size", .data = &pipe_max_size, .maxlen = sizeof(int), .mode = 0644, .proc_handler = &pipe_proc_fn, .extra1 = &pipe_min_size, }, ... int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf, size_t *lenp, loff_t *ppos) { ... ret = proc_dointvec_minmax(table, write, buf, lenp, ppos) ... and then later rounded in-place a few statements later: ... pipe_max_size = round_pipe_size(pipe_max_size); ... This leaves a window of time between initial assignment and rounding that may be visible to other threads. (For example, one thread sets a non-rounded value to pipe_max_size while another reads its value.) Similar reads of pipe_max_size are potentially racy: pipe.c :: alloc_pipe_info() pipe.c :: pipe_set_size() Add a new proc_dopipe_max_size() that consolidates reading the new value from the user buffer, verifying bounds, and calling round_pipe_size() with a single assignment to pipe_max_size. Link: http://lkml.kernel.org/r/1507658689-11669-4-git-send-email-joe.lawrence@redhat.comSigned-off-by: NJoe Lawrence <joe.lawrence@redhat.com> Reported-by: NMikulas Patocka <mpatocka@redhat.com> Reviewed-by: NMikulas Patocka <mpatocka@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Bates 提交于
If the amount of resources allocated to a gen_pool exceeds 2^32 then the avail atomic overflows and this causes problems when clients try and borrow resources from the pool. This is only expected to be an issue on 64 bit systems. Add the <linux/atomic.h> header to pull in atomic_long* operations. So that 32 bit systems continue to use atomic32_t but 64 bit systems can use atomic64_t. Link: http://lkml.kernel.org/r/1509033843-25667-1-git-send-email-sbates@raithlin.comSigned-off-by: NStephen Bates <sbates@raithlin.com> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: NDaniel Mentz <danielmentz@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masahiro Yamada 提交于
This include was added by commit 187f1882 ("BUG: headers with BUG/BUG_ON etc. need linux/bug.h") because BUG_ON() was used in this header at that time. Some time later, commit 6d75f366 ("lib: radix-tree: check accounting of existing slot replacement users") removed the use of BUG_ON() from this header. Since then, there is no reason to include <linux/bug.h>. Link: http://lkml.kernel.org/r/1505660151-4383-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Chris Mi <chrism@mellanox.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masahiro Yamada 提交于
Since commit bc6245e5 ("bug: split BUILD_BUG stuff out into <linux/build_bug.h>"), #include <linux/build_bug.h> is better to pull minimal headers needed for BUILG_BUG() family. Link: http://lkml.kernel.org/r/1505700775-19826-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com> Cc: Dinan Gunawardena <dinan.gunawardena@netronome.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sandipan Das 提交于
The GCC randomize layout plugin can randomize the member offsets of sensitive kernel data structures. To use this feature, certain annotations and members are added to the structures which affect the member offsets even if this plugin is not used. All of these structures are completely randomized, except for task_struct which leaves out some of its members. All the other members are wrapped within an anonymous struct with the __randomize_layout attribute. This is done using the randomized_struct_fields_start and randomized_struct_fields_end defines. When the plugin is disabled, the behaviour of this attribute can vary based on the GCC version. For GCC 5.1+, this attribute maps to __designated_init otherwise it is just an empty define but the anonymous structure is still present. For other compilers, both randomized_struct_fields_start and randomized_struct_fields_end default to empty defines meaning the anonymous structure is not introduced at all. So, if a module compiled with Clang, such as a BPF program, needs to access task_struct fields such as pid and comm, the offsets of these members as recognized by Clang are different from those recognized by modules compiled with GCC. If GCC 4.6+ is used to build the kernel, this can be solved by introducing appropriate defines for Clang so that the anonymous structure is seen when determining the offsets for the members. Link: http://lkml.kernel.org/r/20171109064645.25581-1-sandipan@linux.vnet.ibm.comSigned-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
When we pass the result of a multiplication as the timeout or the delay, we can get a warning from gcc-7: drivers/mmc/host/bcm2835.c:596:149: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/mfd/arizona-core.c:247:195: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/gpu/drm/sun4i/sun4i_hdmi_i2c.c:49:27: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] The warning is a bit questionable inside of a macro, but this is intentional on the side of the gcc developers. It is also an indication of another problem: we evaluate the timeout and sleep arguments multiple times, which can have undesired side-effects when those are complex expressions. This changes the two iopoll variants to use local variables for storing copies of the timeouts. This adds some more type safety, and avoids both the double-evaluation and the gcc warning. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81484 Link: http://lkml.kernel.org/r/20170726133756.2161367-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20171102114048.1526955-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NMark Brown <broonie@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
Some architectures store the WARN_ONCE state in the flags field of the bug_entry. Clear that one too when resetting once state through /sys/kernel/debug/clear_warn_once Pointed out by Michael Ellerman Improves the earlier patch that add clear_warn_once. [ak@linux.intel.com: add a missing ifdef CONFIG_MODULES] Link: http://lkml.kernel.org/r/20171020170633.9593-1-andi@firstfloor.org [akpm@linux-foundation.org: fix unused var warning] [akpm@linux-foundation.org: Use 0200 for clear_warn_once file, per mpe] [akpm@linux-foundation.org: clear BUGFLAG_DONE in clear_once_table(), per mpe] Link: http://lkml.kernel.org/r/20171019204642.7404-1-andi@firstfloor.orgSigned-off-by: NAndi Kleen <ak@linux.intel.com> Tested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
It is pointless to migrate hugetlb memory as part of memory compaction if the hugetlb size is equal to the pageblock order. No defragmentation is occurring in this condition. It is also pointless to for the freeing scanner to scan a pageblock where a hugetlb page is pinned. Unconditionally skip these pageblocks, and do so peristently so that they are not rescanned until it is observed that these hugepages are no longer pinned. It would also be possible to do this by involving the hugetlb subsystem in marking pageblocks to no longer be skipped when they hugetlb pages are freed. This is a simple solution that doesn't involve any additional subsystems in pageblock skip manipulation. [rientjes@google.com: fix build] Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708201734390.117182@chino.kir.corp.google.com Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708151639130.106658@chino.kir.corp.google.comSigned-off-by: NDavid Rientjes <rientjes@google.com> Tested-by: NMichal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-