1. 16 4月, 2019 5 次提交
    • G
      net: kernel hookers service for toa module · aff365b6
      George Zhang 提交于
      LVS fullnat will replace network traffic's source ip with its local ip,
      and thus the backend servers cannot obtain the real client ip.
      
      To solve this, LVS has introduced the tcp option address (TOA) to store
      the essential ip address information in the last tcp ack packet of the
      3-way handshake, and the backend servers need to retrieve it from the
      packet header.
      
      In this patch, we have introduced the sk_toa_data member in the sock
      structure to hold the TOA information. There used to be an in-tree
      module for TOA managing, whereas it has now been maintained as an
      standalone module.
      
      In this case, the toa module should register its hook function(s) using
      the provided interfaces in the hookers module.
      
      TOA in sock structure:
      
      	__be32 sk_toa_data[16];
      
      The hookers module only provides the sk_toa_data placeholder, and the
      toa module can use this variable through the layout it needs.
      
      Hook interfaces:
      
      The hookers module replaces the kernel's syn_recv_sock and getname
      handler with a stub that chains the toa module's hook function(s) to the
      original handling function. The hookers module allows hook functions to
      be installed and uninstalled in any order.
      
      toa module:
      
      The external toa module will be provided in separate RPM package.
      
      [xuyu@linux.alibaba.com: amend commit log]
      Signed-off-by: NGeorge Zhang <georgezhang@linux.alibaba.com>
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
      aff365b6
    • C
      virtio_blk: add discard and write zeroes support · 068b6c62
      Changpeng Liu 提交于
      commit 1f23816b8eb8fdc39990abe166c10a18c16f6b21 upstream.
      
      In commit 88c85538, "virtio-blk: add discard and write zeroes features
      to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
      block specification has been extended to add VIRTIO_BLK_T_DISCARD and
      VIRTIO_BLK_T_WRITE_ZEROES commands.  This patch enables support for
      discard and write zeroes in the virtio-blk driver when the device
      advertises the corresponding features, VIRTIO_BLK_F_DISCARD and
      VIRTIO_BLK_F_WRITE_ZEROES.
      Signed-off-by: NChangpeng Liu <changpeng.liu@intel.com>
      Signed-off-by: NDaniel Verkamp <dverkamp@chromium.org>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      068b6c62
    • E
      ext4: adjust reserved cluster count when removing extents · e640e0c0
      Eric Whitney 提交于
      commit 9fe671496b6c286f9033aedfc1718d67721da0ae upstream.
      
      Modify ext4_ext_remove_space() and the code it calls to correct the
      reserved cluster count for pending reservations (delayed allocated
      clusters shared with allocated blocks) when a block range is removed
      from the extent tree.  Pending reservations may be found for the clusters
      at the ends of written or unwritten extents when a block range is removed.
      If a physical cluster at the end of an extent is freed, it's necessary
      to increment the reserved cluster count to maintain correct accounting
      if the corresponding logical cluster is shared with at least one
      delayed and unwritten extent as found in the extents status tree.
      
      Add a new function, ext4_rereserve_cluster(), to reapply a reservation
      on a delayed allocated cluster sharing blocks with a freed allocated
      cluster.  To avoid ENOSPC on reservation, a flag is applied to
      ext4_free_blocks() to briefly defer updating the freeclusters counter
      when an allocated cluster is freed.  This prevents another thread
      from allocating the freed block before the reservation can be reapplied.
      
      Redefine the partial cluster object as a struct to carry more state
      information and to clarify the code using it.
      
      Adjust the conditional code structure in ext4_ext_remove_space to
      reduce the indentation level in the main body of the code to improve
      readability.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      e640e0c0
    • E
      ext4: fix reserved cluster accounting at delayed write time · fc9a1389
      Eric Whitney 提交于
      commit 0b02f4c0d6d9e2c611dfbdd4317193e9dca740e6 upstream.
      
      The code in ext4_da_map_blocks sometimes reserves space for more
      delayed allocated clusters than it should, resulting in premature
      ENOSPC, exceeded quota, and inaccurate free space reporting.
      
      Fix this by checking for written and unwritten blocks shared in the
      same cluster with the newly delayed allocated block.  A cluster
      reservation should not be made for a cluster for which physical space
      has already been allocated.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      fc9a1389
    • E
      ext4: generalize extents status tree search functions · a700b393
      Eric Whitney 提交于
      commit ad431025aecda85d3ebef5e4a3aca5c1c681d0c7 upstream.
      
      Ext4 contains a few functions that are used to search for delayed
      extents or blocks in the extents status tree.  Rather than duplicate
      code to add new functions to search for extents with different status
      values, such as written or a combination of delayed and unwritten,
      generalize the existing code to search for caller-specified extents
      status values.  Also, move this code into extents_status.c where it
      is better associated with the data structures it operates upon, and
      where it can be more readily used to implement new extents status tree
      functions that might want a broader scope for i_es_lock.
      
      Three missing static specifiers in RFC version of patch reported and
      fixed by Fengguang Wu <fengguang.wu@intel.com>.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      a700b393
  2. 06 4月, 2019 12 次提交
    • F
      netfilter: physdev: relax br_netfilter dependency · 2e6bcc32
      Florian Westphal 提交于
      [ Upstream commit 8e2f311a68494a6677c1724bdcb10bada21af37c ]
      
      Following command:
        iptables -D FORWARD -m physdev ...
      causes connectivity loss in some setups.
      
      Reason is that iptables userspace will probe kernel for the module revision
      of the physdev patch, and physdev has an artificial dependency on
      br_netfilter (xt_physdev use makes no sense unless a br_netfilter module
      is loaded).
      
      This causes the "phydev" module to be loaded, which in turn enables the
      "call-iptables" infrastructure.
      
      bridged packets might then get dropped by the iptables ruleset.
      
      The better fix would be to change the "call-iptables" defaults to 0 and
      enforce explicit setting to 1, but that breaks backwards compatibility.
      
      This does the next best thing: add a request_module call to checkentry.
      This was a stray '-D ... -m physdev' won't activate br_netfilter
      anymore.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2e6bcc32
    • O
      cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting · d0bc74c5
      Oleg Nesterov 提交于
      [ Upstream commit 51bee5abeab2058ea5813c5615d6197a23dbf041 ]
      
      The only user of cgroup_subsys->free() callback is pids_cgrp_subsys which
      needs pids_free() to uncharge the pid.
      
      However, ->free() is called from __put_task_struct()->cgroup_free() and this
      is too late. Even the trivial program which does
      
      	for (;;) {
      		int pid = fork();
      		assert(pid >= 0);
      		if (pid)
      			wait(NULL);
      		else
      			exit(0);
      	}
      
      can run out of limits because release_task()->call_rcu(delayed_put_task_struct)
      implies an RCU gp after the task/pid goes away and before the final put().
      
      Test-case:
      
      	mkdir -p /tmp/CG
      	mount -t cgroup2 none /tmp/CG
      	echo '+pids' > /tmp/CG/cgroup.subtree_control
      
      	mkdir /tmp/CG/PID
      	echo 2 > /tmp/CG/PID/pids.max
      
      	perl -e 'while ($p = fork) { wait; } $p // die "fork failed: $!\n"' &
      	echo $! > /tmp/CG/PID/cgroup.procs
      
      Without this patch the forking process fails soon after migration.
      
      Rename cgroup_subsys->free() to cgroup_subsys->release() and move the callsite
      into the new helper, cgroup_release(), called by release_task() which actually
      frees the pid(s).
      Reported-by: NHerton R. Krzesinski <hkrzesin@redhat.com>
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d0bc74c5
    • V
      bpf: fix missing prototype warnings · ae92cf47
      Valdis Kletnieks 提交于
      [ Upstream commit 116bfa96a255123ed209da6544f74a4f2eaca5da ]
      
      Compiling with W=1 generates warnings:
      
        CC      kernel/bpf/core.o
      kernel/bpf/core.c:721:12: warning: no previous prototype for ?bpf_jit_alloc_exec_limit? [-Wmissing-prototypes]
        721 | u64 __weak bpf_jit_alloc_exec_limit(void)
            |            ^~~~~~~~~~~~~~~~~~~~~~~~
      kernel/bpf/core.c:757:14: warning: no previous prototype for ?bpf_jit_alloc_exec? [-Wmissing-prototypes]
        757 | void *__weak bpf_jit_alloc_exec(unsigned long size)
            |              ^~~~~~~~~~~~~~~~~~
      kernel/bpf/core.c:762:13: warning: no previous prototype for ?bpf_jit_free_exec? [-Wmissing-prototypes]
        762 | void __weak bpf_jit_free_exec(void *addr)
            |             ^~~~~~~~~~~~~~~~~
      
      All three are weak functions that archs can override, provide
      proper prototypes for when a new arch provides their own.
      Signed-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ae92cf47
    • A
      sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock() · e8e0bd49
      Andrea Parri 提交于
      [ Upstream commit c546951d9c9300065bad253ecdf1ac59ce9d06c8 ]
      
      move_queued_task() synchronizes with task_rq_lock() as follows:
      
      	move_queued_task()		task_rq_lock()
      
      	[S] ->on_rq = MIGRATING		[L] rq = task_rq()
      	WMB (__set_task_cpu())		ACQUIRE (rq->lock);
      	[S] ->cpu = new_cpu		[L] ->on_rq
      
      where "[L] rq = task_rq()" is ordered before "ACQUIRE (rq->lock)" by an
      address dependency and, in turn, "ACQUIRE (rq->lock)" is ordered before
      "[L] ->on_rq" by the ACQUIRE itself.
      
      Use READ_ONCE() to load ->cpu in task_rq() (c.f., task_cpu()) to honor
      this address dependency.  Also, mark the accesses to ->cpu and ->on_rq
      with READ_ONCE()/WRITE_ONCE() to comply with the LKMM.
      Signed-off-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: https://lkml.kernel.org/r/20190121155240.27173-1-andrea.parri@amarulasolutions.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e8e0bd49
    • M
      perf/aux: Make perf_event accessible to setup_aux() · efd85d83
      Mathieu Poirier 提交于
      [ Upstream commit 840018668ce2d96783356204ff282d6c9b0e5f66 ]
      
      When pmu::setup_aux() is called the coresight PMU needs to know which
      sink to use for the session by looking up the information in the
      event's attr::config2 field.
      
      As such simply replace the cpu information by the complete perf_event
      structure and change all affected customers.
      Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Reviewed-by: NSuzuki Poulouse <suzuki.poulose@arm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-s390@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190131184714.20388-2-mathieu.poirier@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      efd85d83
    • T
      genirq: Avoid summation loops for /proc/stat · 1f369486
      Thomas Gleixner 提交于
      [ Upstream commit 1136b0728969901a091f0471968b2b76ed14d9ad ]
      
      Waiman reported that on large systems with a large amount of interrupts the
      readout of /proc/stat takes a long time to sum up the interrupt
      statistics. In principle this is not a problem. but for unknown reasons
      some enterprise quality software reads /proc/stat with a high frequency.
      
      The reason for this is that interrupt statistics are accounted per cpu. So
      the /proc/stat logic has to sum up the interrupt stats for each interrupt.
      
      This can be largely avoided for interrupts which are not marked as
      'PER_CPU' interrupts by simply adding a per interrupt summation counter
      which is incremented along with the per interrupt per cpu counter.
      
      The PER_CPU interrupts need to avoid that and use only per cpu accounting
      because they share the interrupt number and the interrupt descriptor and
      concurrent updates would conflict or require unwanted synchronization.
      Reported-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Link: https://lkml.kernel.org/r/20190208135020.925487496@linutronix.de
      
      8<-------------
      
      v2: Undo the unintentional layout change of struct irq_desc.
      
       include/linux/irqdesc.h |    1 +
       kernel/irq/chip.c       |   12 ++++++++++--
       kernel/irq/internals.h  |    8 +++++++-
       kernel/irq/irqdesc.c    |    7 ++++++-
       4 files changed, 24 insertions(+), 4 deletions(-)
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1f369486
    • L
      sched/topology: Fix percpu data types in struct sd_data & struct s_data · 845d4849
      Luc Van Oostenryck 提交于
      [ Upstream commit 99687cdbb3f6c8e32bcc7f37496e811f30460e48 ]
      
      The percpu members of struct sd_data and s_data are declared as:
      
      	struct ... ** __percpu member;
      
      So their type is:
      
      	__percpu pointer to pointer to struct ...
      
      But looking at how they're used, their type should be:
      
      	pointer to __percpu pointer to struct ...
      
      and they should thus be declared as:
      
      	struct ... * __percpu *member;
      
      So fix the placement of '__percpu' in the definition of these
      structures.
      
      This addresses a bunch of Sparse's warnings like:
      
      	warning: incorrect type in initializer (different address spaces)
      	  expected void const [noderef] <asn:3> *__vpp_verify
      	  got struct sched_domain **
      Signed-off-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190118144936.79158-1-luc.vanoostenryck@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      845d4849
    • S
      scsi: fcoe: make use of fip_mode enum complete · 1ef1b20f
      Sedat Dilek 提交于
      [ Upstream commit 8beb90aaf334a6efa3e924339926b5f93a234dbb ]
      
      commit 1917d42d ("fcoe: use enum for fip_mode") introduces a separate
      enum for the fip_mode that shall be used during initialisation handling
      until it is passed to fcoe_ctrl_link_up to set the initial fip_state.  That
      change was incomplete and gcc quietly converted in various places between
      the fip_mode and the fip_state enum values with implicit enum conversions,
      which fortunately cannot cause any issues in the actual code's execution.
      
      clang however warns about these implicit enum conversions in the scsi
      drivers. This commit consolidates the use of the two enums, guided by
      clang's enum-conversion warnings.
      
      This commit now completes the use of the fip_mode: It expects and uses
      fip_mode in {bnx2fc,fcoe}_interface_create and fcoe_ctlr_init, and it calls
      fcoe_ctrl_set_set() with the correct values in fcoe_ctlr_link_up().  It
      also breaks the association between FIP_MODE_AUTO and FIP_ST_AUTO to
      indicate these two enums are distinct.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/151
      Fixes: 1917d42d ("fcoe: use enum for fip_mode")
      Reported-by: NDmitry Golovin <dima@golovin.in>
      Original-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
      CC: Lukas Bulwahn <lukas.bulwahn@gmail.com>
      CC: Nick Desaulniers <ndesaulniers@google.com>
      CC: Nathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Tested-by: NNathan Chancellor <natechancellor@gmail.com>
      Suggested-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1ef1b20f
    • K
      clk: fractional-divider: check parent rate only if flag is set · 763a895a
      Katsuhiro Suzuki 提交于
      [ Upstream commit d13501a2bedfbea0983cc868d3f1dc692627f60d ]
      
      Custom approximation of fractional-divider may not need parent clock
      rate checking. For example Rockchip SoCs work fine using grand parent
      clock rate even if target rate is greater than parent.
      
      This patch checks parent clock rate only if CLK_SET_RATE_PARENT flag
      is set.
      
      For detailed example, clock tree of Rockchip I2S audio hardware.
        - Clock rate of CPLL is 1.2GHz, GPLL is 491.52MHz.
        - i2s1_div is integer divider can divide N (N is 1~128).
          Input clock is CPLL or GPLL. Initial divider value is N = 1.
          Ex) PLL = CPLL, N = 10, i2s1_div output rate is
            CPLL / 10 = 1.2GHz / 10 = 120MHz
        - i2s1_frac is fractional divider can divide input to x/y, x and
          y are 16bit integer.
      
      CPLL --> | selector | ---> i2s1_div -+--> | selector | --> I2S1 MCLK
      GPLL --> |          | ,--------------'    |          |
                            `--> i2s1_frac ---> |          |
      
      Clock mux system try to choose suitable one from i2s1_div and
      i2s1_frac for master clock (MCLK) of I2S1.
      
      Bad scenario as follows:
        - Try to set MCLK to 8.192MHz (32kHz audio replay)
          Candidate setting is
          - i2s1_div: GPLL / 60 = 8.192MHz
          i2s1_div candidate is exactly same as target clock rate, so mux
          choose this clock source. i2s1_div output rate is changed
          491.52MHz -> 8.192MHz
      
        - After that try to set to 11.2896MHz (44.1kHz audio replay)
          Candidate settings are
          - i2s1_div : CPLL / 107 = 11.214945MHz
          - i2s1_frac: i2s1_div   = 8.192MHz
            This is because clk_fd_round_rate() thinks target rate
            (11.2896MHz) is higher than parent rate (i2s1_div = 8.192MHz)
            and returns parent clock rate.
      
      Above is current upstreamed behavior. Clock mux system choose
      i2s1_div, but this clock rate is not acceptable for I2S driver, so
      users cannot replay audio.
      
      Expected behavior is:
        - Try to set master clock to 11.2896MHz (44.1kHz audio replay)
          Candidate settings are
          - i2s1_div : CPLL / 107          = 11.214945MHz
          - i2s1_frac: i2s1_div * 147/6400 = 11.2896MHz
                       Change i2s1_div to GPLL / 1 = 491.52MHz at same
                       time.
      
      If apply this commit, clk_fd_round_rate() calls custom approximate
      function of Rockchip even if target rate is higher than parent.
      Custom function changes both grand parent (i2s1_div) and parent
      (i2s_frac) settings at same time. Clock mux system can choose
      i2s1_frac and audio works fine.
      Signed-off-by: NKatsuhiro Suzuki <katsuhiro@katsuster.net>
      Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
      [sboyd@kernel.org: Make function into a macro instead]
      Signed-off-by: NStephen Boyd <sboyd@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      763a895a
    • C
      f2fs: fix to check inline_xattr_size boundary correctly · 4ab78f4d
      Chao Yu 提交于
      [ Upstream commit 500e0b28ecd3c5aade98f3c3a339d18dcb166bb6 ]
      
      We use below condition to check inline_xattr_size boundary:
      
      	if (!F2FS_OPTION(sbi).inline_xattr_size ||
      		F2FS_OPTION(sbi).inline_xattr_size >=
      				DEF_ADDRS_PER_INODE -
      				F2FS_TOTAL_EXTRA_ATTR_SIZE -
      				DEF_INLINE_RESERVED_SIZE -
      				DEF_MIN_INLINE_SIZE)
      
      There is there problems in that check:
      - we should allow inline_xattr_size equaling to min size of inline
      {data,dentry} area.
      - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
      different size unit, previous one is 4 bytes, latter one is 1 bytes.
      - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
      however, we need to consider min size of inline dentry area as well,
      minimal inline dentry should at least contain two entries: '.' and
      '..', so that min inline_dentry size is 40 bytes.
      
      .bitmap		1 * 1 = 1
      .reserved	1 * 1 = 1
      .dentry		11 * 2 = 22
      .filename	8 * 2 = 16
      total		40
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4ab78f4d
    • L
      include/linux/relay.h: fix percpu annotation in struct rchan · d6ad08aa
      Luc Van Oostenryck 提交于
      [ Upstream commit 62461ac2e5b6520b6d65fc6d7d7b4b8df4b848d8 ]
      
      The percpu member of this structure is declared as:
      	struct ... ** __percpu member;
      So its type is:
      	__percpu pointer to pointer to struct ...
      
      But looking at how it's used, its type should be:
      	pointer to __percpu pointer to struct ...
      and it should thus be declared as:
      	struct ... * __percpu *member;
      
      So fix the placement of '__percpu' in the definition of this
      structures.
      
      This silents a few Sparse's warnings like:
      	warning: incorrect type in initializer (different address spaces)
      	  expected void const [noderef] <asn:3> *__vpp_verify
      	  got struct sched_domain **
      
      Link: http://lkml.kernel.org/r/20190118144902.79065-1-luc.vanoostenryck@gmail.com
      Fixes: 017c59c0 ("relay: Use per CPU constructs for the relay channel buffer pointers")
      Signed-off-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Cc: Jens Axboe <axboe@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d6ad08aa
    • D
      tracing: kdb: Fix ftdump to not sleep · b73c7d02
      Douglas Anderson 提交于
      [ Upstream commit 31b265b3baaf55f209229888b7ffea523ddab366 ]
      
      As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
      BUG for "sleeping function called from invalid context".
      
      kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
      atomic context.  A very simple solution for this is to add allocation
      flags to ring_buffer_read_prepare() so kdb can call it without
      triggering the allocation error.  This patch does that.
      
      Note that in the original email thread about this, it was suggested
      that perhaps the solution for kdb was to either preallocate the buffer
      ahead of time or create our own iterator.  I'm hoping that this
      alternative of adding allocation flags to ring_buffer_read_prepare()
      can be considered since it means I don't need to duplicate more of the
      core trace code into "trace_kdb.c" (for either creating my own
      iterator or re-preparing a ring allocator whose memory was already
      allocated).
      
      NOTE: another option for kdb is to actually figure out how to make it
      reuse the existing ftrace_dump() function and totally eliminate the
      duplication.  This sounds very appealing and actually works (the "sr
      z" command can be seen to properly dump the ftrace buffer).  The
      downside here is that ftrace_dump() fully consumes the trace buffer.
      Unless that is changed I'd rather not use it because it means "ftdump
      | grep xyz" won't be very useful to search the ftrace buffer since it
      will throw away the whole trace on the first grep.  A future patch to
      dump only the last few lines of the buffer will also be hard to
      implement.
      
      [1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com
      
      Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.orgReported-by: NBrian Norris <briannorris@chromium.org>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b73c7d02
  3. 03 4月, 2019 4 次提交
    • H
      drivers: base: Helpers for adding device connection descriptions · e99d90ce
      Heikki Krogerus 提交于
      commit cd7753d371388e712e3ee52b693459f9b71aaac2 upstream.
      
      Introducing helpers for adding and removing multiple device
      connection descriptions at once.
      Acked-by: NHans de Goede <hdegoede@redhat.com>
      Tested-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e99d90ce
    • N
      mm: add support for kmem caches in DMA32 zone · 62d342d6
      Nicolas Boichat 提交于
      commit 6d6ea1e967a246f12cfe2f5fb743b70b2e608d4a upstream.
      
      Patch series "iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables",
      v6.
      
      This is a followup to the discussion in [1], [2].
      
      IOMMUs using ARMv7 short-descriptor format require page tables (level 1
      and 2) to be allocated within the first 4GB of RAM, even on 64-bit
      systems.
      
      For L1 tables that are bigger than a page, we can just use
      __get_free_pages with GFP_DMA32 (on arm64 systems only, arm would still
      use GFP_DMA).
      
      For L2 tables that only take 1KB, it would be a waste to allocate a full
      page, so we considered 3 approaches:
       1. This series, adding support for GFP_DMA32 slab caches.
       2. genalloc, which requires pre-allocating the maximum number of L2 page
          tables (4096, so 4MB of memory).
       3. page_frag, which is not very memory-efficient as it is unable to reuse
          freed fragments until the whole page is freed. [3]
      
      This series is the most memory-efficient approach.
      
      stable@ note:
        We confirmed that this is a regression, and IOMMU errors happen on 4.19
        and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue
        most likely starts from commit ad67f5a6 ("arm64: replace ZONE_DMA
        with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek
        platforms (and maybe others?).
      
      [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
      [2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html
      [3] https://patchwork.codeaurora.org/patch/671639/
      
      This patch (of 3):
      
      IOMMUs using ARMv7 short-descriptor format require page tables to be
      allocated within the first 4GB of RAM, even on 64-bit systems.  On arm64,
      this is done by passing GFP_DMA32 flag to memory allocation functions.
      
      For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
      a full page using get_free_pages, so we considered 3 approaches:
       1. This patch, adding support for GFP_DMA32 slab caches.
       2. genalloc, which requires pre-allocating the maximum number of L2
          page tables (4096, so 4MB of memory).
       3. page_frag, which is not very memory-efficient as it is unable
          to reuse freed fragments until the whole page is freed.
      
      This change makes it possible to create a custom cache in DMA32 zone using
      kmem_cache_create, then allocate memory using kmem_cache_alloc.
      
      We do not create a DMA32 kmalloc cache array, as there are currently no
      users of kmalloc(..., GFP_DMA32).  These calls will continue to trigger a
      warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.
      
      This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
      kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
      unnecessary).
      
      Link: http://lkml.kernel.org/r/20181210011504.122604-2-drinkcat@chromium.orgSigned-off-by: NNicolas Boichat <drinkcat@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Sasha Levin <Alexander.Levin@microsoft.com>
      Cc: Huaisheng Ye <yehs1@lenovo.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Yong Wu <yong.wu@mediatek.com>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: Tomasz Figa <tfiga@google.com>
      Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62d342d6
    • X
      sctp: get sctphdr by offset in sctp_compute_cksum · 97265479
      Xin Long 提交于
      [ Upstream commit 273160ffc6b993c7c91627f5a84799c66dfe4dee ]
      
      sctp_hdr(skb) only works when skb->transport_header is set properly.
      
      But in Netfilter, skb->transport_header for ipv6 is not guaranteed
      to be right value for sctphdr. It would cause to fail to check the
      checksum for sctp packets.
      
      So fix it by using offset, which is always right in all places.
      
      v1->v2:
        - Fix the changelog.
      
      Fixes: e6d8b64b ("net: sctp: fix and consolidate SCTP checksumming code")
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97265479
    • M
      packets: Always register packet sk in the same order · 69cea7cf
      Maxime Chevallier 提交于
      [ Upstream commit a4dc6a49156b1f8d6e17251ffda17c9e6a5db78a ]
      
      When using fanouts with AF_PACKET, the demux functions such as
      fanout_demux_cpu will return an index in the fanout socket array, which
      corresponds to the selected socket.
      
      The ordering of this array depends on the order the sockets were added
      to a given fanout group, so for FANOUT_CPU this means sockets are bound
      to cpus in the order they are configured, which is OK.
      
      However, when stopping then restarting the interface these sockets are
      bound to, the sockets are reassigned to the fanout group in the reverse
      order, due to the fact that they were inserted at the head of the
      interface's AF_PACKET socket list.
      
      This means that traffic that was directed to the first socket in the
      fanout group is now directed to the last one after an interface restart.
      
      In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
      socket that used to receive traffic from the last CPU after an interface
      restart.
      
      This commit introduces a helper to add a socket at the tail of a list,
      then uses it to register AF_PACKET sockets.
      
      Note that this changes the order in which sockets are listed in /proc and
      with sock_diag.
      
      Fixes: dc99f600 ("packet: Add fanout support")
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69cea7cf
  4. 27 3月, 2019 1 次提交
  5. 24 3月, 2019 11 次提交
  6. 14 3月, 2019 3 次提交
    • A
      drm: disable uncached DMA optimization for ARM and arm64 · f9a0a08d
      Ard Biesheuvel 提交于
      [ Upstream commit e02f5c1bb2283cfcee68f2f0feddcc06150f13aa ]
      
      The DRM driver stack is designed to work with cache coherent devices
      only, but permits an optimization to be enabled in some cases, where
      for some buffers, both the CPU and the GPU use uncached mappings,
      removing the need for DMA snooping and allocation in the CPU caches.
      
      The use of uncached GPU mappings relies on the correct implementation
      of the PCIe NoSnoop TLP attribute by the platform, otherwise the GPU
      will use cached mappings nonetheless. On x86 platforms, this does not
      seem to matter, as uncached CPU mappings will snoop the caches in any
      case. However, on ARM and arm64, enabling this optimization on a
      platform where NoSnoop is ignored results in loss of coherency, which
      breaks correct operation of the device. Since we have no way of
      detecting whether NoSnoop works or not, just disable this
      optimization entirely for ARM and arm64.
      
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Zhou <David1.Zhou@amd.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Junwei Zhang <Jerry.Zhang@amd.com>
      Cc: Michel Daenzer <michel.daenzer@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Maxime Ripard <maxime.ripard@bootlin.com>
      Cc: Sean Paul <sean@poorly.run>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>
      Cc: dri-devel <dri-devel@lists.freedesktop.org>
      Reported-by: NCarsten Haitzler <Carsten.Haitzler@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.kernel.org/patch/10778815/Signed-off-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f9a0a08d
    • Z
      irqchip/gic-v3-its: Fix ITT_entry_size accessor · 2a5c84e1
      Zenghui Yu 提交于
      [ Upstream commit 56841070ccc87b463ac037d2d1f2beb8e5e35f0c ]
      
      According to ARM IHI 0069C (ID070116), we should use GITS_TYPER's
      bits [7:4] as ITT_entry_size instead of [8:4]. Although this is
      pretty annoying, it only results in a potential over-allocation
      of memory, and nothing bad happens.
      
      Fixes: 3dfa576b ("irqchip/gic-v3-its: Add probing for VLPI properties")
      Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
      [maz: massaged subject and commit message]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2a5c84e1
    • J
      net: stmmac: Fallback to Platform Data clock in Watchdog conversion · 46ba03c5
      Jose Abreu 提交于
      [ Upstream commit 4ec5302fa906ec9d86597b236f62315bacdb9622 ]
      
      If we don't have DT then stmmac_clk will not be available. Let's add a
      new Platform Data field so that we can specify the refclk by this mean.
      
      This way we can still use the coalesce command in PCI based setups.
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      46ba03c5
  7. 10 3月, 2019 4 次提交