1. 23 5月, 2017 22 次提交
    • B
      sched/rt: Remove unnecessary condition in push_rt_task() · de16b91e
      Byungchul Park 提交于
      pick_next_pushable_task(rq) has BUG_ON(rq_cpu != task_cpu(task)) when
      it returns a task other than NULL, which means that task_cpu(task) must
      be rq->cpu. So if task == next_task, then task_cpu(next_task) must be
      rq->cpu as well. Remove the redundant condition and make the code simpler.
      
      This way one unnecessary branch and two LOAD operations can be avoided.
      Signed-off-by: NByungchul Park <byungchul.park@lge.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Reviewed-by: NJuri Lelli <juri.lelli@arm.com>
      Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Cc: <kernel-team@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1494551143-22219-1-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      de16b91e
    • B
      sched/core: Use the new llist_for_each_entry_safe() primitive · 73215849
      Byungchul Park 提交于
      Now that we've added llist_for_each_entry_safe(), use it to simplify
      an open coded version of it in sched_ttwu_pending().
      Signed-off-by: NByungchul Park <byungchul.park@lge.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <kernel-team@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1494549584-11730-1-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      73215849
    • B
      llist: Provide a safe version for llist_for_each() · d714893e
      Byungchul Park 提交于
      Sometimes we have to dereference next field of llist node before entering
      loop becasue the node might be deleted or the next field might be
      modified within the loop. So this adds the safe version of llist_for_each(),
      that is, llist_for_each_safe().
      Signed-off-by: NByungchul Park <byungchul.park@lge.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NHuang, Ying <ying.huang@intel.com>
      Cc: <kernel-team@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1494549416-10539-1-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d714893e
    • P
      smp, cpumask: Use non-atomic cpumask_{set,clear}_cpu() · 6c8557bd
      Peter Zijlstra 提交于
      The cpumasks in smp_call_function_many() are private and not subject
      to concurrency, atomic bitops are pointless and expensive.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6c8557bd
    • A
      smp: Avoid sending needless IPI in smp_call_function_many() · 3fc5b3b6
      Aaron Lu 提交于
      Inter-Processor-Interrupt(IPI) is needed when a page is unmapped and the
      process' mm_cpumask() shows the process has ever run on other CPUs. page
      migration, page reclaim all need IPIs. The number of IPI needed to send
      to different CPUs is especially large for multi-threaded workload since
      mm_cpumask() is per process.
      
      For smp_call_function_many(), whenever a CPU queues a CSD to a target
      CPU, it will send an IPI to let the target CPU to handle the work.
      This isn't necessary - we need only send IPI when queueing a CSD
      to an empty call_single_queue.
      
      The reason:
      
      flush_smp_call_function_queue() that is called upon a CPU receiving an
      IPI will empty the queue and then handle all of the CSDs there. So if
      the target CPU's call_single_queue is not empty, we know that:
      i.  An IPI for the target CPU has already been sent by 'previous queuers';
      ii. flush_smp_call_function_queue() hasn't emptied that CPU's queue yet.
      Thus, it's safe for us to just queue our CSD there without sending an
      addtional IPI. And for the 'previous queuers', we can limit it to the
      first queuer.
      
      To demonstrate the effect of this patch, a multi-thread workload that
      spawns 80 threads to equally consume 100G memory is used. This is tested
      on a 2 node broadwell-EP which has 44cores/88threads and 32G memory. So
      after 32G memory is used up, page reclaiming starts to happen a lot.
      
      With this patch, IPI number dropped 88% and throughput increased about
      15% for the above workload.
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170519075331.GE2084@aaronlu.sh.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3fc5b3b6
    • I
      Merge branch 'linus' into sched/core, to pick up fixes · 386b5548
      Ingo Molnar 提交于
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      386b5548
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · fde8e33d
      Linus Torvalds 提交于
      Pull crypto fix from Herbert Xu:
       "This fixes a regression in the skcipher interface that allows bogus
        key parameters to hit underlying implementations which can cause
        crashes"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: skcipher - Add missing API setkey checks
      fde8e33d
    • L
      Merge tag 'pstore-v4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · fadd2ce5
      Linus Torvalds 提交于
      Pull pstore fix from Kees Cook:
       "Marta noticed another misbehavior in EFI pstore, which this fixes.
      
        Hopefully this is the last of the v4.12 fixes for pstore!"
      
      * tag 'pstore-v4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        efi-pstore: Fix write/erase id tracking
      fadd2ce5
    • L
      Merge tag 'acpi-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 74a9e7db
      Linus Torvalds 提交于
      Pull ACPI fixes from Rafael Wysocki:
       "These revert a 4.11 change that turned out to be problematic and add a
        .gitignore file.
      
        Specifics:
      
         - Revert a 4.11 commit related to the ACPI-based handling of laptop
           lids that made changes incompatible with existing user space stacks
           and broke things there (Lv Zheng).
      
         - Add .gitignore to the ACPI tools directory (Prarit Bhargava)"
      
      * tag 'acpi-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        Revert "ACPI / button: Remove lid_init_state=method mode"
        tools/power/acpi: Add .gitignore file
      74a9e7db
    • L
      Merge tag 'pm-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 801099be
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "These fix RTC wakeup from suspend-to-idle broken recently, fix CPU
        idleness detection condition in the schedutil cpufreq governor, fix a
        cpufreq driver build failure, fix an error code path in the power
        capping framework, clean up the hibernate core and update the
        intel_pstate documentation.
      
        Specifics:
      
         - Fix RTC wakeup from suspend-to-idle broken by the recent rework of
           ACPI wakeup handling (Rafael Wysocki).
      
         - Update intel_pstate driver documentation to reflect the current
           code and explain how it works in more detail (Rafael Wysocki).
      
         - Fix an issue related to CPU idleness detection on systems with
           shared cpufreq policies in the schedutil governor (Juri Lelli).
      
         - Fix a possible build issue in the dbx500 cpufreq driver (Arnd
           Bergmann).
      
         - Fix a function in the power capping framework core to return an
           error code instead of 0 when there's an error (Dan Carpenter).
      
         - Clean up variable definition in the hibernation core (Pushkar
           Jambhlekar)"
      
      * tag 'pm-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: dbx500: add a Kconfig symbol
        PM / hibernate: Declare variables as static
        PowerCap: Fix an error code in powercap_register_zone()
        RTC: rtc-cmos: Fix wakeup from suspend-to-idle
        PM / wakeup: Fix up wakeup_source_report_event()
        cpufreq: intel_pstate: Document the current behavior and user interface
        cpufreq: schedutil: use now as reference when aggregating shared policy requests
      801099be
    • J
      i2c: designware: Fix bogus sda_hold_time due to uninitialized vars · ad258fb9
      Jan Kiszka 提交于
      We need to initializes those variables to 0 for platforms that do not
      provide ACPI parameters. Otherwise, we set sda_hold_time to random
      values, breaking e.g. Galileo and IOT2000 boards.
      Reported-and-tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NTobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
      Fixes: 9d640843 ("i2c: designware: don't infer timings described by ACPI from clock rate")
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad258fb9
    • K
      efi-pstore: Fix write/erase id tracking · c10e8031
      Kees Cook 提交于
      Prior to the pstore interface refactoring, the "id" generated during
      a backend pstore_write() was only retained by the internal pstore
      inode tracking list. Additionally the "part" was ignored, so EFI
      would encode this in the id. This corrects the misunderstandings
      and correctly sets "id" during pstore_write(), and uses "part"
      directly during pstore_erase().
      Reported-by: NMarta Lofstedt <marta.lofstedt@intel.com>
      Fixes: 76cc9580 ("pstore: Replace arguments for write() API")
      Fixes: a61072aa ("pstore: Replace arguments for erase() API")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Tested-by: NMarta Lofstedt <marta.lofstedt@intel.com>
      c10e8031
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 86ca984c
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Mostly netfilter bug fixes in here, but we have some bits elsewhere as
        well.
      
         1) Don't do SNAT replies for non-NATed connections in IPVS, from
            Julian Anastasov.
      
         2) Don't delete conntrack helpers while they are still in use, from
            Liping Zhang.
      
         3) Fix zero padding in xtables's xt_data_to_user(), from Willem de
            Bruijn.
      
         4) Add proper RCU protection to nf_tables_dump_set() because we
            cannot guarantee that we hold the NFNL_SUBSYS_NFTABLES lock. From
            Liping Zhang.
      
         5) Initialize rcv_mss in tcp_disconnect(), from Wei Wang.
      
         6) smsc95xx devices can't handle IPV6 checksums fully, so don't
            advertise support for offloading them. From Nisar Sayed.
      
         7) Fix out-of-bounds access in __ip6_append_data(), from Eric
            Dumazet.
      
         8) Make atl2_probe() propagate the error code properly on failures,
            from Alexey Khoroshilov.
      
         9) arp_target[] in bond_check_params() is used uninitialized. This
            got changes from a global static to a local variable, which is how
            this mistake happened. Fix from Jarod Wilson.
      
        10) Fix fallout from unnecessary NULL check removal in cls_matchall,
            from Jiri Pirko. This is definitely brown paper bag territory..."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
        net: sched: cls_matchall: fix null pointer dereference
        vsock: use new wait API for vsock_stream_sendmsg()
        bonding: fix randomly populated arp target array
        net: Make IP alignment calulations clearer.
        bonding: fix accounting of active ports in 3ad
        net: atheros: atl2: don't return zero on failure path in atl2_probe()
        ipv6: fix out of bound writes in __ip6_append_data()
        bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
        smsc95xx: Support only IPv4 TCP/UDP csum offload
        arp: always override existing neigh entries with gratuitous ARP
        arp: postpone addr_type calculation to as late as possible
        arp: decompose is_garp logic into a separate function
        arp: fixed error in a comment
        tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
        netfilter: xtables: fix build failure from COMPAT_XT_ALIGN outside CONFIG_COMPAT
        ebtables: arpreply: Add the standard target sanity check
        netfilter: nf_tables: revisit chain/object refcounting from elements
        netfilter: nf_tables: missing sanitization in data from userspace
        netfilter: nf_tables: can't assume lock is acquired when dumping set elems
        netfilter: synproxy: fix conntrackd interaction
        ...
      86ca984c
    • J
      net: sched: cls_matchall: fix null pointer dereference · 2d76b2f8
      Jiri Pirko 提交于
      Since the head is guaranteed by the check above to be null, the call_rcu
      would explode. Remove the previously logically dead code that was made
      logically very much alive and kicking.
      
      Fixes: 985538ee ("net/sched: remove redundant null check on head")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d76b2f8
    • W
      vsock: use new wait API for vsock_stream_sendmsg() · 499fde66
      WANG Cong 提交于
      As reported by Michal, vsock_stream_sendmsg() could still
      sleep at vsock_stream_has_space() after prepare_to_wait():
      
        vsock_stream_has_space
          vmci_transport_stream_has_space
            vmci_qpair_produce_free_space
              qp_lock
                qp_acquire_queue_mutex
                  mutex_lock
      
      Just switch to the new wait API like we did for commit
      d9dc8b0f ("net: fix sleeping for sk_wait_event()").
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      499fde66
    • J
      bonding: fix randomly populated arp target array · 72ccc471
      Jarod Wilson 提交于
      In commit dc9c4d0f, the arp_target array moved from a static global
      to a local variable. By the nature of static globals, the array used to
      be initialized to all 0. At present, it's full of random data, which
      that gets interpreted as arp_target values, when none have actually been
      specified. Systems end up booting with spew along these lines:
      
      [   32.161783] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
      [   32.168475] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
      [   32.175089] 8021q: adding VLAN 0 to HW filter on device lacp0
      [   32.193091] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
      [   32.204892] lacp0: Setting MII monitoring interval to 100
      [   32.211071] lacp0: Removing ARP target 216.124.228.17
      [   32.216824] lacp0: Removing ARP target 218.160.255.255
      [   32.222646] lacp0: Removing ARP target 185.170.136.184
      [   32.228496] lacp0: invalid ARP target 255.255.255.255 specified for removal
      [   32.236294] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
      [   32.243987] lacp0: Removing ARP target 56.125.228.17
      [   32.249625] lacp0: Removing ARP target 218.160.255.255
      [   32.255432] lacp0: Removing ARP target 15.157.233.184
      [   32.261165] lacp0: invalid ARP target 255.255.255.255 specified for removal
      [   32.268939] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
      [   32.276632] lacp0: Removing ARP target 16.0.0.0
      [   32.281755] lacp0: Removing ARP target 218.160.255.255
      [   32.287567] lacp0: Removing ARP target 72.125.228.17
      [   32.293165] lacp0: Removing ARP target 218.160.255.255
      [   32.298970] lacp0: Removing ARP target 8.125.228.17
      [   32.304458] lacp0: Removing ARP target 218.160.255.255
      
      None of these were actually specified as ARP targets, and the driver does
      seem to clean up the mess okay, but it's rather noisy and confusing, leaks
      values to userspace, and the 255.255.255.255 spew shows up even when debug
      prints are disabled.
      
      The fix: just zero out arp_target at init time.
      
      While we're in here, init arp_all_targets_value in the right place.
      
      Fixes: dc9c4d0f ("bonding: reduce scope of some global variables")
      CC: Mahesh Bandewar <maheshb@google.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      CC: stable@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72ccc471
    • R
      Merge branches 'pm-sleep' and 'powercap' · bb47e964
      Rafael J. Wysocki 提交于
      * pm-sleep:
        PM / hibernate: Declare variables as static
        RTC: rtc-cmos: Fix wakeup from suspend-to-idle
        PM / wakeup: Fix up wakeup_source_report_event()
      
      * powercap:
        PowerCap: Fix an error code in powercap_register_zone()
      bb47e964
    • R
      Merge branches 'acpi-button' and 'acpi-tools' · e3170cc0
      Rafael J. Wysocki 提交于
      * acpi-button:
        Revert "ACPI / button: Remove lid_init_state=method mode"
      
      * acpi-tools:
        tools/power/acpi: Add .gitignore file
      e3170cc0
    • R
      Merge branches 'intel_pstate', 'pm-cpufreq' and 'pm-cpufreq-sched' · 079c1812
      Rafael J. Wysocki 提交于
      * intel_pstate:
        cpufreq: intel_pstate: Document the current behavior and user interface
      
      * pm-cpufreq:
        cpufreq: dbx500: add a Kconfig symbol
      
      * pm-cpufreq-sched:
        cpufreq: schedutil: use now as reference when aggregating shared policy requests
      079c1812
    • D
      net: Make IP alignment calulations clearer. · e4eda884
      David S. Miller 提交于
      The assignmnet:
      
      	ip_align = strict ? 2 : NET_IP_ALIGN;
      
      in compare_pkt_ptr_alignment() trips up Coverity because we can only
      get to this code when strict is true, therefore ip_align will always
      be 2 regardless of NET_IP_ALIGN's value.
      
      So just assign directly to '2' and explain the situation in the
      comment above.
      Reported-by: N"Gustavo A. R. Silva" <garsilva@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4eda884
    • J
      bonding: fix accounting of active ports in 3ad · 751da2a6
      Jarod Wilson 提交于
      As of 7bb11dc9 and 0622cab0, bond slaves in a 3ad bond are not
      removed from the aggregator when they are down, and the active slave count
      is NOT equal to number of ports in the aggregator, but rather the number
      of ports in the aggregator that are still enabled. The sysfs spew for
      bonding_show_ad_num_ports() has a comment that says "Show number of active
      802.3ad ports.", but it's currently showing total number of ports, both
      active and inactive. Remedy it by using the same logic introduced in
      0622cab0 in __bond_3ad_get_active_agg_info(), so sysfs, procfs and
      netlink all report the number of active ports. Note that this means that
      IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of
      NUM_PORTS, and thus perhaps should be renamed for clarity.
      
      Lightly tested on a dual i40e lacp bond, simulating link downs with an ip
      link set dev <slave2> down, was able to produce the state where I could
      see both in the same aggregator, but a number of ports count of 1.
      
      MII Status: up
      Active Aggregator Info:
              Aggregator ID: 1
              Number of ports: 2 <---
      Slave Interface: ens10
      MII Status: up <---
      Aggregator ID: 1
      Slave Interface: ens11
      MII Status: up
      Aggregator ID: 1
      
      MII Status: up
      Active Aggregator Info:
              Aggregator ID: 1
              Number of ports: 1 <---
      Slave Interface: ens10
      MII Status: down <---
      Aggregator ID: 1
      Slave Interface: ens11
      MII Status: up
      Aggregator ID: 1
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      751da2a6
    • A
      net: atheros: atl2: don't return zero on failure path in atl2_probe() · bd703a15
      Alexey Khoroshilov 提交于
      If dma mask checks fail in atl2_probe(), it breaks off initialization,
      deallocates all resources, but returns zero.
      
      The patch adds proper error code return value and
      make error code setup unified.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd703a15
  2. 22 5月, 2017 18 次提交
    • E
      ipv6: fix out of bound writes in __ip6_append_data() · 232cd35d
      Eric Dumazet 提交于
      Andrey Konovalov and idaifish@gmail.com reported crashes caused by
      one skb shared_info being overwritten from __ip6_append_data()
      
      Andrey program lead to following state :
      
      copy -4200 datalen 2000 fraglen 2040
      maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200
      
      The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen,
      fraggap, 0); is overwriting skb->head and skb_shared_info
      
      Since we apparently detect this rare condition too late, move the
      code earlier to even avoid allocating skb and risking crashes.
      
      Once again, many thanks to Andrey and syzkaller team.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Reported-by: <idaifish@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      232cd35d
    • L
      Linux 4.12-rc2 · 08332893
      Linus Torvalds 提交于
      08332893
    • L
      x86: fix 32-bit case of __get_user_asm_u64() · 33c9e972
      Linus Torvalds 提交于
      The code to fetch a 64-bit value from user space was entirely buggered,
      and has been since the code was merged in early 2016 in commit
      b2f68038 ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
      kernels").
      
      Happily the buggered routine is almost certainly entirely unused, since
      the normal way to access user space memory is just with the non-inlined
      "get_user()", and the inlined version didn't even historically exist.
      
      The normal "get_user()" case is handled by external hand-written asm in
      arch/x86/lib/getuser.S that doesn't have either of these issues.
      
      There were two independent bugs in __get_user_asm_u64():
      
       - it still did the STAC/CLAC user space access marking, even though
         that is now done by the wrapper macros, see commit 11f1a4b9
         ("x86: reorganize SMAP handling in user space accesses").
      
         This didn't result in a semantic error, it just means that the
         inlined optimized version was hugely less efficient than the
         allegedly slower standard version, since the CLAC/STAC overhead is
         quite high on modern Intel CPU's.
      
       - the double register %eax/%edx was marked as an output, but the %eax
         part of it was touched early in the asm, and could thus clobber other
         inputs to the asm that gcc didn't expect it to touch.
      
         In particular, that meant that the generated code could look like
         this:
      
              mov    (%eax),%eax
              mov    0x4(%eax),%edx
      
         where the load of %edx obviously was _supposed_ to be from the 32-bit
         word that followed the source of %eax, but because %eax was
         overwritten by the first instruction, the source of %edx was
         basically random garbage.
      
      The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
      the 64-bit output as early-clobber to let gcc know that no inputs should
      alias with the output register.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable@kernel.org   # v4.8+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33c9e972
    • L
      Clean up x86 unsafe_get/put_user() type handling · 334a023e
      Linus Torvalds 提交于
      Al noticed that unsafe_put_user() had type problems, and fixed them in
      commit a7cc722f ("fix unsafe_put_user()"), which made me look more
      at those functions.
      
      It turns out that unsafe_get_user() had a type issue too: it limited the
      largest size of the type it could handle to "unsigned long".  Which is
      fine with the current users, but doesn't match our existing normal
      get_user() semantics, which can also handle "u64" even when that does
      not fit in a long.
      
      While at it, also clean up the type cast in unsafe_put_user().  We
      actually want to just make it an assignment to the expected type of the
      pointer, because we actually do want warnings from types that don't
      convert silently.  And it makes the code more readable by not having
      that one very long and complex line.
      
      [ This patch might become stable material if we ever end up back-porting
        any new users of the unsafe uaccess code, but as things stand now this
        doesn't matter for any current existing uses. ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      334a023e
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f3926e4c
      Linus Torvalds 提交于
      Pull misc uaccess fixes from Al Viro:
       "Fix for unsafe_put_user() (no callers currently in mainline, but
        anyone starting to use it will step into that) + alpha osf_wait4()
        infoleak fix"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        osf_wait4(): fix infoleak
        fix unsafe_put_user()
      f3926e4c
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 970c305a
      Linus Torvalds 提交于
      Pull scheduler fix from Thomas Gleixner:
       "A single scheduler fix:
      
        Prevent idle task from ever being preempted. That makes sure that
        synchronize_rcu_tasks() which is ignoring idle task does not pretend
        that no task is stuck in preempted state. If that happens and idle was
        preempted on a ftrace trampoline the machine crashes due to
        inconsistent state"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/core: Call __schedule() from do_idle() without enabling preemption
      970c305a
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e7a3d627
      Linus Torvalds 提交于
      Pull irq fixes from Thomas Gleixner:
       "A set of small fixes for the irq subsystem:
      
         - Cure a data ordering problem with chained interrupts
      
         - Three small fixlets for the mbigen irq chip"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Fix chained interrupt data ordering
        irqchip/mbigen: Fix the clear register offset calculation
        irqchip/mbigen: Fix potential NULL dereferencing
        irqchip/mbigen: Fix memory mapping code
      e7a3d627
    • X
      bridge: start hello_timer when enabling KERNEL_STP in br_stp_start · 6d18c732
      Xin Long 提交于
      Since commit 76b91c32 ("bridge: stp: when using userspace stp stop
      kernel hello and hold timers"), bridge would not start hello_timer if
      stp_enabled is not KERNEL_STP when br_dev_open.
      
      The problem is even if users set stp_enabled with KERNEL_STP later,
      the timer will still not be started. It causes that KERNEL_STP can
      not really work. Users have to re-ifup the bridge to avoid this.
      
      This patch is to fix it by starting br->hello_timer when enabling
      KERNEL_STP in br_stp_start.
      
      As an improvement, it's also to start hello_timer again only when
      br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
      no reason to start the timer again when it's NO_STP.
      
      Fixes: 76b91c32 ("bridge: stp: when using userspace stp stop kernel hello and hold timers")
      Reported-by: NHaidong Li <haili@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NIvan Vecera <cera@cera.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d18c732
    • N
      smsc95xx: Support only IPv4 TCP/UDP csum offload · fe0cd8ca
      Nisar Sayed 提交于
      When TX checksum offload is used, if the computed checksum is 0 the
      LAN95xx device do not alter the checksum to 0xffff.  In the case of ipv4
      UDP checksum, it indicates to receiver that no checksum is calculated.
      Under ipv6, UDP checksum yields a result of zero must be changed to
      0xffff. Hence disabling checksum offload for ipv6 packets.
      Signed-off-by: NNisar Sayed <Nisar.Sayed@microchip.com>
      Reported-by: Npopcorn mix <popcornmix@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe0cd8ca
    • D
      Merge branch 'arp-always-override-existing-neigh-entries-with-gratuitous-ARP' · 776ee323
      David S. Miller 提交于
      Ihar Hrachyshka says:
      
      ====================
      arp: always override existing neigh entries with gratuitous ARP
      
      This patchset is spurred by discussion started at
      https://patchwork.ozlabs.org/patch/760372/ where we figured that there is no
      real reason for enforcing override by gratuitous ARP packets only when
      arp_accept is 1. Same should happen when it's 0 (the default value).
      
      changelog v2: handled review comments by Julian Anastasov
      - fixed a mistake in a comment;
      - postponed addr_type calculation to as late as possible.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      776ee323
    • I
      arp: always override existing neigh entries with gratuitous ARP · 7d472a59
      Ihar Hrachyshka 提交于
      Currently, when arp_accept is 1, we always override existing neigh
      entries with incoming gratuitous ARP replies. Otherwise, we override
      them only if new replies satisfy _locktime_ conditional (packets arrive
      not earlier than _locktime_ seconds since the last update to the neigh
      entry).
      
      The idea behind locktime is to pick the very first (=> close) reply
      received in a unicast burst when ARP proxies are used. This helps to
      avoid ARP thrashing where Linux would switch back and forth from one
      proxy to another.
      
      This logic has nothing to do with gratuitous ARP replies that are
      generally not aligned in time when multiple IP address carriers send
      them into network.
      
      This patch enforces overriding of existing neigh entries by all incoming
      gratuitous ARP packets, irrespective of their time of arrival. This will
      make the kernel honour all incoming gratuitous ARP packets.
      Signed-off-by: NIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d472a59
    • I
      arp: postpone addr_type calculation to as late as possible · d9ef2e7b
      Ihar Hrachyshka 提交于
      The addr_type retrieval can be costly, so it's worth trying to avoid its
      calculation as much as possible. This patch makes it calculated only
      for gratuitous ARP packets. This is especially important since later we
      may want to move is_garp calculation outside of arp_accept block, at
      which point the costly operation will be executed for all setups.
      
      The patch is the result of a discussion in net-dev:
      http://marc.info/?l=linux-netdev&m=149506354216994Suggested-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9ef2e7b
    • I
      arp: decompose is_garp logic into a separate function · 6fd05633
      Ihar Hrachyshka 提交于
      The code is quite involving already to earn a separate function for
      itself. If anything, it helps arp_process readability.
      Signed-off-by: NIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fd05633
    • I
      arp: fixed error in a comment · 34eb5fe0
      Ihar Hrachyshka 提交于
      the is_garp code deals just with gratuitous ARP packets, not every
      unsolicited packet.
      
      This patch is a result of a discussion in netdev:
      http://marc.info/?l=linux-netdev&m=149506354216994Suggested-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34eb5fe0
    • W
      tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0 · 499350a5
      Wei Wang 提交于
      When tcp_disconnect() is called, inet_csk_delack_init() sets
      icsk->icsk_ack.rcv_mss to 0.
      This could potentially cause tcp_recvmsg() => tcp_cleanup_rbuf() =>
      __tcp_select_window() call path to have division by 0 issue.
      So this patch initializes rcv_mss to TCP_MIN_MSS instead of 0.
      Reported-by: NAndrey Konovalov  <andreyknvl@google.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      499350a5
    • A
      osf_wait4(): fix infoleak · a8c39544
      Al Viro 提交于
      failing sys_wait4() won't fill struct rusage...
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a8c39544
    • A
      fix unsafe_put_user() · a7cc722f
      Al Viro 提交于
      __put_user_size() relies upon its first argument having the same type as what
      the second one points to; the only other user makes sure of that and
      unsafe_put_user() should do the same.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a7cc722f
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 23416e23
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains Netfilter/IPVS fixes for your net tree,
      they are:
      
      1) When using IPVS in direct-routing mode, normal traffic from the LVS
         host to a back-end server is sometimes incorrectly NATed on the way
         back into the LVS host. Patch to fix this from Julian Anastasov.
      
      2) Calm down clang compilation warning in ctnetlink due to type
         mismatch, from Matthias Kaehlcke.
      
      3) Do not re-setup NAT for conntracks that are already confirmed, this
         is fixing a problem that was introduced in the previous nf-next batch.
         Patch from Liping Zhang.
      
      4) Do not allow conntrack helper removal from userspace cthelper
         infrastructure if already in used. This comes with an initial patch
         to introduce nf_conntrack_helper_put() that is required by this fix.
         From Liping Zhang.
      
      5) Zero the pad when copying data to userspace, otherwise iptables fails
         to remove rules. This is a follow up on the patchset that sorts out
         the internal match/target structure pointer leak to userspace. Patch
         from the same author, Willem de Bruijn. This also comes with a build
         failure when CONFIG_COMPAT is not on, coming in the last patch of
         this series.
      
      6) SYNPROXY crashes with conntrack entries that are created via
         ctnetlink, more specifically via conntrackd state sync. Patch from
         Eric Leblond.
      
      7) RCU safe iteration on set element dumping in nf_tables, from
         Liping Zhang.
      
      8) Missing sanitization of immediate date for the bitwise and cmp
         expressions in nf_tables.
      
      9) Refcounting logic for chain and objects from set elements does not
         integrate into the nf_tables 2-phase commit protocol.
      
      10) Missing sanitization of target verdict in ebtables arpreply target,
          from Gao Feng.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23416e23