1. 19 1月, 2022 1 次提交
  2. 18 1月, 2022 6 次提交
    • U
      [SPARK-37949][SQL] Improve Rebalance statistics estimation · 1f496fbe
      ulysses-you 提交于
      ### What changes were proposed in this pull request?
      
      Match `RebalancePartitions` in `SizeInBytesOnlyStatsPlanVisitor` and `BasicStatsPlanVisitor`.
      
      ### Why are the changes needed?
      
      The defualt statistics estimation only consider the size in bytes, which may lost the row rount and columns statistics.
      
      The `RebalancePartitions` actually does not change the statistics of plan, so we can use the statistics of its child for more accurate.
      
      ### Does this PR introduce _any_ user-facing change?
      
      no, only affect the statistics of plan
      
      ### How was this patch tested?
      
      Unify the test in `BasicStatsEstimationSuite`
      
      Closes #35235 from ulysses-you/SPARK-37949.
      Authored-by: Nulysses-you <ulyssesyou18@gmail.com>
      Signed-off-by: NWenchen Fan <wenchen@databricks.com>
      1f496fbe
    • Y
      [SPARK-37768][SQL][FOLLOWUP] Schema pruning for the metadata struct · 54f91d39
      yaohua 提交于
      ### What changes were proposed in this pull request?
      Follow-up PR of #34575. Support the metadata struct schema pruning for all file formats.
      
      ### Why are the changes needed?
      Performance improvements.
      
      ### Does this PR introduce _any_ user-facing change?
      No
      
      ### How was this patch tested?
      Existing UTs and a new UT.
      
      Closes #35147 from Yaohua628/spark-37768.
      Authored-by: Nyaohua <yaohua.zhao@databricks.com>
      Signed-off-by: NWenchen Fan <wenchen@databricks.com>
      54f91d39
    • A
      [SPARK-37906][SQL] spark-sql should not pass last comment to backend · 450418bd
      Angerszhuuuu 提交于
      ### What changes were proposed in this pull request?
      In https://github.com/apache/spark/pull/34815 we change back support unclosed bracketed comment to backend.
      
      But miss the case such as
      ```
      SELECT 1; --comment
      ```
      
      ```
      SELECT 1; /* comment */
      ```
      
      It's a common use case in sql job. We should ignore the comment at end of SQL script.
      
      Need to clarify that when use `-e`, we directly pass SQL to `splitSemiColon`, when use `-f`, CliDriver will add a `\n` for query.
      ```
        public int processReader(BufferedReader r) throws IOException {
          StringBuilder qsb = new StringBuilder();
      
          String line;
          while((line = r.readLine()) != null) {
            if (!line.startsWith("--")) {
              qsb.append(line + "\n");
            }
          }
      
          return this.processLine(qsb.toString());
        }
      ```
      
      So in `splitSemiColon`, we should consider both case.
      
      In this pr, the final behavior like below
      
      For `-e`
      
      | Query | Behavior before  | Behavior now|
      -------|---------|--------------|
      |   `SELECT 1; --comment` |  Will pass both `SELECT 1` and `--comment` to backend engine and throw exception since `--comment` can't be executed| Only pass `SELECT 1` to backend engine and will ignore the simple comment |
      |   `SELECT 1;  /* comment */ ` |  Will pass both `SELECT 1` and `/* comment */` to backend engine and throw exception since `/* comment */` can't be executed |  Only pass `SELECT 1` to backend engine|
      |   `SELECT 1;  /* comment ` | Will pass `SELECT 1` and `/* comment` to backend engine  |  Will pass `SELECT 1` and `/* comment` to backend engine  |
      |   `SELECT 1; /* comment SELECT 1` | Will pass `SELECT 1` and `/* comment SELECT 1` to backend engine  |  Will pass `SELECT 1` and `/* comment SELECT 1` to backend engine  |
      |   `/ * comment SELECT 1;`  |  Will pass `/ * comment SELECT 1;` to back end engine and throw `unclose bracketed comment exception`|  Will pass `/ * comment SELECT 1;` to back end engine and throw `unclose bracketed comment exception`|
      
      For `-f`, since `-f` will add a `\n` at the end line if it's not started  as `--`
      | Query | Behavior before  | Behavior now|
      -------|---------|--------------|
      |   `SELECT 1; --comment\n` |  Will pass both `SELECT 1` and `--comment` to backend engine and throw exception since `--comment` can't be executed| Only pass `SELECT 1` to backend engine and will ignore the simple comment |
      |   `SELECT 1;  /* comment */ \n` |  Will pass both `SELECT 1` and `/* comment */` to backend engine and throw exception since `/* comment */` can't be executed |  Only pass `SELECT 1` to backend engine|
      |   `SELECT 1;  /* comment \n` | Will pass `SELECT 1` and `/* comment\n` to backend engine  |  Will pass `SELECT 1` and `/* comment\n` to backend engine  |
      |   `SELECT 1; /* comment SELECT 1\n` | Will pass `SELECT 1` and `/* comment SELECT 1\n` to backend engine  |  Will pass `SELECT 1` and `/* comment SELECT 1\n` to backend engine  |
      |   `/ * comment SELECT 1;\n`  |  Will pass `/ * comment SELECT 1;\n` to back end engine and throw `unclose bracketed comment exception`|  Will pass `/ * comment SELECT 1;\n` to back end engine and throw `unclose bracketed comment exception`|
      
      ### Why are the changes needed?
      Spark sql should not pass last entire comment   to backend
      
      ### Does this PR introduce _any_ user-facing change?
      Use can write SQL script end with a comment
      ```
      SELECT 1; --comment
      ```
      
      ```
      SELECT 1; /* comment */
      ```
      
      ### How was this patch tested?
      Added UT
      
      Closes #35206 from AngersZhuuuu/SPARK-37906.
      Lead-authored-by: NAngerszhuuuu <angers.zhu@gmail.com>
      Co-authored-by: NAngersZhuuuu <angers.zhu@gmail.com>
      Signed-off-by: NWenchen Fan <wenchen@databricks.com>
      450418bd
    • P
      [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default · 2c825d19
      PengLei 提交于
      ### What changes were proposed in this pull request?
      
      1. Add `quoted(identifier: TableIdentifier)` to quoted the table name of V1 command(SHOW CREATE TABLE[AS SERDE]) to match V2 behavior. It just work when `quoteIfNeeded`
      2. Change `addV2TableProperties` of `V1Table`. Just when `external == true`, we will add `location` property.
      3. Change `V1Table`.`Schema`, re-construct the original schema from the string.
      4. Use V2 command as default for `SHOW CRATE TABLE`
      5. Change V2 behavior `ShowTablePartitions` to match V1 behavior.
      
      ### Why are the changes needed?
      
      It's been a while since we introduced the v2 commands, and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands.
      
      ### Does this PR introduce _any_ user-facing change?
      
      use V2 command as default for `show create table`
      if LEGACY_USE_V1_COMMAND == true
      will use V1 command
      
      ### How was this patch tested?
      build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowCreateTableSuite"
      
      Closes #35204 from Peng-Lei/SPARK-37878.
      Authored-by: NPengLei <peng.8lei@gmail.com>
      Signed-off-by: NWenchen Fan <wenchen@databricks.com>
      2c825d19
    • A
      [SPARK-37712][YARN] Spark request yarn cluster metrics slow cause delay · df7447bc
      Angerszhuuuu 提交于
      ### What changes were proposed in this pull request?
      Spark will request yarn cluster metrics and print a log about nodemanager number, it's not so important and this rpc is always slow
      ![image](https://user-images.githubusercontent.com/46485123/147055954-30698764-b313-419f-8759-772ad9f301ff.png)
      
      We can make it as debug level
      
      ### Why are the changes needed?
      Avoid unnecessary delay when submit application.
      
      ### Does this PR introduce _any_ user-facing change?
      No
      
      ### How was this patch tested?
      Not need
      
      Closes #34982 from AngersZhuuuu/SPARK-37712.
      Authored-by: NAngerszhuuuu <angers.zhu@gmail.com>
      Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
      df7447bc
    • Y
      [SPARK-37498][PYTHON] Add eventually for test_reuse_worker_of_parallelize_range · 732477b2
      Yikun Jiang 提交于
      ### What changes were proposed in this pull request?
      Add eventually for test_reuse_worker_of_parallelize_range
      
      ### Why are the changes needed?
      Avoid test_reuse_worker_of_parallelize_range becoming flaky when resources are tight or some other reason
      
      ### Does this PR introduce _any_ user-facing change?
      No
      
      ### How was this patch tested?
      UT passed.
      
      Closes #35228 from Yikun/SPARK-37498.
      Authored-by: NYikun Jiang <yikunkero@gmail.com>
      Signed-off-by: NHyukjin Kwon <gurwls223@apache.org>
      732477b2
  3. 17 1月, 2022 4 次提交
  4. 16 1月, 2022 2 次提交
  5. 15 1月, 2022 8 次提交
  6. 14 1月, 2022 12 次提交
  7. 13 1月, 2022 7 次提交