未验证 提交 b896d839 编写于 作者: H Hans Zeller 提交者: GitHub

Correctly compute distribution policy in phase 1 of gpexpand (#555)

Consider a hashed distributed partitioned table foo. After first stage
of gpexpand, the root table will have distribution polcy as hashed but
the leaf partitions will have random distribution. In such a case,
ORCA sets m_convert_hash_to_random to true in the table
descriptor. When the flag is true we should treat the distribution
policy of table as random during planning.

Note that this code will need to change when we plan to handle
gpexpand phase 1 in GPDB 6 and later. At that time, we won't be able
to treat these tables as random-partitioned anymore.

Co-authored-by: Sambitesh Dash sdash@pivotal.io
Co-authored-by: Hans Zeller hzeller@pivotal.io
上级 1eec040b
......@@ -5,7 +5,7 @@ project(gpopt LANGUAGES CXX C)
set(CMAKE_CXX_STANDARD 98)
set(GPORCA_VERSION_MAJOR 3)
set(GPORCA_VERSION_MINOR 84)
set(GPORCA_VERSION_MINOR 85)
set(GPORCA_VERSION_PATCH 0)
set(GPORCA_VERSION_STRING "${GPORCA_VERSION_MAJOR}.${GPORCA_VERSION_MINOR}.${GPORCA_VERSION_PATCH}")
......
此差异已折叠。
......@@ -340,7 +340,7 @@ explain insert into pt2 select * from r;
<dxl:ConstValue TypeMdid="0.16.1.0" Value="true"/>
</dxl:ResidualFilter>
<dxl:PropagationExpression>
<dxl:ConstValue TypeMdid="0.23.1.0" IsNull="true" IsByValue="true"/>
<dxl:ConstValue TypeMdid="0.23.1.0" IsNull="true"/>
</dxl:PropagationExpression>
<dxl:PrintableFilter>
<dxl:ConstValue TypeMdid="0.16.1.0" Value="true"/>
......
......@@ -340,7 +340,7 @@ explain insert into pt2 select * from r;
<dxl:ConstValue TypeMdid="0.16.1.0" Value="true"/>
</dxl:ResidualFilter>
<dxl:PropagationExpression>
<dxl:ConstValue TypeMdid="0.23.1.0" IsNull="true" IsByValue="true"/>
<dxl:ConstValue TypeMdid="0.23.1.0" IsNull="true"/>
</dxl:PropagationExpression>
<dxl:PrintableFilter>
<dxl:ConstValue TypeMdid="0.16.1.0" Value="true"/>
......
......@@ -193,7 +193,9 @@ namespace gpopt
return 0 < m_pdrgpulPart->Size();
}
// true iff a hash distributed table needs to be considered as random
// true iff a hash distributed table needs to be considered as random;
// this happens for when we are in phase 1 of a gpexpand or (for GPDB 5X)
// when we have a mix of hash-distributed and random distributed partitions
BOOL ConvertHashToRandom() const
{
return m_convert_hash_to_random;
......
......@@ -71,6 +71,51 @@ CPhysicalDML::CPhysicalDML
NULL != pcrCtid && NULL != pcrSegmentId);
m_pds = CPhysical::PdsCompute(m_mp, m_ptabdesc, pdrgpcrSource);
if (CDistributionSpec::EdtHashed == m_pds->Edt() && ptabdesc->ConvertHashToRandom())
{
// The "convert hash to random" flag indicates that we have a table that was hash-partitioned
// originally but then we either entered phase 1 of a gpexpand or we altered some of the partitions
// to be randomly distributed (works on GPDB 5X only).
// If this is the case, we want to handle DMLs in the following way:
//
// Insert: Use a hash redistribution for the insert, that means that we insert the data into
// the random partitions using a hash function, which can still be considered "random"
// Delete: Use a "strict random" distribution, which will use a routed repartition operator,
// based on the gp_segment_id of the row, which will work for both hash and random partitions
// Update without updating the distribution key: Same method as for delete
// Update of the distribution key: This will be handled with a Split node below the DML node,
// with the split deleting the existing rows and this DML node inserting the new rows,
// so this is handled here like an insert, using hash distribution for all partitions.
BOOL is_update_without_changing_distribution_key = false;
if (CLogicalDML::EdmlUpdate == edmlop)
{
CDistributionSpecHashed *hashDistSpec = CDistributionSpecHashed::PdsConvert(m_pds);
CColRefSet *updatedCols = GPOS_NEW(mp) CColRefSet(mp);
CColRefSet *distributionCols = hashDistSpec->PcrsUsed(mp);
// compute a ColRefSet of the updated columns
for (ULONG c=0; c < pdrgpcrSource->Size(); c++)
{
if (pbsModified->Get(c))
{
updatedCols->Include((*pdrgpcrSource)[c]);
}
}
is_update_without_changing_distribution_key = !updatedCols->FIntersects(distributionCols);
updatedCols->Release();
distributionCols->Release();
}
if (CLogicalDML::EdmlDelete == edmlop || is_update_without_changing_distribution_key)
{
m_pds->Release();
m_pds = GPOS_NEW(mp) CDistributionSpecRandom();
}
}
m_pos = PosComputeRequired(mp, ptabdesc);
ComputeRequiredLocalColumns(mp);
}
......
......@@ -38,6 +38,9 @@ const CHAR *rgszDMLFileNames[] =
"../data/dxl/minidump/InsertRandomDistr.mdp",
"../data/dxl/minidump/InsertMismatchedDistrubution.mdp",
"../data/dxl/minidump/InsertMismatchedDistrubution-2.mdp",
"../data/dxl/minidump/DeleteMismatchedDistribution.mdp",
"../data/dxl/minidump/UpdateNoDistKeyMismatchedDistribution.mdp",
"../data/dxl/minidump/UpdateDistKeyMismatchedDistribution.mdp",
"../data/dxl/minidump/InsertConstTupleRandomDistribution.mdp",
"../data/dxl/minidump/InsertMasterOnlyTable.mdp",
"../data/dxl/minidump/InsertMasterOnlyTableConstTuple.mdp",
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册