首页>
外国专利>
ADAPTIVE HANDLING OF SKEW FOR DISTRIBUTED JOINS IN A CLUSTER
ADAPTIVE HANDLING OF SKEW FOR DISTRIBUTED JOINS IN A CLUSTER
展开▼
机译:集群中分布式联接的偏斜自适应处理
展开▼
页面导航
摘要
著录项
相似文献
摘要
Techniques for detecting data skew while performing a distributed join operation on tables in a cluster of nodes managed by database management system (cDBMS), is disclosed. In an embodiment, heavy hitter values in a join column of a table are determined during the runtime of a distributed join operation of the table with another table. The cDBMS keeps in a datastore a count for each unique value read from the join column of the table. The datastore may be a hash table with the unique values serving as keys and may additionally include a heap or a sorted array for an efficient count based traversal. When a count for a particular value in the datastore exceeds a threshold, then the particular value is identified as a heavy hitter value. The tuples from the joined table that include the heavy hitter value, are kept local at the node that the tuples were originally distributed to, while the other joined table tuples are broadcasted to one or more nodes of the cDBMS that at least include the originally distributed nodes.
展开▼