Class RDDAggregateUtils
- java.lang.Object
-
- org.apache.sysds.runtime.instructions.spark.utils.RDDAggregateUtils
-
public class RDDAggregateUtils extends Object
Collection of utility methods for aggregating binary block rdds. As a general policy always call stable algorithms which maintain corrections over blocks per key. The performance overhead over a simple reducebykey is roughly 7-10% and with that acceptable.
-
-
Constructor Summary
Constructors Constructor Description RDDAggregateUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, boolean deepCopyCombiner)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, int numPartitions, boolean deepCopyCombiner)static MatrixBlockaggStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop)Single block aggregation over pair rdds with corrections for numerical stability.static MatrixBlockaggStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in, AggregateOperator aop)Single block aggregation over rdds with corrections for numerical stability.static TensorBlockaggStableTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> in, AggregateOperator aop)Single block aggregation over pair rdds with corrections for numerical stability.static TensorBlockaggStableTensor(org.apache.spark.api.java.JavaRDD<TensorBlock> in, AggregateOperator aop)Single block aggregation over rdds with corrections for numerical stability.static doublemax(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner)Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner)Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>mergeRowsByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,RowMatrixBlock> in)Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double>sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double>sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in, int numParts)static MatrixBlocksumStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)static MatrixBlocksumStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in)
-
-
-
Method Detail
-
sumStable
public static MatrixBlock sumStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
-
sumStable
public static MatrixBlock sumStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in)
-
sumByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
-
sumByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner)
-
sumByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner)
-
sumCellsByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in)
-
sumCellsByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in, int numParts)
-
aggStable
public static MatrixBlock aggStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop)
Single block aggregation over pair rdds with corrections for numerical stability.- Parameters:
in- matrix asJavaPairRDD<MatrixIndexes, MatrixBlock>aop- aggregate operator- Returns:
- matrix block
-
aggStable
public static MatrixBlock aggStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in, AggregateOperator aop)
Single block aggregation over rdds with corrections for numerical stability.- Parameters:
in- matrix asJavaRDD<MatrixBlock>aop- aggregate operator- Returns:
- matrix block
-
aggStableTensor
public static TensorBlock aggStableTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> in, AggregateOperator aop)
Single block aggregation over pair rdds with corrections for numerical stability.- Parameters:
in- tensor asJavaPairRDD<TensorIndexes, TensorBlock>aop- aggregate operator- Returns:
- tensor block
-
aggStableTensor
public static TensorBlock aggStableTensor(org.apache.spark.api.java.JavaRDD<TensorBlock> in, AggregateOperator aop)
Single block aggregation over rdds with corrections for numerical stability.- Parameters:
in- tensor asJavaRDD<TensorBlock>aop- aggregate operator- Returns:
- tensor block
-
aggByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop)
-
aggByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, boolean deepCopyCombiner)
-
aggByKeyStable
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, int numPartitions, boolean deepCopyCombiner)
-
max
public static double max(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
-
mergeByKey
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
in- matrix asJavaPairRDD<MatrixIndexes, MatrixBlock>- Returns:
- matrix as
JavaPairRDD<MatrixIndexes, MatrixBlock>
-
mergeByKey
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner)
Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
in- matrix asJavaPairRDD<MatrixIndexes, MatrixBlock>deepCopyCombiner- indicator if the createCombiner functions needs to deep copy the input block- Returns:
- matrix as
JavaPairRDD<MatrixIndexes, MatrixBlock>
-
mergeByKey
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner)
Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
in- matrix asJavaPairRDD<MatrixIndexes, MatrixBlock>numPartitions- number of output partitionsdeepCopyCombiner- indicator if the createCombiner functions needs to deep copy the input block- Returns:
- matrix as
JavaPairRDD<MatrixIndexes, MatrixBlock>
-
mergeRowsByKey
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeRowsByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,RowMatrixBlock> in)
Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
in- matrix asJavaPairRDD<MatrixIndexes, RowMatrixBlock>- Returns:
- matrix as
JavaPairRDD<MatrixIndexes, MatrixBlock>
-
-