The commons-math3 distributions used in the reference data generator in the archetypes are really slow.
During a local test of an experiment suite on which I am working with @ggevay I am observing the following numbers for generating dataset.A with key cardinality 100000:
- with
Uniform key distribution, the job takes ~ 5 seconds
- with
Binomial key distribution, the job takes ~ 25 seconds
- with
Zipfian key distribution, the datagen job exceeded the allowed limit of 600 seconds.
The fix should be pushed to the peel-wordcount repository (see peelframework/peel-wordcount#1).
The
commons-math3distributions used in the reference data generator in the archetypes are really slow.During a local test of an experiment suite on which I am working with @ggevay I am observing the following numbers for generating
dataset.Awith key cardinality 100000:Uniformkey distribution, the job takes ~ 5 secondsBinomialkey distribution, the job takes ~ 25 secondsZipfiankey distribution, the datagen job exceeded the allowed limit of 600 seconds.The fix should be pushed to the peel-wordcount repository (see peelframework/peel-wordcount#1).