We have a data generator for a KMeans benchmark and want to use it with the PEEL framework.
The generator produces 2 files, points and centers and run as a flink job. We want to save these files in <hdfs-root-directory >/kmeans using the GeneratedDataSet class and then pick these files with the KMeans flink job.
My question is: How can we configure PEEL to create the directory kmeans in HDFS and then copy the files to that directory? With our current configuration shown below that does not work.
<!--************************************************************************
* Data Generators
*************************************************************************-->
<bean id="datagen.kmeans" class="org.peelframework.flink.beans.job.FlinkJob">
<constructor-arg name="runner" ref="flink-1.0.3"/>
<constructor-arg name="command">
<value><![CDATA[
-v -c org.apache.flink.examples.java.clustering.util.KMeansDataGenerator \
${app.path.datagens}/KMeans.jar \
--points ${datagen.points} \
--k ${datagen.k} \
--output ${system.hadoop-2.path.input}/kmeans
]]>
</value>
</constructor-arg>
</bean>
<!--************************************************************************
* Data Sets
*************************************************************************-->
<bean id="dataset.kmeans.generated" class="org.peelframework.core.beans.data.GeneratedDataSet">
<constructor-arg name="src" ref="datagen.kmeans"/>
<constructor-arg name="dst" value="${system.hadoop-2.path.input}/kmeans"/>
<constructor-arg name="fs" ref="hdfs-2.7.1"/>
</bean>
The usage of our data generator is similar to the WordGenetator except that it produces 2 files instead of just one.
Do you have an idea how we could solve this problem with PEEL or do we have to adjust our data generator?
Thanks!
We have a data generator for a KMeans benchmark and want to use it with the PEEL framework.
The generator produces 2 files, points and centers and run as a flink job. We want to save these files in
<hdfs-root-directory >/kmeansusing theGeneratedDataSetclass and then pick these files with the KMeans flink job.My question is: How can we configure PEEL to create the directory
kmeansin HDFS and then copy the files to that directory? With our current configuration shown below that does not work.The usage of our data generator is similar to the
WordGenetatorexcept that it produces 2 files instead of just one.Do you have an idea how we could solve this problem with PEEL or do we have to adjust our data generator?
Thanks!