Data generator with multiple file output

We have a data generator for a KMeans benchmark and want to use it with the PEEL framework.  
The generator produces 2 files, points and centers and run as a flink job. We want to save these files in `<hdfs-root-directory >/kmeans` using the `GeneratedDataSet` class and then pick these files with the KMeans flink job. 

My question is: How can we configure PEEL to create the directory `kmeans` in HDFS and then copy the files to that directory? With our current configuration shown below that does not work.

``` xml
     

    <bean id="datagen.kmeans" class="org.peelframework.flink.beans.job.FlinkJob">
        <constructor-arg name="runner" ref="flink-1.0.3"/>
        <constructor-arg name="command">
            <value><![CDATA[
              -v -c org.apache.flink.examples.java.clustering.util.KMeansDataGenerator  \
              ${app.path.datagens}/KMeans.jar                                                                                   \
              --points ${datagen.points}                                                                                                \
              --k ${datagen.k}                                                                                                                  \
               --output ${system.hadoop-2.path.input}/kmeans
            ]]>
            </value>
        </constructor-arg>
    </bean>

    

        <bean id="dataset.kmeans.generated" class="org.peelframework.core.beans.data.GeneratedDataSet">
        <constructor-arg name="src" ref="datagen.kmeans"/>
        <constructor-arg name="dst" value="${system.hadoop-2.path.input}/kmeans"/>
        <constructor-arg name="fs" ref="hdfs-2.7.1"/>
    </bean>
```

The usage of our data generator is similar to the `WordGenetator` except that it produces 2 files instead of just one. 

Do you have an idea how we could solve this problem with PEEL or do we have to adjust our data generator? 

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data generator with multiple file output #105

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data generator with multiple file output #105

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions