We built a workflow to train a model. It works fast enough on our local, maybe not so powerful, machine. So far.
The data set is growing. Each month a considerable number of new records is added. Each month the training workflow becomes slower. Shall we start to think of scalability? Shall we consider big data platforms? Could my neat and elegant KNIME workflow be replicated on a big data platform? Indeed it can.
The KNIME Big Data Extension offers nodes to build and configure workflows to run on the big data platform of choice. The cool feature of the KNIME Big Data Extension consists in the nodes GUI. The configuration window for each Big Data node has been built as similar as possible to the configuration window of the corresponding KNIME node. The configuration window of a Spark Joiner node will look exactly the same as the configuration window of a Joiner node.
Thus, it is not only possible to replicate your original workflow on a Big Data Platform, it is also extremely easy since you do not need to learn new scripts or tools instructions. The KNIME Big Data Extension brings the ease of use of KNIME into the scalability of Big Data.
This video shows how we replicated an existing classical analytics workflow on a Big Data Platform.