DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Productionizing Data Science

Productionizing Data Science

Collaboration is necessary to be successful with Big Data, and the right server facilitates collaboration.

Tom Smith user avatar by
Tom Smith
CORE ·
Nov. 21, 18 · Opinion
Like (1)
Save
Tweet
Share
5.94K Views

Join the DZone community and get the full member experience.

Join For Free

It was great speaking with Michael Berthold, Founder and CEO of KNIME during their fall summit. KNIME provides an open source analytics platform for the creation of data science. It allows developers, scientists, analysts, and business owners to design and implement data science workflows with added leverage from KNIME Integrations, KNIME Extensions, Community Extensions, and Partner Extensions.

According to Michael, multiple users working on the same projects will need to share files, opinions, and current work -- collaborating to build the best solution. A data science project rarely finishes with a trained model, the conclusive step is to deploy the model within a production application. Scalability in real-world applications is another concern. Finally, all workflows, models, metanodes, and the data produced within the group need access rights, monitoring, versioning, and management.

KNIME Server extends the power of KNIME Analytics Platform by improving the productivity of data science teams with collaborative, scalability, deployment, and management features, giving them more freedom and flexibility.

The server enables collaboration where users can build a workflow and get feedback by sharing with team members who can comment, correct, tag, and rate the workflow via the workflow hub. Users can access interactive overviews of the workflows including the image, meta information, and the required plugins for the workflow. This is a great way to encourage discussion about data science problems among stakeholders. The workflow hub helps retain knowledge as team members come and go and promotes the reuse of successful workflows. The public workflow hub serves as a place for the KNIME community to share and rate workflows.

Database connections, logical groups of nodes, and complex Python, R, or other scripts can all be shared in a way that's available to new users. To create, simply drag and drop the node into your workflow editor. This creates a read-only link to the original metanode template.

There are three deployment options with the KNIME server: 1) run via remote execution; 2) run via a web browser on the KNIME web portal; and, 3) run as a REST API.

Workflows can be set to run on a schedule, as well as remotely. You can monitor your workflow and make changes to the configuration. This is useful when you are not able to run on the high-performance hardware required to process extremely large datasets, or working with GPUs for deep learning applications.

Security and administration controls are part of the server. After deployment, you can set permissions, and control access on workflows and data files to comply with data protection policies, internal business rules, and team processes. It's also possible to integrate authentications with corporate LDAP/Active Directory setups and manage permissions for groups and individual users.

For large deployments, keeping control of common and approved preferences and allowing connections to databases, proxy settings, Python or R setting can become difficult. The management feature of the server makes it easy to manage all client preferences. 

When deploying or updating a workflow you can keep track of changes by taking snapshots so you can roll back to a previous version if necessary. You can also identify subtle differences in workflows with the workflow difference functionality highlight.

Finally, the server keeps a record of all jobs and enables you to scale by moving workflows to a distributed platform, the public cloud, or distributed executors of the KNIME server using RabbitMQ.

Data science workflow KNIME

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Why Does DevOps Recommend Shift-Left Testing Principles?
  • Using the PostgreSQL Pager With MariaDB Xpand
  • Cloud-Native Application Networking
  • Playwright vs. Cypress: The King Is Dead, Long Live the King?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: