DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The Latest Big Data Topics

article thumbnail
Feature Store - Why Do You Need One?
How to set up a data architecture that saves your data scientists time and effort.
August 22, 2022
by Roger Oriol
· 4,598 Views · 1 Like
article thumbnail
Data Pipeline Orchestration
In this article, I share an idea of what a data pipeline orchestration model might look like.
August 22, 2022
by Ivan Nikitsenka
· 6,356 Views · 2 Likes
article thumbnail
Bootstrap K3S Data: For Beginners
Let’s take a look at how to create an initially well created cluster for managing bootstrap data with a description of K3S and High availability (HA) clusters.
August 21, 2022
by Hrittik Roy
· 6,942 Views · 2 Likes
article thumbnail
Kubernetes: Persistent Disk or StatefulSet?
When and why to use Persistent Disk and StatefulSet with Kubernetes
August 21, 2022
by Sylvain Kalache
· 4,504 Views · 1 Like
article thumbnail
Optimal Transport and its Applications to Fairness
As fairness in AI becomes an increasing area of focus across industries, data scientists should consider the value of optimal transport
August 21, 2022
by Terrence Alsup
· 6,532 Views · 2 Likes
article thumbnail
How to Debug an Unresponsive Elasticsearch Cluster
While highly scalable, Elastisearch is complex to set up. Read on for a cheat sheet for common integration issues, what they mean, and how to solve them.
August 21, 2022
by Derric Gilling CORE
· 6,064 Views · 2 Likes
article thumbnail
CHAR vs. VARCHAR: What's the Difference?
In this article, we are going to go through the similarities and differences between two MySQL data types: VARCHAR and CHAR.
August 21, 2022
by Lukas Vileikis
· 5,569 Views · 2 Likes
article thumbnail
A Review of Popular IoT Communication Protocols
Every IoT communication protocol has a set of parameters that make it a success in one Internet of Things deployment and a total failure in another.
August 19, 2022
by Roman Lapa
· 5,811 Views · 4 Likes
article thumbnail
How Pro-Coders and Low-Coders Can Find Common Ground
Pro-coders and low-coders can work together if each recognizes the unique talents and skills the other brings to enterprises.
August 19, 2022
by Amy Groden-Morrison
· 4,119 Views · 2 Likes
article thumbnail
Processing of Streaming Data: Kappa vs Lambda Architectures
In today’s Big Data landscape, Lambda architecture is a new archetype for handling a vast amount of data. How does it compare to Kappa architecture?
August 19, 2022
by Gautam Goswami CORE
· 5,380 Views · 1 Like
article thumbnail
Integrate Oracle Database With Apache Kafka Using Debezium
Oracle Databases are used for traditional enterprise applications and departmental systems in large enterprises. Debezium connector for Oracle is a great way to capture data changes from the transactional system of record and make them available for application use.
August 16, 2022
by Hugo Guerrero CORE
· 4,381 Views · 1 Like
article thumbnail
On Some Aspects of Big Data Processing in Apache Spark, Part 2: Useful Design Patterns
In this post, learn to construct Spark applications in a maintainable and upgradable way, where at the same time "task not serializable" exceptions are avoided.
August 16, 2022
by Alexander Eleseev CORE
· 5,187 Views · 4 Likes
article thumbnail
IoT + Cloud Growth = Greatest Cybersecurity Risk
Cybersecurity platforms help achieve full visibility, security and control across every user on every device.
August 15, 2022
by Tom Smith CORE
· 6,012 Views · 1 Like
article thumbnail
What Is Pydantic?
Pydantic can be used with any Python-based framework and it supports native JSON encoding and decoding as well. Here, learn how simple it is to adopt Pydantic.
August 15, 2022
by Sameer Shukla CORE
· 7,744 Views · 3 Likes
article thumbnail
“The Data of Experience” at SXSW by Yet Analytics and the Advanced Distributed Learning Initiative
South by Southwest (SXSW) Edu and Interactive is a big event. Big Data startup Yet Analytics and ADL will present their groundbreaking model of data technology at SXSW, which centers on the Experience API (xAPI)
August 13, 2022
by Allie Tscheulin
· 3,096 Views · 6 Likes
article thumbnail
The Problem Is, This Jeeves Can’t Think
Learn about the risks of driverless cars and how to future-proof the driverless experience.
August 13, 2022
by Yashodeep Sengupta
· 3,580 Views · 3 Likes
article thumbnail
The 5 Most Promising Frameworks of the First Half of 2016
An in-depth look at what distinguishes some newer JavaScript frameworks like Polymer, Aurelia, Meteor, Webix, and React.
August 13, 2022
by Ivan Petrenko
· 41,597 Views · 37 Likes
article thumbnail
APIs Are the Backbone of New IoT Standards
Here is a breakdown of how APIs impact IoT and what the World Wide Web Consortium, among others, is doing to make IoT communication broader.
August 12, 2022
by Jennifer Riggins
· 7,349 Views · 1 Like
article thumbnail
Does Datameer Support a Full Big Data Analysis Process?
Over the last few days, I had the chance to test Datameer analytics solution (das). Das is a platform for Hadoop which includes data source integration, an analytics engine, and visualization functionality. This promise of a fully integrated big data analysis process motivated me to test the product. It really includes all required functionality for data management or ETL, it provides standard tools to analyze data and there are nice ways to build visualization dashboards. For example, there are connectors for Twitter, IMAP, HDFS, or FTP available. All menus and processes are self-explaining and the complete interface is strongly Excel or spreadsheet oriented. If you are familiar with excel you can do the analyses on your big data out of the box. For a fast on-the-fly analysis performance you only work with a subset of your data and the analyses you store will then be automatically transformed into a kind of procedure. In the end – or according to a schedule you set – you “run” the analyses on your big data: Das collects the latest data for you, Das creates MapReduce jobs in the background, and updates all your spreadsheets and visualizations. To close the analysis circle you can use the connectors to write your results back to HDFS or a database such as HBase or many more technologies. Analytics the Way You Need Them There are a lot of ways that Datameer can prove useful to you, and you don’t want to discount that fact. You will discover that it is likely the case that Datameer is a great way to display a large-scale amount of data in a format that is digestible and useful to you. We can only absorb so much data as individual human beings, but we can certainly use the information that we receive to make important decisions about our business, our products, and the future experiences that our customers will have as a result. Thus, it makes sense that there are many people who want to use Datameer as a means of getting this done. If you have felt what it is like to see all of your data on a big spreadsheet and be able to visualize what it looks like coming together, then you know that you need Datameer and similar products to help you get the results that you really need. Das is really designed for big data. If you test it with small data you will be frustrated by the performance – the overhead of creating MapReduce jobs dominates in this situation. But as soon as you start with real big data analyses this overhead gets negligible and das is taking over a lot of your programming work. My Test Infrastructure The following figure provides a nice overview of the Datameer infrastructure. Das supports many data sources, it runs on all Hadoop distributions, it provides a rest API and you can add plugins as connectors for other modeling languages such as r (#rstats). I tested das version 3.1.2 running on our MapR Hadoop cluster version 3.0.2. After getting the latest package version from Datameer support the installation was straightforward and it worked out of the box. Thanks to Datameer for providing a full test license. There are several online tutorials and videos available and there are some tutorial apps. Apps are another great feature of Datameer. You can download Datameer apps which include connectors, workbooks, and visualizations for different analysis examples. And you can create your own app from your analyses and share them with your colleagues or the community. My Test Data And Analyses I tested das with the famous “airline on-time performance” data set consisting of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008. I downloaded all the data (including supplements) to MapR FS, created connectors for the data, and imported the data into a workbook. In the workbook I tested many classical statistical counting analyses: Grouping functionality for the airports and counting the number of flights Grouping for the airlines and calculating different statistics as mean values for the air time Using joins to add additional information like the airline name to the airline identifier Doing sorts to extract the most interesting airports depending on different measures I am not an Excel expert. so it took me some time to get used to this low-level process of doing analysis on spreadsheets. But in the end, it is a very intuitive process of creating analyses. Every new analysis will be available in a new tab in your workbook. There are several nice functionalities to support your work. For example, there is a “sheet dependencies” overview which provides information about the dependencies between sheets. Apart from the classical analyses, das provides some data mining functionality. It is called “smart analytics”. So far, it covers k-means clustering, decision trees, column dependencies, and recommendations. It works out of the box but is not yet on the level to be satisfying for real analyses. e.g. For k-means clustering, there is no support for choosing the right number of clusters (k) and you can not switch between different distance functions (default is euclidean distance). Finally, I visualized all my results in a nice “infographic”. There are many different visualization tools and parameters available. After playing around with the settings you can create a nice dashboard and share it with your colleagues. Please be aware that the complete data set is about 5 GB. Importing the data set takes about 30 minutes and running the workbook took more than 3h in my case. In the end, I split my analyses into several workbooks to improve the feasibility. Summary It was easy to get started with Datameer analytics solution (das). It is definitely a great tool to do big data analyses without any detailed Hadoop or big data knowledge. Furthermore, it covers many use cases and provides all required functionality for your daily analysis process. However, as soon as your analyses get more complex, the limitations of Datameer become apparent and you will probably look for a more powerful toolset or start implementing your big data analyses directly on Hadoop. Finally, Datameer supports many steps in the big data analysis process, it works efficiently and the usability is straightforward. But big data is more than ETL, data analysis and visualizing the results. You should never forget to think about your use case and the business value that you want to extract from your data. In the end, this is what should guide you in choosing the tools and/or implementations to use.
August 12, 2022
by Comsysto Gmbh
· 8,505 Views · 1 Like
article thumbnail
2016: The Year in Big Data
It was a big year for Big Data with new advances in tools, an expanded focus on IoT, and new ways of ingesting and manipulating data.
August 12, 2022
by Tim Spann CORE
· 10,199 Views · 10 Likes
  • Previous
  • ...
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • ...
  • Next

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: