DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Can You Run a MariaDB Cluster on a $150 Kubernetes Lab? I Gave It a Shot
  • How Kubernetes Cluster Sizing Affects Performance and Cost Efficiency in Cloud Deployments
  • The Production-Ready Kubernetes Service Checklist
  • 10 Best Practices for Managing Kubernetes at Scale

Trending

  • Unlocking AI Coding Assistants: Generate Unit Tests
  • Start Coding With Google Cloud Workstations
  • MySQL to PostgreSQL Database Migration: A Practical Case Study
  • Unlocking Data with Language: Real-World Applications of Text-to-SQL Interfaces
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Statistics for Rookies: Learn Data-Driven Decision-Making the Fun Way

Statistics for Rookies: Learn Data-Driven Decision-Making the Fun Way

Learn about the basics of inferential statistics and see what inferential statistics can do for you in terms of data-driven decision-making.

By 
Sunil Kappal user avatar
Sunil Kappal
·
Jan. 09, 18 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
11.7K Views

Join the DZone community and get the full member experience.

Join For Free

This article is an excerpt from my forthcoming e-book, Statistics for Rookies: Learn Data-Driven Decision-Making the Fun Way.

Inferential Statistics: Sampling Techniques

Inferential statistics is the generalization of a sample population’s patterns and insights into the overall population. For us to understand this definition and cement the idea of inferential statistics in our minds, we need to understand the basics of inferential statistics:

  1. Population (N)

  2. Sample population (n)

  3. Sampling techniques

    1. Random sampling

    2. Stratified sampling

    3. Distributions

Before we start talking about these basics, though, it's a good idea to understand what inferential statistics can do for us.

Inferential statistics helps us move from a simple guess to an educated guess. By deploying various inferential statistical analyses and tests, we can either confirm whether what we guessed was right or wrong. These guesses can be termed as hypotheses, which we will cover later.

Definition of population:

Population is the collection of all individuals or items under consideration in a statistical study.

Definition of sample population:

The sample is the part of the population from which information is collected.

Population vs. Sample

Image title

In statistics, we rely a lot on samples to draw inferences about the entire population. Inferential statistics provide a way to base our conclusions on the sample to the population by inferring the parameters of a population from data around the statistics of the sample. (Parameters can also be termed as μ "mean" and σ "standard deviation.")

I know this is getting a bit heavy, so let me put it this way: inferential statistics gives us a way to generalize the patterns observed on the overall population based on the inferential analyses and tests performed on the sample data.

This section can be considered the most important part of my book. We will develop the basic intuition of picking the most appropriate sample. It is also worth mentioning that it is very important for a researcher to work with samples rather working with the entire population.

So, what are those sampling techniques?

There are a variety of sampling techniques available. However, as the theme of this book (statistics for rookies) suggests, I will discuss only two main sampling techniques: random sampling and stratified sampling. However, for the brainiacs who adore the intricacies of statistics, I have created a hierarchical view of various sampling methods:

Image title

The following table briefly describes various sampling methods with the associated pros and cons. Following that is an example based on survey data.

Image title

Image title

A survey was conducted with 2,000 people from the population of a particular state. In the above example, the “sample” is the 2,000 people surveyed from the state. This can be considered one example of sample size.

Random Sample

In a purely random sample, every unit of the population has an equal chance of being selected, removing bias from the selection procedure. To conduct a random sample, first, a population and a target sample size are defined. Units of the population are then chosen at random.

Because the selection is random, the sample is assumed to be representative of the population, and the information collected can be used to develop inferences about the whole population.

Caveat: Conducting a truly random sample may be challenging if the population is large, dispersed, or hidden.

Image title

Stratified Sample

Stratified sampling is a random sampling method in which you divide members of a population into strata, or homogeneous subgroups. Stratified sampling is the process of selecting a sample that allows identified subgroups in the defined population to be represented in the same proportion that they exist in the population.

Steps to perform stratified sampling:

  1. Identify and define the population.

  2. Determine the desired sample size.

  3. Identify the variables and subgroups (strata) for which you want to guarantee exact and equal representation.

Its advantages include that it provides a precise sample, that it can be used for both proportions and stratification sampling, and that the sample represents the desired strata.

Cluster Sampling

This can be defined as the process of randomly selecting intact groups, not individuals, within the defined population sharing similar characteristics. This can be also called multistage sampling.

Image title

Steps to perform cluster sampling:

  1. Identify and define the population.

  2. Determine the desired sample size.

  3. Identify and define a logical cluster.

  4. List all clusters that make up the population.

  5. Estimate the average number of population members per cluster.

  6. Determine the number of clusters needed by dividing the sample size by the estimated size of a cluster.

  7. Randomly select the needed number of clusters by using a table of random numbers.

  8. Include in your study all population members in each selected cluster.

Its advantages include that it is efficient, that you don’t need excessive details about the population members, and that it is very useful for educational research.

Systematic Sampling

Systematic sampling is the process of selecting individuals within the defined population from a list by taking every Nth name.

Image title

Steps to perform systematic sampling:

  1. Identify and define the population.

  2. Determine the desired sample size.

  3. Obtain a list of the population.

  4. Determine what N is equal to by dividing the size of the population by the desired sample size.

  5. Start at some random place in the population list. Close your eyes and point your finger to a name!

  6. Starting at that point, take every Nth name on the list until the desired sample size is reached.

  7. If you reach the end of the list before you reach the desired sample, go back to the top of the list.

Its main advantage is that the sample selection process is simple.

Conclusion

To conclude, the process of selecting a number of individuals for a study in such a way that the individuals represent the larger group from which they were selected is called sampling. The group of individuals selected for a study whose characteristics exemplify the larger group from which they are selected is called a sample. The larger group from which individuals are selected to participate in a study is called a population.

Image title

Stay tuned for more details on the full release of this e-book!

Statistics cluster

Opinions expressed by DZone contributors are their own.

Related

  • Can You Run a MariaDB Cluster on a $150 Kubernetes Lab? I Gave It a Shot
  • How Kubernetes Cluster Sizing Affects Performance and Cost Efficiency in Cloud Deployments
  • The Production-Ready Kubernetes Service Checklist
  • 10 Best Practices for Managing Kubernetes at Scale

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: