towardsdatascience.com Open in urlscan Pro
162.159.153.4  Public Scan

Submitted URL: https://towardsdatascience.com/what-if-your-data-is-not-normal-d7293f7b8f0
Effective URL: https://towardsdatascience.com/what-if-your-data-is-not-normal-d7293f7b8f0?gi=4f1549069cbe
Submission: On December 05 via manual from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Open in app

Sign up

Sign in

Write


Sign up

Sign in



Member-only story


WHAT IF YOUR DATA IS NOT NORMAL?


IN THIS ARTICLE, WE DISCUSS THE CHEBYSHEV’S BOUND FOR STATISTICAL DATA ANALYSIS.
IN THE ABSENCE OF ANY IDEA ABOUT THE NORMALITY OF A GIVEN DATA SET, THIS BOUND
CAN BE USED TO GAUGE THE CONCENTRATION OF DATA AROUND THE MEAN.

Tirthajyoti Sarkar

·

Follow

Published in

Towards Data Science

·
7 min read
·
Nov 2, 2018

1.2K

3

Listen

Share




INTRODUCTION

This is Halloween week, and in between the tricks and treats, we, data geeks,
are chuckling over this cute meme over the social media.



You think this is a joke? Let me tell you, this is not a laughing matter. It is
scary, true to the spirit of Halloween!

> If we cannot assume that most of our data (of business, social, economic, or
> scientific origin) are at least approximately ‘Normal’ (i.e. they are
> generated by a Gaussian process or by a sum of multiple such processes), then
> we are doomed!

Here is an extremely brief list of things that will not be valid,

 * The whole concept of six-sigma
 * The famous 68–95–99.7 rule
 * The ‘holy’ concept of p=0.05 (comes from 2 sigma interval) in statistical
   analysis

Scary enough? Let’s talk more about it…


THE OMNIPOTENT AND OMNIPRESENT NORMAL DISTRIBUTION

Let’s keep this section short and sweet.

Normal (Gaussian) distribution is the most widely known probability
distribution. Here are some links to the articles describing its power and wide
applicability,

 * Why Data Scientists love Gaussian


WHY DATA SCIENTISTS LOVE GAUSSIAN?


THREE MAIN REASONS WHY GAUSSIAN DISTRIBUTION IS SO POPULAR WITH DEEP LEARNING,
MACHINE LEARNING ENGINEERS AND…

towardsdatascience.com


 * How to Dominate the Statistics Portion of Your Data Science Interview
 * What’s So Important about the Normal Distribution?

CREATE AN ACCOUNT TO READ THE FULL STORY.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.



Continue in app
Or, continue in mobile web



Sign up with Google

Sign up with Facebook

Sign up with email

Already have an account? Sign in





1.2K

1.2K

3


Follow



WRITTEN BY TIRTHAJYOTI SARKAR

12.6K Followers
·Writer for

Towards Data Science

Sr. Director of AI/ML platform | Stories on Artificial Intelligence, Data
Science, and ML | Speaker, Open-source contributor, Author of multiple DS books

Follow




MORE FROM TIRTHAJYOTI SARKAR AND TOWARDS DATA SCIENCE

Tirthajyoti Sarkar

in

Towards Data Science


DESIGN YOUR ENGINEERING EXPERIMENT PLAN WITH A SIMPLE PYTHON COMMAND


DESIGN YOUR ENGINEERING EXPERIMENT PLAN WITH A SIMPLE PYTHON COMMAND.

10 min read·Jul 4, 2018

655

3




Rahul Nayak

in

Towards Data Science


HOW TO CONVERT ANY TEXT INTO A GRAPH OF CONCEPTS


A METHOD TO CONVERT ANY TEXT CORPUS INTO A KNOWLEDGE GRAPH USING MISTRAL 7B.

12 min read·Nov 10

3.6K

39




Anthony Alcaraz

in

Towards Data Science


EMBEDDINGS + KNOWLEDGE GRAPHS: THE ULTIMATE TOOLS FOR RAG SYSTEMS


THE ADVENT OF LARGE LANGUAGE MODELS (LLMS) , TRAINED ON VAST AMOUNTS OF TEXT
DATA, HAS BEEN ONE OF THE MOST SIGNIFICANT BREAKTHROUGHS IN…


·10 min read·Nov 14

1K

8




Tirthajyoti Sarkar

in

Towards Data Science


ESSENTIAL MATH FOR DATA SCIENCE


THE KEY TOPICS TO MASTER TO BECOME A BETTER DATA SCIENTIST


·8 min read·Aug 8, 2018

14.3K

29



See all from Tirthajyoti Sarkar
See all from Towards Data Science



RECOMMENDED FROM MEDIUM

Erdogan Taskesen

in

Towards Data Science


HOW TO FIND THE BEST THEORETICAL DISTRIBUTION FOR YOUR DATA


KNOWING THE UNDERLYING DATA DISTRIBUTION IS AN ESSENTIAL STEP FOR DATA MODELING
AND HAS MANY APPLICATIONS, SUCH AS ANOMALY DETECTION…


·19 min read·Feb 4

1K

11




MS Somanna


GUIDE TO ADDING NOISE TO YOUR DATA USING PYTHON AND NUMPY


IN THIS ARTICLE YOU’LL LEARN WHY YOU SHOULD ADD NOISE TO YOUR OTHERWISE PERFECT
SYNTHETIC DATA, WHAT ARE THE TYPES OF NOISES YOU CAN ADD…

8 min read·Jul 22

71

1





LISTS


PREDICTIVE MODELING W/ PYTHON

20 stories·662 saves


PRACTICAL GUIDES TO MACHINE LEARNING

10 stories·742 saves


NATURAL LANGUAGE PROCESSING

932 stories·443 saves


NEW_READING_LIST

174 stories·213 saves


Mehul Gupta

in

Data Science in your pocket


PERMUTATION TESTING EXPLAINED WITH AN EXAMPLE


MOVING BEYOND HYPOTHESIS TESTING

4 min read·Jun 6

91





Virat Patel


I APPLIED TO 230 DATA SCIENCE JOBS DURING LAST 2 MONTHS AND THIS IS WHAT I’VE
FOUND.


A LITTLE BIT ABOUT MYSELF: I HAVE BEEN WORKING AS A DATA ANALYST FOR A LITTLE
OVER 2 YEARS. ADDITIONALLY, FOR THE PAST YEAR, I HAVE BEEN…


·3 min read·Aug 11

2.6K

53




Unbecoming


10 SECONDS THAT ENDED MY 20 YEAR MARRIAGE


IT’S AUGUST IN NORTHERN VIRGINIA, HOT AND HUMID. I STILL HAVEN’T SHOWERED FROM
MY MORNING TRAIL RUN. I’M WEARING MY STAY-AT-HOME MOM…


·4 min read·Feb 16, 2022

70K

1016




Philippe Tousignant


UNDERSTANDING TIME SERIES STATIONARITY WITH PYTHON


HOW SHOULD I TRANSFORM MY DATA?


·4 min read·Jul 17

1




See more recommendations

Help

Status

About

Careers

Blog

Privacy

Terms

Text to speech

Teams

To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.