Data Visualization and Descriptive Statistics

Getting Started with Data

image37

Introduction to the world of data, its collection, structure and preparation for efficient analysis.

Data Visualization

image38

Best way to start is to illustrate data. Too many stories can be revealed simply and quickly with appropriate charts.

Descriptive Statistics

image39

Describing data sets with statistical indicators is the best primer to start with before jumping into analytics.

Probability Laws

image40

It is all about the probability of how likely what you observe can be simply the result of chance intervention ... only!

Central Limit Theorem

image41

Should you use all your data set for analytics? What if at a certain point, any sub-sample you pick up randomly will generate the same results?

Estimations

image42

When you deal with samples, it is imperative to make sure what could be the reality if you were capable to handle the complete and possible data.

Data Analysis for "Professionals"

Hypothesis Testing

image43

All our decisions are based on hypotheses. In data analysis sample's result should confirm or reject any of them we claim to be true, and actions will follow accordingly.

One Group Tests

image44

Comparing results calculated over a sample to a multiple standards and checking out main deviations is the start of an efficient data analysis.

Two Group Tests

image45

Two groups are compared on variables with different measurement units. Sorting differences in one single chart is a state-of-the-art to add in your reports. 

MULTIPLE GROUPS TESTS

image46

Finding out differences between multiple groups is not enough. This should highlight pairs that caused such difference ... or not!

SIMPLE LINEAR REGRESSION

image47

Did you know that the simple linear regression is about explaining a "quantitative" output with a "quantitative" input ... 

Simple Logistic Regression

image48

... at the time the logistic regression is the same but for a "qualitative" output?

Data Analysis for "Experts"

Dependent Samples

image49

If you want to measure the effect of a factor over any population's characteristic, running your experiment on dependent samples is better than independent ones.

Non Parametric Tests

image50

Did you know that most specific data analysis techniques have strict regulations concerning the data? Not for Non Parametric tests :)

Power Analysis

image51

What is the sample size required for reliable results? Power tests science will bring a detailed answer on your quest.

"Supervised" Machine Learning

Multiple Regressions

image52

When explaining a "quantitative" output with one single "quantitative" input is not enough, simple regression, make the inputs ... multiple!

Discriminant Analysis

image53

If not satisfied by comparing two groups on different variables separately, simply use them all in one single shot with Discriminant Analysis!

Decision Trees

image54

Frequently, you would like to separate your data sets in groups that are homogeneous to better describe their behaviors. Decision Trees are your best choice then.

Support Vector Machines

image55

Many roads lead to Rome! The same when predicting (mainly) a "qualitative"  output from different inputs. Though complex to apply, SVMs are powerful classifiers and ... estimators! 

k Nearest Neighbors

image56

If your nearest neighbor lives in a fancy house, you might or might have the same life standard. But if your 10 (K) nearest neighbors live in luxurious places, then most likely you are rich as well!

Naive Bayes

For the fans of prediction via probabilistic methods, you cannot be better served than with Naive Bayes.

"Unsupervised" Machine Learning

Principle Component Analysis

image57

Did you ever know that you can visualize a 4 or even 10 dimensional data information into one single plane?

Yes you can!

Multi Dimensional Scaling

image58

A is like B, but B is different from C, similar to D.

If you cannot illustrate similarities between the letters, the MDS will do it! By the way, only experts can tell the difference between MDS and PCA. 

Clustering Analysis

image59

They who are alike, are put in same groups. Those obtained clusters are then identical from inside, but are different from each others on the outside.

Correspondence Analysis

image60

Englishmen speak English, but English is a universal language! However, Chinese language is exclusive for Chinese and vice versa. The CA will translate that into a simple map!

Quadrant Analysis

image61

The concept is easy, but the content can span from simple information up to most complex KPIs. Depends on you knowledge and experience in data analytics!

"Reinforcement" Learning

Agent and Environment

Another way of learning is to set an agent free in a environment and let it explore the path to your ultimate objective. 

Q - Learning

It is the summary results of all explored episodes by the agent, delivering a "learned" guidance matrix to reach ultimate objective.

Markov Decision Process

MDP is the "policy" finder that will allow the agent to optimize the reward during its quest for the ultimate objective.

Deep Learning

Artificial Neural Networks

image62

Designed to think like humans, Artificial Neural Networks try to replicate human decision making as possible as they can, but at the speed of light.

Convolution N.N

image63

An advanced and more sophisticated version of ANN, CNNs are image recognition algorithms that revolutionized AI.

Recurrent N.N

image64

Highly efficient in text mining, translation, and sentiment analysis, RNNs are specific Neural Network that conveys text memories through its hidden layers for ... text prediction.

Long & Short Term Memory

image65

An enhancement of the RNN, LSTMs are empowered with a stronger memory from the past. Therefore, its accuracy is stronger, but at the price of its complexity.

Gated Recurrent Unit

image66

A simplified version of LSTM with less tensors inside the main cell. Its usage should be justified with proven advantages over its predecessor.

Natural Language Processing

Text Preparation

image67

Text preparation is the prerequisite to all NLP algorithms. It is about  cleaning text from any confusing structure prior to analysis.

Sentiment Analysis

image68

"This movie is like those I like most. But I didn't like it though!" Is my sentiment Positive or Negative?

Topic Modeling

image69

To find most relevant topics in a text is to highlight key words and quantify their importance in a model. 

Bag Of Words & TF-ITF

image70

When words taken apart might lead to confusion, several should then be put in one "bag" and used together.

Word2Vec

image71

"Apple day keep doctor" ... "Away". Completing the sentence was possible with Word2Vec.

Big Data & Related

Big Data

image72

    Extension of the BI, the Big Data program covers the complete tools and techniques related to the Ingest - Store - Prepare – Serve four layers, as well as the architectures behind a successful implementation. 

Internet of Things

image73

Millions of devices are connected to the web with trillions of actions. This workshop covers how the flow of information runs, by covering IoT virtualization, containerization, protocols and architecture best practices.

Cyber Security

image74

With the increase of the technological ecosystem, specifically on the web, infiltrating secure information is becoming an ease task. To counter the increasing daily hacking attacks, cyber security is getting an avoidable discipline for all type of companies.

Business Intelligence

image75

It is the "must" knowledge prior to Big Data. This workshop covers the four classic layers of data management starting with the ingestion of data and ending with analysis & visualization. And in between all about ETL and data warehousing.

Forecasting Methodologies

Trends

image76

In the series of forecasting, Trends are the most basics. Yet knowing them is essential to understand the logic behind more sophisticated and complex ones.

AVERAGING

image77

Forecasting frequently depends on historical data, at least the very previous ones. Moving averages methods are quite effective and easy to implement.  

Exponential Smoothing

image78

More sophisticated than Moving Averages, Exponential Smoothing algorithms take into account "trend" and "seasonality" in its both exploding or vanishing effect.

Time Series

image79

Time Series allow to breakdown effects on sequential data variability into four components, facilitating the comprehension of their impact on the near future estimation.

ARIMA Models

image80

Different from all other methods, ARIMA holds its specificity by accounting on previous estimations as well as on their incurred "errors"!

Industrial Quality Control

SPC for "Measurements"

image81

The proper behavior of processes is monitored with SPC charts. Now if the output is a quantitative characteristic, SPC for "measurements" are to implement.

SPC for "Attributes"

image82

SPC for "attributes" apply for qualitative outputs.  Both categories inform if the process production is under control only, but not necessarily within specs!

Process Capability Analysis

image83

How can you make sure that all the production falls within the required specifications? Your process capability indicators should all be satisfactory.

R&R Analysis

image84

How can you be sure that an uncontrolled process is really affected by an external factor? What if measures themselves are not controlled? R&R will let you know that.

Design Of Experiments

image85

Increasing or decreasing an output depends how you calibrate your inputs? DOE helps finding out the combination that will make you reach the desired output.

AI in Manufacturing

image86

Manufacturing alike other industries, is invaded by AI tools and methods.

As an example, the "digital twinning" might cut costs on many daily mishaps. 

Epidemiology and Healthcare

Measures in Epidemiology

image87

Epidemiology has its own specificity in statistical measures. They all relate with epidemics, mortality, etc. 

Studies in Epidemiology

image88

 Studies in epidemiology are grouped in four categories. Some are close from classic researches (descriptive) but some have their very own specificity (Etiological).

Measurement Properties

image89

  What if you are diagnosed positive at the time you are not. To be confident, you should simply ask for the "False +" and "False -" rate of the adopted test. 

Diagnostic Tests Performance

image90

Many indicators and illustrations exist to evaluate how reliable are your predictions. ROC chart is one of the most straightforward illustration: All the more stretched to the upper left corner, all the more your tests outputs are reliable.

Survival Curves

image91

With their original goal for tracking death rate through time, S.C can be used in many other situations, even opposite to their primary objective: tracking health recovery! 

Tools 1

Excel

image92

Data Visualization, Descriptive Statistics, Data Analysis and Trends.  

SPSS

image93

Descriptive Statistics, Data Analysis, Machine Learning and Forecasting. 

STATISTICA

image94

Descriptive Statistics, Data Analysis, Machine Learning, Forecasting and Quality Control Measures.  

Tableau

image95

Descriptive Statistics, Data visualization and basics of Data Analysis and Forecasting.

Power BI

image96

Data visualization and Dash-boarding.

Tools 2

Alteryx

image97

Building flow chart with complete analysis tools.

SAS - Enterprise Guide

image98

Descriptive Statistics, Data Analysis, Machine Learning and Forecasting.  

SAS - E Miner

image99

Machine Learning and Advanced Predictive models.

R Studio

image100

Descriptive Statistics, Data visualization, Data Analysis, Machine Learning and Forecasting.  

Python

image101

 Descriptive Statistics, Data visualization, Data Analysis, Machine Learning and Forecasting.