Data Visualization and Descriptive Statistics

Getting Started with Data

image68

Introduction to the world of data, its collection, structure and preparation for efficient analysis.

Data Visualization

image69

Best way to start is to illustrate data. Too many stories can be revealed simply and quickly with appropriate charts.

Descriptive Statistics

image70

Describing data sets with statistical indicators is the best primer to start with before jumping into analytics.

Probability Laws

image71

It is all about the probability of how likely what you observe can be simply the result of chance intervention ... only!

Central Limit Theorem

image72

Should you use all your data set for analytics? What if at a certain point, any sub-sample you pick up randomly will generate the same results?

Estimations

image73

When you deal with samples, it is imperative to make sure what could be the reality if you were capable to handle the complete and possible data.

Data Analysis for "Professionals"

Hypothesis Testing

image74

All our decisions are based on hypotheses. In data analysis sample's result should confirm or reject any of them we claim to be true, and actions will follow accordingly.

One Group Tests

image75

Comparing results calculated over a sample to a multiple standards and checking out main deviations is the start of an efficient data analysis.

Two Group Tests

image76

Two groups are compared on variables with different measurement units. Sorting differences in one single chart is a state-of-the-art to add in your reports. 

MULTIPLE GROUPS TESTS

image77

Finding out differences between multiple groups is not enough. This should highlight pairs that caused such difference ... or not!

SIMPLE LINEAR REGRESSION

image78

Did you know that the simple linear regression is about explaining a "quantitative" output with a "quantitative" input ... 

Simple Logistic Regression

image79

... at the time the logistic regression is the same but for a "qualitative" output?

Data Analysis for "Experts"

Dependent Samples

image80

If you want to measure the effect of a factor over any population's characteristic, running your experiment on dependent samples is better than independent ones.

Non Parametric Tests

image81

Did you know that most specific data analysis techniques have strict regulations concerning the data? Not for Non Parametric tests :)

Power Analysis

image82

What is the sample size required for reliable results? Power tests science will bring a detailed answer on your quest.

"Supervised" Machine Learning

Multiple Regressions

image83

When explaining a "quantitative" output with one single "quantitative" input is not enough, simple regression, make the inputs ... multiple!

Discriminant Analysis

image84

If not satisfied by comparing two groups on different variables separately, simply use them all in one single shot with Discriminant Analysis!

Decision Trees

image85

Frequently, you would like to separate your data sets in groups that are homogeneous to better describe their behaviors. Decision Trees are your best choice then.

Support Vector Machines

image86

Many roads lead to Rome! The same when predicting (mainly) a "qualitative"  output from different inputs. Though complex to apply, SVMs are powerful classifiers and ... estimators! 

k Nearest Neighbors

image87

If your nearest neighbor lives in a fancy house, you might or might have the same life standard. But if your 10 (K) nearest neighbors live in luxurious places, then most likely you are rich as well!

Naive Bayes

For the fans of prediction via probabilistic methods, you cannot be better served than with Naive Bayes.

"Unsupervised" Machine Learning

Principle Component Analysis

image88

Did you ever know that you can visualize a 4 or even 10 dimensional data information into one single plane?

Yes you can!

Multi Dimensional Scaling

image89

A is like B, but B is different from C, similar to D.

If you cannot illustrate similarities between the letters, the MDS will do it! By the way, only experts can tell the difference between MDS and PCA. 

Clustering Analysis

image90

They who are alike, are put in same groups. Those obtained clusters are then identical from inside, but are different from each others on the outside.

Correspondence Analysis

image91

Englishmen speak English, but English is a universal language! However, Chinese language is exclusive for Chinese and vice versa. The CA will translate that into a simple map!

Quadrant Analysis

image92

The concept is easy, but the content can span from simple information up to most complex KPIs. Depends on you knowledge and experience in data analytics!

"Reinforcement" Learning

Agent and Environment

Another way of learning is to set an agent free in a environment and let it explore the path to your ultimate objective. 

Q - Learning

It is the summary results of all explored episodes by the agent, delivering a "learned" guidance matrix to reach ultimate objective.

Markov Decision Process

MDP is the "policy" finder that will allow the agent to optimize the reward during its quest for the ultimate objective.

Deep Learning

Artificial Neural Networks

image93

Designed to think like humans, Artificial Neural Networks try to replicate human decision making as possible as they can, but at the speed of light.

Convolution N.N

image94

An advanced and more sophisticated version of ANN, CNNs are image recognition algorithms that revolutionized AI.

Recurrent N.N

image95

Highly efficient in text mining, translation, and sentiment analysis, RNNs are specific Neural Network that conveys text memories through its hidden layers for ... text prediction.

Long & Short Term Memory

image96

An enhancement of the RNN, LSTMs are empowered with a stronger memory from the past. Therefore, its accuracy is stronger, but at the price of its complexity.

Gated Recurrent Unit

image97

A simplified version of LSTM with less tensors inside the main cell. Its usage should be justified with proven advantages over its predecessor.

Natural Language Processing

Text Preparation

image98

Text preparation is the prerequisite to all NLP algorithms. It is about  cleaning text from any confusing structure prior to analysis.

Sentiment Analysis

image99

"This movie is like those I like most. But I didn't like it though!" Is my sentiment Positive or Negative?

Topic Modeling

image100

To find most relevant topics in a text is to highlight key words and quantify their importance in a model. 

Bag Of Words & TF-ITF

image101

When words taken apart might lead to confusion, several should then be put in one "bag" and used together.

Word2Vec

image102

"Apple day keep doctor" ... "Away". Completing the sentence was possible with Word2Vec.

Big Data & Related

Big Data

image103

    Extension of the BI, the Big Data program covers the complete tools and techniques related to the Ingest - Store - Prepare – Serve four layers, as well as the architectures behind a successful implementation. 

Internet of Things

image104

Millions of devices are connected to the web with trillions of actions. This workshop covers how the flow of information runs, by covering IoT virtualization, containerization, protocols and architecture best practices.

Cyber Security

image105

With the increase of the technological ecosystem, specifically on the web, infiltrating secure information is becoming an ease task. To counter the increasing daily hacking attacks, cyber security is getting an avoidable discipline for all type of companies.

Business Intelligence

image106

It is the "must" knowledge prior to Big Data. This workshop covers the four classic layers of data management starting with the ingestion of data and ending with analysis & visualization. And in between all about ETL and data warehousing.

Forecasting Methodologies

Trends

image107

In the series of forecasting, Trends are the most basics. Yet knowing them is essential to understand the logic behind more sophisticated and complex ones.

AVERAGING

image108

Forecasting frequently depends on historical data, at least the very previous ones. Moving averages methods are quite effective and easy to implement.  

Exponential Smoothing

image109

More sophisticated than Moving Averages, Exponential Smoothing algorithms take into account "trend" and "seasonality" in its both exploding or vanishing effect.

Time Series

image110

Time Series allow to breakdown effects on sequential data variability into four components, facilitating the comprehension of their impact on the near future estimation.

ARIMA Models

image111

Different from all other methods, ARIMA holds its specificity by accounting on previous estimations as well as on their incurred "errors"!

Industrial Quality Control

SPC for "Measurements"

image112

The proper behavior of processes is monitored with SPC charts. Now if the output is a quantitative characteristic, SPC for "measurements" are to implement.

SPC for "Attributes"

image113

SPC for "attributes" apply for qualitative outputs.  Both categories inform if the process production is under control only, but not necessarily within specs!

Process Capability Analysis

image114

How can you make sure that all the production falls within the required specifications? Your process capability indicators should all be satisfactory.

R&R Analysis

image115

How can you be sure that an uncontrolled process is really affected by an external factor? What if measures themselves are not controlled? R&R will let you know that.

Design Of Experiments

image116

Increasing or decreasing an output depends how you calibrate your inputs? DOE helps finding out the combination that will make you reach the desired output.

AI in Manufacturing

image117

Manufacturing alike other industries, is invaded by AI tools and methods.

As an example, the "digital twinning" might cut costs on many daily mishaps. 

Epidemiology and Healthcare

Measures in Epidemiology

image118

Epidemiology has its own specificity in statistical measures. They all relate with epidemics, mortality, etc. 

Studies in Epidemiology

image119

 Studies in epidemiology are grouped in four categories. Some are close from classic researches (descriptive) but some have their very own specificity (Etiological).

Measurement Properties

image120

  What if you are diagnosed positive at the time you are not. To be confident, you should simply ask for the "False +" and "False -" rate of the adopted test. 

Diagnostic Tests Performance

image121

Many indicators and illustrations exist to evaluate how reliable are your predictions. ROC chart is one of the most straightforward illustration: All the more stretched to the upper left corner, all the more your tests outputs are reliable.

Survival Curves

image122

With their original goal for tracking death rate through time, S.C can be used in many other situations, even opposite to their primary objective: tracking health recovery! 

Tools 1

Excel

image123

Data Visualization, Descriptive Statistics, Data Analysis and Trends.  

SPSS

image124

Descriptive Statistics, Data Analysis, Machine Learning and Forecasting. 

STATISTICA

image125

Descriptive Statistics, Data Analysis, Machine Learning, Forecasting and Quality Control Measures.  

Tableau

image126

Descriptive Statistics, Data visualization and basics of Data Analysis and Forecasting.

Power BI

image127

Data visualization and Dash-boarding.

Tools 2

Alteryx

image128

Building flow chart with complete analysis tools.

SAS - Enterprise Guide

image129

Descriptive Statistics, Data Analysis, Machine Learning and Forecasting.  

SAS - E Miner

image130

Machine Learning and Advanced Predictive models.

R Studio

image131

Descriptive Statistics, Data visualization, Data Analysis, Machine Learning and Forecasting.  

Python

image132

 Descriptive Statistics, Data visualization, Data Analysis, Machine Learning and Forecasting.