Data Visualization and Descriptive Statistics

Getting Started with Data


Introduction to the world of data, its collection, structure and preparation for efficient analysis.

Data Visualization


Best way to start is to illustrate data. Too many stories can be revealed simply and quickly with appropriate charts.

Descriptive Statistics


Describing data sets with statistical indicators is the best primer to start with before jumping into analytics.

Probability Laws


It is all about the probability of how likely what you observe can be simply the result of chance intervention ... only!

Central Limit Theorem


Should you use all your data set for analytics? What if at a certain point, any sub-sample you pick up randomly will generate the same results?



When you deal with samples, it is imperative to make sure what could be the reality if you were capable to handle the complete and possible data.

Data Analysis for "Professionals"

Hypothesis Testing


All our decisions are based on hypotheses. In data analysis sample's result should confirm or reject any of them we claim to be true, and actions will follow accordingly.

One Group Tests


Comparing results calculated over a sample to a multiple standards and checking out main deviations is the start of an efficient data analysis.

Two Group Tests


Two groups are compared on variables with different measurement units. Sorting differences in one single chart is a state-of-the-art to add in your reports. 



Finding out differences between multiple groups is not enough. This should highlight pairs that caused such difference ... or not!



Did you know that the simple linear regression is about explaining a "quantitative" output with a "quantitative" input ... 

Simple Logistic Regression


... at the time the logistic regression is the same but for a "qualitative" output?

Data Analysis for "Experts"

Dependent Samples


If you want to measure the effect of a factor over any population's characteristic, running your experiment on dependent samples is better than independent ones.

Non Parametric Tests


Did you know that most specific data analysis techniques have strict regulations concerning the data? Not for Non Parametric tests :)

Power Analysis


What is the sample size required for reliable results? Power tests science will bring a detailed answer on your quest.

"Supervised" Machine Learning

Multiple Regressions


When explaining a "quantitative" output with one single "quantitative" input is not enough, simple regression, make the inputs ... multiple!

Discriminant Analysis


If not satisfied by comparing two groups on different variables separately, simply use them all in one single shot with Discriminant Analysis!

Decision Trees


Frequently, you would like to separate your data sets in groups that are homogeneous to better describe their behaviors. Decision Trees are your best choice then.

Support Vector Machines


Many roads lead to Rome! The same when predicting (mainly) a "qualitative"  output from different inputs. Though complex to apply, SVMs are powerful classifiers and ... estimators! 

k Nearest Neighbors


If your nearest neighbor lives in a fancy house, you might or might have the same life standard. But if your 10 (K) nearest neighbors live in luxurious places, then most likely you are rich as well!

Naive Bayes

For the fans of prediction via probabilistic methods, you cannot be better served than with Naive Bayes.

"Unsupervised" Machine Learning

Principle Component Analysis


Did you ever know that you can visualize a 4 or even 10 dimensional data information into one single plane?

Yes you can!

Multi Dimensional Scaling


A is like B, but B is different from C, similar to D.

If you cannot illustrate similarities between the letters, the MDS will do it! By the way, only experts can tell the difference between MDS and PCA. 

Clustering Analysis


They who are alike, are put in same groups. Those obtained clusters are then identical from inside, but are different from each others on the outside.

Correspondence Analysis


Englishmen speak English, but English is a universal language! However, Chinese language is exclusive for Chinese and vice versa. The CA will translate that into a simple map!

Quadrant Analysis


The concept is easy, but the content can span from simple information up to most complex KPIs. Depends on you knowledge and experience in data analytics!

"Reinforcement" Learning

Agent and Environment

Another way of learning is to set an agent free in a environment and let it explore the path to your ultimate objective. 

Q - Learning

It is the summary results of all explored episodes by the agent, delivering a "learned" guidance matrix to reach ultimate objective.

Markov Decision Process

MDP is the "policy" finder that will allow the agent to optimize the reward during its quest for the ultimate objective.

Deep Learning

Artificial Neural Networks


Designed to think like humans, Artificial Neural Networks try to replicate human decision making as possible as they can, but at the speed of light.

Convolution N.N


An advanced and more sophisticated version of ANN, CNNs are image recognition algorithms that revolutionized AI.

Recurrent N.N


Highly efficient in text mining, translation, and sentiment analysis, RNNs are specific Neural Network that conveys text memories through its hidden layers for ... text prediction.

Long & Short Term Memory


An enhancement of the RNN, LSTMs are empowered with a stronger memory from the past. Therefore, its accuracy is stronger, but at the price of its complexity.

Gated Recurrent Unit


A simplified version of LSTM with less tensors inside the main cell. Its usage should be justified with proven advantages over its predecessor.

Natural Language Processing

Text Preparation


Text preparation is the prerequisite to all NLP algorithms. It is about  cleaning text from any confusing structure prior to analysis.

Sentiment Analysis


"This movie is like those I like most. But I didn't like it though!" Is my sentiment Positive or Negative?

Topic Modeling


To find most relevant topics in a text is to highlight key words and quantify their importance in a model. 

Bag Of Words & TF-ITF


When words taken apart might lead to confusion, several should then be put in one "bag" and used together.



"Apple day keep doctor" ... "Away". Completing the sentence was possible with Word2Vec.

Big Data & Related

Big Data


    Extension of the BI, the Big Data program covers the complete tools and techniques related to the Ingest - Store - Prepare – Serve four layers, as well as the architectures behind a successful implementation. 

Internet of Things


Millions of devices are connected to the web with trillions of actions. This workshop covers how the flow of information runs, by covering IoT virtualization, containerization, protocols and architecture best practices.

Cyber Security


With the increase of the technological ecosystem, specifically on the web, infiltrating secure information is becoming an ease task. To counter the increasing daily hacking attacks, cyber security is getting an avoidable discipline for all type of companies.

Business Intelligence


It is the "must" knowledge prior to Big Data. This workshop covers the four classic layers of data management starting with the ingestion of data and ending with analysis & visualization. And in between all about ETL and data warehousing.

Forecasting Methodologies



In the series of forecasting, Trends are the most basics. Yet knowing them is essential to understand the logic behind more sophisticated and complex ones.



Forecasting frequently depends on historical data, at least the very previous ones. Moving averages methods are quite effective and easy to implement.  

Exponential Smoothing


More sophisticated than Moving Averages, Exponential Smoothing algorithms take into account "trend" and "seasonality" in its both exploding or vanishing effect.

Time Series


Time Series allow to breakdown effects on sequential data variability into four components, facilitating the comprehension of their impact on the near future estimation.

ARIMA Models


Different from all other methods, ARIMA holds its specificity by accounting on previous estimations as well as on their incurred "errors"!

Industrial Quality Control

SPC for "Measurements"


The proper behavior of processes is monitored with SPC charts. Now if the output is a quantitative characteristic, SPC for "measurements" are to implement.

SPC for "Attributes"


SPC for "attributes" apply for qualitative outputs.  Both categories inform if the process production is under control only, but not necessarily within specs!

Process Capability Analysis


How can you make sure that all the production falls within the required specifications? Your process capability indicators should all be satisfactory.

R&R Analysis


How can you be sure that an uncontrolled process is really affected by an external factor? What if measures themselves are not controlled? R&R will let you know that.

Design Of Experiments


Increasing or decreasing an output depends how you calibrate your inputs? DOE helps finding out the combination that will make you reach the desired output.

AI in Manufacturing


Manufacturing alike other industries, is invaded by AI tools and methods.

As an example, the "digital twinning" might cut costs on many daily mishaps. 

Epidemiology and Healthcare

Measures in Epidemiology


Epidemiology has its own specificity in statistical measures. They all relate with epidemics, mortality, etc. 

Studies in Epidemiology


 Studies in epidemiology are grouped in four categories. Some are close from classic researches (descriptive) but some have their very own specificity (Etiological).

Measurement Properties


  What if you are diagnosed positive at the time you are not. To be confident, you should simply ask for the "False +" and "False -" rate of the adopted test. 

Diagnostic Tests Performance


Many indicators and illustrations exist to evaluate how reliable are your predictions. ROC chart is one of the most straightforward illustration: All the more stretched to the upper left corner, all the more your tests outputs are reliable.

Survival Curves


With their original goal for tracking death rate through time, S.C can be used in many other situations, even opposite to their primary objective: tracking health recovery! 

Tools 1



Data Visualization, Descriptive Statistics, Data Analysis and Trends.  



Descriptive Statistics, Data Analysis, Machine Learning and Forecasting. 



Descriptive Statistics, Data Analysis, Machine Learning, Forecasting and Quality Control Measures.  



Descriptive Statistics, Data visualization and basics of Data Analysis and Forecasting.

Power BI


Data visualization and Dash-boarding.

Tools 2



Building flow chart with complete analysis tools.

SAS - Enterprise Guide


Descriptive Statistics, Data Analysis, Machine Learning and Forecasting.  

SAS - E Miner


Machine Learning and Advanced Predictive models.

R Studio


Descriptive Statistics, Data visualization, Data Analysis, Machine Learning and Forecasting.  



 Descriptive Statistics, Data visualization, Data Analysis, Machine Learning and Forecasting.