Data Visualization and Descriptive Statistics

Introduction to the world of data, its collection, structure and preparation for efficient analysis.

Best way to start is to illustrate data. Too many stories can be revealed simply and quickly with appropriate charts.

Describing data sets with statistical indicators is the best primer to start with before jumping into analytics.

It is all about the probability of how likely what you observe can be simply the result of chance intervention ... only!

Should you use all your data set for analytics? What if at a certain point, any sub-sample you pick up randomly will generate the same results?

When you deal with samples, it is imperative to make sure what could be the reality if you were capable to handle the complete and possible data.

Data Analysis for "Professionals"

All our decisions are based on hypotheses. In data analysis sample's result should confirm or reject any of them we claim to be true, and actions will follow accordingly.

Comparing results calculated over a sample to a multiple standards and checking out main deviations is the start of an efficient data analysis.

Two groups are compared on variables with different measurement units. Sorting differences in one single chart is a state-of-the-art to add in your reports.

Finding out differences between multiple groups is not enough. This should highlight pairs that caused such difference ... or not!

Did you know that the simple linear regression is about explaining a "quantitative" output with a "quantitative" input ...

... at the time the logistic regression is the same but for a "qualitative" output?

Data Analysis for "Experts"

If you want to measure the effect of a factor over any population's characteristic, running your experiment on dependent samples is better than independent ones.

Did you know that most specific data analysis techniques have strict regulations concerning the data? Not for Non Parametric tests :)

What is the sample size required for reliable results? Power tests science will bring a detailed answer on your quest.

"Supervised" Machine Learning

When explaining a "quantitative" output with one single "quantitative" input is not enough, simple regression, make the inputs ... multiple!

If not satisfied by comparing two groups on different variables separately, simply use them all in one single shot with Discriminant Analysis!

Frequently, you would like to separate your data sets in groups that are homogeneous to better describe their behaviors. Decision Trees are your best choice then.

Many roads lead to Rome! The same when predicting (mainly) a "qualitative" output from different inputs. Though complex to apply, SVMs are powerful classifiers and ... estimators!

If your nearest neighbor lives in a fancy house, you might or might have the same life standard. But if your 10 (K) nearest neighbors live in luxurious places, then most likely you are rich as well!

For the fans of prediction via probabilistic methods, you cannot be better served than with Naive Bayes.

"Unsupervised" Machine Learning

Did you ever know that you can visualize a 4 or even 10 dimensional data information into one single plane?

Yes you can!

A is like B, but B is different from C, similar to D.

If you cannot illustrate similarities between the letters, the MDS will do it! By the way, only experts can tell the difference between MDS and PCA.

They who are alike, are put in same groups. Those obtained clusters are then identical from inside, but are different from each others on the outside.

Englishmen speak English, but English is a universal language! However, Chinese language is exclusive for Chinese and vice versa. The CA will translate that into a simple map!

The concept is easy, but the content can span from simple information up to most complex KPIs. Depends on you knowledge and experience in data analytics!

"Reinforcement" Learning

Another way of learning is to set an agent free in a environment and let it explore the path to your ultimate objective.

It is the summary results of all explored episodes by the agent, delivering a "learned" guidance matrix to reach ultimate objective.

MDP is the "policy" finder that will allow the agent to optimize the reward during its quest for the ultimate objective.

Deep Learning

Designed to think like humans, Artificial Neural Networks try to replicate human decision making as possible as they can, but at the speed of light.

An advanced and more sophisticated version of ANN, CNNs are image recognition algorithms that revolutionized AI.

Highly efficient in text mining, translation, and sentiment analysis, RNNs are specific Neural Network that conveys text memories through its hidden layers for ... text prediction.

An enhancement of the RNN, LSTMs are empowered with a stronger memory from the past. Therefore, its accuracy is stronger, but at the price of its complexity.

A simplified version of LSTM with less tensors inside the main cell. Its usage should be justified with proven advantages over its predecessor.

Natural Language Processing

Text preparation is the prerequisite to all NLP algorithms. It is about cleaning text from any confusing structure prior to analysis.

"This movie is like those I like most. But I didn't like it though!" Is my sentiment Positive or Negative?

To find most relevant topics in a text is to highlight key words and quantify their importance in a model.

When words taken apart might lead to confusion, several should then be put in one "bag" and used together.

"Apple day keep doctor" ... "Away". Completing the sentence was possible with Word2Vec.

Big Data & Related

Extension of the BI, the Big Data program covers the complete tools and techniques related to the Ingest - Store - Prepare – Serve four layers, as well as the architectures behind a successful implementation.

Millions of devices are connected to the web with trillions of actions. This workshop covers how the flow of information runs, by covering IoT virtualization, containerization, protocols and architecture best practices.

With the increase of the technological ecosystem, specifically on the web, infiltrating secure information is becoming an ease task. To counter the increasing daily hacking attacks, cyber security is getting an avoidable discipline for all type of companies.

It is the "must" knowledge prior to Big Data. This workshop covers the four classic layers of data management starting with the ingestion of data and ending with analysis & visualization. And in between all about ETL and data warehousing.

Forecasting Methodologies

In the series of forecasting, Trends are the most basics. Yet knowing them is essential to understand the logic behind more sophisticated and complex ones.

Forecasting frequently depends on historical data, at least the very previous ones. Moving averages methods are quite effective and easy to implement.

More sophisticated than Moving Averages, Exponential Smoothing algorithms take into account "trend" and "seasonality" in its both exploding or vanishing effect.

Time Series allow to breakdown effects on sequential data variability into four components, facilitating the comprehension of their impact on the near future estimation.

Different from all other methods, ARIMA holds its specificity by accounting on previous estimations as well as on their incurred "errors"!

Industrial Quality Control

The proper behavior of processes is monitored with SPC charts. Now if the output is a quantitative characteristic, SPC for "measurements" are to implement.

SPC for "attributes" apply for qualitative outputs. Both categories inform if the process production is under control only, but not necessarily within specs!

How can you make sure that all the production falls within the required specifications? Your process capability indicators should all be satisfactory.

How can you be sure that an uncontrolled process is really affected by an external factor? What if measures themselves are not controlled? R&R will let you know that.

Increasing or decreasing an output depends how you calibrate your inputs? DOE helps finding out the combination that will make you reach the desired output.

Manufacturing alike other industries, is invaded by AI tools and methods.

As an example, the "digital twinning" might cut costs on many daily mishaps.

Epidemiology and Healthcare

Epidemiology has its own specificity in statistical measures. They all relate with epidemics, mortality, etc.

Studies in epidemiology are grouped in four categories. Some are close from classic researches (descriptive) but some have their very own specificity (Etiological).

What if you are diagnosed positive at the time you are not. To be confident, you should simply ask for the "False +" and "False -" rate of the adopted test.

Many indicators and illustrations exist to evaluate how reliable are your predictions. ROC chart is one of the most straightforward illustration: All the more stretched to the upper left corner, all the more your tests outputs are reliable.

With their original goal for tracking death rate through time, S.C can be used in many other situations, even opposite to their primary objective: tracking health recovery!

Tools 1

Data Visualization, Descriptive Statistics, Data Analysis and Trends.

Descriptive Statistics, Data Analysis, Machine Learning and Forecasting.

Descriptive Statistics, Data Analysis, Machine Learning, Forecasting and Quality Control Measures.

Descriptive Statistics, Data visualization and basics of Data Analysis and Forecasting.

Tools 2

Building flow chart with complete analysis tools.

Descriptive Statistics, Data Analysis, Machine Learning and Forecasting.

Machine Learning and Advanced Predictive models.

Descriptive Statistics, Data visualization, Data Analysis, Machine Learning and Forecasting.

Descriptive Statistics, Data visualization, Data Analysis, Machine Learning and Forecasting.