Descriptive analysis is an important first step for conducting statistical analyses. It gives you an idea of the distribution of your data, helps you detect outliers and typos, and enable you identify associations among variables, thus preparing you for conducting further statistical analyses.

However, with availability of so many types of graphical and summary approaches, investigators get confused about which approach to use for analysis of their data. They either end up conducting too many types of analyses, thus wasting their time, or completely skip this crucial step of statistical analysis, thus increasing their chances of making erroneous decisions.

However, descriptive analyses are neither difficult nor time consuming, if done systematically. It is easier to think about descriptive analyses if you divide them into two types:

  1. Descriptive analysis for each individual variable
  2. Descriptive analysis for combinations of variables

The best approach for conducting descriptive analyses is to first decide about the types of variables and then use approaches for descriptive analyses based on variable types.

Broadly, variables can be classified into quantitative and categorical. Quantitative variables represent quantities or numerical values (e.g. age, weight, phone bill, volume etc.) while categorical variables describe quality or characteristics of individuals (e.g. colour, ethnicity, gender etc.). Both variable types have further sub-classifications but this broad classification is usually sufficient for deciding the approaches for conducting descriptive analyses.

Let’s use an example dataset available from this website: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/plasma.html but copied here for your convenience. This dataset is based on a study conducted to investigate the relationship between personal characteristics and dietary factors, and plasma concentrations of retinol, beta-carotene and other carotenoids. Details about the dataset and the variables can be obtained from their website.

Descriptive analysis for each individual variable

For quantitative variables, create a histogram and a box-and-whisker plot to get an idea of the shape of the distribution. If the shape is symmetric, then calculate and present mean and standard deviation whereas if the shape is skewed, calculate and present median and quartiles. You could also calculate and present min and max values. These descriptive analyses would also help you identify outlying and improbable values so that you can double check data entry errors.

For categorical variables, create frequency tables and present them in bar charts, pie charts or doughnut charts. These approaches are sufficient to get an idea of distributions of variables and of typos and other errors in data entry.

Statulator conducts all these analyses simultaneously. Some of the summary and graphical statistics produced by Statulator based on the above dataset are copied below.

Summary statistics and a histogram of the variable ‘cholesterol’
Box-and-whisker plot
A box-and-whisker plot of the variable retplasma
bar chart
A bar chart of the variable ‘vituse’
frequency table
A frequency table of the variable vituse.

Descriptive analysis for combination of two variables

The next step is to conduct descriptive analyses for combinations of variables to obtain preliminary information about associations between them. Since the variables can be either quantitative or categorical, they essentially make three combinations and hence three types of descriptive analysis:

  1. Both variables quantitative: Create a scatter plot.
  2. One variable categorical and the other quantitative: Calculate summary statistics and box-and-whisker plots of the quantitative variable classified by the categorical variable.
  3. Both variables categorical: Prepare a contingency table.

Statulator conducts all these analyses in one go. Some of the descriptive analyses for combinations of variables produced by Statulator based on the above dataset are copied below.

Scatter plot
A scatter plot to evaluate an association between retplasma and age.
Summary statistics and a box-and-whisker plot of retplasma by sex
contingency table
A contingency table and a stacked bar chart of vituse by smokstat

Of course, there are number of other graphical approaches but the above approaches would give you sufficient information about the association between two variables so that you can conduct further statistical analyses.

Both the univariate and bivariate descriptive analyses described above can be very easily conducted using our Descriptive Analysis tool in four simple steps:

  1. Go to the website (http://statulator.com/descriptive.html) and upload data by clicking the button Choose file.
  2. Select the variables you wish to analyse by clicking the button Select Variables and verify their type. Specify Retplasma as the response variable and click Save changes.
  3. Click the button Univariate to conduct descriptive analyses for each individual variable. You may downolad all the summary tables by clicking the button ‘Display summary tables’
  4. Click the button Bivariate to conduct descriptive analyses for evaluation of associations of all explanatory variables with the outcome.

Have a go! We will discuss these descriptive approaches in detail in a future post.

We look forward to receiving your feedback.

Descriptive Analysis: Take it easy!

Navneet Dhand

Navneet is an epidemiologist and biostatistician passionate about making statistical methods simpler and easier to use. He has more than 15 years of teaching epidemiology and biostatistics and currently lectures at The University of Sydney, Australia. - See more at: http://statulator.com/about.html

Post navigation

One thought on “Descriptive Analysis: Take it easy!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>