Available courses

What is Data Journalism: short video and powerpoint.

Introduction to Excel: basic skills

Cracking .pdf table with Tabula

 - using Tableau to crack and visualize a .pdf table

Scrapping data with chrome table capture

Working with Tableau

 - add your own map to Tableau

Working with Datawrapper

Calculating the Big Mac Index

Downloading World Bank data

Excel advanced, pivot tables, charts and statistics

Scraping with Outwit Hub

Cleaning data with Refine

Content analysis with Voyant: word count, word cloud, word network

Analysis of Twitter using Workbench

Python is a programming language that can be used for data analysis. Compared with R, python is broader. There is lot’s of discussion about which of the two is the best; see more on https://www.r-bloggers.com/2018/12/why-r-for-data-science-and-not-python/ .

However the libraries/packages in Python to be used for data analysis can do more or less the same thing as R packages. Of course the commands are different.


The set up is as follows:

1. install python on your machine: https://www.python.org/downloads/

2. install anaconda, GUI for python: https://docs.anaconda.com/anaconda/

3. start anaconda-navigator.

From here you see different programs/interfaces. For working with python jupyter notebook is your choice. However, you can also work directly from the python prompt.

For data analysis some libraries need to be installed (use conda install from the prompt)

- numpy

- pandas

- matplotlib

- scipy

Download the attached scripts and data  to work with the examples

Jupyter in anaconda

Now we are ready to start a first analysis. Use the following notebook(.ipynb) in jupyter, and run the commands in each chunk.

Download and run the notebook gem2.ipynb together with the dataset gemeentedata.xls. This is just a short intro into the basic of statistics using python.

R-studio in anaconda

As you see from the navigator it is also possible to run R (rstudio) to run scripts(.Rmd) for data analysis. Download and run the following script together with the data: rforjournalists_stats2.Rmd and gemeentedata.xls

R kernel in Jupyter

t is also possible to use jupyter notebook for python and for R. After installing the R kernel is available for jupyter notebooks. I had some trouble to get R in Jupyter up and running, based on the following: https://www.thetopsites.net/article/50566743.shtml.

Open a jupyter notebook and load gem_r_injupyter.ipynb together with the dataset

Convert .Rmd into .ipynb

Finally, the following: how to translate .Rmd into .ipynb, so that you can use and the R script as jupyter notebook using the R kernel. Install the following: https://rdrr.io/github/mkearney/rmd2jupyter/f/README.md

Run in R:


file saved as rforjournalists-stats.ipynb

Now you can run the .ipynb file in jupyter notebook with an R kernel.

This e-learning module aims at introducing R to data journalists, and show the the advantages of R compared to Excel.

It is a hands-on training; for each step into R a new hand-out wit instructions and data will be provided.

Before getting started there is a small introduction about

- Why using R

- How to start with R; software needed

- Some literature to get started

- Basic statistics in R

- Manipulation of data in a data frame

- Mapping data with R

- Analyzing tweets

- analyzing  twitter networks: members of the Tweede Kamer

- Scraping with R

- Text analysis with R: Bob Dylan lyrics

- Experiments with AI and machine learning in R

This resource page features course content from the Knight Center for Journalism in the America's massive open online course (MOOC) titled "Data Journalism and Visualization with Free Tools." The six-week course took place from October 14 to November 24, 2019. We are now making the content free and available to students who took the course and anyone else who's interested in interested in data journalism and visualization.


The course, which was powered by Google News Initiative, was taught by Alberto CairoSimon Rogersand a great team of instructors. They created and curated the content for the course, which includes video classes, readings, exercises, and more.

 The course materials are broken up into seven modules:


As you review this resource page, we encourage you to watch the videos, review the readings, and complete the exercises as time allows. The course materials build off each other, but the videos and readings also act as standalone resources that you can return to over time.

We hope you enjoy the materials. If you have any questions, please contact us at journalismcourses@austin.utexas.edu.

Here you find some more advanced software and tools that could use full for data science and data journalism.

- Regex: working with regular expressions for find or selecting words or characters

- Mapping: making maps with QGIS

- Social Network Analysis with Gephi