Analytics Development Ubuntu

RStudio for Ubuntu

Step by Step Installation of RStudio

Rstudio is an integrated set of tools that can help for statistical computing and graphics. RStudio is available as a desktop application and a server application. Also RStudio desktop is available for windows, macOS, and Linux.

In this tutorial, we will be installing RStudio on Ubuntu Linux, specifically Ubuntu 16.04 LTS

To begin the installation:

  1. To Install RStudio, you need to download and install R for Linux. R for Linux can be found in the software center or via the web. If the software center is not up to date, it may be difficult to locate R for Linux. Via the web, R for Linux can be downloaded from https://cran.studio.com and also via the Linux terminal. To download R, via the web, open a web browser and type https://cran.studio.com from within your Ubuntu

Screen Capture of RStudio Web Home Page

To download via terminal, open a terminal and run the following command:

  • Next, add R to Ubuntu keyring by typing:
    gpg –keyserver keyserver.ubuntu.com –recv-key E084ADA. This will request a key from the Ubuntu key server
    Then add: gpg –a –export E084DAB9 | sudo apt-key add –

Terminal Screen Shot Update Ubuntu keyring

  • Next, I will get an update and install R by typing this commands into the terminal
    Sudo apt-get update
    Sudo apt-get install r-base r-base-dev

Screen Shot Installing R

The first command gets the updates and files from a central server on Ubuntu, reads the package list. The second command reads the package and informs the user the amount of space to be used up, finally asks the user if they want to continue with the installation by requesting for a Yes (Y) or No (N).

Install R Confirmation Screen Shot

If the user proceeds, the package is unpacked, and the installation will begin.

ScreenShot Terminal Unpack Deb Files

A successful installation is confirmed on the terminal

Screen Shot of Terminal Setup R CRAN

This is further confirmed when searched for in the search box on the desktop

R Studio Desktop Screen Shot

  • Next, we install R-Studio via the terminal. Open the terminal and type these commands
    sudo apt-get install gdebi-core
    The root password will be requested. Enter the root password to proceed. The package will be read and installed

Screen Shot Download RStudio Terminal

wget https://download1.rstudio.org/rstudio-0.99.896-amd64.deb

This connects to rstudio online package which is now downloaded to the system locally.

Terminal Screen Shot RStudio Download

sudo gdebi –n rstudio-0.99.896-amd64.deb

This will request for the root password. Enter the root password to continue. The package will be read and loaded. After loading the terminal will request for permission to install. Click Y to install RStudio

Screen Shot of Terminal Install RStudio Begin

  • A successful installation is shown in the terminal

Screen Shot of RStudio Install Completion

  • To run or open RStudio, open the search box and type “R”, RStudio is listed in the list of installed application. Click on it and RStudio will open

RStudio Home Screen Ubuntu

Tutorial of how to use RStudio:

1. Basic Data Analysis using RStudio

RStudio can be used to make some visual representation of the data. You need to follow below steps to use the features of RStudio for basic data analysis:

  1. Downloading or importing data in R
  2. Transforming Data and Running queries on data
  3. Basic data analysis using statistical averages function
  4. Plotting data distribution

In the tutorial, we have explained individual steps by performing one step at a time.

1.1 Importing Data in RStudio

In this tutorial, we have used the sample 2010 census population data by zip code. There are two different ways to import the data in R.

(a) Following command is used to import the data programmatically by executing it in the console window of RStudio

cpd <- read.csv(url(“https://data.lacity.org/api/views/nxs9-385f/rows.csv?accessType=DOWNLOAD”))

Once you run this command by Enter key, the dataset will be downloaded from the web, read as a csv file and attributed to the variable name cpd.

(b) You can also download the data set first to your local desktop or laptop and then use the import data set feature of RStudio to import the data set into RStudio. Below are the steps to import dataset.

  1. Go to the environment tab in the top-right section and click on the import dataset  Then choose the file you need to import and then click open. Once you click, the Import Dataset dialog will appear.
  2. Here, you need to set the preferences of decimal, name, separator and other parameters. Then click on import button. This will import the dataset in RStudio and attributed to the variable name as determined before.

You can also view any data set by giving the following command:

View(cpd)
where cpd is the variable data set.

1.2 Transforming Data and Running queries on data

After importing the data in RStudio, you will be able to use various transformation features of R in order to manipulate the data. Below are the examples of basic data access techniques.

  • To access a particular column, Ex. Total Population in our case.
    cpd$Total Population
  • To access data as a vector
    cpd[1,3]

You can use the subset function of R in order to run some queries on data. Suppose, if you want those rows from the whole dataset in which the Total Males is greater than Total Females. You will need to run the following command in the console box.

a <- subset(cpd , Total Males > Total Females)

The first parameter to the subset function must be the data frame you want to apply that specified function to and the second parameter is the boolean condition that requires being checked for each row whether to be included or not. So, the above statement will result in the set the rows in which the Total Males is greater than Total Females and put those rows to a

1.3 Basic data analysis using statistical averages function

For calculating the averages of the dataset, following functions can be used:

  1. Mean of any column,  run :  mean(cpd$Total Males)
  2. Median of any column, run : median(cpd$Total Females)
  3. Quantile of any column, run : quantile(cpd$Total Population)
  4. Variance of any column, run : var(cpd$Total males)
  5. Standard Deviation of any column, run : sd(cpd$Total Females)

The statistical summary of the dataset can also be obtained by just running on either a column or the complete dataset as below.

 summary(cpd)

1.4 Plotting data distribution

The built in data visualizer for R feature of RStudio is very much liked. The dataset which is imported in the RStudio can be visualized utilizing the plot and various other functions of R.

Below is the example to create a graph:

You can run the following command in console to create a scatter plot of a data set,

plot(x = s$Total Males, y = s$Total Females, type = ‘p’)

Here, ‘s’ is the subset of the original dataset and type ‘p’ is used to set the plot type as a point. You can also select line and other change type variable to ‘L’ etc.

There are several features, packages, and tools available in R for data distribution plots that you can utilize to draw any kind of data distribution. For example;

  • You can run the below command to draw a Histogram of a data set,
    hist(cpd$Total Households)
  • Similarly, run the following set of commands for Bar Plots
    counts <- table(cpd$Total Population)
    barplot(counts, main=”Total Population Distribution”,  xlab=”Number of Total
    Population”)

This whole tutorial will give you a basic idea regarding how to do simple statistics in R/RStudio.

  • zoo package in the RStudio

If your data is an irregular time series, then zoo package should be utilized for such data. This is because one requires only ordered observations for the time index. zoo package is available in the packages component which appears in the lower-right panel in RStudio. We first need to load the zoo package to convert our data into zoo objects. We call its same-named constructor to create a zoo object. Here, we have to provide the first argument which is the data and the second is for the value to order by. We then combine the data into one zoo object.

zoo object is recommended for its convenient plot method. In this case, we just type plot, and the function completion displays us the various plot methods usable with zoo package.

Note:

  • For usage or any documentation of the function in RStudio;

Just type the name of the function and then press ctrl+space to receive the auto completion window.

  • To view the official documentation, you can use “?”before any function name.
  • Data cleaning can also be performed in RStudio.

Advanced features of RStudio

There are some more add-on packages available with RStudio. The Packages component in the RStudio allows you to choose packages to load or unload and it also provides links to their documentation. Below is the list of some add-on packages available in the packages section in RStudio.

  1. WMCapacity: It can be used for GUI implementing Bayesian working memory models.
  2. xlsx: It is used to Read, write, format Excel 2007 and Excel 97/200/Xp/2003 files.
  3. Xlsxjars: This can be used for package required jars for the xlsx
  4. XML: It provides tools for parsing and generating XML within R and S-Plus.
  5. xtable: It is used for exporting tables to LaTeX or HTML.

Next Steps

Follow the R Programming tutorial to go from total beginner to machine learning in just minutes: R Programming Tutorial

 

About the author

Admin

A passionate Linux user for personal and professional reasons, always exploring what is new in the world of Linux and sharing with my readers.