R-Program: Module-2

11 minute read
0


R for Data Management and Cleaning
(Week-2 and 3)

You are welcome to Module-2 of this R-program course. If you have not completed Module -1. Please compete Module 1 first .   And before heading into module-2, lets have a short quiz on sessions of previous week.

Module-1 Revise Quiz

Please fill the above data!
coin :  0

Name : Apu

Surname : 9

Total Questions:

Correct: | Wrong:

Attempt: | Percentage:


Data Object Types
Logical: Also known as boolean data type.It can only have two values either TRUE  or FALSE.

Numeric: It includes all real numbers with or without decimals: For example: 1, 4.5556, 5.99, 1.9


Integer: Real numbers without decimals. For examples 1,7,9,3000. Use suffix "L" to specify integer.
Character : This data type is used to specify character or string or word values in a variable. Use Single quote or double quote to represent string.  For example:  'Hari', “shyam”, 'red', “a”

Complex: It is used to specify purely imaginary values in R. We use the suffix i to specify the imaginary part. For example 34+2i. 


Factor: Categorical  variable (1: male ; 2: Female; 3: Other). 
Note: 
table(obj) gives frequency table of the obj 
contrasts(obj) is applicable for factor variable and help us to identify the reference  variable




Data structure

String: A string is a sequence of characters. For example: "Hello world" is a string that includes characters: H,e,l,l,o, ,w,o,r,l,d


Vector:It is basic data structure that contains list of identical items with single dimension. (contains elements of same type: integer, numeric, character, logical, factor) : {1,3,5,8,9}

Matrix: It is two-dimensional data structure where data are arranged into rows and columns.two dimensional vector


List: A List is a collection of similar or different types of data.We can use list() function to create list.


Array: Arrays are data structure that can store data of same type  in multiple dimensions.  The only difference between vectors, matrices, and arrays are
  • Vectors are uni-dimensional arrays
  • Matrices are two-dimensional arrays
  • Arrays can have more than two dimensions


Data frame: A data frame is a two-dimensional data structure which can store data in tabular format. It has rows and columns. Rows indicates observation and Column indicates variables


Note: To identify data type or data structure use two functions:
class(object_name)
mode(object_name)



Learn to create Dataframe/ dataset

Lets create one dataframe. We can create dataframe using function data.frame().
dataframe1 <- data.frame(
   first_col  = c(value11, value12, ...),
   second_col = c(val21, val22, ...),
   ...
)

Here, first_col and second_col are vectors each of similar data type.

In order to create dataframe, we need to create vectors of each variables. Here, I'm generating vectors of participant ID, gender, Age, weight and height.
PID<-c(1,2,3,4,5,6,7)
gender<-c("F", "M","F","F","M","M","M")
Age<-c(23,32,21,22,24,32,29)
weight<-c(67,89,45,65,59,90,56)
height<-c(1.5,1.7,1.8,1.3,1.65,1.9,1.4)

dataset<-data.frame(PID,gender,Age,weight,height) 




Dataset or data-frame exist in  dataset[row,column] format.

Select specific cell:
dataset[3,5]  # This will select 3rd row 5th column value

Select specific column:
dataset[, 4] # Here row is empty indicates keep all row values  of 4th column
dataset$var1 # select var1 column of dataset 
dataset[[var_name]] #selects all values of column  named var_name

Select specific row:
dataset[3, ] # Here column is empty indicates keep all column values of 3rd row



Import dataset in R

You can import dataset in two ways a) through code and b) visually.

Download datasets  and code books here


Visual method:
  1. In order to import dataset, follow the steps below:
  2. First of all go to top right pannel where you can see import dataset. Click the button. and select format of dataset (SPSS, STATA, Excel ,CSV, Text, SAS)
  3. Add path to dataset, rename dataset
  4. Import dataset



Codes: You can use different functions to import datasets in different formats (.sav, .dta, .csv, .xlxs, .xls, .txt, ..). Following are the different functions for importing datasets.

Import CSV
dataset <- read.csv( file = “locationof the file/filename.csv”, header = TRUE, sep = “,”)
dataset <- read.csv( file = file.choose(), header = TRUE, sep = “,”)

Import excel datasets
library(readxl)
dataset<-read_excel(file=“location_of_file/filename.xlsx”, sheet = “sheet1”)

 Import *.sav and *.dta files 
library(haven)
dataset<-read_dta(file=“location_of_file/filename.dta”)
dataset<-read_sav(file=“location_of_file/filename.sav”)


Import from google sheet
library(gsheet)
dataset<-gsheet2tbl(url="https://drive.google.com/file/d/1yhFs7ju5qPWwE-8qRMxr6O_BOt521vwv/view?usp=sharing", sheetid = "Sheet1")


Import from kobotoolbox 
install_github("mrdwab/koboloadeR")
library(koboloadeR)
#download specific dataset directly form Kobo-toolbox
dataset<-kobo_data_downloader(formid ="######" ,
                              user = "Username:*******", 
                              api = "https://kc.kobotoolbox.org/api/v1/", 
                              check=T
)

Note: 
  • You can just specify the name with extension inside double quote instead of full path if your dataset is in your own directory( use getwd() and setwd() functions for setting up directory.)
  • While adding path to file. Add the drive name followed by :/ and enter tab key and select right folder and file thereafter.

Export dataset in R

In order to export dataset into different format, you can use different functions. Lets see some examples:

Export to CSV format
write.csv(dataset, file="location_to_export/filename.csv")

#Export to dta and sav formats for SPSS and STATA
write_dta(data, path, version = 14, label = attr(data, "label"))
write_sav(data, path, compress = FALSE)

#Export into R object (.RData and .RDS
save.image(file = ".RData", version=4.1.0)
saveRDS(dataset_name, "path to save file\file.rds")



Save R-scripts 
You can save R-scripts by just clicking the Save button just above the top left panel

Save R-History

In order to save history of codes that are already ran in console can be save with the help of function savehistory()

savehistory(file = "file_history.Rhistory")





Working with Dataframe


Step-1: Find the dimension of the dataframe. (row count and column count=??)

dim(dataset) # display no. of rows and column


Step-2: Visualize dataset

View(dataframe_name)   #Display dataframe 

str(dataframe_name) #display variables with their type 

glimpse(dataframe_name) #display variables with their type from “dplyr” package

Step-3: Find out the data types of specific variable 
class(dataset$var_name)
mode(dataset$var_name)
str(dataset$var_name)
glimpse(dataset$var_name)






Tags

Post a Comment

0Comments

#buttons=(Accept !) #days=(20)

Our website uses cookies to enhance your experience. Learn More
Accept !
May 15, 2025