Grouping and Aggregating Data

Grouping and Aggregating Data


Sometimes, you may want to group a quantitative variable into groups to get summarized information. The cut() function is used convert a quantitative variable to a grouping factor.

Agegroups = cut(Age,c(20,50,70,80))
Agegroups
x1 = table(Sex,Treatment,Agegroups,Improved)
ftable(x1)

You can also use the quantile()function to break the “Age” variable into equal sized groups The quantile() function returns the quartiles of the data by default (see nest section)

Agegroups = cut(Age,quantile(Age))
table(Agegroups)

Aggregating Numerical Data:

The function tapply(numeric variable, grouping variable, function to aggregate)is useful to summarize statistics (length, mean, range, quantile, sd, etc.) within groups.

In the tapply function, the first argument is numeric and the second is a grouping variable which is generally a factor The results are placed in a tabular form. For example, x1 = tapply(Gas,Insul,mean) will compute the mean of “Gas” based on “Insul” and place it in X1. Make sure to use na.rm=T if there is missing data.

If you have multiple grouping variables, then you can use the list argument as follows:

tapply(Salaries$salary,list(Salaries$sex,Salaries$rank),mean)

Histogram Command:

To get a traditional frequency histogram, use the “freq” option:

hist(weight,breaks=c(50,60,80,100),freq=T)