## Grouping and Aggregating Data

Grouping and Aggregating Data

Sometimes, you may want to group a quantitative variable into groups to get summarized information.
The `cut()`

function is used convert a quantitative variable to a grouping factor.

```
Agegroups = cut(Age,c(20,50,70,80))
Agegroups
x1 = table(Sex,Treatment,Agegroups,Improved)
ftable(x1)
```

You can also use the `quantile()`

function to break the “Age” variable into equal sized groups
The `quantile()`

function returns the quartiles of the data by default (see nest section)

```
Agegroups = cut(Age,quantile(Age))
table(Agegroups)
```

Aggregating Numerical Data:

The function `tapply(numeric variable, grouping variable, function to aggregate)`

is useful to summarize statistics (length, mean, range, quantile, sd, etc.) within groups.

In the tapply function, the first argument is numeric and the second is a grouping variable which is generally a factor
The results are placed in a tabular form. For example, `x1 = tapply(Gas,Insul,mean)`

will compute the mean of “Gas” based on “Insul” and place it in X1.
Make sure to use `na.rm=T`

if there is missing data.

If you have multiple grouping variables, then you can use the list argument as follows：

```
tapply(Salaries$salary,list(Salaries$sex,Salaries$rank),mean)
```

Histogram Command:

To get a traditional frequency histogram, use the “freq” option:

```
hist(weight,breaks=c(50,60,80,100),freq=T)
```