# The Formula, ModelFrame and ModelMatrix TypesΒΆ

In regression model, we often want to describe the relationship between a
response variable and one or more input variables in terms of main effects
and interactions. To facilitate the specification of a regression model in
terms of the columns of a `DataFrame`

, the DataFrames package provides a
`Formula`

type, which is created by the `~`

binary operator in Julia:

```
fm = Z ~ X + Y
```

A `Formula`

object can be used to transform a `DataFrame`

into a `ModelFrame`

object:

```
df = DataFrame(X = randn(10), Y = randn(10), Z = randn(10))
mf = ModelFrame(Z ~ X + Y, df)
```

A `ModelFrame`

object is just a simple wrapper around a `DataFrame`

. For
modeling purposes, one generally wants to construct a `ModelMatrix`

, which
constructs a `Matrix{Float64}`

that can be used directly to fit a
statistical model:

```
mm = ModelMatrix(ModelFrame(Z ~ X + Y, df))
```

Note that `mm`

contains an additional column consisting entirely of `1.0`

values. This is used to fit an intercept term in a regression model.

In addition to specifying main effects, it is possible to specify interactions
using the `&`

operator inside a `Formula`

:

```
mm = ModelMatrix(ModelFrame(Z ~ X + Y + X&Y, df))
```

If you would like to specify both main effects and an interaction term at once,
use the `*`

operator inside a Formula:

```
mm = ModelMatrix(ModelFrame(Z ~ X*Y, df))
```

The construction of model matrices makes it easy to formulate complex statistical models. These are used to good effect by the GLM Package.