Skip to contents

This node type may be used to generate a new node given a regular R expression that may include function calls or any other valid R syntax. This may be useful to combine components of a node which need to be simulated with separate node calls, or just as a convenient shorthand for some variable transformations.

Usage

node_identity(data, parents, formula)

Arguments

data

A data.table (or something that can be coerced to a data.table) containing all columns specified by parents.

parents

A character vector specifying the names of the parents that this particular child node has. When using this function as a node type in node or node_td, this argument usually does not need to be specified because the formula argument is required and contains all needed information already.

formula

A formula object containing a ~ symbol with nothing on the LHS, and any valid R expression that can be evaluated on data on the RHS. This expression needs to contain at least one variable name (otherwise users may simply use rconstant as node type). It may contain any number of function calls or other valid R syntax, given that all contained objects are included in the global environment. Note that contrary to the other node types supporting this argument, using interactions or levels of a categorical variable are not supported when specifying the formula here.

Details

Custom functions and objects can be used without issues in the formula, but they need to be present in the global environment, otherwise the underlying eval() function call will fail. Using this function outside of node or node_td is essentially equal to using with(data, eval(formula)) (without the ~ in the formula).

Author

Robin Denz

Value

Returns a numeric vector of length nrow(data).

Examples

library(simDAG)

set.seed(12455432)

# define a DAG
dag <- empty_dag() +
  node("age", type="rnorm", mean=50, sd=4) +
  node("sex", type="rbernoulli", p=0.5) +
  node("bmi", type="identity", formula= ~ age + sex + 2)

sim_dat <- sim_from_dag(dag=dag, n_sim=100)
head(sim_dat)
#>         age    sex      bmi
#>       <num> <lgcl>    <num>
#> 1: 46.90669  FALSE 48.90669
#> 2: 47.25599  FALSE 49.25599
#> 3: 47.05490  FALSE 49.05490
#> 4: 48.06434  FALSE 50.06434
#> 5: 51.62119  FALSE 53.62119
#> 6: 54.11589  FALSE 56.11589

# more complex alternative
dag <- empty_dag() +
  node("age", type="rnorm", mean=50, sd=4) +
  node("sex", type="rbernoulli", p=0.5) +
  node("bmi", type="identity",
       formula= ~ age / 2 + age^2 - ifelse(sex, 2, 3) + 2)

sim_dat <- sim_from_dag(dag=dag, n_sim=100)
head(sim_dat)
#>         age    sex      bmi
#>       <num> <lgcl>    <num>
#> 1: 48.75325   TRUE 2401.256
#> 2: 48.62980   TRUE 2389.173
#> 3: 56.19418   TRUE 3185.883
#> 4: 53.39997  FALSE 2877.257
#> 5: 45.73172   TRUE 2114.256
#> 6: 56.06411  FALSE 3170.217