Simulate a Node based on an expression
node_identity.Rd
This node type may be used to generate a new node given a regular R expression that may include function calls or any other valid R syntax. This may be useful to combine components of a node which need to be simulated with separate node
calls, or just as a convenient shorthand for some variable transformations.
Arguments
- data
A
data.table
(or something that can be coerced to adata.table
) containing all columns specified byparents
.- parents
A character vector specifying the names of the parents that this particular child node has. When using this function as a node type in
node
ornode_td
, this argument usually does not need to be specified because theformula
argument is required and contains all needed information already.- formula
A
formula
object containing a~
symbol with nothing on the LHS, and any valid R expression that can be evaluated ondata
on the RHS. This expression needs to contain at least one variable name (otherwise users may simply userconstant
as node type). It may contain any number of function calls or other valid R syntax, given that all contained objects are included in the global environment. Note that contrary to the other node types supporting this argument, using interactions or levels of a categorical variable are not supported when specifying the formula here.
Details
Custom functions and objects can be used without issues in the formula
, but they need to be present in the global environment, otherwise the underlying eval()
function call will fail. Using this function outside of node
or node_td
is essentially equal to using with(data, eval(formula))
(without the ~
in the formula
).
Examples
library(simDAG)
set.seed(12455432)
# define a DAG
dag <- empty_dag() +
node("age", type="rnorm", mean=50, sd=4) +
node("sex", type="rbernoulli", p=0.5) +
node("bmi", type="identity", formula= ~ age + sex + 2)
sim_dat <- sim_from_dag(dag=dag, n_sim=100)
head(sim_dat)
#> age sex bmi
#> <num> <lgcl> <num>
#> 1: 46.90669 FALSE 48.90669
#> 2: 47.25599 FALSE 49.25599
#> 3: 47.05490 FALSE 49.05490
#> 4: 48.06434 FALSE 50.06434
#> 5: 51.62119 FALSE 53.62119
#> 6: 54.11589 FALSE 56.11589
# more complex alternative
dag <- empty_dag() +
node("age", type="rnorm", mean=50, sd=4) +
node("sex", type="rbernoulli", p=0.5) +
node("bmi", type="identity",
formula= ~ age / 2 + age^2 - ifelse(sex, 2, 3) + 2)
sim_dat <- sim_from_dag(dag=dag, n_sim=100)
head(sim_dat)
#> age sex bmi
#> <num> <lgcl> <num>
#> 1: 48.75325 TRUE 2401.256
#> 2: 48.62980 TRUE 2389.173
#> 3: 56.19418 TRUE 3185.883
#> 4: 53.39997 FALSE 2877.257
#> 5: 45.73172 TRUE 2114.256
#> 6: 56.06411 FALSE 3170.217