
Extract "event" times from start-stop format datasets
times_from_start_stop.RdGiven a data.table like object in the start-stop format, it returns a new data.table containing the times at which events of a particular type happened.
Arguments
- data
A
data.tablelike object including at least three columns:id(the unique case identifier),start(the beginning of the time-interval) andstop(the end of the time-interval). May also be any object that can be coerced to be adata.table, such as adata.frameor atibble. Intervals should be coded as[start, stop), like in all other functions of this package.- id
A single character string specifying the column containing the unique case identifier.
- name
A single character string specifying the "event" column in
data. The specified column should be of class "logical" (containing only eitherTRUEorFALSE). Alternatively, the specified variable may be a numeric variable containing only 0 (consideredFALSE) and 1 (consideredTRUE).- type
A single character string specifying which type of variable the column specified by
nameis. If the variable is an actual event, meaning that existing intervals end at the exact time thatnameoccured, it should be set totype="event". In this case, thestopvalue of all intervals wherenameisTRUEare extracted. If the variable refers to a time-varying binary variable instead (for example a time-dependent exposure that can be present or absent), it should be set to"var", in which case thestarttime of each duration wherenamewasTRUEare extracted. See details.- start
A single character string specifying a column in
dataspecifying the beginning of a time-interval. Defaults to"start".- stop
A single character string specifying a column in
dataspecifying the ending of a time-interval. Defaults to"stop".- time_name
A single character string specifying the name that the
"time"variable should have in the output data. Defaults to"time".
Details
This function may be useful to extract times of occurence of binary time-dependent exposures or actual events from start-stop data.
Use on Time-Varying Variables:
If type="var" is used the variable specified by name is treated as a simple time-varying variable and only the start times of each uninterrupted duration where this variable is TRUE are extracted. For example, if the variable starts being TRUE at t = 20 and stops being TRUE at t = 123 and it was never TRUE before or after these times, time_name would simple be 20 for this individual, regardless of how many intervals are present where name is TRUE. This is done because it is continuously TRUE and we only want to extract the initial time where it "occured" or "happened". In this case, if name goes back to FALSE and is TRUE again later, for example at t = 700, the output would contain another entry for this id including the time 700, because this constitutes another occurence.
Use on actual events:
If type="event" is used instead, every single occurence of TRUE in the input data is considered to specify a single event occurence in name, regardless of whether these intervals are directly after one another. This is the classic difference between coding time-varying variables and events in start-stop data, as discussed in the survival package documentation and in the vignettes of this package.
Examples
library(MatchTime)
library(data.table)
# define some example start-stop data
data <- data.table(id=c(1, 1, 1, 1, 1, 2, 2, 2),
start=c(0, 10, 25, 812, 1092, 90, 9023, 10000),
stop=c(10, 25, 812, 1092, 34334, 8021, 9823, 220022),
exposure=c(TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE,
TRUE))
# treating it as an exposure
# NOTE: in this case, the first two rows of id = 1 are considered to be
# one continuous occurence, because "exposure" stayed TRUE the entire
# time
out1 <- times_from_start_stop(data, id="id", name="exposure", type="var")
head(out1)
#> Key: <id, time>
#> id time
#> <num> <num>
#> 1: 1 0
#> 2: 1 812
#> 3: 2 10000
# treating it as an event
# NOTE: in this case the first two rows of id = 1 are considered to be
# two independent events, events force a time-interval to stop
out2 <- times_from_start_stop(data, id="id", name="exposure", type="event")
head(out2)
#> Key: <id, time>
#> id time
#> <num> <num>
#> 1: 1 10
#> 2: 1 25
#> 3: 1 1092
#> 4: 2 220022