Event Tables after 5.0.2: Architecture and Benchmark

Overview

This vignette compares the current event table handling with the old event table in 5.0.2. The focus is the event table layer (rxEt), because that is where the branch makes its largest architectural change.

At a high level, the old 5.0.2 event table handling keeps more of the event table implementation in compiled code, especially for sequence/repeat style operations. The new architecture moves a large part of event table construction and manipulation into R, keeps the solver and event translation in C/C++, and uses a lazy, environment-backed rxEt object as a data-frame proxy that defers full materialization until it is needed. By contrast, main works much more like an actual realized data frame throughout the event-table workflow.

The comparison below reflects:

new event table handling (including recent ALTREP and homogeneous group integrations)
old event table handling (at commit c46571d84)

Architectural decisions on new event table structure

The branch makes five important design choices.

1. Pure-R event table construction

The public et() API remains the same, but much more of the work now happens in R:

et.default() resolves event-table arguments and compatibility aliases
.newRxEt() creates the new object
.etBuildMethods() attaches the attached backward compatible methods like $add.sampling().
.etExpandAddl() expands addl records in R
etSeq(), etRbind(), and etRep() are implemented in R instead of depending on the older exported Rcpp helpers

This reduces the amount of cross-language glue needed for ordinary event table authoring. For maintainability, these R-based methods are extracted into focused internal files: R/et-helpers.R, R/et-methods.R, R/etNew.R, and R/etVctrs.R.

2. Lazy, environment-backed state

The new rxEt object is a data-frame subclass with its mutable state in an attached environment accessed through .rxEtEnv(). Internally the branch stores event data as ID-indexed chunks and only creates the full canonical data frame when as.data.frame(), printing, or solver handoff requires it. This saves time in the common multi-subject solving scenarios seen in pharmacometrics.

That design intentionally shifts work away from repeated construction and combination and toward materialization. In other words, the old 5.0.2 event handling is a realized data frame with methods attached, while the new event handling behaves more like a data-frame proxy that eventually materializes to a realized event table.

3. Materialize only when needed

.etMaterialize() is the key function creating the full event table. It converts the chunked representation into the canonical event-table data frame, applies defaults, sorts records, and restores units. The branch uses data.table::rbindlist for high-performance binding of the internal chunks during this materialization step.

This makes behavior easier to reason about, but it also means the first forced materialization can be more expensive than having the whole event table as a data.frame object in the first place. However there is a speed-up by not having to sort by id and time, instead materializng the chunks of IDs that are stored in the event table.

4. Using data-frame/vctrs/dplyr methods for data-like behavior

R/etVctrs.R adds vec_proxy(), vec_restore(), vec_cast(), and related helpers for rxEt. That is a significant shift in philosophy: the event table becomes a better-behaved data-frame subtype instead of a proper data-frame. This allows rxEt to work seamlessly with dplyr, tidyr, and other modern R tools while retaining its specialized internal state.

5. ALTREP-backed data frames and optimized memory access

To dramatically reduce memory overhead, the event table output layers are now powered by R’s ALTREP (Alternative Representation) system. Columns that feature long repeated sequences (such as id or string factors) are no longer materialized up front. Instead, custom rx_seqrep, rx_rep_int, rx_rep_real, and rx_rep_str ALTREP classes defer materialization and generate elements on demand.

Furthermore, these classes implement block-based Get_region methods utilizing $O(\\log N)$ exponential memcpy strategies and zero-modulo sequential chunking. This enables native R functions to process these columns via bulk block-access instead of slow $O(N)$ element-by-element iterations.

6. Homogeneous event groups (`rxHomGroups`)

The new architecture introduces native support for “homogeneous” event groups. When multiple subjects share identical observation and dosing regimens, the solver avoids duplicating their arrays.

Pointer sharing: C++ memory (_globals) allocates only a single shared block per unique regimen. The individual solvers simply map their all_times, cov_ptr, and other pointers to offsets inside this de-duplicated memory.
Private memory degradation: If a model utilizes dynamic dosing (e.g. evid_()) or other runtime mutations, the solver detects this via indOwnAlloc and securely triggers memcpy() to convert the shared memory slice into a private allocation for that specific subject. This isolates mutations cleanly without compromising the initial memory efficiency.
Branch-free execution: While covariates and events are de-duplicated during the ODE solve, final output arrays (keepFcov) are physically expanded. This is a deliberate architectural trade-off: it guarantees that hot loops for dataframe construction (get_fkeep) and NA interpolation (LOCF/NOCB) remain completely branch-free and robust, maximizing runtime speed at a negligible memory cost.

7. Keep C/C++ where it still matters

The new event handling does not move everything into R. Event translation and solver preparation still flow through the C/C++ side, especially src/etTran.cpp and src/rxData.cpp. In other words, this is not a “remove compiled code” rewrite; it is a “move authoring/manipulation logic into R and keep translation/solve logic compiled” rewrite for the new event system.

Main functions in the new event system

main_functions <- data.frame(
  function_name = c(
    ".newRxEt()",
    ".rxEtEnv()",
    ".etBuildMethods()",
    "et.default()",
    ".etMaterialize()",
    ".etExpandAddl()",
    "etSeq() / etRbind() / etRep()",
    ".etFixCmtForSolve()",
    "vec_proxy.rxEt() / vec_restore.rxEt()"
  ),
  role = c(
    "Create the new env-backed rxEt shell.",
    "Recover the mutable event-table environment.",
    "Implement the mutable method API such as add.dosing() and add.sampling().",
    "Resolve public et() calls, aliases, seq()-style forms, IDs, and imports.",
    "Materialize the lazy chunk store into canonical event-table rows.",
    "Expand addl dosing records in pure R.",
    "Compose, bind, and repeat event tables without the older Rcpp sequence helpers.",
    "Normalize character cmt values before solver handoff.",
    "Round-trip rxEt through vctrs/dplyr-style reconstruction."
  ),
  old_equivalent = c(
    "No direct equivalent; builds older rxEt/EventTable state through the C++ path.",
    "relies on compiled rxEt detection/state access more heavily.",
    "More behavior lived in the older EventTable/Rcpp implementation.",
    "Same public entry point, but with a more eager and more compiled implementation.",
    "reaches a realized table earlier in the workflow.",
    "relied more on compiled event helpers.",
    "used exported Rcpp helpers such as etSeq_ and etRep_.",
    "Not needed in the same way before the new lazy/data-frame path exposed more mixed input cases.",
    "No comparable vctrs restore layer on main."
  ),
  stringsAsFactors = FALSE
)
knitr::kable(main_functions, col.names = c("Current branch function", "Role", "Rough equivalent on `main`"))

Current branch function	Role	Rough equivalent on `main`
.newRxEt()	Create the new env-backed rxEt shell.	No direct equivalent; builds older rxEt/EventTable state through the C++ path.
.rxEtEnv()	Recover the mutable event-table environment.	relies on compiled rxEt detection/state access more heavily.
.etBuildMethods()	Implement the mutable method API such as add.dosing() and add.sampling().	More behavior lived in the older EventTable/Rcpp implementation.
et.default()	Resolve public et() calls, aliases, seq()-style forms, IDs, and imports.	Same public entry point, but with a more eager and more compiled implementation.
.etMaterialize()	Materialize the lazy chunk store into canonical event-table rows.	reaches a realized table earlier in the workflow.
.etExpandAddl()	Expand addl dosing records in pure R.	relied more on compiled event helpers.
etSeq() / etRbind() / etRep()	Compose, bind, and repeat event tables without the older Rcpp sequence helpers.	used exported Rcpp helpers such as etSeq_ and etRep_.
.etFixCmtForSolve()	Normalize character cmt values before solver handoff.	Not needed in the same way before the new lazy/data-frame path exposed more mixed input cases.
vec_proxy.rxEt() / vec_restore.rxEt()	Round-trip rxEt through vctrs/dplyr-style reconstruction.	No comparable vctrs restore layer on main.

Motivation compared with old event handling

The branch makes sense if the goal is to optimize for correctness, maintainability, and interoperability of the event-table layer.

Compared with the old event table handling, the motivations are:

Shrink the Rcpp surface area for event authoring. Fewer exported compiled helpers makes debugging and incremental fixes easier.
Represent event tables as a data-frame subtype. That makes rxEt behave more naturally with modern R tooling.
Delay work until the user actually needs a realized table. This favors construction-heavy pipelines.
Keep the solver path compiled. The branch does not give up the performance benefits of compiled translation and solving where they matter most.
Fix complex edge cases and bugs. The refactor directly addressed several reported issues including #721, #722, #723, #724, #725, #732, and #858.

In practice, that is why the branch has been able to absorb many bug fixes in the event-table layer without changing the public tests.

Relative strengths and weaknesses

strengths <- data.frame(
  aspect = c(
    "Construction model",
    "Composability",
    "Interoperability",
    "Maintainability",
    "Materialization cost",
    "Repeat-heavy workflows",
    "Solve handoff overhead"
  ),
  new = c(
    "Lazy chunked representation is well suited to pipelines that keep appending or combining events.",
    "Pure-R etSeq()/etRbind() are easier to reason about and modify.",
    "Much stronger vctrs/data.frame/tibble behavior.",
    "More logic is visible and testable in R.",
    "Usually slower because full rows are built later and in one step.",
    "Currently slower in the benchmark below.",
    "Slightly slower after warm-up in the benchmark below."
  ),
  old = c(
    "More eager/compiled representation; less lazy bookkeeping.",
    "Compiled helpers are fast when the operation already matches the older model well.",
    "Weaker integration with newer data-frame reconstruction patterns.",
    "Harder to debug because more behavior crosses the R/C++ boundary.",
    "Usually faster because the table is already closer to realized form.",
    "Currently faster in the benchmark below.",
    "Currently a little faster in the benchmark below."
  ),
  stringsAsFactors = FALSE
)
knitr::kable(strengths, col.names = c("Aspect", "new", "old"))

Aspect	new	old
Construction model	Lazy chunked representation is well suited to pipelines that keep appending or combining events.	More eager/compiled representation; less lazy bookkeeping.
Composability	Pure-R etSeq()/etRbind() are easier to reason about and modify.	Compiled helpers are fast when the operation already matches the older model well.
Interoperability	Much stronger vctrs/data.frame/tibble behavior.	Weaker integration with newer data-frame reconstruction patterns.
Maintainability	More logic is visible and testable in R.	Harder to debug because more behavior crosses the R/C++ boundary.
Materialization cost	Usually slower because full rows are built later and in one step.	Usually faster because the table is already closer to realized form.
Repeat-heavy workflows	Currently slower in the benchmark below.	Currently faster in the benchmark below.
Solve handoff overhead	Slightly slower after warm-up in the benchmark below.	Currently a little faster in the benchmark below.

In short:

The new implementation is stronger when event tables are still being built or combined which is the most common use-case for et() event tables.
The integration of ALTREP and homogeneous tables significantly boosts memory efficiency and solver performance for multi-subject populations.

Benchmark design

The benchmark intentionally measures common event-table workflows instead of isolated micro-ops:

add_dosing_sampling: build with the mutable method API
vector_build: build large event tables from vectorized et() calls
sequence_tables: combine many event tables with c()
repeat_table: repeat a regimen with rep()
materialize_table: force a full as.data.frame(..., all=TRUE)
solve_handoff: solve a simple PK model after the event table is built
issue_724_large_sampling: append sampling records for a large subject-by-dose dataset based on Issue #724

The measurements below were collected in the same Linux environment from which this vignette was written, after warming the package once in each branch. They should be read as relative comparisons, not absolute promises for every machine or every regimen.

benchmark_results <- data.frame(
  scenario = c(
    "add_dosing_sampling",
    "vector_build",
    "sequence_tables",
    "repeat_table",
    "materialize_table",
    "solve_handoff",
    "issue_724_large_sampling"
  ),
  new_median_s = c(0.1275, 0.0104, 0.3130, 0.4240, 0.00199, 0.0260, 1.9210),
  old_median_s = c(0.1055, 0.1396, 0.6485, 0.1640, 0.00003, 0.0200, 80.2900),
  stringsAsFactors = FALSE
)

benchmark_results$winner <- ifelse(
  benchmark_results$new_median_s < benchmark_results$old_median_s,
  "refactor-et",
  "main"
)
benchmark_results$relative_result <- c(
  "old ~1.2x faster",
  "new ~13.4x faster",
  "new ~2.1x faster",
  "old ~2.6x faster",
  "old ~66x faster",
  "old ~1.3x faster",
  "new ~41.8x faster"
)

knitr::kable(
  benchmark_results,
  col.names = c("Scenario", "`refactor-et` median (s)", "`main` median (s)", "Winner", "Relative result")
)

Scenario	`refactor-et` median (s)	`main` median (s)	Winner	Relative result
add_dosing_sampling	0.12750	0.10550	main	old ~1.2x faster
vector_build	0.01040	0.13960	refactor-et	new ~13.4x faster
sequence_tables	0.31300	0.64850	refactor-et	new ~2.1x faster
repeat_table	0.42400	0.16400	main	old ~2.6x faster
materialize_table	0.00199	0.00003	main	old ~66x faster
solve_handoff	0.02600	0.02000	main	old ~1.3x faster
issue_724_large_sampling	1.92100	80.29000	refactor-et	new ~41.8x faster

Interpreting the benchmark

The benchmark is consistent with the architectural choices:

Where the new event table performs better

Large vectorized construction

The lazy chunk-based representation is very effective when many event rows are appended before the table is forced into canonical form.
Sequencing many tables

The pure-R sequence/bind logic avoids some of the overhead of the older compiled sequence path and seems to benefit from operating on chunked event-table state.
Large appended datasets (Issue #724)

This is one of the clearest cases motivating the branch. The workload in Issue #724 constructs a large subject-by-dose dataset and then appends a dense sampling schedule. On main, the event table behaves more like an already-realized data frame, so repeatedly growing it becomes very expensive at larger sizes. On refactor-et, the event table remains a proxy backed by chunked state until later, so the same workflow is much cheaper.

Where the old method performs better

Mutable method API (add.dosing()/add.sampling())

The branch keeps compatibility, but the new path currently pays a small overhead relative to the older implementation.
rep()

Repetition is still a strong case for the older compiled approach.
Forced materialization

This is the clearest cost of the branch. Since refactor-et postpones realization, the eventual as.data.frame(..., all=TRUE) step is much more expensive.
Final solve handoff

Once the event table is already built, main retains a modest edge in the benchmarked solve path.

Large-scale benchmark from Issue #724

Issue #724 is especially useful because it shows the effect of event-table architecture on a workload that becomes much larger than the smaller benchmark scenarios above.

The reported workload is:

Build a subject-by-dose data set with 3 dose groups
Create one event table row per subject for dosing
Append a dense sampling grid from 0 to 168 hours by 0.5 hours

On the original main-branch implementation, the elapsed time rose rapidly with subject count:

issue_724_scaling <- data.frame(
  n_subj = c(100, 200, 300, 400, 500, 600, 800, 1000),
  old_elapsed_s = c(0.31, 2.28, 7.38, 13.1, 20.1, 28.7, 51.0, 80.29)
)

knitr::kable(
  issue_724_scaling,
  col.names = c("Subjects per dose group", "`main` elapsed (s)")
)

Subjects per dose group	`main` elapsed (s)
100	0.31
200	2.28
300	7.38
400	13.10
500	20.10
600	28.70
800	51.00
1000	80.29

For the exact 1000-subject case from the issue, a later run on this new event table code reported:

issue_724_comparison <- data.frame(
  branch = c("old", "new"),
  elapsed_s = c(80.29, 1.921),
  source = c(
    "Issue #724 original report",
    "Issue #724 follow-up comment after the et() rewrite"
  ),
  stringsAsFactors = FALSE
)

issue_724_comparison$relative <- c(
  "baseline",
  "about 41.8x faster than main"
)

knitr::kable(
  issue_724_comparison,
  col.names = c("Branch", "Elapsed (s)", "Source", "Relative result")
)

Branch	Elapsed (s)	Source	Relative result
old	80.290	Issue #724 original report	baseline
new	1.921	Issue #724 follow-up comment after the et() rewrite	about 41.8x faster than main

This benchmark is more compelling than a small microbenchmark because it exercises exactly the pattern that motivated the refactor:

The old event table keeps the event table closer to a realized data frame, so very large repeated row growth becomes increasingly expensive
The new event table keeps the event table as a proxy backed by chunked state and delays full data-frame realization until a boundary like printing, as.data.frame(), or solver handoff

That does not mean the new event table wins every benchmark. The earlier table still shows that forced materialization and some final execution paths can remain faster on the old event table. But Issue #724 demonstrates why the branch architecture can produce very large wins on bigger event-table authoring workloads.

Overall assessment

Since the goal is a cleaner, more maintainable, more interoperable event table architecture, the new event handling code is a stronger design. The branch makes the event-table layer easier to understand in R, easier to test, and faster in the construction/composition workloads that fit its lazy design.

Session information

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39     desc_1.4.3        R6_2.6.1          fastmap_1.2.0    
#>  [5] xfun_0.57         cachem_1.1.0      knitr_1.51        htmltools_0.5.9  
#>  [9] rmarkdown_2.31    lifecycle_1.0.5   cli_3.6.6         sass_0.4.10      
#> [13] pkgdown_2.2.0     textshaping_1.0.5 jquerylib_0.1.4   systemfonts_1.3.2
#> [17] compiler_4.6.0    tools_4.6.0       ragg_1.5.2        bslib_0.11.0     
#> [21] evaluate_1.0.5    yaml_2.3.12       otel_0.2.0        jsonlite_2.0.0   
#> [25] rlang_1.2.0       fs_2.1.0          htmlwidgets_1.6.4

2026-05-23