Title: | Kaplan-Meier Estimator with Constraints for Right Censored Data -- a Recursive Computational Algorithm |
---|---|
Description: | Given constraints for right censored data, we use a recursive computational algorithm to calculate the the "constrained" Kaplan-Meier estimator. The constraint is assumed given in linear estimating equations or mean functions. We also illustrate how this leads to the empirical likelihood ratio test with right censored data and accelerated failure time model with given coefficients. EM algorithm from emplik package is used to get the initial value. The properties and performance of the EM algorithm is discussed in Mai Zhou and Yifan Yang (2015)<doi: 10.1007/s00180-015-0567-9> and Mai Zhou and Yifan Yang (2017) <doi: 10.1002/wics.1400>. More applications could be found in Mai Zhou (2015) <doi: 10.1201/b18598>. |
Authors: | Yifan Yang [aut, cre, cph], Mai Zhou [aut] |
Maintainer: | Yifan Yang <[email protected]> |
License: | LGPL-3 |
Version: | 0.4-3 |
Built: | 2025-01-30 05:38:18 UTC |
Source: | https://github.com/yfyang86/kmc |
To derive the empirical likelihood with constraints, we need to make sure there are solutions. Dines' method is used here to check whether the linear constraintsare proper or not.
check_G_mat(gmat)
check_G_mat(gmat)
gmat |
A p by n. Here p is the number of constraints, n is the number of observations. The matrix is defined in <doi: 10.1201/b18598>. |
flg |
A flag: - 0: not proper - 1: proper |
Yifan Yang([email protected])
Dines, L. L. (1926). On positive solutions of a system of linear equations Annals of Mathematics pages 386–392
Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics. Online ISSN 1613-9658.
#### A Proper Example #### x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) f1 <-function(x) { x - 3.7} f2 <- function(x) {x^2 - 16.5 } g <- list(f1, f2) re = kmc.clean(x, d) p = length(g) n = length(re$kmc.time) gmat<-matrix(0, p, n); for(i in 1:p){ gmat[i,] = g[[i]](re$kmc.time) } # You may want to require(Rcpp) on some platforms (such Mac OSX-ARM) # library(Rcpp) # check_G_mat(gmat)
#### A Proper Example #### x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) f1 <-function(x) { x - 3.7} f2 <- function(x) {x^2 - 16.5 } g <- list(f1, f2) re = kmc.clean(x, d) p = length(g) n = length(re$kmc.time) gmat<-matrix(0, p, n); for(i in 1:p){ gmat[i,] = g[[i]](re$kmc.time) } # You may want to require(Rcpp) on some platforms (such Mac OSX-ARM) # library(Rcpp) # check_G_mat(gmat)
Use the empirical likelihood ratio and Wilks theorem to test if the regression coefficient equals beta.
with constraints
Instead of EM algorithm, this function calculates the Kaplan-Meier estimator with mean constraints recursively to test in the accelerated failure time model:
where is distribution free.
kmc.bjtest(y, d, x, beta,init.st="naive")
kmc.bjtest(y, d, x, beta,init.st="naive")
y |
Response variable vector (length n). |
d |
Status vector (length n), 0: right censored; 1 uncensored. |
x |
n by p explanatory variable matrix. |
beta |
The value of the regression coeffiecnt vector (length p) to be tested. |
init.st |
Type of methods to initialize the algorithm. By default, init.st is set to 1/n |
The empirical likelihood is the likelihood of the error term when the coefficients are specified. Model assumptions are the same as requirements of a standard Buckley-James estimator.
a list with the following components:
prob |
the probabilities that max the empirical likelihood under estimating equation. |
logel1 |
the log empirical likelihood without constraints, i.e. under Kaplan-Merier of residuals' |
logel2 |
the log empirical likelihood with constraints, i.e. under null hypotheses or estimation equations. |
"-2LLR" |
the -2 loglikelihood ratio; have approximate chisq distribution under null hypotheses |
convergence |
an indicator: 0: fails to converge 1: converged |
Mai Zhou([email protected]), Yifan Yang([email protected])
Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika, 66 429-36
Zhou, M., & Li, G. (2008). Empirical likelihood analysis of the Buckley-James estimator. Journal of multivariate analysis, 99(4), 649-664.
Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics. Online ISSN 1613-9658.
library(survival) stanford5 <- stanford2[!is.na(stanford2$t5), ] y <- log10(stanford5$time) d <- stanford5$status oy <- order(y, -d) d <- d[oy] y <- y[oy] x <- cbind(1, stanford5$age)[oy,] beta0 <- c(3.2, -0.015) ss <- kmc.bjtest(y, d, x=x, beta=beta0, init.st="naive")
library(survival) stanford5 <- stanford2[!is.na(stanford2$t5), ] y <- log10(stanford5$time) d <- stanford5$status oy <- order(y, -d) d <- d[oy] y <- y[oy] x <- cbind(1, stanford5$age)[oy,] beta0 <- c(3.2, -0.015) ss <- kmc.bjtest(y, d, x=x, beta=beta0, init.st="naive")
The kmc.clean function clean the (kmc.time, delta) for the randomized censored data:
- Reorder the data according to the observed time and status;
- Clean the (right) censored data point(s) if they happen before the first uncesored data.
- If there are ties in the data. For the time points contain ties, e.g.
, we re-arranged the data in a manner that those with d=1 are ordered ahead of those with d=0. As d=0 indicates the data point is right censored, such procedure is trivial.
kmc.clean(kmc.time, delta)
kmc.clean(kmc.time, delta)
kmc.time |
Non-negative real vector. The observed time. |
delta |
0/1 vector. Censoring status indictator, 0: right censored; 1 uncensored |
A list with the following components:
kmc.time |
The cleaned observed time. |
delta |
The cleaned censoring status indictator, 0: right censored; 1 uncensored |
Yifan Yang([email protected])
Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics Online ISSN 1613-9658.
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) kmc.clean(x, d)
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) kmc.clean(x, d)
This function calculate the Kaplan-Meier estimator with mean constraints recursively.
with constraints
It uses Lagrange multiplier directly.
kmc.solve(x, d, g, em.boost = T, using.num = T, using.Fortran = T, using.C = F, tmp.tag = T, rtol = 1e-09, control = list(nr.it = 20, nr.c = 1, em.it = 3),...)
kmc.solve(x, d, g, em.boost = T, using.num = T, using.Fortran = T, using.C = F, tmp.tag = T, rtol = 1e-09, control = list(nr.it = 20, nr.c = 1, em.it = 3),...)
x |
Non-negative real vector. The observed time. |
d |
0/1 vector. Censoring status indictator, 0: right censored; 1 uncensored |
g |
list of contraint functions. It should be a list of functions list(f1,f2,...) |
em.boost |
A logical value. It determines whether the EM algorithm is used to get the initial value, default=TRUE. See 'Details' for EM control. |
using.num |
A logical value. It determines whether the numeric derivatives is used in iterations, default=TRUE. |
using.Fortran |
A logical value. It determines whether Fortran is used in root solving, default=F. |
using.C |
A logical value. It determines whether to use Rcpp in iteraruib, default=T. This option will promote the computational efficiency of the KMC algorithm. Development version works on one constraint only, otherwise it will generate a Error information. It won't work on using.num=F. |
tmp.tag |
Development version needs it, keep it as TRUE. |
rtol |
Tolerance used in rootSolve(multiroot) package, see 'rootSolve::multiroot'. |
control |
A list. The entry nr.it controls max iterations allowed in N-R algorithm default=20; nr.c is the scaler used in N-R algorithm default=1; em.it is max iteration if use EM algorithm (em.boost) to get the initial value of lambda, default=3. |
... |
Unspecified yet. |
The function check_G_mat checks whether the solution space is null or not under the constraint. But due to the computational complexity, it will detect at most two conditions.
A list with the following components:
loglik.ha |
The log empirical likelihood without constraints |
loglik.h0 |
The log empirical likelihood with constraints |
"-2LLR" |
The -2 Log empirical likelihood ratio |
phat |
|
pvalue |
The p-value of the test |
df |
Degree(s) of freedom. It equals the number of constraints. |
lambda |
The lambda is the Lagrangian multiplier described in reference. |
Mai Zhou([email protected]), Yifan Yang([email protected])
Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics Online ISSN 1613-9658.
# positive time x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) # status censored/uncensored d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) ################# # dim =1 ################# f <- function(x) {x-3.7} # \sum f(ti) wi ~ 0 g <- list(f=f) ; #define constraint as a list kmc.solve(x, d, g) ; #using default kmc.solve(x, d, g, using.C=TRUE) ; #using Rcpp ################# # dim =2 ################# myfun5 <- function( x) { x^2-16.5 } g <- list( f1=f,f2=myfun5) ; #define constraint as a list re0 <- kmc.solve(x,d,g); ################################################### # Print Estimation and other information # with option: digits = 5 ################################################### #' Print kmc object #' #' @param x kmc object #' @param digits minimal number of significant digits, see print.default. f_print <- function(x, digits = 5){ cat("\n----------------------------------------------------------------\n") cat("A Recursive Formula for the Kaplan-Meier Estimator with Constraint\n") cat("Information:\n") cat("Number of Constraints:\t", length(x$g), "\n") cat("lamda(s):\t", x$lambda,'\n'); cat("\n----------------------------------------------------------------\n") names <- c("Log-likelihood(Ha)", "Log-likelihood(H0)", "-2LLR", paste("p-Value(df=", length(x$g), ")",sep = "")) re <- matrix(c(x[[1]], x[[2]], x[[3]], 1 - pchisq(x[[3]], length(x$g))), nrow = 1) colnames(re) <- names rownames(re) <- "Est" print.default(format(re, digits = digits), print.gap = 2, quote = FALSE, df = length(x$g)) cat("------------------------------------------------------------------\n") } f_print(re0)
# positive time x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) # status censored/uncensored d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) ################# # dim =1 ################# f <- function(x) {x-3.7} # \sum f(ti) wi ~ 0 g <- list(f=f) ; #define constraint as a list kmc.solve(x, d, g) ; #using default kmc.solve(x, d, g, using.C=TRUE) ; #using Rcpp ################# # dim =2 ################# myfun5 <- function( x) { x^2-16.5 } g <- list( f1=f,f2=myfun5) ; #define constraint as a list re0 <- kmc.solve(x,d,g); ################################################### # Print Estimation and other information # with option: digits = 5 ################################################### #' Print kmc object #' #' @param x kmc object #' @param digits minimal number of significant digits, see print.default. f_print <- function(x, digits = 5){ cat("\n----------------------------------------------------------------\n") cat("A Recursive Formula for the Kaplan-Meier Estimator with Constraint\n") cat("Information:\n") cat("Number of Constraints:\t", length(x$g), "\n") cat("lamda(s):\t", x$lambda,'\n'); cat("\n----------------------------------------------------------------\n") names <- c("Log-likelihood(Ha)", "Log-likelihood(H0)", "-2LLR", paste("p-Value(df=", length(x$g), ")",sep = "")) re <- matrix(c(x[[1]], x[[2]], x[[3]], 1 - pchisq(x[[3]], length(x$g))), nrow = 1) colnames(re) <- names rownames(re) <- "Est" print.default(format(re, digits = digits), print.gap = 2, quote = FALSE, df = length(x$g)) cat("------------------------------------------------------------------\n") } f_print(re0)
This function calculate the Kaplan-Meier estimator with mean constraints recursively.
with constraints
It uses Lagrange multiplier directly. This function is a lite version of kmc.solve.
kmc.solve(x, d, g, rtol = 1e-09, control = list(nr.it = 20, nr.c = 1, em.it = 3),...)
kmc.solve(x, d, g, rtol = 1e-09, control = list(nr.it = 20, nr.c = 1, em.it = 3),...)
x |
Non-negative real vector. The observed time. |
d |
0/1 vector. Censoring status indictator, 0: right censored; 1 uncensored |
g |
list of contraint functions. It should be a list of functions list(f1,f2,...) |
rtol |
Tolerance used in rootSolve(multiroot) package, see 'rootSolve::multiroot'. |
control |
A list. The entry nr.it controls max iterations allowed in N-R algorithm default=20; nr.c is the scaler used in N-R algorithm default=1; em.it is max iteration if use EM algorithm (em.boost) to get the initial value of lambda, default=3. |
... |
Unspecified yet. |
The function check_G_mat checks whether the solution space is null or not under the constraint. But due to the computational complexity, it will detect at most two conditions.
A list with the following components:
loglik.ha |
The log empirical likelihood without constraints |
loglik.h0 |
The log empirical likelihood with constraints |
"-2LLR" |
The -2 Log empirical likelihood ratio |
phat |
|
pvalue |
The p-value of the test |
df |
Degree(s) of freedom. It equals the number of constraints. |
lambda |
The lambda is the Lagrangian multiplier described in reference. |
Mai Zhou([email protected]), Yifan Yang([email protected])
Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics Online ISSN 1613-9658.
# positive time x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) # status censored/uncensored d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) ################# # dim =1 ################# f <- function(x) {x-3.7} # \sum f(ti) wi ~ 0 g <- list(f=f) #define constraint as a list kmc.solve(x, d, g) #using default kmc.solvelite(x, d, g) #using default ################# # dim =2 ################# myfun5 <- function( x) { x^2-16.5 } g <- list(f1=f, f2=myfun5) ; #define constraint as a list re0 <- kmc.solvelite(x, d, g); ################################################### # Real Data: Stanford Heart Transplant ################################################### # library(survival) # stanford5 <- stanford2[!is.na(stanford2$t5), ] # y=log10(stanford5$time) # d <- stanford5$status # g <- list(f = function(x) {(x-2.4 < 0.) - 0.5} ) # \sum I(x_i<2.4) wi = 0.5) # kmc.solvelite(y, d, g) ################################## ## Example (Cont'd): -2LLR Curve ################################## # iters = 100 # scale = 2. # starter = 2.5 # result = rep(0., 20) # observe_range = starter + (1:iters)/(iters * scale) # for (i in 1:iters){ # g <- list(f = function(x) {(x- (starter + i/(iters * scale)) < 0.) - 0.5}) # result[i] = kmc.solvelite(y, d, g)$`-2LLR` # } # plot(x = observe_range, result, xlab = 'time', ylab = '-2LLR', type = 'b')
# positive time x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) # status censored/uncensored d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) ################# # dim =1 ################# f <- function(x) {x-3.7} # \sum f(ti) wi ~ 0 g <- list(f=f) #define constraint as a list kmc.solve(x, d, g) #using default kmc.solvelite(x, d, g) #using default ################# # dim =2 ################# myfun5 <- function( x) { x^2-16.5 } g <- list(f1=f, f2=myfun5) ; #define constraint as a list re0 <- kmc.solvelite(x, d, g); ################################################### # Real Data: Stanford Heart Transplant ################################################### # library(survival) # stanford5 <- stanford2[!is.na(stanford2$t5), ] # y=log10(stanford5$time) # d <- stanford5$status # g <- list(f = function(x) {(x-2.4 < 0.) - 0.5} ) # \sum I(x_i<2.4) wi = 0.5) # kmc.solvelite(y, d, g) ################################## ## Example (Cont'd): -2LLR Curve ################################## # iters = 100 # scale = 2. # starter = 2.5 # result = rep(0., 20) # observe_range = starter + (1:iters)/(iters * scale) # for (i in 1:iters){ # g <- list(f = function(x) {(x- (starter + i/(iters * scale)) < 0.) - 0.5}) # result[i] = kmc.solvelite(y, d, g)$`-2LLR` # } # plot(x = observe_range, result, xlab = 'time', ylab = '-2LLR', type = 'b')
Given a kmc object, this function will produce contour plot if there were two constraints.
plotkmc2D(resultkmc, flist=list(f1=function(x){x}, f2=function(x){x^2}), range0=c(0.2, 3,20))
plotkmc2D(resultkmc, flist=list(f1=function(x){x}, f2=function(x){x^2}), range0=c(0.2, 3,20))
resultkmc |
S3 Object of kmcS3. |
flist |
list of two functions,flist=list( f1=function( x ) x ,f2=function( x ) x^2 ) |
range0 |
A vector that helps to determine the range of the contour plot, i.e (center[1]-range0[1], center[2]-range0[2]) to (center+range0[1], center[2]+range0[2]). The third parameter defines the number of grids would be used. |
X |
x.grid |
Y |
y.grid |
Z |
grid value |
Yifan Yang([email protected])
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) f<-function( x) { x-3.7} myfun5 <- function( x) { x^2-16.5 } # construnct g as a LIST! g=list( f1=f,f2=myfun5) ; kmc.solve( x,d,g) ->re0; #plotkmc2D(re0) ->ZZ; # run this to generate contour plot #Advanced PLOT option using ggplot2: not run #library(reshape2) #volcano3d <- melt(ZZ$Z) #names(volcano3d) <- c("x", "y", "z") #volcano3d$x <- ZZ$X[volcano3d$x]; #volcano3d$y <- ZZ$Y[volcano3d$y]; #### Plot: use ggplot2 #### #library(ggplot2) # v <- ggplot(volcano3d, aes(x, y, z=z)); # v + geom_tile(aes(fill = z)) + # stat_contour()+ # scale_fill_gradientn("Custom Colours",colours=grey.colors(10)); #### Plot: use qplot #### #qplot(x, y, z = z, data = volcano3d, stat = "contour", geom = "path")
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) f<-function( x) { x-3.7} myfun5 <- function( x) { x^2-16.5 } # construnct g as a LIST! g=list( f1=f,f2=myfun5) ; kmc.solve( x,d,g) ->re0; #plotkmc2D(re0) ->ZZ; # run this to generate contour plot #Advanced PLOT option using ggplot2: not run #library(reshape2) #volcano3d <- melt(ZZ$Z) #names(volcano3d) <- c("x", "y", "z") #volcano3d$x <- ZZ$X[volcano3d$x]; #volcano3d$y <- ZZ$Y[volcano3d$y]; #### Plot: use ggplot2 #### #library(ggplot2) # v <- ggplot(volcano3d, aes(x, y, z=z)); # v + geom_tile(aes(fill = z)) + # stat_contour()+ # scale_fill_gradientn("Custom Colours",colours=grey.colors(10)); #### Plot: use qplot #### #qplot(x, y, z = z, data = volcano3d, stat = "contour", geom = "path")