Package 'kmc' reference manual

Title:	Kaplan-Meier Estimator with Constraints for Right Censored Data -- a Recursive Computational Algorithm
Description:	Given constraints for right censored data, we use a recursive computational algorithm to calculate the the "constrained" Kaplan-Meier estimator. The constraint is assumed given in linear estimating equations or mean functions. We also illustrate how this leads to the empirical likelihood ratio test with right censored data and accelerated failure time model with given coefficients. EM algorithm from emplik package is used to get the initial value. The properties and performance of the EM algorithm is discussed in Mai Zhou and Yifan Yang (2015)<doi: 10.1007/s00180-015-0567-9> and Mai Zhou and Yifan Yang (2017) <doi: 10.1002/wics.1400>. More applications could be found in Mai Zhou (2015) <doi: 10.1201/b18598>.
Authors:	Yifan Yang [aut, cre, cph], Mai Zhou [aut]
Maintainer:	Yifan Yang <[email protected]>
License:	LGPL-3
Version:	0.4-3
Built:	2025-03-01 05:37:15 UTC
Source:	https://github.com/yfyang86/kmc

Check the contraints of KMC

Description

To derive the empirical likelihood with constraints, we need to make sure there are solutions. Dines' method is used here to check whether the linear constraintsare proper or not.

Usage

check_G_mat(gmat)
check_G_mat(gmat)

Arguments

gmat

A p by n. Here p is the number of constraints, n is the number of observations. The matrix is defined in <doi: 10.1201/b18598>.

Value

flg

A flag: - 0: not proper - 1: proper

Author(s)

Yifan Yang([email protected])

References

Dines, L. L. (1926). On positive solutions of a system of linear equations Annals of Mathematics pages 386–392

Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics. Online ISSN 1613-9658.

Examples

#### A Proper Example ####
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) 
f1 <-function(x) { x - 3.7} 
f2 <- function(x) {x^2 - 16.5 } 
g <- list(f1, f2)
re = kmc.clean(x, d)
p = length(g)
n = length(re$kmc.time)
gmat<-matrix(0, p, n);
for(i in 1:p){
  gmat[i,] = g[[i]](re$kmc.time)
  }
# You may want to require(Rcpp) on some platforms (such Mac OSX-ARM)
# library(Rcpp)
# check_G_mat(gmat)
#### A Proper Example ####
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) 
f1 <-function(x) { x - 3.7} 
f2 <- function(x) {x^2 - 16.5 } 
g <- list(f1, f2)
re = kmc.clean(x, d)
p = length(g)
n = length(re$kmc.time)
gmat<-matrix(0, p, n);
for(i in 1:p){
  gmat[i,] = g[[i]](re$kmc.time)
  }
# You may want to require(Rcpp) on some platforms (such Mac OSX-ARM)
# library(Rcpp)
# check_G_mat(gmat)

Calculate the NPMLE with constriants for accelerated failure time model with given coefficients.

Description

Use the empirical likelihood ratio and Wilks theorem to test if the regression coefficient equals beta.

$El(F)=\prod_{i=1}^{n}(\Delta F(T_i))^{\delta_i}(1-F(T_i))^{1-\delta_i}$

with constraints

$\sum_i g(T_i)\Delta F(T_i)=0,\quad,i=1,2,\ldots$

Instead of EM algorithm, this function calculates the Kaplan-Meier estimator with mean constraints recursively to test $H_0:~\beta=\beta_0$ in the accelerated failure time model:

$\log(T_i) = y_i = x_i\beta^\top + \epsilon_i,$

where $\epsilon$ is distribution free.

Usage

kmc.bjtest(y, d, x, beta,init.st="naive")
kmc.bjtest(y, d, x, beta,init.st="naive")

Arguments

`y`	Response variable vector (length n).
`d`	Status vector (length n), 0: right censored; 1 uncensored.
`x`	n by p explanatory variable matrix.
`beta`	The value of the regression coeffiecnt vector (length p) to be tested.
`init.st`	Type of methods to initialize the algorithm. By default, init.st is set to 1/n

Details

The empirical likelihood is the likelihood of the error term when the coefficients are specified. Model assumptions are the same as requirements of a standard Buckley-James estimator.

Value

a list with the following components:

`prob`	the probabilities that max the empirical likelihood under estimating equation.
`logel1`	the log empirical likelihood without constraints, i.e. under Kaplan-Merier of residuals'
`logel2`	the log empirical likelihood with constraints, i.e. under null hypotheses or estimation equations.
`"-2LLR"`	the -2 loglikelihood ratio; have approximate chisq distribution under null hypotheses
`convergence`	an indicator: 0: fails to converge 1: converged

Author(s)

Mai Zhou([email protected]), Yifan Yang([email protected])

References

Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika, 66 429-36

Zhou, M., & Li, G. (2008). Empirical likelihood analysis of the Buckley-James estimator. Journal of multivariate analysis, 99(4), 649-664.

Examples

library(survival)
stanford5 <- stanford2[!is.na(stanford2$t5), ]
y <- log10(stanford5$time)
d <- stanford5$status
oy <- order(y, -d)
d <- d[oy]
y <- y[oy]
x <- cbind(1, stanford5$age)[oy,]
beta0  <-  c(3.2, -0.015)
ss  <-  kmc.bjtest(y, d, x=x, beta=beta0, init.st="naive")
library(survival)
stanford5 <- stanford2[!is.na(stanford2$t5), ]
y <- log10(stanford5$time)
d <- stanford5$status
oy <- order(y, -d)
d <- d[oy]
y <- y[oy]
x <- cbind(1, stanford5$age)[oy,]
beta0  <-  c(3.2, -0.015)
ss  <-  kmc.bjtest(y, d, x=x, beta=beta0, init.st="naive")

Perform Data Clean for the kmc Algorithm

Description

The kmc.clean function clean the (kmc.time, delta) for the randomized censored data:

- Reorder the data according to the observed time and status;

- Clean the (right) censored data point(s) if they happen before the first uncesored data.

- If there are ties in the data. For the time points contain ties, e.g.

$(T_{i_s}, d_{i_s}), i_s \in S \forall j \in S, T_{j} \equiv T$

, we re-arranged the data in a manner that those with d=1 are ordered ahead of those with d=0. As d=0 indicates the data point is right censored, such procedure is trivial.

Usage

kmc.clean(kmc.time, delta)
kmc.clean(kmc.time, delta)

Arguments

`kmc.time`	Non-negative real vector. The observed time.
`delta`	0/1 vector. Censoring status indictator, 0: right censored; 1 uncensored

Value

A list with the following components:

`kmc.time`	The cleaned observed time.
`delta`	The cleaned censoring status indictator, 0: right censored; 1 uncensored

Author(s)

Yifan Yang([email protected])

References

Examples

x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) 
kmc.clean(x, d)
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) 
kmc.clean(x, d)

Calculate NPMLE with constriants for right censored data

Description

This function calculate the Kaplan-Meier estimator with mean constraints recursively.

$El(F)=\prod_{i=1}^{n}(\Delta F(T_i))^{\delta_i}(1-F(T_i))^{1-\delta_i}$

with constraints

$\sum_i g(T_i)\Delta F(T_i)=0,\quad,i=1,2,...$

It uses Lagrange multiplier directly.

Usage

kmc.solve(x, d, g, em.boost = T, using.num = T, using.Fortran =
                 T, using.C = F, tmp.tag = T, rtol = 1e-09, control =
                 list(nr.it = 20, nr.c = 1, em.it = 3),...)
kmc.solve(x, d, g, em.boost = T, using.num = T, using.Fortran =
                 T, using.C = F, tmp.tag = T, rtol = 1e-09, control =
                 list(nr.it = 20, nr.c = 1, em.it = 3),...)

Arguments

`x`	Non-negative real vector. The observed time.
`d`	0/1 vector. Censoring status indictator, 0: right censored; 1 uncensored
`g`	list of contraint functions. It should be a list of functions list(f1,f2,...)
`em.boost`	A logical value. It determines whether the EM algorithm is used to get the initial value, default=TRUE. See 'Details' for EM control.
`using.num`	A logical value. It determines whether the numeric derivatives is used in iterations, default=TRUE.
`using.Fortran`	A logical value. It determines whether Fortran is used in root solving, default=F.
`using.C`	A logical value. It determines whether to use Rcpp in iteraruib, default=T. This option will promote the computational efficiency of the KMC algorithm. Development version works on one constraint only, otherwise it will generate a Error information. It won't work on using.num=F.
`tmp.tag`	Development version needs it, keep it as TRUE.
`rtol`	Tolerance used in rootSolve(multiroot) package, see 'rootSolve::multiroot'.
`control`	A list. The entry nr.it controls max iterations allowed in N-R algorithm default=20; nr.c is the scaler used in N-R algorithm default=1; em.it is max iteration if use EM algorithm (em.boost) to get the initial value of lambda, default=3.
`...`	Unspecified yet.

Details

The function check_G_mat checks whether the solution space is null or not under the constraint. But due to the computational complexity, it will detect at most two conditions.

Value

A list with the following components:

`loglik.ha`	The log empirical likelihood without constraints
`loglik.h0`	The log empirical likelihood with constraints
`"-2LLR"`	The -2 Log empirical likelihood ratio
`phat`	$\Delta F(T_i)$
`pvalue`	The p-value of the test
`df`	Degree(s) of freedom. It equals the number of constraints.
`lambda`	The lambda is the Lagrangian multiplier described in reference.

Author(s)

Mai Zhou([email protected]), Yifan Yang([email protected])

References

Examples

# positive time
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
# status censored/uncensored
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1)               

#################
# dim =1
#################

f <- function(x) {x-3.7}                     # \sum f(ti) wi ~ 0 
g <- list(f=f) ;                             #define constraint as a list

kmc.solve(x, d, g) ;                         #using default
kmc.solve(x, d, g, using.C=TRUE) ;           #using Rcpp

#################
# dim =2
#################

myfun5 <- function( x)  { 
 x^2-16.5
} 

g <- list( f1=f,f2=myfun5) ;                 #define constraint as a list

re0 <- kmc.solve(x,d,g);

###################################################
# Print Estimation and other information 
# with option: digits = 5
###################################################

#' Print kmc object
#' 
#' @param x kmc object
#' @param digits minimal number of significant digits, see print.default.
f_print <- function(x, digits = 5){
  cat("\n----------------------------------------------------------------\n")
  cat("A Recursive Formula for the Kaplan-Meier Estimator with Constraint\n")
  cat("Information:\n")
  cat("Number of Constraints:\t", length(x$g), "\n")
  cat("lamda(s):\t", x$lambda,'\n');
  cat("\n----------------------------------------------------------------\n")
  names <- c("Log-likelihood(Ha)", "Log-likelihood(H0)",
  "-2LLR", paste("p-Value(df=", length(x$g), ")",sep = ""))
  re <- matrix(c(x[[1]], x[[2]], x[[3]], 1 - pchisq(x[[3]],
  length(x$g))), nrow = 1)
  colnames(re) <- names
  rownames(re) <- "Est"
  print.default(format(re, digits = digits), print.gap = 2,
                quote = FALSE, df = length(x$g))
  cat("------------------------------------------------------------------\n")
}

f_print(re0)
# positive time
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
# status censored/uncensored
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1)               

#################
# dim =1
#################

f <- function(x) {x-3.7}                     # \sum f(ti) wi ~ 0 
g <- list(f=f) ;                             #define constraint as a list

kmc.solve(x, d, g) ;                         #using default
kmc.solve(x, d, g, using.C=TRUE) ;           #using Rcpp

#################
# dim =2
#################

myfun5 <- function( x)  { 
 x^2-16.5
} 

g <- list( f1=f,f2=myfun5) ;                 #define constraint as a list

re0 <- kmc.solve(x,d,g);

###################################################
# Print Estimation and other information 
# with option: digits = 5
###################################################

#' Print kmc object
#' 
#' @param x kmc object
#' @param digits minimal number of significant digits, see print.default.
f_print <- function(x, digits = 5){
  cat("\n----------------------------------------------------------------\n")
  cat("A Recursive Formula for the Kaplan-Meier Estimator with Constraint\n")
  cat("Information:\n")
  cat("Number of Constraints:\t", length(x$g), "\n")
  cat("lamda(s):\t", x$lambda,'\n');
  cat("\n----------------------------------------------------------------\n")
  names <- c("Log-likelihood(Ha)", "Log-likelihood(H0)",
  "-2LLR", paste("p-Value(df=", length(x$g), ")",sep = ""))
  re <- matrix(c(x[[1]], x[[2]], x[[3]], 1 - pchisq(x[[3]],
  length(x$g))), nrow = 1)
  colnames(re) <- names
  rownames(re) <- "Est"
  print.default(format(re, digits = digits), print.gap = 2,
                quote = FALSE, df = length(x$g))
  cat("------------------------------------------------------------------\n")
}

f_print(re0)

Calculate NPMLE with constriants for right censored data

Description

This function calculate the Kaplan-Meier estimator with mean constraints recursively.

$El(F)=\prod_{i=1}^{n}(\Delta F(T_i))^{\delta_i}(1-F(T_i))^{1-\delta_i}$

with constraints

$\sum_i g(T_i)\Delta F(T_i)=0,\quad,i=1,2,...$

It uses Lagrange multiplier directly. This function is a lite version of kmc.solve.

Usage

kmc.solve(x, d, g, rtol = 1e-09, 
          control = list(nr.it = 20, nr.c = 1, em.it = 3),...)
kmc.solve(x, d, g, rtol = 1e-09, 
          control = list(nr.it = 20, nr.c = 1, em.it = 3),...)

Arguments

`x`	Non-negative real vector. The observed time.
`d`	0/1 vector. Censoring status indictator, 0: right censored; 1 uncensored
`g`	list of contraint functions. It should be a list of functions list(f1,f2,...)
`rtol`	Tolerance used in rootSolve(multiroot) package, see 'rootSolve::multiroot'.
`control`	A list. The entry nr.it controls max iterations allowed in N-R algorithm default=20; nr.c is the scaler used in N-R algorithm default=1; em.it is max iteration if use EM algorithm (em.boost) to get the initial value of lambda, default=3.
`...`	Unspecified yet.

Details

The function check_G_mat checks whether the solution space is null or not under the constraint. But due to the computational complexity, it will detect at most two conditions.

Value

A list with the following components:

`loglik.ha`	The log empirical likelihood without constraints
`loglik.h0`	The log empirical likelihood with constraints
`"-2LLR"`	The -2 Log empirical likelihood ratio
`phat`	$\Delta F(T_i)$
`pvalue`	The p-value of the test
`df`	Degree(s) of freedom. It equals the number of constraints.
`lambda`	The lambda is the Lagrangian multiplier described in reference.

Author(s)

Mai Zhou([email protected]), Yifan Yang([email protected])

References

Examples

# positive time
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
# status censored/uncensored
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1)               

#################
# dim =1
#################

f <- function(x) {x-3.7}                  # \sum f(ti) wi ~ 0 
g <- list(f=f)                            #define constraint as a list

kmc.solve(x, d, g)                        #using default
kmc.solvelite(x, d, g)                    #using default

#################
# dim =2
#################

myfun5 <- function( x)  { 
 x^2-16.5
} 
g <- list(f1=f, f2=myfun5) ;              #define constraint as a list
re0 <- kmc.solvelite(x, d, g);

###################################################
# Real Data: Stanford Heart Transplant
###################################################
# library(survival)
# stanford5 <- stanford2[!is.na(stanford2$t5), ]
# y=log10(stanford5$time)
# d <- stanford5$status
# g <- list(f = function(x) {(x-2.4 < 0.) - 0.5} )  # \sum I(x_i<2.4) wi = 0.5)
# kmc.solvelite(y, d, g)
##################################
## Example (Cont'd): -2LLR Curve
##################################
# iters = 100
# scale = 2.
# starter = 2.5
# result = rep(0., 20)
# observe_range = starter + (1:iters)/(iters * scale)
# for (i in 1:iters){
#         g <- list(f = function(x) {(x- (starter + i/(iters * scale)) < 0.) - 0.5})
#         result[i] = kmc.solvelite(y, d, g)$`-2LLR`
# }
# plot(x = observe_range, result, xlab = 'time', ylab = '-2LLR', type = 'b')
# positive time
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
# status censored/uncensored
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1)               

#################
# dim =1
#################

f <- function(x) {x-3.7}                  # \sum f(ti) wi ~ 0 
g <- list(f=f)                            #define constraint as a list

kmc.solve(x, d, g)                        #using default
kmc.solvelite(x, d, g)                    #using default

#################
# dim =2
#################

myfun5 <- function( x)  { 
 x^2-16.5
} 
g <- list(f1=f, f2=myfun5) ;              #define constraint as a list
re0 <- kmc.solvelite(x, d, g);

###################################################
# Real Data: Stanford Heart Transplant
###################################################
# library(survival)
# stanford5 <- stanford2[!is.na(stanford2$t5), ]
# y=log10(stanford5$time)
# d <- stanford5$status
# g <- list(f = function(x) {(x-2.4 < 0.) - 0.5} )  # \sum I(x_i<2.4) wi = 0.5)
# kmc.solvelite(y, d, g)
##################################
## Example (Cont'd): -2LLR Curve
##################################
# iters = 100
# scale = 2.
# starter = 2.5
# result = rep(0., 20)
# observe_range = starter + (1:iters)/(iters * scale)
# for (i in 1:iters){
#         g <- list(f = function(x) {(x- (starter + i/(iters * scale)) < 0.) - 0.5})
#         result[i] = kmc.solvelite(y, d, g)$`-2LLR`
# }
# plot(x = observe_range, result, xlab = 'time', ylab = '-2LLR', type = 'b')

Plot the contour plot of log-likelihood around the H0 (dim=2).

Description

Given a kmc object, this function will produce contour plot if there were two constraints.

Usage

plotkmc2D(resultkmc, flist=list(f1=function(x){x}, f2=function(x){x^2}), 
          range0=c(0.2, 3,20))
plotkmc2D(resultkmc, flist=list(f1=function(x){x}, f2=function(x){x^2}), 
          range0=c(0.2, 3,20))

Arguments

`resultkmc`	S3 Object of kmcS3.
`flist`	list of two functions,flist=list( f1=function( x ) x ,f2=function( x ) x^2 )
`range0`	A vector that helps to determine the range of the contour plot, i.e (center[1]-range0[1], center[2]-range0[2]) to (center+range0[1], center[2]+range0[2]). The third parameter defines the number of grids would be used.

Value

`X`	x.grid
`Y`	y.grid
`Z`	grid value

Author(s)

Yifan Yang([email protected])

Examples


x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) 

f<-function( x) { x-3.7} 

myfun5 <- function( x)  { 
 x^2-16.5
} 
# construnct g as a LIST!

g=list( f1=f,f2=myfun5) ; 
kmc.solve( x,d,g) ->re0;

#plotkmc2D(re0) ->ZZ; # run this to generate contour plot
#Advanced PLOT option using ggplot2: not run
#library(reshape2)
#volcano3d <- melt(ZZ$Z)
#names(volcano3d) <- c("x", "y", "z")

#volcano3d$x <- ZZ$X[volcano3d$x];
#volcano3d$y <- ZZ$Y[volcano3d$y];

#### Plot: use ggplot2 ####
#library(ggplot2)
# v <- ggplot(volcano3d, aes(x, y, z=z));
# v + geom_tile(aes(fill = z)) + 
#   stat_contour()+
#   scale_fill_gradientn("Custom Colours",colours=grey.colors(10));

#### Plot: use qplot ####
#qplot(x, y, z = z, data = volcano3d, stat = "contour", geom = "path")
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) 

f<-function( x) { x-3.7} 

myfun5 <- function( x)  { 
 x^2-16.5
} 
# construnct g as a LIST!

g=list( f1=f,f2=myfun5) ; 
kmc.solve( x,d,g) ->re0;

#plotkmc2D(re0) ->ZZ; # run this to generate contour plot
#Advanced PLOT option using ggplot2: not run
#library(reshape2)
#volcano3d <- melt(ZZ$Z)
#names(volcano3d) <- c("x", "y", "z")

#volcano3d$x <- ZZ$X[volcano3d$x];
#volcano3d$y <- ZZ$Y[volcano3d$y];

#### Plot: use ggplot2 ####
#library(ggplot2)
# v <- ggplot(volcano3d, aes(x, y, z=z));
# v + geom_tile(aes(fill = z)) + 
#   stat_contour()+
#   scale_fill_gradientn("Custom Colours",colours=grey.colors(10));

#### Plot: use qplot ####
#qplot(x, y, z = z, data = volcano3d, stat = "contour", geom = "path")

Package 'kmc'

Help Index

Check the contraints of KMC

Description

Usage

Arguments

Value

Author(s)

References

Examples

Calculate the NPMLE with constriants for accelerated failure time model with given coefficients.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Perform Data Clean for the kmc Algorithm

Description

Usage

Arguments

Value

Author(s)

References

Examples

Calculate NPMLE with constriants for right censored data

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Calculate NPMLE with constriants for right censored data

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Plot the contour plot of log-likelihood around the H0 (dim=2).

Description

Usage

Arguments

Value

Author(s)

Examples