Package 'kmc'

Title: Kaplan-Meier Estimator with Constraints for Right Censored Data -- a Recursive Computational Algorithm
Description: Given constraints for right censored data, we use a recursive computational algorithm to calculate the the "constrained" Kaplan-Meier estimator. The constraint is assumed given in linear estimating equations or mean functions. We also illustrate how this leads to the empirical likelihood ratio test with right censored data and accelerated failure time model with given coefficients. EM algorithm from emplik package is used to get the initial value. The properties and performance of the EM algorithm is discussed in Mai Zhou and Yifan Yang (2015)<doi: 10.1007/s00180-015-0567-9> and Mai Zhou and Yifan Yang (2017) <doi: 10.1002/wics.1400>. More applications could be found in Mai Zhou (2015) <doi: 10.1201/b18598>.
Authors: Yifan Yang [aut, cre, cph], Mai Zhou [aut]
Maintainer: Yifan Yang <[email protected]>
License: LGPL-3
Version: 0.4-3
Built: 2025-01-30 05:38:18 UTC
Source: https://github.com/yfyang86/kmc

Help Index


Check the contraints of KMC

Description

To derive the empirical likelihood with constraints, we need to make sure there are solutions. Dines' method is used here to check whether the linear constraintsare proper or not.

Usage

check_G_mat(gmat)

Arguments

gmat

A p by n. Here p is the number of constraints, n is the number of observations. The matrix is defined in <doi: 10.1201/b18598>.

Value

flg

A flag: - 0: not proper - 1: proper

Author(s)

Yifan Yang([email protected])

References

Dines, L. L. (1926). On positive solutions of a system of linear equations Annals of Mathematics pages 386–392

Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics. Online ISSN 1613-9658.

Examples

#### A Proper Example ####
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) 
f1 <-function(x) { x - 3.7} 
f2 <- function(x) {x^2 - 16.5 } 
g <- list(f1, f2)
re = kmc.clean(x, d)
p = length(g)
n = length(re$kmc.time)
gmat<-matrix(0, p, n);
for(i in 1:p){
  gmat[i,] = g[[i]](re$kmc.time)
  }
# You may want to require(Rcpp) on some platforms (such Mac OSX-ARM)
# library(Rcpp)
# check_G_mat(gmat)

Calculate the NPMLE with constriants for accelerated failure time model with given coefficients.

Description

Use the empirical likelihood ratio and Wilks theorem to test if the regression coefficient equals beta.

El(F)=i=1n(ΔF(Ti))δi(1F(Ti))1δiEl(F)=\prod_{i=1}^{n}(\Delta F(T_i))^{\delta_i}(1-F(T_i))^{1-\delta_i}

with constraints

ig(Ti)ΔF(Ti)=0,,i=1,2,\sum_i g(T_i)\Delta F(T_i)=0,\quad,i=1,2,\ldots

Instead of EM algorithm, this function calculates the Kaplan-Meier estimator with mean constraints recursively to test H0: β=β0H_0:~\beta=\beta_0 in the accelerated failure time model:

log(Ti)=yi=xiβ+ϵi,\log(T_i) = y_i = x_i\beta^\top + \epsilon_i,

where ϵ\epsilon is distribution free.

Usage

kmc.bjtest(y, d, x, beta,init.st="naive")

Arguments

y

Response variable vector (length n).

d

Status vector (length n), 0: right censored; 1 uncensored.

x

n by p explanatory variable matrix.

beta

The value of the regression coeffiecnt vector (length p) to be tested.

init.st

Type of methods to initialize the algorithm. By default, init.st is set to 1/n

Details

The empirical likelihood is the likelihood of the error term when the coefficients are specified. Model assumptions are the same as requirements of a standard Buckley-James estimator.

Value

a list with the following components:

prob

the probabilities that max the empirical likelihood under estimating equation.

logel1

the log empirical likelihood without constraints, i.e. under Kaplan-Merier of residuals'

logel2

the log empirical likelihood with constraints, i.e. under null hypotheses or estimation equations.

"-2LLR"

the -2 loglikelihood ratio; have approximate chisq distribution under null hypotheses

convergence

an indicator:

0: fails to converge

1: converged

Author(s)

Mai Zhou([email protected]), Yifan Yang([email protected])

References

Buckley, J. and James, I. (1979). Linear regression with censored data. Biometrika, 66 429-36

Zhou, M., & Li, G. (2008). Empirical likelihood analysis of the Buckley-James estimator. Journal of multivariate analysis, 99(4), 649-664.

Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics. Online ISSN 1613-9658.

See Also

plotkmc2D, bjtest.

Examples

library(survival)
stanford5 <- stanford2[!is.na(stanford2$t5), ]
y <- log10(stanford5$time)
d <- stanford5$status
oy <- order(y, -d)
d <- d[oy]
y <- y[oy]
x <- cbind(1, stanford5$age)[oy,]
beta0  <-  c(3.2, -0.015)
ss  <-  kmc.bjtest(y, d, x=x, beta=beta0, init.st="naive")

Perform Data Clean for the kmc Algorithm

Description

The kmc.clean function clean the (kmc.time, delta) for the randomized censored data:

- Reorder the data according to the observed time and status;

- Clean the (right) censored data point(s) if they happen before the first uncesored data.

- If there are ties in the data. For the time points contain ties, e.g.

(Tis,dis),isSjS,TjT(T_{i_s}, d_{i_s}), i_s \in S \forall j \in S, T_{j} \equiv T

, we re-arranged the data in a manner that those with d=1 are ordered ahead of those with d=0. As d=0 indicates the data point is right censored, such procedure is trivial.

Usage

kmc.clean(kmc.time, delta)

Arguments

kmc.time

Non-negative real vector. The observed time.

delta

0/1 vector. Censoring status indictator, 0: right censored; 1 uncensored

Value

A list with the following components:

kmc.time

The cleaned observed time.

delta

The cleaned censoring status indictator, 0: right censored; 1 uncensored

Author(s)

Yifan Yang([email protected])

References

Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics Online ISSN 1613-9658.

Examples

x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) 
kmc.clean(x, d)

Calculate NPMLE with constriants for right censored data

Description

This function calculate the Kaplan-Meier estimator with mean constraints recursively.

El(F)=i=1n(ΔF(Ti))δi(1F(Ti))1δiEl(F)=\prod_{i=1}^{n}(\Delta F(T_i))^{\delta_i}(1-F(T_i))^{1-\delta_i}

with constraints

ig(Ti)ΔF(Ti)=0,,i=1,2,...\sum_i g(T_i)\Delta F(T_i)=0,\quad,i=1,2,...

It uses Lagrange multiplier directly.

Usage

kmc.solve(x, d, g, em.boost = T, using.num = T, using.Fortran =
                 T, using.C = F, tmp.tag = T, rtol = 1e-09, control =
                 list(nr.it = 20, nr.c = 1, em.it = 3),...)

Arguments

x

Non-negative real vector. The observed time.

d

0/1 vector. Censoring status indictator, 0: right censored; 1 uncensored

g

list of contraint functions. It should be a list of functions list(f1,f2,...)

em.boost

A logical value. It determines whether the EM algorithm is used to get the initial value, default=TRUE. See 'Details' for EM control.

using.num

A logical value. It determines whether the numeric derivatives is used in iterations, default=TRUE.

using.Fortran

A logical value. It determines whether Fortran is used in root solving, default=F.

using.C

A logical value. It determines whether to use Rcpp in iteraruib, default=T. This option will promote the computational efficiency of the KMC algorithm. Development version works on one constraint only, otherwise it will generate a Error information. It won't work on using.num=F.

tmp.tag

Development version needs it, keep it as TRUE.

rtol

Tolerance used in rootSolve(multiroot) package, see 'rootSolve::multiroot'.

control

A list. The entry nr.it controls max iterations allowed in N-R algorithm default=20; nr.c is the scaler used in N-R algorithm default=1; em.it is max iteration if use EM algorithm (em.boost) to get the initial value of lambda, default=3.

...

Unspecified yet.

Details

The function check_G_mat checks whether the solution space is null or not under the constraint. But due to the computational complexity, it will detect at most two conditions.

Value

A list with the following components:

loglik.ha

The log empirical likelihood without constraints

loglik.h0

The log empirical likelihood with constraints

"-2LLR"

The -2 Log empirical likelihood ratio

phat

ΔF(Ti)\Delta F(T_i)

pvalue

The p-value of the test

df

Degree(s) of freedom. It equals the number of constraints.

lambda

The lambda is the Lagrangian multiplier described in reference.

Author(s)

Mai Zhou([email protected]), Yifan Yang([email protected])

References

Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics Online ISSN 1613-9658.

See Also

plotkmc2D.

Examples

# positive time
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
# status censored/uncensored
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1)               

#################
# dim =1
#################

f <- function(x) {x-3.7}                     # \sum f(ti) wi ~ 0 
g <- list(f=f) ;                             #define constraint as a list

kmc.solve(x, d, g) ;                         #using default
kmc.solve(x, d, g, using.C=TRUE) ;           #using Rcpp

#################
# dim =2
#################

myfun5 <- function( x)  { 
 x^2-16.5
} 

g <- list( f1=f,f2=myfun5) ;                 #define constraint as a list

re0 <- kmc.solve(x,d,g);

###################################################
# Print Estimation and other information 
# with option: digits = 5
###################################################

#' Print kmc object
#' 
#' @param x kmc object
#' @param digits minimal number of significant digits, see print.default.
f_print <- function(x, digits = 5){
  cat("\n----------------------------------------------------------------\n")
  cat("A Recursive Formula for the Kaplan-Meier Estimator with Constraint\n")
  cat("Information:\n")
  cat("Number of Constraints:\t", length(x$g), "\n")
  cat("lamda(s):\t", x$lambda,'\n');
  cat("\n----------------------------------------------------------------\n")
  names <- c("Log-likelihood(Ha)", "Log-likelihood(H0)",
  "-2LLR", paste("p-Value(df=", length(x$g), ")",sep = ""))
  re <- matrix(c(x[[1]], x[[2]], x[[3]], 1 - pchisq(x[[3]],
  length(x$g))), nrow = 1)
  colnames(re) <- names
  rownames(re) <- "Est"
  print.default(format(re, digits = digits), print.gap = 2,
                quote = FALSE, df = length(x$g))
  cat("------------------------------------------------------------------\n")
}

f_print(re0)

Calculate NPMLE with constriants for right censored data

Description

This function calculate the Kaplan-Meier estimator with mean constraints recursively.

El(F)=i=1n(ΔF(Ti))δi(1F(Ti))1δiEl(F)=\prod_{i=1}^{n}(\Delta F(T_i))^{\delta_i}(1-F(T_i))^{1-\delta_i}

with constraints

ig(Ti)ΔF(Ti)=0,,i=1,2,...\sum_i g(T_i)\Delta F(T_i)=0,\quad,i=1,2,...

It uses Lagrange multiplier directly. This function is a lite version of kmc.solve.

Usage

kmc.solve(x, d, g, rtol = 1e-09, 
          control = list(nr.it = 20, nr.c = 1, em.it = 3),...)

Arguments

x

Non-negative real vector. The observed time.

d

0/1 vector. Censoring status indictator, 0: right censored; 1 uncensored

g

list of contraint functions. It should be a list of functions list(f1,f2,...)

rtol

Tolerance used in rootSolve(multiroot) package, see 'rootSolve::multiroot'.

control

A list. The entry nr.it controls max iterations allowed in N-R algorithm default=20; nr.c is the scaler used in N-R algorithm default=1; em.it is max iteration if use EM algorithm (em.boost) to get the initial value of lambda, default=3.

...

Unspecified yet.

Details

The function check_G_mat checks whether the solution space is null or not under the constraint. But due to the computational complexity, it will detect at most two conditions.

Value

A list with the following components:

loglik.ha

The log empirical likelihood without constraints

loglik.h0

The log empirical likelihood with constraints

"-2LLR"

The -2 Log empirical likelihood ratio

phat

ΔF(Ti)\Delta F(T_i)

pvalue

The p-value of the test

df

Degree(s) of freedom. It equals the number of constraints.

lambda

The lambda is the Lagrangian multiplier described in reference.

Author(s)

Mai Zhou([email protected]), Yifan Yang([email protected])

References

Zhou, M. and Yang, Y. (2015). A recursive formula for the Kaplan-Meier estimator with mean constraints and its application to empirical likelihood Computational Statistics Online ISSN 1613-9658.

See Also

plotkmc2D.

Examples

# positive time
x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
# status censored/uncensored
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1)               

#################
# dim =1
#################

f <- function(x) {x-3.7}                  # \sum f(ti) wi ~ 0 
g <- list(f=f)                            #define constraint as a list

kmc.solve(x, d, g)                        #using default
kmc.solvelite(x, d, g)                    #using default

#################
# dim =2
#################

myfun5 <- function( x)  { 
 x^2-16.5
} 
g <- list(f1=f, f2=myfun5) ;              #define constraint as a list
re0 <- kmc.solvelite(x, d, g);

###################################################
# Real Data: Stanford Heart Transplant
###################################################
# library(survival)
# stanford5 <- stanford2[!is.na(stanford2$t5), ]
# y=log10(stanford5$time)
# d <- stanford5$status
# g <- list(f = function(x) {(x-2.4 < 0.) - 0.5} )  # \sum I(x_i<2.4) wi = 0.5)
# kmc.solvelite(y, d, g)
##################################
## Example (Cont'd): -2LLR Curve
##################################
# iters = 100
# scale = 2.
# starter = 2.5
# result = rep(0., 20)
# observe_range = starter + (1:iters)/(iters * scale)
# for (i in 1:iters){
#         g <- list(f = function(x) {(x- (starter + i/(iters * scale)) < 0.) - 0.5})
#         result[i] = kmc.solvelite(y, d, g)$`-2LLR`
# }
# plot(x = observe_range, result, xlab = 'time', ylab = '-2LLR', type = 'b')

Plot the contour plot of log-likelihood around the H0 (dim=2).

Description

Given a kmc object, this function will produce contour plot if there were two constraints.

Usage

plotkmc2D(resultkmc, flist=list(f1=function(x){x}, f2=function(x){x^2}), 
          range0=c(0.2, 3,20))

Arguments

resultkmc

S3 Object of kmcS3.

flist

list of two functions,flist=list( f1=function( x ) x ,f2=function( x ) x^2 )

range0

A vector that helps to determine the range of the contour plot, i.e (center[1]-range0[1], center[2]-range0[2]) to (center+range0[1], center[2]+range0[2]). The third parameter defines the number of grids would be used.

Value

X

x.grid

Y

y.grid

Z

grid value

Author(s)

Yifan Yang([email protected])

Examples

x <- c( 1, 1.5, 2, 3, 4.2, 5.0, 6.1, 5.3, 4.5, 0.9, 2.1, 4.3) 
d <- c( 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1) 

f<-function( x) { x-3.7} 

myfun5 <- function( x)  { 
 x^2-16.5
} 
# construnct g as a LIST!

g=list( f1=f,f2=myfun5) ; 
kmc.solve( x,d,g) ->re0;

#plotkmc2D(re0) ->ZZ; # run this to generate contour plot
#Advanced PLOT option using ggplot2: not run
#library(reshape2)
#volcano3d <- melt(ZZ$Z)
#names(volcano3d) <- c("x", "y", "z")

#volcano3d$x <- ZZ$X[volcano3d$x];
#volcano3d$y <- ZZ$Y[volcano3d$y];

#### Plot: use ggplot2 ####
#library(ggplot2)
# v <- ggplot(volcano3d, aes(x, y, z=z));
# v + geom_tile(aes(fill = z)) + 
#   stat_contour()+
#   scale_fill_gradientn("Custom Colours",colours=grey.colors(10));

#### Plot: use qplot ####
#qplot(x, y, z = z, data = volcano3d, stat = "contour", geom = "path")