NONPARAMETRIC TESTS FOR ORDERED DIVERSITY IN A GENOMIC SEQUENCE
Keywords:
Gini-Simpson diversity index, Hamming distance, Highdiemensional qualitative data models, U-statistics.Abstract
In genomics (SNP and RNA amino acid studies), typically, we encounter enormously
large dimensional qualitative categorical data models without an ordering
of the categories, thus preempting the use of conventional measures of dispersion
(variation or diversity) as well as other measures which assume some latent trait
variable(s). The Gini-Simpson diversity measure, often advocated for diversity
analysis in one-dimensional models, has been adapted to formulate measures of
diversity and co-diversity based on the Hamming distance in the multidimensional
setup. Based on certain (molecular) biologically interpretable monotone diversity
perspectives, an ordering of the Gini-Simpson measures across the genome (positions)
is formulated in a meaningful way. Motivated by this feature, nonparametric
inference for such ordered measures is considered here, and their applications
stressed.