Regression analysis and introduction to linear models. Topics:
Multiple regression, analysis of covariance, least square means,
logistic regression, and non-linear regression. This course
includes a one hour computer lab and emphasizes hands-on
applications to datasets from the health sciences.
Advanced presentation of statistical methods for comparing
populations and estimating and testing associations between
variables. Topics: Point estimation, confidence intervals,
hypothesis testing, ANOVA models for 1, 2 and k way
classifications, multiple comparisons, chi-square test of
homogeneity, Fisher's exact test, McNemar's test, measures of
association, including odds ratio, relative risks, Mantel-Haenszel
tests of association, and standardized rates, repeated measures
ANOVA, simple regression and correlation. This course includes a
one-hour computing lab and emphasizes hands-on applications to
datasets from the health-related sciences.
The purpose of this course is to familiarize students with PC-based statistical computing applications for public health. It is a comparison course for EEH 505: Introduction to Biostatistics. The course will develop basic skills in the use of a statistical package through classroom demonstrations and independent lab assignments that will complement the material covered in EEH 505. The course will emphasize data definition, verification, descriptive and inferential statistics and graphical presentation. The course should familiarize the students with the use of a statistical package and give them the skills needed for effective data management, data manipulation, an data analysis at a basic level.
NOTE: Concurrent registration in prerequisite is
Statistical tools for analyzing experiments involving genomic data. Topics: Basic genetics and statistics, linkage analysis and map construction using genetic markers, association studies, Quantitative Trait Loci analysis with ANOVA, variance components analysis and marker regression (including multiple and partial regression), QTL mapping with interval mapping and composite interval mapping, LOD test, supervised and unsupervised methods for gene expression microarray data across multiple conditions.
This course is intended for students interested in statistical computing. The goal of this course is to enable students to do essential computations and statistical analysis using SAS and R software. Topics include descriptive statistics, graphical presentation, estimation, hypothesis testing, sample size and power; emphasis on learning statistical methods and concepts through hands-on experience with real data.
Introduces alternate methods for designing and analyzing
comparative studies that may be used when some or all of the
assumptions underlying the usual parametric method are
questionable. Topics: 1- , 2- , and k-sample location problems,
randomized block and repeated measures designs, the independence
problem, rank transformation tests, randomization tests, the
2-sample dispersion problem, and other topics as time
This course provides students with useful methods for analyzing
categorical data. Topics: Cross-classification tables, tests for
independence, log-linear models, Poisson regression, ordinal
logistic regression, and multinomial regression for the logistic
Provides student with probability and distribution theory necessary for study of statistics. Topics: axioms of probability theory, independence, conditional probability, random variables, discrete and continuous probability distributions, functions of random variables, moment generating functions, Law of Large Numbers and Central Limit Theorem.
Introduces principles of statistical inference. Classical methods of estimation, tests of significance, and Neyman-Pearson Theory of testing hypotheses, maximum likelihood methods, and Bayesian statistics are introduced and developed.
Since the completion of the human genome project, there is a
burgeoning field of new applications for statistics involving high
throughput experiments designed to gather large amounts of
information on biological systems. This course is focused on
discussing the wide array of approaches and technologies
implemented to gather this information and the statistical issues
involved from initial data processing steps to end stage research
objectives. Specifically, time permitting, the technologies
we will examine include two dimensional protein gel
electrophoresis, protein mass spectrometry, and several flavors of
microarray experiments. Much of the work for the course will
involve analyzing data sets from class and form the text using the
Introduction to fundamental principles and planning
techniques for designing and analyzing statistical
experiments. Recommended for students in applied fields. Topics:
Justification for randomized controlled clinical trials, methods of
randomization, blinding and placebos, ethical issues, parallel
groups design, crossover trials, inclusion of covariates,
determining sample size, sequential designs, interim analyses,
repeated measures studies.
Introduction to theory and practice of sample surveys
involving collection of statistical data from planned
Introduces factorial experiments, fractional factorial
experiments, confounding, lattice designs, various incomplete block
designs, efficiency of experimentation, and problems of design
Deals with statistical methods for estimation and testing
hypotheses when samples are observed and analyzed
This course presents the topic of data mining from a statistical perspective, with attention directed towards both applied and theoretical considerations. An emphasis will be placed on supervised learning methods. Topics include: linear and logistic regression, discriminant analysis, shrinkage methods, subset selection, dimension reduction techniques, classification and regression trees, ensemble methods, neural networks, and random forests. Model selection and estimation of generalization error will be emphasized. Considerations and issues that arise with high-dimensional (N<<p) applications will be highlighted. Applications will be presented in R to illustrate methods and concepts.
This course presents the topic of data mining from a statistical perspective, with attention directed towards both applied and theoretical considerations. An emphasis will be placed on unsupervised learning methods, especially those designed to discover and model patterns in data. Applications to high-dimensional data (N<<p) and big data (N>>p) will be highlighted. Topics include: market basket analysis, hierarchical and center-based clustering, self organizing maps, factor analysis, computer vision, eigenfaces, data visualization, graphical models. Applications will be presented in Matlab and R to illustrate methods and concepts.
For graduate students who have had an introduction to
probability theory and advanced calculus. Concepts,
properties, basic theory, and applications of stochastic
Introduction to methods for analyzing longitudinal and time
series data. Topics: Random coefficient regression models, growth
curve analysis, hierarchical linear models, general mixed models,
autoregressive and moving average models for time series data, and
the analysis of cross-section time series data.
The Bayesian approach to statistical design and analysis can be
viewed as a philosophical approach or as a procedure-generator. The
use of Bayesian design and analysis is burgeoning. In this
introduction to Bayesian methods, we consider basic examples of
Bayesian thinking and formalism on which more complicated and
comprehensive approaches are built. These include adjusting
estimates using related information, the use of Bayes Factors in
testing of hypotheses, the relationship of the prior and posterior
distributions, and the key steps in a Bayesian analysis. We
consider the Bayesian approach that requires a data likelihood (the
sampling distribution) and a prior distribution. From these, the
posterior distribution can be computed and used to inform
statistical design and analysis. Applications of this technique are
It can be said there are no new problems in statistics, only new applications. Since the completion of the human genome project, there is a burgeoning field of new applications for statistics involving high throughput experiments designed to gather large amounts of information on biological systems. This course is focused on discussing the wide array of approaches and technologies implemented to gather this information and the statistical issues involved from initial data processing steps to end stage research objectives. Specifically, time permitting, the technologies we will examine include two dimensional protein gel electrophoresis, protein mass spectrometry, several flavors of microarrays, and Xerogel sensor experiments.
We will use the text, "Bioinformatics and Computational Biology
Solutions Using R and Bioconductor." Much of the work for the
course will involve analyzing data sets from class and from the
text using the R language.
Provides an advanced course on the use of life tables and
analysis of failure time data. Topics: Use of Kaplan-Meier survival
curves, use of log rank test, Cox proportional hazards model,
evaluating the proportionality assumption, dealing with
non-proportionality, stratified Cox procedure, extension to
time-dependent variables, and comparison with logistic regression
Presents methods for analyzing multiple outcome variables
simultaneously, and for classification and variable reduction.
Topics: Multivariate normal distribution, simple, partial, and
multiple correlation; Hotelling's T-squared, multivariate analysis
of variance, and general linear hypothesis, and discriminant
analysis, cluster analysis, principal components analysis, and