skip to content

Cambridge Immunology Network



Supervisor: Chris Wallace

The analysis of Flow Cytometry data for immune-phenotype to genotype association studies in the context of Type 1 Diabetes.

I work on methods of visualising and analysing biological data. In particular computationally efficient unsupervised and supervised techniques for finding clusters in data. In a nutshell, the objective of clustering is to assign elements to different groups such that elements within groups are more similar to each other than across groups. Clusters may represent populations of cells such as in flow cytometry data, or different genotypes such as in SNP array or quantitative PCR data. However as the level of uncertainty (data quality (signal-to-noise ratio) and prior knowledge (expected number of clusters)) differs greatly between experimental data sets, clustering needs to take on a more probabilistic approach. One of the important aspects of my work is to allow for this uncertainty in assigning elements to clusters by giving probabilistic weights of cluster membership (a number between 0 and 1) instead of making discrete calls (0 or 1). Model-based clustering (i.e. fitting a mixture of distributions) is particularly suited to this task. After clustering the data, statistical analysis is then conducted, often using generalised linear regression, to see whether we detect significant statistical association with Type 1 Diabetes or another response variable of interest. One guiding principle I like to keep in mind in science: There is no right or wrong model, merely one that is more useful for the job at hand.

Association of CD25 expression with IL2RA genotype

I started off my PhD by reanalysing, using computational methods (k-means, mixtures of univariate distributions), previous published data to test the correlation between CD25 expression on naive and memory T cells with regulating SNPs in the IL2RA gene. I found that automatic methods were able to outdo manual methods by improving the reproducibility of the results and the strength of the correlation but did poorly when the data did not fit the prior assumptions (e.g. number of clusters larger than expected). But this seeming weakness of the automatic method is in fact one of its strength as it makes for an excellent outlier detection tool, capable of spotting cases were the data does not fit the expected model.

 Nikolas  Pontikos
Not available for consultancy


Person keywords: 
data visualisation
flow cytometry
machine learning
statistical analysis