Dan He - Research

My research interests spread over various fields involving massive information, such as database systems and data mining, bioinformatics, information retrieval, machine learning, etc. I am especially interested in designing efficient methods, either theoretical or experimental, to improve processing time and reduce storage requirements for information.

Design efficient algorithms for Bioinformatics and Computational Biology problems

I am interested in designing efficient algorithms for biological sequence analysis problems, such as multiple sequence alignment problems, constrained multiple sequence alignment problems, discovery of repetitive patterns in DNA sequences, discovery of patterns with wildcards in strings etc. I've published quite a few conference and journal papers on these problems. Numerous technologies, such as dynamic programming, greedy algorithm, heuristic search, suffix tree, parallel algorithm etc., have been customized and then applied to these problems.

Biomedical Literature Classification

Different from regular classification methods, we map biomedical documents into new feature spaces with MeSH ontology. We then develop various kinds of classification methods based on the properties of the domain ontology to improve the classification accuracies. Many Machine Learning and statistical techniques are involved in this project. One paper was published for this topic.

One-To-Some shortest path algorithm

In this project, many state-of-art shortest path algorithms are investigated and adjusted to be applied to the problem. One paper was published for this topic.

Mining frequent patterns with wildcards in data streams (Supported by NSF)

This data mining project aims to efficiently find frequent patterns with wildcards in data streams, especially when the alphabetic size of the data stream is large. Two papers were published for this project. One paper is in preparation.

Optimize classifiers for biased data sets in data streams