My research interests spread over various fields involving massive information, such as database systems and data mining, bioinformatics, information retrieval, machine learning, etc. I am especially interested in designing efficient methods, either theoretical or experimental, to improve processing time and reduce storage requirements for information.
- I am interested in designing efficient
algorithms for biological sequence analysis problems, such as multiple
sequence alignment problems, constrained multiple sequence alignment problems,
discovery of repetitive patterns in DNA sequences, discovery of patterns with
wildcards in strings etc. I've published quite a few conference and journal
papers on these problems. Numerous technologies, such as dynamic programming,
greedy algorithm, heuristic search, suffix tree, parallel algorithm etc., have
been customized and then applied to these problems.
- Different from regular classification methods,
we map biomedical documents into new feature spaces with MeSH ontology. We
then develop various kinds of classification methods based on the properties
of the domain ontology to improve the classification accuracies. Many Machine
Learning and statistical techniques are involved in this project. One paper
was published for this topic.
- In this project, many state-of-art shortest
path algorithms are investigated and adjusted to be applied to the problem.
One paper was published for this topic.
- This data mining project aims to efficiently
find frequent patterns with wildcards in data streams, especially when the
alphabetic size of the data stream is large. Two papers were published for
this project. One paper is in preparation.