Molecular representation learning for GNNs and LLMs

This line of work explores how molecular structure can be represented more faithfully for both graph neural networks and large language models. One direction studies Motif-driven molecular graph representation learning, where motifs are decomposed into functional and structural encodings to improve expressive power in GNNs.

Another direction studies From Graphs to Tokens, which introduces a substructure-aware tokenizer for molecular LLMs. Together, these projects aim to make molecular models more expressive, reusable, and better at generalizing to unseen compounds.