Protein hydration and crystallization condition prediction

This ongoing project focuses on protein hydration, crystallization condition prediction, and protein representation learning. I use large language models to extract structured crystallization conditions from free-text Protein Data Bank records, filter out incomplete entries, and build a curated protein sub-database with both structural data and complete experimental conditions.

On top of that dataset, I build graph convolutional models for predicting major crystallization variables such as PEG concentration and polymerization degree. A current direction is to inject hydration information into existing protein embedding models so they can capture richer biophysical context.