Protein hydration and crystallization condition prediction
This ongoing project focuses on protein hydration, crystallization condition prediction, and protein representation learning. I use large language models to extract structured crystallization conditions from free-text Protein Data Bank records, filter out incomplete entries, and build a curated protein sub-database with both structural data and complete experimental conditions.
On top of that dataset, I build graph convolutional models for predicting major crystallization variables such as PEG concentration and polymerization degree. A current direction is to inject hydration information into existing protein embedding models so they can capture richer biophysical context.
