-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Greetings Tencent Team!
Thank you for developing and releasing open-source tFold protein folding model, especially in such non-trivial task as antibody and nanobody development.
As for my problem, I'm currently working on a project aimed to construct a full-stack protocol/pipeline capable of completely de novo constucting nanobodies to a certain antigens (from Fw to CDR design). In my project I use tFold-Ab and tFold-Ag models for virtual screening purposes to assess the binding potential of custom nanobody sequences dataset (>5000 structures designed). In general, the results in performance and usability aspects were mediocre (I was very delighted tFold can run on CPU-cluster with a good performance on Intel Xeon Platinum processor), but I've faced some problems while using your software:
-
First things first, you installation and usage recepie seems not complete and superficial, because after the installation of all the uniref, colabfold and pdb100 protein databases, even adjustment of the paths in the gen_msa.py file to my personal installation directories couldn't solve the problem of referencing the colabfold DB, resulting in abberant appending of backslash ("/") during the MSA production of an antigen. Also, I'd like to guide the future users to allocate num_workers to 80-90% of all avaliable CPU thread on your machine for performance boosting of MMSeqs2. Another problem with this databases were the installation process and aria function, that was kinda cutoff, so the downloading was very slow, until I mannualy changed the aria settings (it was solved with this changes in setup_databases.sh file:
aria2c --max-connection-per-server="$ARIA_NUM_CONN" -s 16 --allow-overwrite=true -o "$FILENAME" -d "$DIR" "$URL" && set -e && return 0). -
About the functionality of tFold-Ab and Ag itself. As far as I understand, the overall pipeline of tFold model inherit the function of AF2.3-M model, but lack the same output in understanding the accuracy of the finale model. AF2.3 and AF3 has their own .json output encompassing all the LDDT, pTM and ipTM values across the complete structure. This feels frustrating to me, because AF-derived models have an errorish ipTM calculation method that was reported in Dunbrack publication (https://pmc.ncbi.nlm.nih.gov/articles/PMC11844409/). But that wouldn't be a problem if I could read those .json parameters and reassess the accuracy of my predicted structures. If you could help me integrate Dunbrack's github solvation of this problem manually, I'd very grateful.
-
The devastating problem I faced during my nanobody design was confusing and abberant atomistic structures of predicted nanobodies and antigens, both in standalone prediction mode (Ab) and Nb-Ag mode (Ag), that depicting differently in different molecular GUI (PyMol, SAMSON etc.), resulting in non-realistic valencies or breackages. Fisrtly, I thought that de novo design based on scaffold (side-chain-free nanobodies) lacks some aminoacid information in variable regions (predominantly in CDR-loops) as long as scaffolds do not reach 128 AA in length (ranging in 105-119). But the re-evaluation of my designed sequences with NanoBodyBuilder2 didn't face such a problem, I think, because of the renumbering process built in NB2, so the folding process do not pretend to "pull" incomplete sequence on the trained model. May be, integration of something like renumbering schemes as ANARCI would solve this problem and adding CONNECT records to predicted pdb files, so different visualisators won't guess the bond order; I'd be very grateful if you could help me with this, becuase integration of the tFold with such rough artifacts is questionable and my thoughts now are directed towards opting for another models. The pictures of artifacts in structures are attached.
With all the respect,
Danil Kotelnikov

