CifX

A robust mmCIF file parser for human. Legacy PDB format is not supported.

安装

安装ANARCI，参考：https://github.com/oxpig/ANARCI
执行：

pip install -e .

Quick start

View in Jupyter notebook: tutorial.ipynb

1. 基础读取与层级访问

读取文件

使用 cifx.parser 中的 parse 函数读取 mmCIF 文件（例如 examples/1bzq.cif）。

from cifx.parser import parse

stucture = parse('examples/1bzq.cif')

查看结构层次

直接打印 stucture 对象可以查看其包含的**模型（Model）和链（Chain）**的概览信息，包括链的类型、长度、序列预览，以及链的 chain_id 和 auth_chain_id。

print(stucture)

输出示例:

Structure[1bzq]
  * Model[1](12 chains)
    - Chain[A/A](type=Polymer.PeptideL, length=124, seq=KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHES...)
    - Chain[E/K](type=Polymer.PeptideL, length=124, seq=QVQLVESGGGLVQAGGSLRLSCAASGYAYTYIYMGWFRQAPGKEREGVAA...)
    ...
    - Chain[I/A](type=NonPolymer, length=1, seq= PO4 )
...

层级访问

CifX 支持使用索引（index）或 ID（如 chain_id, res_id, atom_name）进行层级访问：

层级	访问方式	示例
获取 Model	索引	`stucture[0]` 或 `stucture[0]`
获取 Chain	索引或 `chain_id`	`stucture[0][0]` 或 `stucture[0]['A']`
获取 Residue	索引或 `res_id`	`stucture[0][0][0]` 或 `stucture[0][0]['1']`
获取 Atom	索引或 `atom_name`	`stucture[0][0]['1'][0]` 或 `stucture[0][0]['1']['CA']`

示例: 获取第一个模型中的 'A' 链，以及链 'E' 的 chain_id 和 auth_chain_id。

# 获取第一条链
chain_A = stucture[0][0] 
# Chain[A/A](type=Polymer.PeptideL, length=124, seq=KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHES...)

# 获取链的 ID
chain_E = stucture[0]['E']
print(chain_E.chain_id, chain_E.auth_chain_id)
# 输出: E K

2. 结构遍历

结构对象支持迭代。您可以利用嵌套循环遍历结构中的所有元素：

for model in stucture:
    for chain in model:
        for residue in chain:
            for atom in residue:
                print(chain.chain_id, atom.atom_name)

3. 处理缺失残基

和BioPython或gemmi不同，CifX 会自动读取所有缺失坐标的残基，并将其标记为 is_missing_residue=True。忽略缺失坐标的残基，在某些场景下是不合理的（例如训练生成模型时，提供缺失残基的结构，模块可能会学到不合理的构象）。

stucture = parse('examples/6QG8.cif')

for residue in stucture[0]['A']:
    print(residue, residue.is_missing_residue)

输出示例（部分）:

Chain[A].Residue[31](ASP) False
Chain[A].Residue[32](ALA) True  # 缺失残基
Chain[A].Residue[33](GLY) True  # 缺失残基
Chain[A].Residue[47](GLU) False
...

跳过缺失残基

使用 get_residues(skip_missing_residues=True) 方法可以在遍历时跳过缺失的残基：

for residue in stucture[0]['A'].get_residues(skip_missing_residues=True, skip_nonstandard_residues=False):
    print(residue, residue.is_missing_residue)

获取序列

获取链的氨基酸序列：

stucture[0]['A'].get_sequence(skip_missing_residues=False, skip_nonstandard_residues=False)

4. 处理原子和残基变体

CifX 会自动处理多构象原子和点突变残基，并将它们封装为特定的对象，同时默认选取第一个构象作为属性代表，以实现无感操作。

多构象原子 (DisorderAtom)

多构象原子被解析为 DisorderAtom 对象，它继承了 Atom 的所有属性和方法。

stucture[0]['A']['163'].atom_view()

输出示例:

Residue[163](ASN)
  - Atom[N]
  - DisorderAtom[CA](CA.A, CA.B) # 包含 A 和 B 两种构象
  ...

点突变残基 (PointMutationResidue)

包含点突变（即在同一位点包含不同残基类型）的残基被封装为 PointMutationResidue 对象，默认选择第一个残基。

stucture = parse('examples/1EN2.cif')

print(stucture[0]['A']['10'])
# 输出: PointMutationResidue[10](SER, GLY)

5. 界面与接触分析

CifX 提供了基于 KDtree 的高效方法来查找分子间的接触残基或原子。通常认为，两个残基之间任意原子距离小于 cutoff（例如 4.5 Å）即可认为存在互作。

from cifx.tools.contact import get_interface_residues, get_contact_atoms

stucture = parse('examples/1bzq.cif')
chain_A = stucture[0]['A']
chain_F = stucture[0]['F']

# 获取链 A 和链 F 之间互作的残基对
interface_residues = get_interface_residues(chain_A, chain_F, distance_cutoff=4.5)
# [(Chain[A].Residue[115](TYR), Chain[F].Residue[1109](GLY)), ...]

# 获取链 A 和链 F 之间接触的原子对
contact_atoms = get_contact_atoms(chain_A, chain_F, distance_cutoff=4.5)
# [(Atom[O], Atom[CE1]), (Atom[C], Atom[CE1]), ...]

6. 免疫分子注释

cifx.tools.immune 模块可以对结构进行免疫学分子（如抗体、TCR）的注释，自动识别链类型并提供 IMGT 编号方案。

from cifx.tools.immune import annotate_structure

# 注释结构
stucture = annotate_structure(stucture)

# 查看注释后的链信息
print(stucture[0]['E'])
# 输出示例: Chain[E/K][Antibody: H](type=Polymer.PeptideL, length=124, seq=...)

获取 Domain 和 Region 信息

注释后，可以在链对象上访问免疫学相关信息，例如 V domain 的详细信息：

# 获取 V domain 信息
stucture[0]['E'].imgt_domains
# 输出示例: [{'start_res_id': '801', 'end_res_id': '921', 'chain_type': 'H', 'species': 'alpaca', ...}]

# 获取 CDR 区域的残基列表
stucture[0]['E'].get_cdr_residues(cdr='cdr1', domain_index=0)
# 输出示例: [Chain[E].Residue[826](GLY)[IMGT: 27  | CDR1], ...]

# 获取 FR 区域的残基列表
stucture[0]['E'].get_framework_residues(fr='all', domain_index=0)

# 获取单个残基的区域信息
print(stucture[0]['E'][0].imgt_region)  # fr1
print(stucture[0]['E'][0].imgt_species) # alpaca

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
examples		examples
src/cifx		src/cifx
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
tutorial.ipynb		tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CifX

安装

Quick start

1. 基础读取与层级访问

读取文件

查看结构层次

层级访问

2. 结构遍历

3. 处理缺失残基

跳过缺失残基

获取序列

4. 处理原子和残基变体

多构象原子 (DisorderAtom)

点突变残基 (PointMutationResidue)

5. 界面与接触分析

6. 免疫分子注释

获取 Domain 和 Region 信息

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CifX

安装

Quick start

1. 基础读取与层级访问

读取文件

查看结构层次

层级访问

2. 结构遍历

3. 处理缺失残基

跳过缺失残基

获取序列

4. 处理原子和残基变体

多构象原子 (DisorderAtom)

点突变残基 (PointMutationResidue)

5. 界面与接触分析

6. 免疫分子注释

获取 Domain 和 Region 信息

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages