Skip to content

Commit 06bb17b

Browse files
committed
v0.1.7 upload
- revised model classes KRRModel, KRRLocalModel, SORFModel, SORFLocalModel, and FJKModel. Most notable change is that they now all do hyperparameter optimization using leave-one-out errors and do two-tier optimization of lambda and sigma (similarly to MSORF models). - MSORF can be used with PCA, can also now be trained on datasets too large to be stored in RAM at once. - dimensionality of forces can now be changed to value different from nCartDim=3 (introduced for alchemical derivatives). - importing from qml2.ensemble no longer requires installation of rdkit, pyscf, etc. - added constructor from ASE objects. - minor bugfixes.
1 parent 8907496 commit 06bb17b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+3232
-1067
lines changed

.pre-commit-config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ repos:
7878
stages: [commit-msg]
7979
args: []
8080

81-
- repo: https://github.com/kieran-ryan/pyprojectsort
82-
rev: v0.4.0
81+
- repo: https://github.com/tox-dev/pyproject-fmt
82+
rev: v2.5.0
8383
hooks:
84-
- id: pyprojectsort
84+
- id: pyproject-fmt

README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,21 +18,21 @@ Some parts of the code depend on additional dependencies that can be installed w
1818

1919
- `orb_ml` - for FJK (machine learning from orbital information).
2020

21-
- `msorf` - for MSORF (everything in `qml2.multilevel_sorf`).
21+
- `models` - for hyperparameter optimization procedures in `qml2.models` and `qml2.multilevel_sorf`.
2222

2323
- `morfeus` - for applications dependend on `morfeus-ml` package (everything related to conformer ensemble generation).
2424

2525
- `torch` - Torch functionality (efficiency questionable right now TBH).
2626

27-
For example, to use the `orb_ml` and `msorf` optional dependecy flags in your installation use
27+
For example, to use the `orb_ml` and `models` optional dependecy flags in your installation use
2828

2929
```bash
30-
pip install .[orb_ml,msorf]
30+
pip install .[orb_ml,models]
3131
```
3232
or, if `makefile` is installed,
3333

3434
```
35-
make install OPT=[orb_ml,msorf]
35+
make install OPT=[orb_ml,models]
3636
```
3737

3838
## :clipboard: Testing
@@ -72,6 +72,9 @@ This will create `manual.html` file that can be opened with an Internet browser.
7272

7373
`QML2_AVOID_NUMBA_NUMPY_PARALLELIZATION` - some Numba routines in the code call in parallel Numpy routines, which creates problems in some setups (e.g. when both Numba and Numpy try to parallelize over a large number of threads without taking each other into account). Setting this environmental variable to `1` disables Numba parallelization in such routines, leaving them to be parallelized exclusively with Numpy.
7474

75+
`QML2_AVOID_SORF_NUMBA_PARALLELIZATION` - setting this environmental variable to `1` disables Numba parallelization over feature vectors for SORF routines in `qml2.kernels.sorf` and `qml2.kernels.gradient_sorf` (also referred to in the corresponding `qml2.models` classes). Helps if reductors are used in systems where Numba and Numpy try to parallelize simultaneously (see `QML2_AVOID_NUMBA_NUMPY_PARALLELIZATION`).
76+
77+
7578
### Experimental
7679

7780
`QML2_DEFAULT_JIT` - setting to `NUMBA` (default) or `TORCH` (both are case insensitive) determines whether Numba or TorchScript JIT compilation is used. Also see `jit_interfaces.set_default_jit`.
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import csv
2+
import random
3+
import tarfile
4+
5+
import numpy as np
6+
7+
from qml2 import Compound
8+
from qml2.models.loss_functions import MAE
9+
from qml2.models.sorf import SORFModel
10+
11+
xyzs = []
12+
energies = []
13+
14+
training_set_size = 2001
15+
test_set_size = 1000
16+
num_mols = training_set_size + test_set_size
17+
18+
with open("../../tests/test_data/hof_qm7.txt") as csvfile:
19+
reader = csv.reader(csvfile, delimiter=" ")
20+
all_rows = list(reader)
21+
random.shuffle(all_rows)
22+
for row in all_rows[:num_mols]:
23+
xyzs.append(row[0])
24+
energies.append(float(row[1]))
25+
26+
energies = np.array(energies)
27+
28+
compounds = []
29+
with tarfile.open("../../tests/test_data/qm7.tar.gz") as tar:
30+
for xyz_name in xyzs:
31+
xyz = tar.extractfile(xyz_name)
32+
comp = Compound(xyz=xyz)
33+
compounds.append(comp)
34+
35+
train_compounds = compounds[:training_set_size]
36+
test_compounds = compounds[training_set_size:]
37+
38+
train_quantities = energies[:training_set_size]
39+
test_quantities = energies[training_set_size:]
40+
41+
# NOTE: we can use shift_quantities=True, shifting the labels by their mean, but since in this example they are extensive that is not likely to make the results better.
42+
model = SORFModel(shift_quantities=True)
43+
44+
model.train(training_compounds=train_compounds, training_quantities=train_quantities)
45+
46+
print("Optimized sigma:", model.sigma)
47+
print("Optimized l2reg divided by average kernel element:", model.l2reg_diag_ratio)
48+
49+
predictions = model.predict_from_compounds(test_compounds)
50+
print("Prediction MAE:", MAE()(predictions - test_quantities))
51+
print("Test set quantity STD:", np.std(test_quantities))
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import csv
2+
import random
3+
import tarfile
4+
5+
import numpy as np
6+
7+
from qml2 import Compound
8+
from qml2.models.krr import KRRModel
9+
from qml2.models.loss_functions import MAE
10+
11+
xyzs = []
12+
energies = []
13+
14+
training_set_size = 2001
15+
test_set_size = 1000
16+
num_mols = training_set_size + test_set_size
17+
18+
with open("../../tests/test_data/hof_qm7.txt") as csvfile:
19+
reader = csv.reader(csvfile, delimiter=" ")
20+
all_rows = list(reader)
21+
random.shuffle(all_rows)
22+
for row in all_rows[:num_mols]:
23+
xyzs.append(row[0])
24+
energies.append(float(row[1]))
25+
26+
energies = np.array(energies)
27+
28+
compounds = []
29+
with tarfile.open("../../tests/test_data/qm7.tar.gz") as tar:
30+
for xyz_name in xyzs:
31+
xyz = tar.extractfile(xyz_name)
32+
comp = Compound(xyz=xyz)
33+
compounds.append(comp)
34+
35+
train_compounds = compounds[:training_set_size]
36+
test_compounds = compounds[training_set_size:]
37+
38+
train_quantities = energies[:training_set_size]
39+
test_quantities = energies[training_set_size:]
40+
41+
# NOTE: we can use shift_quantities=True, shifting the labels by their mean, but since in this example they are extensive that is not likely to make the results better.
42+
model = KRRModel(shift_quantities=False)
43+
44+
model.train(training_compounds=train_compounds, training_quantities=train_quantities)
45+
46+
print("Optimized sigma:", model.sigma)
47+
print("Optimized l2reg divided by average kernel element:", model.l2reg_diag_ratio)
48+
49+
predictions = model.predict_from_compounds(test_compounds)
50+
print("Prediction MAE:", MAE()(predictions - test_quantities))
51+
print("Test set quantity STD:", np.std(test_quantities))
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import csv
2+
import random
3+
import tarfile
4+
5+
import numpy as np
6+
7+
from qml2 import Compound
8+
from qml2.models.loss_functions import MAE
9+
from qml2.models.sorf import SORFLocalModel
10+
11+
xyzs = []
12+
energies = []
13+
14+
training_set_size = 501
15+
test_set_size = 1000
16+
num_mols = training_set_size + test_set_size
17+
18+
with open("../../tests/test_data/hof_qm7.txt") as csvfile:
19+
reader = csv.reader(csvfile, delimiter=" ")
20+
all_rows = list(reader)
21+
random.shuffle(all_rows)
22+
for row in all_rows[:num_mols]:
23+
xyzs.append(row[0])
24+
energies.append(float(row[1]))
25+
26+
energies = np.array(energies)
27+
28+
compounds = []
29+
with tarfile.open("../../tests/test_data/qm7.tar.gz") as tar:
30+
for xyz_name in xyzs:
31+
xyz = tar.extractfile(xyz_name)
32+
comp = Compound(xyz=xyz)
33+
compounds.append(comp)
34+
35+
train_compounds = compounds[:training_set_size]
36+
test_compounds = compounds[training_set_size:]
37+
38+
train_quantities = energies[:training_set_size]
39+
test_quantities = energies[training_set_size:]
40+
41+
# using "shift_quantites=True" means using dressed atom approach; requires defining `possible_nuclear_charges` though.
42+
model = SORFLocalModel(shift_quantities=True, possible_nuclear_charges=np.array([1, 6, 7, 8, 16]))
43+
44+
model.train(training_compounds=train_compounds, training_quantities=train_quantities)
45+
46+
print("Optimized sigma:", model.sigma)
47+
print("Optimized l2reg divided by average kernel element:", model.l2reg_diag_ratio)
48+
49+
predictions = model.predict_from_compounds(test_compounds)
50+
print("Prediction MAE:", MAE()(predictions - test_quantities))
51+
print("Test set quantity STD:", np.std(test_quantities))

examples/models/ex_FCHL19_model.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import csv
2+
import random
3+
import tarfile
4+
5+
import numpy as np
6+
7+
from qml2 import Compound
8+
from qml2.models.krr import KRRLocalModel
9+
from qml2.models.loss_functions import MAE
10+
11+
xyzs = []
12+
energies = []
13+
14+
training_set_size = 501
15+
test_set_size = 1000
16+
num_mols = training_set_size + test_set_size
17+
18+
with open("../../tests/test_data/hof_qm7.txt") as csvfile:
19+
reader = csv.reader(csvfile, delimiter=" ")
20+
all_rows = list(reader)
21+
random.shuffle(all_rows)
22+
for row in all_rows[:num_mols]:
23+
xyzs.append(row[0])
24+
energies.append(float(row[1]))
25+
26+
energies = np.array(energies)
27+
28+
compounds = []
29+
with tarfile.open("../../tests/test_data/qm7.tar.gz") as tar:
30+
for xyz_name in xyzs:
31+
xyz = tar.extractfile(xyz_name)
32+
comp = Compound(xyz=xyz)
33+
compounds.append(comp)
34+
35+
train_compounds = compounds[:training_set_size]
36+
test_compounds = compounds[training_set_size:]
37+
38+
train_quantities = energies[:training_set_size]
39+
test_quantities = energies[training_set_size:]
40+
41+
# using "shift_quantites=True" means using dressed atom approach; requires defining `possible_nuclear_charges` though.
42+
model = KRRLocalModel(shift_quantities=True, possible_nuclear_charges=np.array([1, 6, 7, 8, 16]))
43+
44+
model.train(training_compounds=train_compounds, training_quantities=train_quantities)
45+
46+
print("Optimized sigma:", model.sigma)
47+
print("Optimized l2reg divided by average kernel element:", model.l2reg_diag_ratio)
48+
49+
predictions = model.predict_from_compounds(test_compounds)
50+
print("Prediction MAE:", MAE()(predictions - test_quantities))
51+
print("Test set quantity STD:", np.std(test_quantities))
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
import csv
2+
import random
3+
import tarfile
4+
5+
import numpy as np
6+
7+
from qml2 import Compound
8+
from qml2.kernels import local_dn_matern_kernel, local_dn_matern_kernel_symmetric
9+
from qml2.models.krr import KRRLocalModel
10+
from qml2.models.loss_functions import MAE
11+
from qml2.representations.calculators import SLATMCalculator
12+
from qml2.utils import get_sorted_elements
13+
14+
xyzs = []
15+
energies = []
16+
17+
training_set_size = 501
18+
test_set_size = 1000
19+
num_mols = training_set_size + test_set_size
20+
21+
with open("../../tests/test_data/hof_qm7.txt") as csvfile:
22+
reader = csv.reader(csvfile, delimiter=" ")
23+
all_rows = list(reader)
24+
random.shuffle(all_rows)
25+
for row in all_rows[:num_mols]:
26+
xyzs.append(row[0])
27+
energies.append(float(row[1]))
28+
29+
energies = np.array(energies)
30+
all_nuclear_charges = []
31+
32+
compounds = []
33+
with tarfile.open("../../tests/test_data/qm7.tar.gz") as tar:
34+
for xyz_name in xyzs:
35+
xyz = tar.extractfile(xyz_name)
36+
comp = Compound(xyz=xyz)
37+
compounds.append(comp)
38+
all_nuclear_charges.append(comp.nuclear_charges)
39+
40+
train_compounds = compounds[:training_set_size]
41+
test_compounds = compounds[training_set_size:]
42+
43+
train_quantities = energies[:training_set_size]
44+
test_quantities = energies[training_set_size:]
45+
46+
slatm_calculator = SLATMCalculator(all_nuclear_charges)
47+
possible_nuclear_charges = get_sorted_elements(np.concatenate(all_nuclear_charges))
48+
print("Nuclear charges found:", possible_nuclear_charges)
49+
50+
# using "shift_quantites=True" means using dressed atom approach; requires defining `possible_nuclear_charges` though.
51+
model = KRRLocalModel(
52+
shift_quantities=True,
53+
possible_nuclear_charges=possible_nuclear_charges,
54+
representation_function=slatm_calculator,
55+
rep_kwargs={"local": True},
56+
kernel_kwargs={"order": 0, "metric": "l2"},
57+
kernel_function=local_dn_matern_kernel,
58+
kernel_function_symmetric=local_dn_matern_kernel_symmetric,
59+
)
60+
61+
model.train(training_compounds=train_compounds, training_quantities=train_quantities)
62+
63+
print("Optimized sigma:", model.sigma)
64+
print("Optimized l2reg divided by average kernel element:", model.l2reg_diag_ratio)
65+
66+
predictions = model.predict_from_compounds(test_compounds)
67+
print("Prediction MAE:", MAE()(predictions - test_quantities))
68+
print("Test set quantity STD:", np.std(test_quantities))
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
import csv
2+
import random
3+
import tarfile
4+
5+
import numpy as np
6+
7+
from qml2 import Compound
8+
from qml2.models.loss_functions import MAE
9+
from qml2.models.sorf import SORFLocalModel
10+
from qml2.representations.calculators import SLATMCalculator
11+
from qml2.utils import get_sorted_elements
12+
13+
xyzs = []
14+
energies = []
15+
16+
training_set_size = 501
17+
test_set_size = 1000
18+
num_mols = training_set_size + test_set_size
19+
20+
with open("../../tests/test_data/hof_qm7.txt") as csvfile:
21+
reader = csv.reader(csvfile, delimiter=" ")
22+
all_rows = list(reader)
23+
random.shuffle(all_rows)
24+
for row in all_rows[:num_mols]:
25+
xyzs.append(row[0])
26+
energies.append(float(row[1]))
27+
28+
energies = np.array(energies)
29+
all_nuclear_charges = []
30+
31+
compounds = []
32+
with tarfile.open("../../tests/test_data/qm7.tar.gz") as tar:
33+
for xyz_name in xyzs:
34+
xyz = tar.extractfile(xyz_name)
35+
comp = Compound(xyz=xyz)
36+
compounds.append(comp)
37+
all_nuclear_charges.append(comp.nuclear_charges)
38+
39+
train_compounds = compounds[:training_set_size]
40+
test_compounds = compounds[training_set_size:]
41+
42+
train_quantities = energies[:training_set_size]
43+
test_quantities = energies[training_set_size:]
44+
45+
slatm_calculator = SLATMCalculator(all_nuclear_charges)
46+
possible_nuclear_charges = get_sorted_elements(np.concatenate(all_nuclear_charges))
47+
print("Nuclear charges found:", possible_nuclear_charges)
48+
49+
# using "shift_quantites=True" means using dressed atom approach; requires defining `possible_nuclear_charges` though.
50+
# NOTE: nfeatures might need to be increased if aSLATM representation is too large.
51+
model = SORFLocalModel(
52+
shift_quantities=True,
53+
possible_nuclear_charges=possible_nuclear_charges,
54+
representation_function=slatm_calculator,
55+
rep_kwargs={"local": True},
56+
nfeatures=32768,
57+
ntransforms=3,
58+
)
59+
60+
model.train(training_compounds=train_compounds, training_quantities=train_quantities)
61+
62+
print("Optimized sigma:", model.sigma)
63+
print("Optimized l2reg divided by average kernel element:", model.l2reg_diag_ratio)
64+
65+
predictions = model.predict_from_compounds(test_compounds)
66+
print("Prediction MAE:", MAE()(predictions - test_quantities))
67+
print("Test set quantity STD:", np.std(test_quantities))

0 commit comments

Comments
 (0)