Metadata-Version: 2.1
Name: theia-pypi
Version: 0.2
Summary: Explainable prediction of EC numbers using a multilayer perceptron.
Home-page: https://github.com/daenuprobst/theia
Author: Daniel Probst
Author-email: daniel.probst@hey.com
License: MIT
Project-URL: Documentation, https://github.com/daenuprobst/theia
Project-URL: Source, https://github.com/daenuprobst/theia
Project-URL: Twitter, https://twitter.com/skepteis
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Unix
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8
License-File: LICENSE

# :anchor: Theia

<!-- - <a href="#Quickstart">Quickstart</a>
- <a href="#Quickstart">Colab</a> -->

## Quickstart
As you need at least Python 3.9 to get started, I suggest you use conda to create an environment with an up-to-date Python versions (3.11 is really, really fast, so I suggest using this as soon as rdkit supports it). For now, let's go with Python 3.10: `conda create -n theia python==3.10 && conda activate theia` is all you need (ha). Then you can go ahead and install theia using pip (theia was taken, so make sure to install theia-pypi, except if you want to parse log files):
```
pip install theia-pypi
```
Thats pretty much it, now you can start theia by simply typing:
```
theia
```
and open the url `http://127.0.0.1:5000/` in your web browser. 

<img src="https://github.com/daenuprobst/theia/raw/main/img/demo.gif">

In case you don't want or need an UI, you can also use the cli to simply predict an EC number from an arbitrary reaction:
```
theia-cli "rheadb.ec123" "S=C=NCC1=CC=CC=C1>>N#CSCC1=CC=CC=C1"
```
If you want a bit more information than just the predicted EC class, you can also get the top-5 probabilities:
```
theia-cli "rheadb.ec123" "S=C=NCC1=CC=CC=C1>>N#CSCC1=CC=CC=C1" --probs
```
Or, if you want human-readable output, you can make it pretty:
```
theia-cli "rheadb.ec123" "S=C=NCC1=CC=CC=C1>>N#CSCC1=CC=CC=C1" --probs --pretty
```
and you'll get a neat table...
```
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Prediction ┃ Probability [%] ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ 2.7.4      │           14.22 │
│ 2.3.2      │           11.03 │
│ 2.3.1      │            7.15 │
│ 2.7.8      │            4.62 │
│ 2.6.1      │            4.05 │
└────────────┴─────────────────┘
```
Of course, there are more models than `rhea.ec123`, which we used in the previous examples. Here's a complete list of all the included models:
| Model         | Trained on  | Name            |
|---------------|-------------|-----------------|
| Rhea ECX      | Rhea        | `rheadb.ec1`    |
| Rhea ECXY     | Rhea        | `rheadb.ec12`   |
| Rhea ECXYZ    | Rhea        | `rheadb.ec123`  |
| ECREACT ECX   | ECREACT 1.0 | `ecreact.ec1`   |
| ECREACT ECXY  | ECREACT 1.0 | `ecreact.ec12`  |
| ECREACT ECXYZ | ECREACT 1.0 | `ecreact.ec123` |

<!-- ## Colab
You can also give the API a spin in this Google colab notebook. -->

## Reproduction / Custom Models
To get started, install the reproduction requirements with:
```
pip install -r reproduction_requirements.txt
```
The training, validation, and test sets used in the manuscript can be recreated using the following two commands (of course, you can plug in your own data sets here to get a custom model):
```
mkdir experiments/data
python scripts/encode_split_data.py data/rheadb.csv.gz experiments/data/rheadb
python scripts/encode_split_data.py data/ecreact-nofilter-1.0.csv.gz experiments/data/ecreact
```
The training of the models can be started with:
```
chmod +x train_all.sh
./train_all.sh
```
If you want to train the 6 additional models for cross-validation, you can run the following:
```
chmod +x train_all_cross.sh
./train_all_cross.sh
```
Finally, to reproduce the figures, you first have to run some additional data crunching scripts:
```
python scripts/class_counts.py data/ecreact-nofilter-1.0.csv.gz experiments/data/ecreact_counts.csv
python scripts/class_counts.py data/rheadb.csv.gz experiments/data/rheadb_counts.csv
```
Then it's time to draw:
```
cd figures
chmod +x generate_figures.sh
./generate_figures.sh
```
fin.
