機器學習工具大整理 Collections of Machine Learning Tools

機器學習工具大整理 Collections of Machine Learning Tools

Best of ml: https://github.com/ml-tooling/best-of-ml-python

EDA make easy : Pandas-Profiling, Sweetviz, Autoviz, D-Tale

Classification metrics

Confusion Matrix
ROC AUC
Gini Coefficient
Gain and Lift Charts
KS Chart (Kolmogorov-Smirnov)

Regression metrics

MSE, RMSE
MAE
MAD
RAE (Relative Absolute Error)
RSE (Relative Squared Error)
R-squared and Adj R-squared
Analysis of Residuals

Competition

Feature Selection: eli5, lofo,
Data loading: Faker, Tensorflow Datasets, datasets, Pdfminer.six
Imbalanced data: imblearn
Parameter Optimization: Optuna, Keras Tuner, skopt(scikit-optimize), Hyperopt
AutoML: H2O, NNI (Neural Network Intelligence), auto sklearn, auto keras, TPOP
Algo: Lightgbm, Xgboost, Catboost, Lazypredict
More effetive pandas: Vaex
Model Interpretability: LIME, SHAP, interpret, alibi
Missing value imputer: sklearn.impute.IterativeImputer
Training ( Workflow & Experiment Tracking): Tensorboard, MLFlow, TensorWatch, Data Version Control(DVC), Metaflow

NLP

NLP: Kashgari, FastNLP, TextBlob
QA: NeuralQ
NER: NeuroNER
Neural Relation Extraction(NRE): OpenNRE
Label: Docanno
Seq2Seq: fairseq

CV

Image: Pillow, torchvision, scikit-image
Image model: PyTorch Image Models, GluonCV
Image label: labelImg
Object detection: deectron2, mmdectection
Face Recognition: face_recognition, facenet_pytorch
Segmentation: segmentation_models
Image data augmentation: imgaug, Albumentations, Augmentor
Finding duplicate image: imagededup
Explainable : cv2.saliency
Faster image loading : libjpeg-turbo, PyVips

Time Series

Time Series Feature extraction: tsfresh
TS smoothing and outliner dectection: tsmoothie
Time Series forecasting: Prophet and NeuralProphet, sktime, pytorch-forecasting, pmdarima
Better datetime: python-dateutil
generalized framework : Kats
markov model: Deeptime

OCR

OCR: Tesseract, EasyOCR, PaddleOCR

Medical

Medical: MNE-Python, Nilearn, Lifelines

Recommendation System

Building and analysis : recommenders, torchrec, TensorFlow Recommenders, Pyspark.mlib.recommendation, surprise
Collaborative Filtering : Implicit
Factorization Machine : lightFM

Face detection

Facenet, OpenFace, VGG-Face, DeepFace, Dlib

Visualization

Data visualization: Plotly, Bokeh, Holoviews, Datashader pydantic, schema
Ploting Architecture: PlotNeuralNet
High Dimensional Data : UMAP

Production

Training pipeline: Kubeflow, Airflow, Prefect, Metaflow, Tensorflow Extended
Data Versioning: DVC
Data Validation: TensorFlow Data Validation (TFDV), datatest
Distributed: Ray, PySpark, DeepSpeed
Model Monitoring: Seldon, MLWatcher
Model registry : MLFlow
Explainable : SHAP
Experiment Monitoring : TensorBoard, Weights & Biases
Measurement of Model time : PyTorch Profiler, Tensorflow Profiler
Code Review: ReviewNB
API: Flask, FastAPI
Large scale: Pyspark, TensorFlowOnSpark, Horovod, BigDL
Python SQL: BlazingSQL, dask-sql
CI/CD: GoCD, AutoRABIT
C++: Dlib, mlpack
Label : labelstudio
Model testing: checklist (NLP bias)

low-code

pycaret

Probabilistic Programming

PyMC
pyro

留言