機器學習工具大整理 Collections of Machine Learning Tools
Best of ml: https://github.com/ml-tooling/best-of-ml-python
EDA make easy : Pandas-Profiling, Sweetviz, Autoviz, D-Tale
Classification metrics
- Confusion Matrix
- ROC AUC
- Gini Coefficient
- Gain and Lift Charts
- KS Chart (Kolmogorov-Smirnov)
Regression metrics
- MSE, RMSE
- MAE
- MAD
- RAE (Relative Absolute Error)
- RSE (Relative Squared Error)
- R-squared and Adj R-squared
- Analysis of Residuals
Competition
- Feature Selection: eli5, lofo,
- Data loading: Faker, Tensorflow Datasets, datasets, Pdfminer.six
- Imbalanced data: imblearn
- Parameter Optimization: Optuna, Keras Tuner, skopt(scikit-optimize), Hyperopt
- AutoML: H2O, NNI (Neural Network Intelligence), auto sklearn, auto keras, TPOP
- Algo: Lightgbm, Xgboost, Catboost, Lazypredict
- More effetive pandas: Vaex
- Model Interpretability: LIME, SHAP, interpret, alibi
- Missing value imputer: sklearn.impute.IterativeImputer
- Training ( Workflow & Experiment Tracking): Tensorboard, MLFlow, TensorWatch, Data Version Control(DVC), Metaflow
NLP
- NLP: Kashgari, FastNLP, TextBlob
- QA: NeuralQ
- NER: NeuroNER
- Neural Relation Extraction(NRE): OpenNRE
- Label: Docanno
- Seq2Seq: fairseq
CV
- Image: Pillow, torchvision, scikit-image
- Image model: PyTorch Image Models, GluonCV
- Image label: labelImg
- Object detection: deectron2, mmdectection
- Face Recognition: face_recognition, facenet_pytorch
- Segmentation: segmentation_models
- Image data augmentation: imgaug, Albumentations, Augmentor
- Finding duplicate image: imagededup
- Explainable : cv2.saliency
- Faster image loading : libjpeg-turbo, PyVips
Time Series
- Time Series Feature extraction: tsfresh
- TS smoothing and outliner dectection: tsmoothie
- Time Series forecasting: Prophet and NeuralProphet, sktime, pytorch-forecasting, pmdarima
- Better datetime: python-dateutil
- generalized framework : Kats
- markov model: Deeptime
OCR
- OCR: Tesseract, EasyOCR, PaddleOCR
Medical
- Medical: MNE-Python, Nilearn, Lifelines
Recommendation System
- Building and analysis : recommenders, torchrec, TensorFlow Recommenders, Pyspark.mlib.recommendation, surprise
- Collaborative Filtering : Implicit
- Factorization Machine : lightFM
Face detection
- Facenet, OpenFace, VGG-Face, DeepFace, Dlib
Visualization
- Data visualization: Plotly, Bokeh, Holoviews, Datashader pydantic, schema
- Ploting Architecture: PlotNeuralNet
- High Dimensional Data : UMAP
Production
- Training pipeline: Kubeflow, Airflow, Prefect, Metaflow, Tensorflow Extended
- Data Versioning: DVC
- Data Validation: TensorFlow Data Validation (TFDV), datatest
- Distributed: Ray, PySpark, DeepSpeed
- Model Monitoring: Seldon, MLWatcher
- Model registry : MLFlow
- Explainable : SHAP
- Experiment Monitoring : TensorBoard, Weights & Biases
- Measurement of Model time : PyTorch Profiler, Tensorflow Profiler
- Code Review: ReviewNB
- API: Flask, FastAPI
- Large scale: Pyspark, TensorFlowOnSpark, Horovod, BigDL
- Python SQL: BlazingSQL, dask-sql
- CI/CD: GoCD, AutoRABIT
- C++: Dlib, mlpack
- Label : labelstudio
- Model testing: checklist (NLP bias)
low-code
- pycaret
Probabilistic Programming
- PyMC
- pyro
留言
張貼留言