5 Minutes of Data Science - week 3
Highlights from January 16 to January 22
Foreword
There are a couple of big news from last week. Firstly, Google launched a free Deep Learning Tuning Book and HOWTO build GPT2 from scratch.
Come say hi on Mastodon. See you next week!
Reddit’s top posts
- 300,000+ Tech jobs have been vanished in the last 12 months. (Sad but true fact), at r/Data Science (💬186)
- Thoughts?, at r/Data Science (💬81)
- I asked ChatGPT to explain ROC AUC, the level of collaboration is beyond my expectation, at r/Data Science (💬81)
- OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic, at r/Machine Learning (💬246)
- Getty Images is suing the creators of AI art tool Stable Diffusion for scraping its content, at r/Machine Learning (💬270)
- [ICLR'2023 Spotlight🌟]: The first BERT-style pretraining on CNNs!, at r/Machine Learning (💬20)
- How do you lie about numbers?, at r/Ask Statistics (💬27)
- [QUESTION] GLM, at r/Ask Statistics (💬3)
- What to do if nothing in class makes sense?, at r/Ask Statistics (💬15)
- This AI can clone your voice! VALL-E (explained), at r/Latest in ML (💬0)
Github jupyter notebook trends
- AI-For-Beginners: 12 Weeks, 24 Lessons, AI for All!
- nn-zero-to-hero: Neural Networks: Zero to Hero
- diff-svc: Singing Voice Conversion via diffusion model
- micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
- VToonify: [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
- YOLOv6: YOLOv6: a single-stage object detection framework dedicated to industrial applications.
- whisper: Robust Speech Recognition via Large-Scale Weak Supervision
- ML-For-Beginners: 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
- FinRL: FinRL: Financial Reinforcement Learning.🔥
- PythonDataScienceHandbook: Python Data Science Handbook: full text in Jupyter Notebooks
- chatgpt-comparison-detection: Human ChatGPT Comparison Corpus (HC3), Detectors, and more!🔥
- mlbookcamp-code: The code from the Machine Learning Bookcamp book and a free course based on the book
- amazon-sagemaker-examples: Example📓Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using🧠Amazon SageMaker.
- mlops-zoomcamp: Free MLOps course from DataTalks.Club
- nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.
- balance: The balance python package offers a simple workflow and methods for dealing with biased data samples when looking to infer from them to some target population of interest.
- blog: Public repo for HF blog posts
- DeepLearningSystem: Deep Learning System core principles introduction.
- ColabFold: Making Protein folding accessible to all!
- Python-and-Machine-Learning: 6th Feb 2021
Github python trends
- langchain: ⚡Building applications with LLMs through composability⚡
- gpt_index: An index created by GPT to organize external information and answer queries!
- CodeFormer: [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
- Gymnasium: A standard API for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
- mage-ai: 🧙The modern replacement for Airflow.
- dvc: 🦉Data Version Control | Git for Data & Models | ML Experiments Management
- ultralytics: YOLOv8🚀in PyTorch > ONNX > CoreML > TFLite
- SHARK: SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters
- mmyolo: OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.
- tiktoken: None
- azure-cli: Azure Command-Line Interface
- CodeGen: CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
- black: The uncompromising Python code formatter
Newsletters
- The Impact and Future of ChatGPT, by Last Week in AI
- Everything you need to know about Stable Diffusion, by The AI Edge
Podcasts
- Do Results Generalize for Privacy and Security Surveys, by Data Skeptic
- Machine learning at small organizations, by Practical AI
- Indie Hacking - Pauline Clavelloux, by Data Talks
Blogs
- Using large language models (LLMs) to synthesize training data, by Amazon Science
- Domain data trumps teacher knowledge for distilling NLU models, by Amazon Science