5 Minutes of Data Science - week 4

Highlights from January 16 to January 22

Foreword

Have a great week everyone!

Like this newsletter? Invite a friend, use this link!


Newsletters

  • Last Week in AI #203: ChatGPT’s new rival, generative AI continues to make waves, new concerns about Deepfakes and more!, by Last Week in AI
  • How to ARCHITECT a search engine like Google Search, by The AI Edge
  • Revolutionizing Data Science: The Latest Trends in Automation, Experimentation, and Language Model Evaluation, by Gradient Flow

Podcasts

  • Do Results Generalize for Privacy and Security Surveys, by Data Skeptic
  • Machine learning at small organizations, by Practical AI
  • AI Trends 2023: Reinforcement Learning - RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine - #612, by The TWIML AI
  • Indie Hacking - Pauline Clavelloux, by Data Talks

Reddit’s top posts

  • data-engineering-zoomcamp: Free Data Engineering course!
  • AI-For-Beginners: 12 Weeks, 24 Lessons, AI for All!
  • nn-zero-to-hero: Neural Networks: Zero to Hero
  • diff-svc: Singing Voice Conversion via diffusion model
  • micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
  • VToonify: [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
  • YOLOv6: YOLOv6: a single-stage object detection framework dedicated to industrial applications.
  • whisper: Robust Speech Recognition via Large-Scale Weak Supervision
  • ML-For-Beginners: 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
  • FinRL: FinRL: Financial Reinforcement Learning.🔥
  • PythonDataScienceHandbook: Python Data Science Handbook: full text in Jupyter Notebooks
  • chatgpt-comparison-detection: Human ChatGPT Comparison Corpus (HC3), Detectors, and more!🔥
  • mlbookcamp-code: The code from the Machine Learning Bookcamp book and a free course based on the book
  • amazon-sagemaker-examples: Example📓Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using🧠Amazon SageMaker.
  • mlops-zoomcamp: Free MLOps course from DataTalks.Club
  • nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.
  • balance: The balance python package offers a simple workflow and methods for dealing with biased data samples when looking to infer from them to some target population of interest.
  • blog: Public repo for HF blog posts
  • DeepLearningSystem: Deep Learning System core principles introduction.
  • ColabFold: Making Protein folding accessible to all!
  • langchain: ⚡Building applications with LLMs through composability⚡
  • gpt_index: An index created by GPT to organize external information and answer queries!
  • CodeFormer: [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
  • yolov8_tracking: Real-time multi-object tracking and segmentation using YOLOv8
  • Gymnasium: A standard API for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
  • mypy: Optional static typing for Python
  • zulip: Zulip server and web application. Open-source team chat that helps teams stay productive and focused.
  • mage-ai: 🧙The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
  • dvc: 🦉Data Version Control | Git for Data & Models | ML Experiments Management
  • ultralytics: YOLOv8🚀in PyTorch > ONNX > CoreML > TFLite
  • SHARK: SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters
  • mmyolo: OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.
  • tiktoken: None
  • azure-cli: Azure Command-Line Interface
  • CodeGen: CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
  • black: The uncompromising Python code formatter

Blogs

  • Using large language models (LLMs) to synthesize training data, by Amazon Science
  • Domain data trumps teacher knowledge for distilling NLU models, by Amazon Science