5 Minutes of Data Science - week 4
Highlights from January 16 to January 22
Have a great week everyone!
Like this newsletter? Invite a friend, use this link!
- Last Week in AI #203: ChatGPT’s new rival, generative AI continues to make waves, new concerns about Deepfakes and more!, by Last Week in AI
- How to ARCHITECT a search engine like Google Search, by The AI Edge
- Revolutionizing Data Science: The Latest Trends in Automation, Experimentation, and Language Model Evaluation, by Gradient Flow
- Do Results Generalize for Privacy and Security Surveys, by Data Skeptic
- Machine learning at small organizations, by Practical AI
- AI Trends 2023: Reinforcement Learning - RLHF, Robotic Pre-Training, and Offline RL with Sergey Levine - #612, by The TWIML AI
- Indie Hacking - Pauline Clavelloux, by Data Talks
Reddit’s top posts
- 300,000+ Tech jobs have been vanished in the last 12 months. (Sad but true fact), at r/Data Science (💬186)
- Thoughts?, at r/Data Science (💬85)
- Another One, at r/Data Science (💬102)
- OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic, at r/Machine Learning (💬253)
- [ICLR'2023 Spotlight🌟]: The first BERT-style pretraining on CNNs!, at r/Machine Learning (💬39)
- Getty Images is suing the creators of AI art tool Stable Diffusion for scraping its content, at r/Machine Learning (💬270)
- How do you lie about numbers?, at r/Ask Statistics (💬27)
- What to do if nothing in class makes sense?, at r/Ask Statistics (💬15)
- Confidence intervals question, at r/Ask Statistics (💬2)
- This AI can clone your voice! VALL-E (explained), at r/Latest in ML (💬0)
Github jupyter notebook trends
- data-engineering-zoomcamp: Free Data Engineering course!
- AI-For-Beginners: 12 Weeks, 24 Lessons, AI for All!
- nn-zero-to-hero: Neural Networks: Zero to Hero
- diff-svc: Singing Voice Conversion via diffusion model
- micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
- VToonify: [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
- YOLOv6: YOLOv6: a single-stage object detection framework dedicated to industrial applications.
- whisper: Robust Speech Recognition via Large-Scale Weak Supervision
- ML-For-Beginners: 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
- FinRL: FinRL: Financial Reinforcement Learning.🔥
- PythonDataScienceHandbook: Python Data Science Handbook: full text in Jupyter Notebooks
- chatgpt-comparison-detection: Human ChatGPT Comparison Corpus (HC3), Detectors, and more!🔥
- mlbookcamp-code: The code from the Machine Learning Bookcamp book and a free course based on the book
- amazon-sagemaker-examples: Example📓Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using🧠Amazon SageMaker.
- mlops-zoomcamp: Free MLOps course from DataTalks.Club
- nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.
- balance: The balance python package offers a simple workflow and methods for dealing with biased data samples when looking to infer from them to some target population of interest.
- blog: Public repo for HF blog posts
- DeepLearningSystem: Deep Learning System core principles introduction.
- ColabFold: Making Protein folding accessible to all!
Github python trends
- langchain: ⚡Building applications with LLMs through composability⚡
- gpt_index: An index created by GPT to organize external information and answer queries!
- CodeFormer: [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
- yolov8_tracking: Real-time multi-object tracking and segmentation using YOLOv8
- Gymnasium: A standard API for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
- mypy: Optional static typing for Python
- zulip: Zulip server and web application. Open-source team chat that helps teams stay productive and focused.
- mage-ai: 🧙The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
- dvc: 🦉Data Version Control | Git for Data & Models | ML Experiments Management
- ultralytics: YOLOv8🚀in PyTorch > ONNX > CoreML > TFLite
- SHARK: SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters
- mmyolo: OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.
- tiktoken: None
- azure-cli: Azure Command-Line Interface
- CodeGen: CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
- black: The uncompromising Python code formatter
- Using large language models (LLMs) to synthesize training data, by Amazon Science
- Domain data trumps teacher knowledge for distilling NLU models, by Amazon Science