5 Minutes of Data Science - week 2

Highlights from January 09 to January 15


Hi everyone!

It seems that github links and reddit links are the most clicked of this newsletter, so I decided to move those to the top, for quick access.

If you have any feedback and feeds that you’d like included, ping me on Mastodon. See you next week!

Reddit’s top posts

  • A millennial founder who sold her company to JP Morgan for $175 million allegedly paid a [DATA SCIENCE] college professor $18K to fabricate 4 million accounts. Their email exchange is a doozy, at r/Data Science (💬69)
  • The true reason I chose to be a DS.., at r/Data Science (💬16)
  • I wrote up a guide showing how to do Data Science with ChatGPT., at r/Data Science (💬97)
  • I built an app that allows you to build Image Classifiers completely on your phone. Collect data, Train models, and Preview the predictions in realtime. You can also export the model/dataset to be used anywhere else. Would love some feedback., at r/Machine Learning (💬80)
  • Class-action law­suit filed against Sta­bil­ity AI, DeviantArt, and Mid­journey for using the text-to-image AI Sta­ble Dif­fu­sion, at r/Machine Learning (💬710)
  • Microsoft ChatGPT investment isn’t about Bing but about Cortana, at r/Machine Learning (💬171)
  • Unsure why academic has used one one tail critical instead of the two tail., at r/Ask Statistics (💬16)
  • Unemployment duration linear probability, probit or exponential, at r/Ask Statistics (💬4)
  • Poisson regression - how to account for proportionality, at r/Ask Statistics (💬7)
  • What happened in AI research in 2022 - My curated list of AI breakthroughs with a video explanation, article, and code for each paper, at r/Latest in ML (💬0)
  • nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.
  • data-engineering-zoomcamp: Free Data Engineering course!
  • VToonify: [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
  • ML-For-Beginners: 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
  • nn-zero-to-hero: Neural Networks: Zero to Hero
  • amazon-sagemaker-examples: Example📓Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using🧠Amazon SageMaker.
  • introtodeeplearning: Lab Materials for MIT 6.S191: Introduction to Deep Learning
  • chatgpt-clone: Build Your own ChatGPT with OpenAI API and Streamlit
  • PythonDataScienceHandbook: Python Data Science Handbook: full text in Jupyter Notebooks
  • micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
  • CLIP: Contrastive Language-Image Pretraining
  • pydata-book: Materials and IPython notebooks for “Python for Data Analysis” by Wes McKinney, published by O’Reilly Media
  • Dash-by-Plotly: Interactive data analytics
  • shap: A game theoretic approach to explain the output of any machine learning model.
  • tutorials: MONAI Tutorials
  • aima-python: Python implementation of algorithms from Russell And Norvig’s “Artificial Intelligence - A Modern Approach”
  • pycaret: An open-source, low-code machine learning library in Python
  • YOLOv6: YOLOv6: a single-stage object detection framework dedicated to industrial applications.
  • lora: Using Low-rank adaptation to quickly fine-tune diffusion models.
  • gpt_index: An index created by GPT to organize external information and answer queries!
  • openai-cookbook: Examples and guides for using the OpenAI API
  • lama-cleaner: Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
  • minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
  • tortoise-tts: A multi-voice TTS system trained with an emphasis on quality
  • unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
  • diagrams: 🎨Diagram as Code for prototyping cloud system architectures
  • CodeFormer: [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
  • GLM-130B: GLM-130B: An Open Bilingual Pre-Trained Model
  • django: The Web framework for perfectionists with deadlines.
  • pytorch-image-models: PyTorch image models, scripts, pretrained weights – ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more
  • streamlit: Streamlit — The fastest way to build data apps in Python
  • chatgpt-wrapper: API for interacting with ChatGPT using Python and from Shell.
  • TTS: 🐸💬- a deep learning toolkit for Text-to-Speech, battle-tested in research and production
  • OpenBBTerminal: Investment Research for Everyone, Anywhere.
  • langchain: ⚡Building applications with LLMs through composability⚡


  • Forecasting Potential Misuses of Language Models for Disinformation Campaigns—and How to Reduce Risk, by Open AI
  • Teaching speech recognizers new words — without retraining, by Amazon Science
  • Amazon and Tennessee State University announce collaboration, by Amazon Science
  • Better differential privacy for end-to-end speech recognition, by Amazon Science


  • A novel method to generate reliable data with Parallel Domain CEO Kevin McNamara (Ep. 214), by Data Science At Home
  • 4 out of 5 Data Scientists Agree, by Data Skeptic
  • ChatGPT goes prime time!, by Practical AI
  • #121 — ChatGPT and How Generative AI is Augmenting Workflows, by DataFramed
  • Doing Software Engineering in Academia - Johanna Bayer, by Data Talks


  • The AI Buzz, Episode #1: ChatGPT, Transformers and Attention, by StatQuest