5 Minutes of Data Science - week 2

Highlights from January 09 to January 15


Reddit’s top posts

  • A millennial founder who sold her company to JP Morgan for $175 million allegedly paid a [DATA SCIENCE] college professor $18K to fabricate 4 million accounts. Their email exchange is a doozy, at r/Data Science (💬69)
  • The true reason I chose to be a DS.., at r/Data Science (💬16)
  • I wrote up a guide showing how to do Data Science with ChatGPT., at r/Data Science (💬97)
  • I built an app that allows you to build Image Classifiers completely on your phone. Collect data, Train models, and Preview the predictions in realtime. You can also export the model/dataset to be used anywhere else. Would love some feedback., at r/Machine Learning (💬80)
  • Class-action law­suit filed against Sta­bil­ity AI, DeviantArt, and Mid­journey for using the text-to-image AI Sta­ble Dif­fu­sion, at r/Machine Learning (💬710)
  • Microsoft ChatGPT investment isn’t about Bing but about Cortana, at r/Machine Learning (💬171)
  • Unsure why academic has used one one tail critical instead of the two tail., at r/Ask Statistics (💬16)
  • Unemployment duration linear probability, probit or exponential, at r/Ask Statistics (💬4)
  • Poisson regression - how to account for proportionality, at r/Ask Statistics (💬7)
  • What happened in AI research in 2022 - My curated list of AI breakthroughs with a video explanation, article, and code for each paper, at r/Latest in ML (💬0)
  • nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.
  • data-engineering-zoomcamp: Free Data Engineering course!
  • VToonify: [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
  • ML-For-Beginners: 12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
  • nn-zero-to-hero: Neural Networks: Zero to Hero
  • amazon-sagemaker-examples: Example📓Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using🧠Amazon SageMaker.
  • introtodeeplearning: Lab Materials for MIT 6.S191: Introduction to Deep Learning
  • chatgpt-clone: Build Your own ChatGPT with OpenAI API and Streamlit
  • PythonDataScienceHandbook: Python Data Science Handbook: full text in Jupyter Notebooks
  • micrograd: A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
  • CLIP: Contrastive Language-Image Pretraining
  • pydata-book: Materials and IPython notebooks for “Python for Data Analysis” by Wes McKinney, published by O’Reilly Media
  • Dash-by-Plotly: Interactive data analytics
  • shap: A game theoretic approach to explain the output of any machine learning model.
  • tutorials: MONAI Tutorials
  • aima-python: Python implementation of algorithms from Russell And Norvig’s “Artificial Intelligence - A Modern Approach”
  • pycaret: An open-source, low-code machine learning library in Python
  • YOLOv6: YOLOv6: a single-stage object detection framework dedicated to industrial applications.
  • lora: Using Low-rank adaptation to quickly fine-tune diffusion models.
  • gpt_index: An index created by GPT to organize external information and answer queries!
  • openai-cookbook: Examples and guides for using the OpenAI API
  • lama-cleaner: Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
  • minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
  • tortoise-tts: A multi-voice TTS system trained with an emphasis on quality
  • unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
  • diagrams: 🎨Diagram as Code for prototyping cloud system architectures
  • CodeFormer: [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
  • GLM-130B: GLM-130B: An Open Bilingual Pre-Trained Model
  • django: The Web framework for perfectionists with deadlines.
  • pytorch-image-models: PyTorch image models, scripts, pretrained weights – ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more
  • streamlit: Streamlit — The fastest way to build data apps in Python
  • chatgpt-wrapper: API for interacting with ChatGPT using Python and from Shell.
  • TTS: 🐸💬- a deep learning toolkit for Text-to-Speech, battle-tested in research and production
  • OpenBBTerminal: Investment Research for Everyone, Anywhere.
  • langchain: ⚡Building applications with LLMs through composability⚡


  • Forecasting Potential Misuses of Language Models for Disinformation Campaigns—and How to Reduce Risk, by Open AI
  • Teaching speech recognizers new words — without retraining, by Amazon Science
  • Amazon and Tennessee State University announce collaboration, by Amazon Science
  • Better differential privacy for end-to-end speech recognition, by Amazon Science


  • A novel method to generate reliable data with Parallel Domain CEO Kevin McNamara (Ep. 214), by Data Science At Home
  • 4 out of 5 Data Scientists Agree, by Data Skeptic
  • ChatGPT goes prime time!, by Practical AI
  • #121 — ChatGPT and How Generative AI is Augmenting Workflows, by DataFramed
  • Doing Software Engineering in Academia - Johanna Bayer, by Data Talks


  • The AI Buzz, Episode #1: ChatGPT, Transformers and Attention, by StatQuest