<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Pedro Madruga</title>
    <link>http://pedromadruga.com/blog/</link>
    <description>Recent content on Pedro Madruga</description>
    <generator>Hugo -- 0.147.8</generator>
    <language>en-us</language>
    <atom:link href="http://pedromadruga.com/blog/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>The art of (AI) science and engineering - an intro</title>
      <link>http://pedromadruga.com/blog/art-of-science-and-engineering/</link>
      <pubDate>Mon, 21 Jul 2025 11:30:03 +0000</pubDate>
      <guid>http://pedromadruga.com/blog/art-of-science-and-engineering/</guid>
      <description>A few thoughts about SIGIR 2025 and how to apply academic research by experimenting in the industry.</description>
      <content:encoded><![CDATA[<blockquote>
<p><strong><em>&ldquo;In science if you know what you are doing you should not be doing it.
In engineering if you do not know what you are doing you should not be doing it.</em>”</strong></p></blockquote>
<p>― Richard Hamming, The Art of Doing Science and Engineering</p>
<p>Just returned from Padova after attending <a href="https://sigir2025.dei.unipd.it/">SIGIR 2025</a>, the most prestigious conference on Information Retrieval (IR) in the world (A* CORE Ranking). An experience full of learnings - but I was struck by the amount of research that could be applied in the industry already. It&rsquo;s evident that the gap between research and industry is incredibly narrow nowadays.</p>
<p>A bit of context: Where LLMs fall short on things like actual and truthful answers - two essential factors for the legal industry -, IR capabilities come to the rescue. The rapid pace of AI development (especially in IR) demands a tight integration between research and product development. In other words, experimentation must be embedded into product development, not treated as an afterthought, a nuisance or a nice-to-have.</p>
<p>Only experimentation will determine whether something from academia is applicable, but also if something is hype or not. This means that research and product development need to walk hand-in-hand, between science and engineering, between experimentation and implementation, between uncertainty and the familiar. And companies need to take shots at delivering the best of what AI has to offer to customers. At the end of the day, customers are the ones who matter the most.</p>
<p>And the dance between science and engineering is a very hard dance to dance; but far from impossible.</p>
<p>Traditional approaches handle experiments &ldquo;when there&rsquo;s time&rdquo; and some claim &ldquo;it takes too long&rdquo;, not understanding that they&rsquo;re creating AI legacy by doing that. AI development is far more fluid and experimental than traditional development. Unknowns happen way more often and some scramble for safety - which comes in the form of confidence. However, confidence is still being mistaken for competence.</p>
<p>There&rsquo;s also a trade-off here (hence being a dance): too much experimentation and there&rsquo;s a risk of low rate of productionalization, whereas too much engineering and there&rsquo;s a risk of AI legacy. The companies who release products in tune with this balance between science and engineering will come out on top.</p>
<p>But for now, in my role, I have to find what AI capabilities are there that can benefit our customers. And SIGIR was the perfect place for that. There&rsquo;s a lot to unpack in this topic which I&rsquo;ll keep writing about. Feel free to follow me <a href="/newsletter">here</a>.</p>
<p>Thanks to Karnov for enabling this learning opportunity at SIGIR 2025 in Padova.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Getting started with Task Groups</title>
      <link>http://pedromadruga.com/blog/airflow-taskgroup/</link>
      <pubDate>Sun, 22 Aug 2021 11:30:03 +0000</pubDate>
      <guid>http://pedromadruga.com/blog/airflow-taskgroup/</guid>
      <description>A simple pipeline with two groups of tasks, using the @taskgroup decorator of the TaskFlow API from Airflow 2.</description>
      <content:encoded><![CDATA[<h2 id="source-code">Source code</h2>
<p>The complete code is available <a href="https://github.com/pmadruga/airflow-dags/blob/main/taskgroup.py">here</a>.</p>
<h2 id="intro">Intro</h2>
<p>Before Task Groups in Airflow 2.0, Subdags were the go-to API to group tasks. With Airflow 2.0, SubDags <a href="https://www.astronomer.io/guides/subdags">are being relegated</a> and now replaced with the Task Group feature. The TaskFlow API is simple and allows for a proper code structure, favoring a clear separation of concerns.</p>
<p>What we&rsquo;re building today is a simple DAG with two groups of tasks, using the <code>@taskgroup</code> decorator from the TaskFlow API from Airflow 2. The graph view is:</p>
<p><img
    src="https://pedromadruga.com/posts/2021/08/taskgroup.png"
    alt="taskgroup - graphview"
    loading="lazy"
    decoding="async"
  /></p>
<p>What this pipeline does is different manipulations to a given initial value. The <code>init()</code> task instantiates a variable with the value <code>0</code>. It then passes to a group of subtasks (<code>group_1</code>) that manipulate that initial value. <code>group_2</code> will aggregate all the values into one. Finally, the <code>end()</code> subtask will print out the final result.</p>
<p>Let&rsquo;s get started by breaking the pipeline down into parts.</p>
<h2 id="grouping-tasks---breakdown">Grouping tasks - breakdown</h2>
<p>As you can see in the image above, there&rsquo;s an <code>init()</code> and <code>end()</code> task. In between, there are two groups of tasks, but let&rsquo;s start with the first and last task of the pipeline.</p>
<h3 id="init-task-and-end-task"><code>init()</code> task and <code>end()</code> task</h3>
<p><img
    src="https://pedromadruga.com/posts/2021/08/taskgroup_2.png"
    alt="taskgroup - init&#43;end"
    loading="lazy"
    decoding="async"
  /></p>
<p>The <code>init()</code> task is the starting point for this pipeline - it returns the initial value that will be manipulated throughout the pipeline: 0. The <code>end()</code> task will print out all the manipulations in the pipeline, to the console.</p>
<p>Let&rsquo;s look at the code for the <code>init()</code> task:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#a6e22e">@task</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">init</span>():
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">return</span> <span style="color:#ae81ff">0</span>
</span></span></code></pre></div><p>That&rsquo;s it. Now the code for the <code>end()</code> task:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#a6e22e">@task</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">end</span>(value):
</span></span><span style="display:flex;"><span>    print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#39;this is the end: </span><span style="color:#e6db74">{</span>value<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;</span>)
</span></span></code></pre></div><p>It&rsquo;s also quite simple to define the flow of the whole pipeline, returned by the function that wraps everything:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">return</span> end(group_2(group_1(init())))
</span></span></code></pre></div><p>Looking at the code above it&rsquo;s possible to see that:</p>
<ol>
<li>The <code>init()</code> function &ldquo;feeds&rdquo; <code>group_1</code>;</li>
<li>The result of <code>group_1</code> is &ldquo;sent&rdquo; to <code>group_2</code>;</li>
<li><code>end()</code> receives the outcome of <code>group_2</code>.</li>
</ol>
<h3 id="task-group-1-group_1">Task Group #1 (<code>group_1</code>)</h3>
<p><img
    src="https://pedromadruga.com/posts/2021/08/taskgroup_group_1.png"
    alt="taskgroup - group_1"
    loading="lazy"
    decoding="async"
  /></p>
<p><code>group_1</code> has a set of three tasks that manipulate the original number:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># This task group has three subtasks:</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># each subtask will perform an operation on the initial value</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># this group will return a list with all the values of the subtasks</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@task_group</span>(group_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;group_1&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">group_1</span>(value):
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># The @tasks below can be defined outside function `group_1`</span>
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># What matters is where they are referenced</span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">@task</span>(task_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;subtask_1&#39;</span>)
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">task_1</span>(value):
</span></span><span style="display:flex;"><span>    task_1_result <span style="color:#f92672">=</span> value <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> task_1_result
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">@task</span>(task_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;subtask_2&#39;</span>)
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">task_2</span>(value):
</span></span><span style="display:flex;"><span>    task_2_result <span style="color:#f92672">=</span> value <span style="color:#f92672">+</span> <span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> task_2_result
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">@task</span>(task_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;subtask_3&#39;</span>)
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">task_3</span>(value):
</span></span><span style="display:flex;"><span>    task_3_result <span style="color:#f92672">=</span> value <span style="color:#f92672">+</span> <span style="color:#ae81ff">3</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> task_3_result
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># tasks are referenced here</span>
</span></span><span style="display:flex;"><span>  task_1_result <span style="color:#f92672">=</span> task_1(value)
</span></span><span style="display:flex;"><span>  task_2_result <span style="color:#f92672">=</span> task_2(value)
</span></span><span style="display:flex;"><span>  task_3_result <span style="color:#f92672">=</span> task_3(value)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># sending this list to `group_2`</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">return</span> [task_1_result, task_2_result, task_3_result]
</span></span></code></pre></div><p>The <code>group_1</code> function receives the result from the <code>init()</code> task.
And notice what&rsquo;s being returned here: a list of the three values. Each of the value stems from <code>subtask_1</code>, <code>subtask_2</code> and <code>subtask_3</code>. This list of values is what&rsquo;s going to be sent to <code>group_2</code>.</p>
<h3 id="task-group-2-group_2">Task Group #2 (<code>group_2</code>)</h3>
<p><img
    src="https://pedromadruga.com/posts/2021/08/taskgroup_group_2.png"
    alt="taskgroup - group_2"
    loading="lazy"
    decoding="async"
  /></p>
<p><code>group_2</code> is rather simple. It receives the list sent from <code>group_1</code> and sums all values (<code>subtask_4</code> does it) and then <code>subtask_5</code> just multiplies by two the result from task_4:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#a6e22e">@task_group</span>(group_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;group_2&#39;</span>)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">group_2</span>(list):
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">@task</span>(task_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;subtask_4&#39;</span>)
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">task_4</span>(values):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> sum(values)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#a6e22e">@task</span>(task_id<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;subtask_5&#39;</span>)
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">task_5</span>(value):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> value<span style="color:#f92672">*</span><span style="color:#ae81ff">2</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># task_4 will sum the values of the list sent by group_1</span>
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># task_5 will multiply it by two.</span>
</span></span><span style="display:flex;"><span>  task_5_result <span style="color:#f92672">=</span> task_5(task_4(list))
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#66d9ef">return</span> task_5_result
</span></span></code></pre></div><p>That&rsquo;s it - the next task is the <code>end()</code>, and it has been handled before in this post. Again, it&rsquo;s possible to see the full code <a href="https://github.com/pmadruga/airflow-dags/blob/main/taskgroup.py">here</a>.</p>
<p>If you check the log of the <code>end()</code> task (see my previous post to know how to check for task logs), you&rsquo;ll see the result printed. The final result should be <code>12</code>.</p>
<p>Success! 🎉</p>
<h2 id="conclusion">Conclusion</h2>
<p>Creating task groups in Airflow 2 is easy - it removes complexity that existed before and allows creating pipelines with clean code.</p>
]]></content:encoded>
    </item>
    <item>
      <title>A simple DAG using Airflow 2.0</title>
      <link>http://pedromadruga.com/blog/airflow2-simple-dag/</link>
      <pubDate>Thu, 05 Aug 2021 00:00:00 +0000</pubDate>
      <guid>http://pedromadruga.com/blog/airflow2-simple-dag/</guid>
      <description>Airflow 2.x is a game-changer, especially regarding its simplified syntax using the new Taskflow API. In this tutorial, we&amp;#39;re building a DAG with only two tasks. The DAG&amp;#39;s tasks include generating a random number (task 1) and print that number (task 2).</description>
      <content:encoded><![CDATA[<h2 id="intro">Intro</h2>
<h3 id="background">Background</h3>
<p>This blog post is part of a series where an <a href="https://pedromadruga.com/posts/etl-series/">entire ETL pipeline</a> is built using Airflow 2.0&rsquo;s newest syntax and Raspberry Pis. It entails knowledge of some terms, so <a href="https://www.astronomer.io/guides/intro-to-airflow">here&rsquo;s</a> a great place to refresh memory. Also, check my previous post on how to install <a href="https://pedromadruga.com/posts/airflow-install/">Airflow 2 on a Raspberry Pi</a>.</p>
<p>The full code is on <a href="https://github.com/pmadruga/airflow-dags/blob/main/simplest.py">Github</a>.</p>
<h2 id="create-a-dag-definition-file">Create a DAG definition file</h2>
<p>We&rsquo;ll start by creating a DAG definition file inside the <code>airflow/dags</code> folder:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>touch example.py
</span></span></code></pre></div><p>Let&rsquo;s populate it by adding a DAG.</p>
<h2 id="dag-breakdown">DAG breakdown</h2>
<h3 id="the-dag-decorator">The <code>@dag</code> decorator</h3>
<p>A DAG has tasks. In this example, it has two tasks where one is dependent on the result of the other. For this, we&rsquo;ll be using the newest airflow decorators: <code>@dag</code> and <code>@task</code>.</p>
<p>We start by defining the DAG and its parameters. We&rsquo;ll determine the interval in which the set of tasks should run (<code>schedule_interval</code>) and the start date (<code>start_date</code>). Of course, there are other parameters to chose from, but we&rsquo;ll keep the scope to the minimum here.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Use the DAG decorator from Airflow</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">## `schedule_interval=&#39;@daily` means the DAG will run every day at midnight.</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e">## It&#39;s possible to set the schedule_interval to None (without quotes).</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@dag</span>(schedule_interval<span style="color:#f92672">=</span><span style="color:#e6db74">&#39;@daily&#39;</span>, start_date<span style="color:#f92672">=</span>days_ago(<span style="color:#ae81ff">2</span>))
</span></span><span style="display:flex;"><span><span style="color:#75715e"># The function name will be the ID of the DAG.</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># In this case, it&#39;s called `EXAMPLE_simple`.</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># In this case it&#39;s called `EXAMPLE_simple`.</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">EXAMPLE_simple</span>():
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Placeholder for the tasks inside the DAG</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># ...</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># run the dag</span>
</span></span><span style="display:flex;"><span>dag <span style="color:#f92672">=</span> EXAMPLE_simple()
</span></span></code></pre></div><p>Notice the <code>@dag</code> decorator on top of the function <code>EXAMPLE_simple</code>. The function name will also be the DAG id. In the end, we just run the function of the DAG.</p>
<h3 id="the-task-decorator">The <code>@task</code> decorator</h3>
<p>Now that the <code>@dag</code> wrapper is settled, we need to define the two tasks inside. Remember, this DAG has two tasks: <code>task_1</code> generates a random number and <code>task_2</code> receives the result of the first task and prints it, like the following:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># ...</span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">@task</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">task_1</span>():
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> random()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">@task</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">task_2</span>(value):
</span></span><span style="display:flex;"><span>        print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#39;The randomly generated number is </span><span style="color:#e6db74">{</span>value<span style="color:#e6db74">}</span><span style="color:#e6db74"> .&#39;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># This will determine the direction of the tasks.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># As you can see, task_2 runs after task_1 is done.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Task_2 then uses the result from task_1.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> task_2(task_1())
</span></span><span style="display:flex;"><span><span style="color:#75715e"># ...</span>
</span></span></code></pre></div><p>Visually, the DAG graph view will look like this:</p>
<p><img
    src="https://pedromadruga.com/posts/simple_dag_1.png"
    alt="Graph View"
    loading="lazy"
    decoding="async"
  /></p>
<p>The code before and after refers to the <code>@dag</code> operator and the dependencies. Next, we&rsquo;ll put everything together:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> airflow.decorators <span style="color:#f92672">import</span> dag, task
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> airflow.utils.dates <span style="color:#f92672">import</span> days_ago
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> random <span style="color:#f92672">import</span> random
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Use the DAG decorator from Airflow</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># `schedule_interval=&#39;@daily` means the DAG will run everyday at midnight.</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># It&#39;s possible to set the schedule_interval to None (without quotes).</span>
</span></span><span style="display:flex;"><span><span style="color:#a6e22e">@dag</span>(schedule_interval<span style="color:#f92672">=</span><span style="color:#66d9ef">None</span>, start_date<span style="color:#f92672">=</span>days_ago(<span style="color:#ae81ff">2</span>), catchup<span style="color:#f92672">=</span><span style="color:#66d9ef">False</span>)
</span></span><span style="display:flex;"><span><span style="color:#75715e"># The function name will be the ID of the DAG.</span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># In this case it&#39;s called `EXAMPLE_simple`.</span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">EXAMPLE_simple</span>():
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">@task</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">task_1</span>():
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># Generate a random number</span>
</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">return</span> random()
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#a6e22e">@task</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">def</span> <span style="color:#a6e22e">task_2</span>(value):
</span></span><span style="display:flex;"><span>        <span style="color:#75715e"># Print the random number to the logs</span>
</span></span><span style="display:flex;"><span>        print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#39;The randomly generated number is </span><span style="color:#e6db74">{</span>value<span style="color:#e6db74">}</span><span style="color:#e6db74"> .&#39;</span>)
</span></span><span style="display:flex;"><span>        print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#39;The randomly generated number is </span><span style="color:#e6db74">{</span>value<span style="color:#e6db74">}</span><span style="color:#e6db74">.&#39;</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># This will determine the direction of the tasks.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># As you can see, task_2 runs after task_1 is done.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># Task_2 then uses the result from task_1.</span>
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> task_2(task_1())
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>dag <span style="color:#f92672">=</span> EXAMPLE_simple()
</span></span></code></pre></div><p>That&rsquo;s it. Let&rsquo;s run this.</p>
<h2 id="running-the-dag">Running the DAG</h2>
<p>Once the DAG definition file is created, and inside the <code>airflow/dags</code> folder, it should appear in the list. Now we need to unpause the DAG and trigger it if we want to run it right away. There are two options to unpause and trigger the DAG: we can use Airflow webserver&rsquo;s UI or the terminal. Let&rsquo;s handle both.</p>
<h3 id="run-via-ui">Run via UI</h3>
<p>First, you should see the DAG on the list:</p>
<p><img
    src="https://pedromadruga.com/posts/simple_dag_3.png"
    alt="simple dag"
    loading="lazy"
    decoding="async"
  /></p>
<p>In this example, I&rsquo;ve run the DAG before (hence some columns already have values), but you should have a clean slate.</p>
<p>Now we enable the DAG (1) and trigger it (2), so it can run right away:</p>
<p><img
    src="https://pedromadruga.com/posts/simple_dag_4.png"
    alt="simple dag"
    loading="lazy"
    decoding="async"
  /></p>
<p>Click the DAG ID (in this case, called <code>EXAMPLE_simple</code>), and you&rsquo;ll see the Tree View. Having triggered a new run, you&rsquo;ll see that the DAG is running:</p>
<p><img
    src="https://pedromadruga.com/posts/simple_dag_5.png"
    alt="simple dag"
    loading="lazy"
    decoding="async"
  /></p>
<p>Heading over to the <strong>Graph View</strong>, we can see that both tasks ran successfully 🎉:</p>
<p><img
    src="https://pedromadruga.com/posts/simple_dag_6.png"
    alt="Graph View"
    loading="lazy"
    decoding="async"
  /></p>
<p>But what about the printed output of <code>task_2</code>, which shows a randomly generated number? We can check that in the logs.</p>
<h4 id="checking-the-logs-via-ui">Checking the logs via UI</h4>
<p>Inside <strong>Graph View</strong>, click on <code>task_2</code>, and click <code>Log</code>.</p>
<p><img
    src="https://pedromadruga.com/posts/simple_dag_7.png"
    alt="task_2 details"
    loading="lazy"
    decoding="async"
  /></p>
<p>It&rsquo;s possible to see the output of the task:</p>
<p><img
    src="https://pedromadruga.com/posts/simple_dag_8.png"
    alt="task_2 output log"
    loading="lazy"
    decoding="async"
  /></p>
<p>Success again 🎉!</p>
<h3 id="run-via-terminal">Run via terminal</h3>
<p>An alternative to the UI, when it comes to unpause and trigger and DAG, is straightforward. Knowing the ID of the DAG, then all we need is:
An alternative to the UI, when it comes to unpause and trigger a DAG, is straightforward. Knowing the ID of the DAG, then all we need is:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>airflow dags unpause EXAMPLE_simple <span style="color:#f92672">&amp;&amp;</span> airflow dags trigger EXAMPLE_simple
</span></span></code></pre></div><h4 id="checking-the-logs-via-terminal">Checking the logs via terminal</h4>
<p>Assuming your airflow installation is in the <code>$HOME</code> directory, it&rsquo;s possible to check the logs by doing:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>cd ~/airflow/logs/EXAMPLE_simple/task_2
</span></span></code></pre></div><p>And select the correct timestamp (in my case it was):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>tail 2021-08-06T14:57:35.762094+00:00/1.log
</span></span></code></pre></div><p>Where the output should include:</p>
<p><img
    src="/posts/simple_dag_9.png"
    alt="DAG log"
    loading="lazy"
    decoding="async"
  /></p>
<p>Followed by the actual number we&rsquo;ve generated in this run.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This is a beginner&rsquo;s friendly DAG, using the new Taskflow API in Airflow 2.0. It&rsquo;s possible to create a simple DAG without too much code. In the next post of the series, we&rsquo;ll create parallel tasks using the <code>@task_group</code> decorator.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Install Airflow 2 on a Raspberry Pi (using Python 3.x)</title>
      <link>http://pedromadruga.com/blog/airflow-install/</link>
      <pubDate>Wed, 21 Jul 2021 11:30:03 +0000</pubDate>
      <guid>http://pedromadruga.com/blog/airflow-install/</guid>
      <description>&lt;p&gt;Airflow is a tool commonly used for Data Engineering. It&amp;rsquo;s great to orchestrate workflows. Version 2 of Airflow only supports Python 3+ versions, so we need to make sure that we use Python 3 to install it. We could probably install this on another Linux distribution, too.&lt;/p&gt;
&lt;p&gt;This is the first post of a series, where we&amp;rsquo;ll build an &lt;strong&gt;entire Data Engineering pipeline&lt;/strong&gt;. To follow this series, just &lt;strong&gt;subscribe to the &lt;a href=&#34;https://pedromadruga.com/newsletter&#34;&gt;newsletter&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;</description>
      <content:encoded><![CDATA[<p>Airflow is a tool commonly used for Data Engineering. It&rsquo;s great to orchestrate workflows. Version 2 of Airflow only supports Python 3+ versions, so we need to make sure that we use Python 3 to install it. We could probably install this on another Linux distribution, too.</p>
<p>This is the first post of a series, where we&rsquo;ll build an <strong>entire Data Engineering pipeline</strong>. To follow this series, just <strong>subscribe to the <a href="https://pedromadruga.com/newsletter">newsletter</a></strong>.</p>
<h2 id="install-dependencies">Install dependencies</h2>
<p>Let&rsquo;s make sure our OS is up-to-date.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>sudo apt-get update -y
</span></span><span style="display:flex;"><span>sudo apt-get upgrade -y
</span></span></code></pre></div><p>Now, we&rsquo;ll install Python 3.x and Pip on the Raspberry Pi.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>sudo apt-get install python3 python3-pip
</span></span></code></pre></div><p>Airflow relies on <strong>numpy</strong>, which has its own dependencies. We&rsquo;ll address that by installing the necessary dependencies:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>sudo apt-get install python-dev libatlas-base-dev
</span></span></code></pre></div><p>We also need to ensure Airflow installs using Python3 and Pip3, so we&rsquo;ll set an alias for both. To do this, edit the <code>~/.bashrc</code> by adding:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>alias python<span style="color:#f92672">=</span><span style="color:#66d9ef">$(</span>which python3<span style="color:#66d9ef">)</span>
</span></span><span style="display:flex;"><span>alias pip<span style="color:#f92672">=</span>pip3
</span></span></code></pre></div><p>Alternatively, you can install using <code>pip3</code> directly. For this tutorial, we&rsquo;ll assume aliases are in use.</p>
<h2 id="install-airflow">Install Airflow</h2>
<h3 id="create-folders">Create folders</h3>
<p>We need a placeholder to install Airflow.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>cd ~/
</span></span><span style="display:flex;"><span>mkdir airflow
</span></span></code></pre></div><h3 id="install-airflow-package">Install Airflow package</h3>
<p>Finally, we can install Airflow safely. We start by defining the airflow and python versions to have the correct constraint URL. The constraint URL ensures that we&rsquo;re installing the correct airflow version for the correct python version.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># set airflow version</span>
</span></span><span style="display:flex;"><span>AIRFLOW_VERSION<span style="color:#f92672">=</span>2.1.2
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># determine the correct python version</span>
</span></span><span style="display:flex;"><span>PYTHON_VERSION<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;</span><span style="color:#66d9ef">$(</span>python --version | cut -d <span style="color:#e6db74">&#34; &#34;</span> -f <span style="color:#ae81ff">2</span> | cut -d <span style="color:#e6db74">&#34;.&#34;</span> -f 1-2<span style="color:#66d9ef">)</span><span style="color:#e6db74">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># build the constraint URL</span>
</span></span><span style="display:flex;"><span>CONSTRAINT_URL<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;https://raw.githubusercontent.com/apache/airflow/constraints-</span><span style="color:#e6db74">${</span>AIRFLOW_VERSION<span style="color:#e6db74">}</span><span style="color:#e6db74">/constraints-</span><span style="color:#e6db74">${</span>PYTHON_VERSION<span style="color:#e6db74">}</span><span style="color:#e6db74">.txt&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># install airflow</span>
</span></span><span style="display:flex;"><span>pip install <span style="color:#e6db74">&#34;apache-airflow==</span><span style="color:#e6db74">${</span>AIRFLOW_VERSION<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span> --constraint <span style="color:#e6db74">&#34;</span><span style="color:#e6db74">${</span>CONSTRAINT_URL<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>
</span></span></code></pre></div><h3 id="initialize-database">Initialize database</h3>
<p>Before running Airflow, we need to initialize the database. There are several different options for this setup: 1) running Airflow against a separate database and 2) running a simple SQLite database. The SQLite database is in use in this tutorial, so there&rsquo;s not much to do other than initializing the database.</p>
<p>So let&rsquo;s initialize it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>airflow db init
</span></span></code></pre></div><h2 id="run-airflow">Run Airflow</h2>
<p>It&rsquo;s now possible to run both the server and the scheduler:</p>
<pre tabindex="0"><code>airflow webserver -p 8080 &amp; airflow scheduler
</code></pre><p>Now open <code>http://localhost:8080</code> on a browser. If you need to log in, you&rsquo;ll need to create a new user. Here&rsquo;s an example:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>airflow users create <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    --username admin <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    --firstname Peter <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    --lastname Parker <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    --role Admin <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span>    --email spiderman@superhero.org
</span></span></code></pre></div><p>Once authenticated, it&rsquo;s now possible to see the main screen:</p>
<p><img
    src="/posts/airflow1.png"
    alt="Airflow main"
    loading="lazy"
    decoding="async"
  /></p>
<p><strong>And that&rsquo;s it - you&rsquo;ve now installed Airflow!</strong> Optionally, you can take extra steps.</p>
<h2 id="optional">Optional</h2>
<h3 id="start-airflow-automatically">Start airflow automatically</h3>
<p>In order to start both the webserver and the scheduler automatically on system boot, we&rsquo;ll need three files: <code>airflow-webserver.service</code>, <code>airflow-scheduler.service</code>, and an <code>environment</code> file. Let&rsquo;s break this into parts:</p>
<ol>
<li>
<p>Go to <a href="https://github.com/apache/airflow/tree/master/scripts/systemd">Airflow&rsquo;s github repo</a> and download the <code>airflow-webserver.service</code> and the <code>airflow-scheduler.service</code></p>
</li>
<li>
<p>Paste them on the <code>/etc/systemd/system</code> folder.</p>
</li>
<li>
<p>Edit both files. Firstly, <code>airflow-webserver.service</code> should look like:</p>
</li>
</ol>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#f92672">[</span>Unit<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>Description<span style="color:#f92672">=</span>Airflow webserver daemon
</span></span><span style="display:flex;"><span>After<span style="color:#f92672">=</span>network.target postgresql.service mysql.service redis.service rabbitmq-server.service
</span></span><span style="display:flex;"><span>Wants<span style="color:#f92672">=</span>postgresql.service mysql.service redis.service rabbitmq-server.service
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>Service<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>EnvironmentFile<span style="color:#f92672">=</span>/home/pi/airflow/env
</span></span><span style="display:flex;"><span>User<span style="color:#f92672">=</span>pi
</span></span><span style="display:flex;"><span>Group<span style="color:#f92672">=</span>pi
</span></span><span style="display:flex;"><span>Type<span style="color:#f92672">=</span>simple
</span></span><span style="display:flex;"><span>ExecStart<span style="color:#f92672">=</span>/bin/bash -c <span style="color:#e6db74">&#39;airflow webserver --pid /home/pi/airflow/webserver.pid&#39;</span>
</span></span><span style="display:flex;"><span>Restart<span style="color:#f92672">=</span>on-failure
</span></span><span style="display:flex;"><span>RestartSec<span style="color:#f92672">=</span>5s
</span></span><span style="display:flex;"><span>PrivateTmp<span style="color:#f92672">=</span>true
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>Install<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>WantedBy<span style="color:#f92672">=</span>multi-user.target
</span></span></code></pre></div><p>Now moving on to edit <code>airflow-scheduler.service</code> file, which should look like:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#f92672">[</span>Unit<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>Description<span style="color:#f92672">=</span>Airflow scheduler daemon
</span></span><span style="display:flex;"><span>After<span style="color:#f92672">=</span>network.target postgresql.service mysql.service redis.service rabbitmq-server.service
</span></span><span style="display:flex;"><span>Wants<span style="color:#f92672">=</span>postgresql.service mysql.service redis.service rabbitmq-server.service
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>Service<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>EnvironmentFile<span style="color:#f92672">=</span>/home/pi/airflow/env
</span></span><span style="display:flex;"><span>User<span style="color:#f92672">=</span>pi
</span></span><span style="display:flex;"><span>Group<span style="color:#f92672">=</span>pi
</span></span><span style="display:flex;"><span>Type<span style="color:#f92672">=</span>simple
</span></span><span style="display:flex;"><span>ExecStart<span style="color:#f92672">=</span>/bin/bash -c <span style="color:#e6db74">&#39;airflow scheduler&#39;</span>
</span></span><span style="display:flex;"><span>Restart<span style="color:#f92672">=</span>always
</span></span><span style="display:flex;"><span>RestartSec<span style="color:#f92672">=</span>5s
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f92672">[</span>Install<span style="color:#f92672">]</span>
</span></span><span style="display:flex;"><span>WantedBy<span style="color:#f92672">=</span>multi-user.target
</span></span></code></pre></div><p>Notice that the <code>user</code> and <code>Group</code> have changed, as well as the <code>ExecStart</code>. You&rsquo;ll also notice that there&rsquo;s an <code>EnvironmentFile</code> that hasn&rsquo;t been created yet. That&rsquo;s what we&rsquo;ll do now.</p>
<ol start="4">
<li>Create an environment file. You can call it any name. I chose to call it <code>env</code> and placed it on the <code>/home/pi/airflow</code> folder. In other words:</li>
</ol>
<pre tabindex="0"><code>cd ~/airflow
touch env
</code></pre><p>Edit the <code>env</code> file and place the contents:</p>
<pre tabindex="0"><code>AIRFLOW_CONFIG=/home/pi/airflow/airflow.cfg
AIRFLOW_HOME=/home/pi/airflow/
</code></pre><ol start="5">
<li>Lastly, let&rsquo;s reload the system daemons:</li>
</ol>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>sudo systemctl daemon-reload
</span></span><span style="display:flex;"><span>sudo systemctl enable airflow-webserver.service
</span></span><span style="display:flex;"><span>sudo systemctl enable airflow-scheduler.service
</span></span><span style="display:flex;"><span>sudo systemctl start airflow-webserver.service
</span></span><span style="display:flex;"><span>sudo systemctl start airflow-scheduler.service
</span></span></code></pre></div><h2 id="thats-it-whats-next">That&rsquo;s it! What&rsquo;s next?</h2>
<p>In the next blog post of this Data Engineering series, we&rsquo;ll create our first Directed Acyclic Graph (DAG) using Airflow. Subscribe to the <a href="https://pedromadruga.com">newsletter</a>, and don&rsquo;t miss out!</p>
<h2 id="sources">Sources</h2>
<ol>
<li><a href="https://airflow.apache.org/docs/apache-airflow/stable/installation.html">https://airflow.apache.org/docs/apache-airflow/stable/installation.html</a></li>
<li><a href="https://medium.com/the-kickstarter/apache-airflow-running-on-a-raspberry-pi-2e061f6c3655">https://medium.com/the-kickstarter/apache-airflow-running-on-a-raspberry-pi-2e061f6c3655</a></li>
<li><a href="http://www.thecrustyengineer.com/home/post/setting_up_airflow_on_a_raspberry_pi_4_part_1">http://www.thecrustyengineer.com/home/post/setting_up_airflow_on_a_raspberry_pi_4_part_1</a></li>
</ol>
]]></content:encoded>
    </item>
    <item>
      <title>Machine Learning Techniques applied to Heart Rate Variability - DTU Course final report</title>
      <link>http://pedromadruga.com/blog/machine-learning-heart-rate/</link>
      <pubDate>Tue, 31 Dec 2019 00:00:00 +0000</pubDate>
      <guid>http://pedromadruga.com/blog/machine-learning-heart-rate/</guid>
      <description>A 3-month long report where Supervised and Unsupervised Learning techniques were applied to a dataset that has a set of features, including Heart Rate Variability. Data stems from two years of health data from an Apple Watch ⌚️. Includes the final report for the Machine Learning course of Technical University of Denmark (DTU).</description>
      <content:encoded><![CDATA[<h2 id="introduction">Introduction</h2>
<p>Heart Rate Variability (HRV) is a way to measure the variation in time between each heartbeat. This variation is a measure of how the heart reacts to physical exercise, mental stress, and heart diseases, directly linked to an increased risk of mortality.</p>
<p>It has its origin in neurons from the parasympathetic, sympathetic nervous system, and vagus nerve. Evidence suggests that HRV is impacted by stress, specifically due to higher levels of stress resulting in a lower HRV.</p>
<p>While stress (and its causes and effects) is a known research topic, it&rsquo;s also more accessible due to the widespread usage of wearables that allow the collection of HRV data. The combination of the possibility of stress analysis from HRV and easy access to data makes this the main focus of the present report, determining whether machine learning techniques can help minimalizing generalization errors.</p>
<p>This report is structured into three main parts: data analysis, supervised, and unsupervised learning. All three parts revolve around predicting and/or clustering HRV values.</p>
<h2 id="report">Report</h2>
<p><a href="https://raw.githubusercontent.com/pmadruga/ml_project/master/dist/report.pdf?token=AA3TGZGN2CDGQNBFJEUW4M3BDZHHW">Download full report (41 pages)</a></p>
<h2 id="code">Code</h2>
<ol>
<li><a href="https://github.com/pmadruga/ml_project/blob/master/books/data_preparation.ipynb">Data Preparation + Principal Component Analysis</a></li>
<li><a href="https://github.com/pmadruga/ml_project/blob/master/books/Classification.ipynb">Supervised learning - classification (<strong>baseline, logistic regression, neural network</strong>)</a></li>
<li><a href="https://github.com/pmadruga/ml_project/blob/master/books/regression%20-%20part%20A.ipynb">Supervised learning - regression (<strong>linear regression, neural network</strong>)</a></li>
<li><a href="https://github.com/pmadruga/ml_project/blob/master/books/New%20Unsupervised.ipynb">Unsupervised learning - <strong>Agglomerated Hierarquical Clustering, Gaussian Mixture Model, Anomaly Detection, Aprior Association</strong></a></li>
</ol>
]]></content:encoded>
    </item>
  </channel>
</rss>
