<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: DanLee</title>
    <description>The latest articles on Forem by DanLee (@dwl285).</description>
    <link>https://forem.com/dwl285</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F186510%2F8072385b-d0c6-4ea9-b224-f66df623c613.jpeg</url>
      <title>Forem: DanLee</title>
      <link>https://forem.com/dwl285</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/dwl285"/>
    <language>en</language>
    <item>
      <title>Three tables every analyst needs</title>
      <dc:creator>DanLee</dc:creator>
      <pubDate>Wed, 26 Jun 2019 12:49:04 +0000</pubDate>
      <link>https://forem.com/dwl285/three-tables-every-analyst-needs-1hnk</link>
      <guid>https://forem.com/dwl285/three-tables-every-analyst-needs-1hnk</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;If you're one of the first analysts joining a company, you will probably arrive to find an overwhelming amount of data. You'll almost certainly have some relational data (probably powering your application's backend), and likely have event data too (possibly through a third party like &lt;a href="https://segment.com"&gt;Segment&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;This can make it difficult to know where to start with each analytical request. You might be tempted to create a new dataset for each analysis, but this will add to the amount of data and quickly becomes difficult to manage.&lt;/p&gt;

&lt;p&gt;Fortunately, &lt;strong&gt;you can gather all of your data into three tables&lt;/strong&gt; that can then be used to answer the majority of user related questions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The SQL snippets in this post apply to Google BigQuery but can easily be adapted for other data warehouses.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Table 1: User Stats
&lt;/h3&gt;

&lt;p&gt;To start, &lt;strong&gt;you need a table that tells you all about your users&lt;/strong&gt;. This table will contain dimensions (e.g. user account creation timestamp) and metrics (e.g. user lifetime transaction count). It might be simple to begin with, but as your company grows and gets more complex, you’ll keep adding to this table.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Key point: &lt;code&gt;user_stats&lt;/code&gt; has one row per user&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_stats&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; user_id &lt;/th&gt;
      &lt;th&gt; created_at &lt;/th&gt;
      &lt;th&gt; last_seen_at &lt;/th&gt;
      &lt;th&gt; transactions_last_7d &lt;/th&gt;
      &lt;th&gt; transactions_lifetime &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;32356543 &lt;/td&gt;
      &lt;td&gt;2019-01-01 14:36:51 &lt;/td&gt;
      &lt;td&gt;2019-03-14 19:55:10 &lt;/td&gt;
      &lt;td&gt;2 &lt;/td&gt;
      &lt;td&gt;13 &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;64763654 &lt;/td&gt;
      &lt;td&gt;2019-02-01 10:30:18 &lt;/td&gt;
      &lt;td&gt;2019-02-08 10:02:32 &lt;/td&gt;
      &lt;td&gt;0 &lt;/td&gt;
      &lt;td&gt;4 &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;35795445 &lt;/td&gt;
      &lt;td&gt;2019-03-03 23:00:04 &lt;/td&gt;
      &lt;td&gt;2019-03-03 23:03:07 &lt;/td&gt;
      &lt;td&gt;0 &lt;/td&gt;
      &lt;td&gt;0 &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;98765446 &lt;/td&gt;
      &lt;td&gt;2019-03-06 14:36:31 &lt;/td&gt;
      &lt;td&gt;2019-03-15 08:20:22 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
      &lt;td&gt;2 &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  What kind of questions can this table be used to answer?
&lt;/h4&gt;

&lt;p&gt;How many users signed up this week?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;user_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;user_stats&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMP_SUB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;current_timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;How many users have made at least 5 transactions?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;user_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;user_stats&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
  &lt;span class="n"&gt;transactions_lifetime&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;How many users have logged in within the past 24 hours?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;user_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;user_stats&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
  &lt;span class="n"&gt;last_seen_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMP_SUB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;current_timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="n"&gt;HOUR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Table 2: Daily user stats
&lt;/h3&gt;

&lt;p&gt;The need for this table might not be so obvious, but it turns out that having &lt;strong&gt;a table describing each user’s activity every single day since they signed up&lt;/strong&gt; (even if they were not active at all on that day) is really useful. There are four dimensions you’ll definitely want in this table:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;user_id&lt;/code&gt;: a unique identifier for your users, that you can link back to &lt;code&gt;user_stats&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;date&lt;/code&gt;: what date does this row correspond to&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;day_n&lt;/code&gt;: what day (in the lifetime of the user) does this date represent. On the day a user signs up &lt;code&gt;day_n = 1&lt;/code&gt;, the day after sign up &lt;code&gt;day_n = 2&lt;/code&gt; etc&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;was_active&lt;/code&gt;: was the user active on this day? (Exactly how you choose to define active should depend on the specifics of your business. If you are creating a messaging app, you may define “active” as having sent at least one message, for example)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As with &lt;code&gt;user_stats&lt;/code&gt;, this table will probably end up with many more than these 4 fields as your business becomes more complex.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Key point: &lt;code&gt;daily_user_stats&lt;/code&gt; has one row per user per day (for all days since they signed up)&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;daily_user_stats&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;day_n&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; user_id &lt;/th&gt;
      &lt;th&gt; date &lt;/th&gt;
      &lt;th&gt; day_n &lt;/th&gt;
      &lt;th&gt; was_active &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;32356543 &lt;/td&gt;
      &lt;td&gt;2019-03-01 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;32356543 &lt;/td&gt;
      &lt;td&gt;2019-03-02 &lt;/td&gt;
      &lt;td&gt;2 &lt;/td&gt;
      &lt;td&gt;0 &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;32356543 &lt;/td&gt;
      &lt;td&gt;2019-03-03 &lt;/td&gt;
      &lt;td&gt;3 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;35795445 &lt;/td&gt;
      &lt;td&gt;2019-03-02 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;35795445 &lt;/td&gt;
      &lt;td&gt;2019-03-03 &lt;/td&gt;
      &lt;td&gt;2 &lt;/td&gt;
      &lt;td&gt;0 &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;98765446 &lt;/td&gt;
      &lt;td&gt;2019-03-03 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  What kind of questions can this table be used to answer?
&lt;/h4&gt;

&lt;p&gt;What percentage of users are active one week after they signed up?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;was_active&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;active_user_ratio&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;daily_users_stats&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
  &lt;span class="n"&gt;day_n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;How has the percentage of users active the day after they sign up trended over time?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;created_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;AVERAGE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;was_active&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;active_user_ratio&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;daily_user_stats&lt;/span&gt;
  &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;user_stats&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;daily_user_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
  &lt;span class="n"&gt;day_n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;created_date&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;What does our &lt;a href="https://www.sequoiacap.com/article/retention"&gt;user retention curve&lt;/a&gt; look like?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;day_n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;AVERAGE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;was_active&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;active_user_ratio&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;daily_user_stats&lt;/span&gt;
  &lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;user_stats&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;daily_user_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
  &lt;span class="n"&gt;user_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMP_SUB&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;current_timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="k"&gt;DAY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;day_n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Table 3: Sessions
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;sessions&lt;/code&gt; table is a place to store your event data. Events are user interactions on your app or site (e.g. button clicks, form submissions, message sends etc). You probably track many different events, and finding a consistent approach to analysing them can be tricky. Sessions is a table to organise the chaos!&lt;/p&gt;

&lt;p&gt;A session is a collection of events that represent one end to end interaction with your product: user opens app &amp;gt; user opens product abc page &amp;gt; user clicks add to cart &amp;gt; user closes app. How you combine a user’s events into a session depends on your product, but a standard approach is to end a session after a period of 30 minutes of inactivity. Any subsequent events from that user will be added to a new session.&lt;/p&gt;

&lt;p&gt;Using some &lt;a href="https://gist.github.com/lewish/a37ecb423a45ccf861373d1c942a0ea5"&gt;relatively complex SQL&lt;/a&gt;, you can split your raw events into sessions. Your sessions table should at least have these dimensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;user_id&lt;/code&gt;: unique user identifier&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;session_id&lt;/code&gt;: Starting at 1 (the user's first ever session), incrementing for each further session&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;start_time&lt;/code&gt;: timestamp of first event in session&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;end_time&lt;/code&gt;: timestamp of last event in session&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;events&lt;/code&gt;: ordered list/array of all events in the session &lt;em&gt;(this is well supported in BigQuery, other query languages may need to approach this slightly differently, for example using JSON)&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Key point: &lt;code&gt;sessions&lt;/code&gt; has one row for each session a user has had&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; user_id &lt;/th&gt;
      &lt;th&gt; session_id &lt;/th&gt;
      &lt;th&gt; start_time &lt;/th&gt;
      &lt;th&gt; end_time &lt;/th&gt;
      &lt;th&gt; events &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;32356543 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
      &lt;td&gt;2019-03-14 19:55:10 &lt;/td&gt;
      &lt;td&gt;2019-03-14 19:58:15 &lt;/td&gt;
      &lt;td&gt;[opened_app, clicked_product, closed_app]&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;32356543 &lt;/td&gt;
      &lt;td&gt;2 &lt;/td&gt;
      &lt;td&gt;2019-03-16 13:45:10 &lt;/td&gt;
      &lt;td&gt;2019-03-16 13:51:15 &lt;/td&gt;
      &lt;td&gt;[opened_app, scrolled_list, closed_app]&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;23501294 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
      &lt;td&gt;2019-03-18 16:13:10 &lt;/td&gt;
      &lt;td&gt;2019-03-18 16:14:15 &lt;/td&gt;
      &lt;td&gt;[opened_app, ..., closed_app]&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;23501294 &lt;/td&gt;
      &lt;td&gt;1 &lt;/td&gt;
      &lt;td&gt;2019-03-19 19:55:10 &lt;/td&gt;
      &lt;td&gt;2019-03-19 19:58:15 &lt;/td&gt;
      &lt;td&gt;[opened_app, ..., closed_app]&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
      &lt;td&gt;... &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  What kind of questions can this table be used to answer?
&lt;/h4&gt;

&lt;p&gt;How has average session length changed over time?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;session_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TIMESTAMP_DIFF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;session_length&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;sessions&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; 
  &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;What share of sessions contain a transaction?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;session_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARRAY_TO_STRING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'-'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'%transaction_complete%'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
         &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;transaction_ratio&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;sessions&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; 
  &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;What chain of events lead to a user leaving the app and never returning?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ARRAY_REVERSE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="n"&gt;SAFE_OFFSET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_event_offset_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ARRAY_REVERSE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="n"&gt;SAFE_OFFSET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_event_offset_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;ARRAY_REVERSE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="n"&gt;SAFE_OFFSET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;last_event_offset_0&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;sessions&lt;/span&gt;
  &lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;user_stats&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
  &lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;user_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sessions_all_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;These three well structured tables can help answer most questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;user_stats&lt;/code&gt;&lt;/strong&gt;: who is in my user base&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;daily_user_stats&lt;/code&gt;&lt;/strong&gt;: how many active users do I have, and how well do they retain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;sessions&lt;/code&gt;&lt;/strong&gt;: what journey are users taking through my product&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you've written the queries to combine your other datasets into these three simple tables you should set up a schedule to keep them up to date with any new changes. You'll soon start relying on these tables for a large share of your analytics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dataform.co/developers?utm_medium=organic&amp;amp;utm_source=dev_to&amp;amp;utm_campaign=3_tables"&gt;Dataform&lt;/a&gt; provides a fully managed, collaborative workspace to set up schedules and manage your data warehouse. For more information on Dataform check out our &lt;a href="https://docs.dataform.co/?utm_medium=organic&amp;amp;utm_source=dev_to&amp;amp;utm_campaign=3_tables"&gt;documentation&lt;/a&gt;, or &lt;a href="https://dataform.co/signup/?utm_medium=organic&amp;amp;utm_source=dev_to&amp;amp;utm_campaign=3_tables"&gt;create an account for free&lt;/a&gt; and start using Dataform's fully managed Web platform.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>data</category>
      <category>analytics</category>
    </item>
  </channel>
</rss>
