Slides from meetup: Snowplow at Seven West Media and Google Cloud Platform

Wednesday 5 December, 2018. | By: Simon Rumble

Key points

  • Seven West Media triggers server-side tracking based on Snowplow data received from clients
  • Tracking is made simpler through centralisation and fanning out
  • Google Cloud Platform port of Snowplow is now available and ready for production
  • Real-time data into BigQuery

[Read More]

Classifying browsers: Which visitors are real people?

Wednesday 28 November, 2018. | By: Simon Rumble

Key points

  • Browser classifications in are deprecated (there’s now a better way)
  • Use the UA Parser enrichment to classify browsers
  • Use the IAB Bots & Spiders enrichment to find bad bots
  • You’ll still need to look at your own data carefully to remove bots that aren’t detected

[Read More]

MeasureCamp Sydney: Event roundup and slides

Wednesday 24 October, 2018. | By: Simon Rumble

The amazing analytics unconference MeasureCamp Sydney was held on Saturday 20 October and it was the best yet! We’ve been involved from the beginning as sponsors and organisers because we love the format and community, so it was great to be part of the third edition.

[Read More]

Paper over your mistakes with data models

Tuesday 9 October, 2018. | By: Simon Rumble

Mistakes happen. In the data world, your ugly mistakes live on forever. It’s not just the embarrassment that’s a problem though. Gaps and obvious errors in historical data distract your stakeholders from more important matters. Explaining the anomalies and getting your data users to focus on things you don’t know about is tiring for everyone.

[Read More]

Quantifying content velocity in Snowplow

Monday 8 October, 2018. | By: Simon Rumble

Adam Greco is something of a legend in the Adobe Analytics space. I’ve been reading his blog posts and learning from him since I first started using Omniture back in 2007 or so. He literally wrote the book on Omniture and then Adobe SiteCatalyst. The reason his blog was so useful is that very few people were writing about advanced digital analytics at the time. Between Adam and Ben Gaines, I learnt much of what I know about the then-emerging discipline.

[Read More]

Modelling your Snowplow event data: Part 5 Automation

Tuesday 18 September, 2018. | By: Simon Rumble

In the first four parts of this series, we modelled out:

  • Pageviews: accurate time spent incorporating page pings and maximum scroll depth
  • Sessions: traffic sources and initial landing details
  • Users: filtering internal traffic and looking up things we know about the users

[Read More]

Modelling your Snowplow event data: Part 4 Users

Wednesday 29 August, 2018. | By: Simon Rumble

In the first three parts of this series, we looked at modelling out pageview events to include accurate time spent and scroll depth, then classifying sessions based on what we know about where the user came from. Now we’re going to look at what we know about users.

[Read More]

Modelling your Snowplow event data: Part 3 Sessions

Wednesday 22 August, 2018. | By: Simon Rumble

In the first two parts of this series, we looked at modelling out pageview events to include accurate time spent and scroll depths. Now we’ll roll up sessions.

[Read More]

Modelling your Snowplow event data: Part 2 Pageviews

Wednesday 15 August, 2018. | By: Simon Rumble

In the first part of this series on data modelling we went through the background for building a data model. In this edition we’ll go through the steps to create a basic pageview model that incorporates page pings so we can see accurate time spent and scroll depth for each pageview.

[Read More]

Modelling your Snowplow event data: Part 1

Wednesday 8 August, 2018. | By: Simon Rumble

One of the important features of Snowplow is that you can build your own custom data models to suit your unique analytical requirements in a space-efficient and performant way. It’s important that your data model can evolve and grow in complexity as your business grows and your needs get more advanced.

[Read More]

Snowplow Inspector extension updated to colour code and allow filtering

Monday 16 July, 2018. | By: Simon Rumble

Snowplow usage continues to go up and to the right, with new and interesting use cases proliferating. Alongside this we’ve seen a steady increase in usage of our debug tool, Snowplow Inspector.

[Read More]

Setting up an AWS Athena datasource in Jetbrains DataGrip

Friday 29 June, 2018. | By: Simon Rumble

Download the JDBC driver from AWS and place it in the DataGrip JDBC driver directory. On Linux this was ~/.DataGrip2018.1/config/jdbc-drivers/.

[Read More]

Event roundup and slides: MeasureCamp Auckland

Monday 18 June, 2018. | By: Simon Rumble

Last Saturday Mike and I made it across to the first MeasureCamp Auckland. We’ve sponsored all the previous events in Sydney and Melbourne so we were thrilled to help out the Auckland team, who did a fantastic job with a really good venue, excellent food and a really diverse crowd keen to get into it.

[Read More]

GDPR emails: some early, some late

Monday 28 May, 2018. | By: Simon Rumble

In case you’ve been living in a cave, the EU’s General Data Protection Regulation came into force on Friday last week. This is Europe’s latest crack at regulating consumer privacy in the digital era, and it’s definitely a very pro-consumer regulation. It’s going to have some big impacts on the business models of a bunch of businesses.

[Read More]

Snowplow Inspector debug tool now validates schemas

Thursday 15 March, 2018. | By: Simon Rumble

Snowplow Inspector use around the world Last June we released our debug tool Snowplow Inspector and we’ve watched as hundreds of Snowplow users around the world have used it to make sense of their Snowplow tracking.

[Read More]

Setting up a BigQuery datasource in Jetbrains DataGrip

Monday 19 February, 2018. | By: Mike Robins

DataGrip is one of the most valuable tools for our engineers for exploring and querying a myriad of different database technologies. DataGrip doesn’t yet come bundled with a BigQuery driver so in this post we’ll explore how to setup a custom data source so that you can connect to BigQuery using DataGrip.

[Read More]

Why would you push Google Analytics data into Snowplow?

Wednesday 14 February, 2018. | By: Simon Rumble

In late January the Snowplow team released R99 of Snowplow which includes a really interesting feature: with a small JavaScript change, you can mirror all the data being sent into Google Analytics into a Snowplow collector.

[Read More]

Presentation: Real World Big Data Architectures

Monday 4 December, 2017. | By: Simon Rumble

A few weeks back Nicholas Tan gave a presentation at the Future of Financial Services conference about architectural designs in the real world to get value from data. Nick most recently was responsible for News Corp’s large-scale Snowplow Analytics rollout and has just started at Macquarie Group. Check out his presentation.

[Read More]

Snowplow R95 with ZSTD support released

Tuesday 14 November, 2017. | By: Simon Rumble

Back in July, we did a bunch of work to quantify the benefits of ZSTD in the Redshift database, resulting in this blog post from Mike. The results were a clear, and massive with at least 50% reductions in storage use, improvement in nearly all use cases. We started migrating our customers to using ZSTD wherever possible so they could benefit from this huge improvement.

[Read More]

Slides from Measurecamp Sydney

Monday 6 November, 2017. | By: Simon Rumble

Snowflake Analytics sponsored the always awesome Measurecamp Sydney unconference last weekend. As usual it was an incredibly high-value event with really great sessions and even better informal chats between the sessions. Such a great event and we’re looking forward to Melbourne early next year.

[Read More]

The Digital Analytics Hierarchy of Needs

Thursday 21 September, 2017. | By: Simon Rumble

A few weeks ago I discoverd Monica Rugati’s fantastic Data Science Hierarchy of Needs. It’s a data science-centric riff on Maslow’s Hierarchy of Needs, a classic concept in pyschology. I’ve found myself using Rugati’s diagram and the concept in conversations with colleagues, partners, customers and friends ever since, as a way to explain the challenges we face in this Digital Analytics space.

[Read More]

Snowflake Analytics welcomes Conrad Yiu as an advisory board member

Wednesday 30 August, 2017. | By: Narbeh Yousefian

We are delighted to announce Conrad Yiu has joined Snowflake Analytics as an Advisory Board member. Conrad brings over 20+ years of experience as an entrepreneur, venture investor and business builder.

[Read More]

A quick script to ZSTD all your shredded tables

Friday 25 August, 2017. | By: Simon Rumble

Mike’s recent post about compressing Snowplow tables works great for, with clients seeing compression down to 30% of the original size or so. But what about all your shredded tables?

[Read More]

How does Snowplow Analytics compare to other vendors?

Wednesday 9 August, 2017. | By: Simon Rumble

Tonight Snowflake Analytics team members Mike and Narbeh are debating the merits of Snowplow Analytics with representatives of Google Analytics and Adobe Analytics at Web Analytics Wednesday. The head-to-head aspect of it is meant to be lighthearted, but it’s forced us to think about some of the ways Snowplow Analytics is a better match for many types of digital analytics problems.

[Read More]

Make big data small again with Redshift ZSTD compression

Wednesday 12 July, 2017. | By: Mike Robins

A new compression option in Redshift allows you to make big storage savings, up to two-thirds in our tests, over the standard Snowplow setup. This guide shows how it works and how to get it happening.

[Read More]

Snowplow Inspector: a debug tool for Snowplow beacons

Tuesday 20 June, 2017. | By: Simon Rumble

Snowplow Insights is an amazingly flexible way to collect data, but with great flexibility comes some complexity. If you work on Snowplow implementations a lot, you’re likely familiar with Base 64 Decode and JSON Formatter when you’re digging into custom contexts and testing your implementation.

[Read More]

Accurate time spent: A killer feature of Snowplow Analytics

Tuesday 7 March, 2017. | By: Simon Rumble

Web analytics tools commonly have a Time Spent metric. Understanding how long people have spent reading a page is a really valuable thing for some businesses. For publishers, the quality of engagement with content is vital, given they’re effectively selling the attention of their readers.

[Read More]

Decoding Snowplow real-time bad rows (Thrift)

Wednesday 14 December, 2016. | By: Mike Robins

In this tutorial we’ll look at decoding the bad rows data that comes out of Snowplow real time. In the real time pipeline bad rows that are inserted into Elasticsearch (and S3) are stored as base64’d binary serialized Thrift records. We’ll walk step by step the instructions in Python as to how to first decode, and then deserialize these records.

[Read More]

Monitoring Snowplow bad rows using Lambda and Cloudwatch

Wednesday 19 October, 2016. | By: Mike Robins

In this tutorial we’ll use Amazon Lambda and Amazon Cloudwatch to set up monitoring for the number of bad rows that are inserted into Elasticsearch over a period of time. This allows us to set an alert for the threshold of bad rows, and generates an email or notification when this threshold has been exceeded. Snowplow users on the realtime pipeline will find this most useful, however users running loads in batch can also adapt this monitoring.

[Read More]


We exist to make organisations better understand their businesses by enabling all decision makers in a company to work with the same version of the truth.

Social Links