In our previous blog post on the data creation tax, we described how product engineering teams handicap themselves by under-instrumenting their features for analytics. For example, at Snap, we often needed analytics events to debug product usability or performance issues after launching a feature. Unfortunately, often these events weren't captured ahead of time and we would have to wait for the next release to instrument them. Given the long release cycles for mobile apps, it would take us at least a 1-2 weeks to debug and fix such issues. We would've iterated much faster if we had more complete data.
On the other hand, it can be sometimes challenging to anticipate what kind of events will be needed post-launch to drive analytics or other use cases. Spending the upfront effort to instrument events which might have questionable utility is a difficult trade-off. So, product teams risk the failure of capturing critical events and insights because of the engineering effort involved or plain lack of foresight. Auto-tracking (aka auto-capture) has been offered as a solution to this problem. In this post, we will describe why auto-tracking has largely failed to deliver on its "no-code analytics" promise and when it can be still useful. In a subsequent post, we will present a superior alternative to auto-tracking that we have developed at Syft.
Auto-tracking 101
With auto-tracking, all user interactions on a website or mobile app are automatically captured for analytics. For web apps, this involves tracking and storing all DOM events, i.e. clicks, form submissions, and even mouse movements, that fire as users interact with the app. This clickstream data is then manually converted into analytics events typically by a product manager or a data team member with a web-based tool. "Instrumented tracking", on the other hand, involves changing the code to log events directly.
Auto-tracking is the default event collection method in tools such as Heap, Fullstory, Pendo, and Freshpaint. They also offer instrumented tracking as an option. MixPanel, Segment, and Rudderstack have discontinued their support for auto-tracking over the years while Amplitude and Snowplow have never supported it. The auto-tracking camp champions it as thefuture of analytics or at least a practical option while the instrumented tracking camp is fairly opinionated against its use. So, who's right?
Auto-tracking Pros/Cons
The arguments in favor of auto-tracking broadly center on two pain points :
- It is difficult to anticipate all the questions you will need to answer with data. So, it is better to capture every user interaction. Auto-tracking acts as an insurance against not having the foresight to predict your data needs in advance.
- Even when you know precisely what you want to capture, you might not have dedicated resources on the product engineering team to implement instrumented tracking.
On the other hand, auto-tracking has plenty of downsides:
Poor quality data. Auto-tracking is brittle and breaks when the UX implementation (HTML/CSS) changes as engineers update the product. When this happens, events defined on top of this UX "click stream" data stop flowing and you have to repair them. Besides data loss, auto-tracking is also prone to capturing sensitive user data.
Incomplete data. Auto-tracking is limited to capturing user interaction data on front-end clients. It cannot collect server-side events. Auto-tracked events can capture extra metadata (fields) only if they are rendered on the screen e.g. for an OrderPlaced event to include a product SKU, the SKU would have to be displayed on the page. To get more comprehensive events, you have to switch to manual instrumented tracking or change your UX code - a departure from the promised "codeless" world.
More effort. With auto-tracking, you still have to manually define events from
the click-stream. It just happens post-capture and is typically done by a Product
Manager or a member of the data team. Converting UX events into semantic events requires
knowing the details of the UX implementation which can change without their knowledge.
And when that happens, they have to engage the engineering team. Again, not exactly
a "developer-free" experience.
Expensive. Auto-tracking collects too much data. All those increased data storage and processing costs are reflected in the pricing structure of the vendors. To mitigate this, customers end up using it for only parts of the app, defeating the purpose of the "capture everything" approach. It is also expensive for end users - auto-capturing data especially on mobile devices consumes device resources and network bandwidth.
So, auto-tracking results in less trustworthy, less complete data and isn't exactly a codeless solution. More importantly, it creates more work for the product managers, developers, and data engineers involved in the data creation process. Developers do not know the dependency that data consumers take on their code and can change their code without knowing that tracking is going to break. As teams iterate faster on their product, keeping auto-tracking functional requires more effort, most of which is spent on firefighting data quality issues. It enables a negative feedback loop which is why most companies outgrow it pretty quickly as they scale.
So should you use Auto-track?
In our experience, the time savings from supposedly code-less tracking do not justify the poor and limited data that auto-tracking collects. It is also not a valid trade-off in a world where it is possible to automate modeling and coding tasks with AI (which is what Syft does). We also think that in the vast majority of cases, product teams know the broad outlines of the questions that they want to answer. They have enough information to implement instrumented tracking.
That said, there are a few situations where auto-tracking is useful:
- If you want to replay user sessions for debugging unexpected UX or performance issues, then you should use dedicated session replay tools such as FullStory that auto-capture all user interaction events with high fidelity.
- If you are prototyping and iterating on a rapidly changing feature, you can get started with auto-tracking to get initial data and then start migrating your events to instrumented tracking as you solidify your product.
- If you are paranoid and want an insurance policy against the risk of not collecting critical events due to an oversight, you can enable auto-tracking for a limited time after releasing a feature.
The flowchart above summarizes the decision tree for which kind of tracking to use when. Note that auto-tracking vs instrumented tracking is not always an either-or choice. If you use auto-tracking tools like Heap, you can migrate your events to instrumented tracking as you iterate. Going in the other direction i.e. moving an event from instrumented tracking to auto-tracking, however, is usually rare.
The future: Automatic Instrumented Tracking
Instrumented tracking is more reliable and robust from a data quality perspective. However, currently, it requires a lot of effort from product, engineering, and data teams to implement. Many teams that buy tools like Amplitude and Mixpanel have difficulty instrumenting their products because they provide no in-built guidance on which events to collect and how to robustly instrument them. Wouldn't it be great if we could eliminate the modeling and implementation effort as auto-tracking promises (but fails to deliver)? In the next post, we will describe how Syft solves this problem. Please sign up below to get notified when it lands!