Data Assistant

Data assistant provides suggestions and automations to improve data quality and make data governance easy.

react

design systems

nestjs

postgress

LLMs

Consider the events you track using Amplitude today. Some of that data might be more important than others. For example, a purchase event on an ecommerce site is likely more critical to your business than viewing a product detail page.We built a Data Assistant that looks at a combination of factors—including the number of queries on each data point and the event volume—to determine an importance score.

Next, Data Assistant also evaluates each event against best practices we've found working with thousands of companies. For example, grouping similar events into categories helps other Amplitude Analytics users find the event they're looking for more easily.

By combining event prioritization and categorization, Data Assistant can aggregate a list of suggestions companies can make to improve their data quality.

Having a recommended list of these suggestions is great, but imagine trying to execute this across a large taxonomy. Going event-by-event, adding a description, categorizing, setting an owner, etc. is not an efficient way to work.

To make this process easier, we've taken learnings from how our customers actually work and incorporated them into Data Assistant.

For example, instead of just identifying that several unrelated events are missing categories, Data Assistant finds similar events and suggests a category grouping appropriate to these events.

We use two methods for determining event similarity. The first method learns from the sequences of events in your own data to identify common patterns, while the second method uses semantic embeddings of the event names to identify common concepts. Both of these are combined to create the final recommendations.

Data Assistant also integrates the latest in AI technology, large language models (LLMs), to provide suggested text for your descriptions. Behind the scenes, we're leveraging OpenAI with description metadata—like the event name and category—to generate a description for you. We've built all of this with privacy and security in mind, following our AI principles.

And, as part of our commitment to transparency, description suggestions leverage OpenAI's APIs with controls in place so that no data will be used to train or improve OpenAI’s models and a strict 30-day deletion policy. When using Data Assistant, you can be confident that none of your end user data will be sent to OpenAI.