The “Bullshit AI” test

There is a running joke on Twitter that if you take a “.ai” domain name and put LLMs, Generative AI, or GPU on a pitch deck, someone will knock at your door with a term sheet.

With all of those buzz words, it’s hard to know when some AI is bullshit or not so I’ll help you out with a 3-step bullshit detector.

But first, who am I to talk about that?

My story as a founder building with AI

I’m the founder of Cycle – a product feedback platform. We help you centralize all of your customer feedback from many different sources like Slack, Intercom & Gong. We bring it all into one neat collaborative space in which you can extract customer insights and close the feedback loop at each release.

We’ve been working on Cycle for the past 5 years. For the first 3.5 years we didn’t have a single AI feature. Then, ChatGPT launched 1.5 years ago and it changed the game in a non-bullshit way for us. Cycle already worked without AI. But product folks were spending hours every week manually processing feedback... So we went all-in on AI and revisited our entire UX to automate all of that feedback processing grunt work.

That's our feedback autopilot. Feedback management without the management. I’ve recorded a sweet product demo for y’all if you’re curious👇

We were in a tough spot and AI saved us in a way. The new AI paradigm is both a threat and an opportunity for every business, we decided to see it as an opportunity.

According to Darwin's Origin of Species, it is not the strongest of the species that survive, nor the most intelligent, it is the one that is most adaptable to change.

Fortunately for us, we were able to adapt very fast. And today, I want to tell you more about how we adapted and give you a few tips that will hopefully help you adapt too.

Bullshit vs non-bullshit AI

So, how do you make the difference between bullshit and non-bullshit AI? By using a 3-step framework that I called the Bullshit AI test. You basically need to answer three distinct questions:

  1. Forget about AI. How does it work without AI?
  2. How does AI make the core of your product 10x better or faster?
  3. How hard is it to get your AI job done with a bunch of prompt templates and a chatGPT tab open?

Let’s dive in! 👇

Question 1: Forget about AI. How does it work without AI?

"Talk to me about the rest of your product”

The first thing to do is to shift the conversation. Having a cool AI idea is nice but you still need to start with a real customer experience and have real software behind it.

Don’t tell me about technology, tell me about the problems you solve for your customers. What non-obvious insight do you have about your customers? What workflows does your product encourage?

Steve Jobs said it well: “you’ve got to start with the customer experience and work back toward the technology.”

Then, you still need to go and build an incredible piece of software. It’s a huge amount of work.

Before you tell me about your AI features, I want to know:

  1. What's your product's dictionary? What’s the minimum set of new words people need to understand before using your product? If it's high... 🚩
  2. What's your object model? How simple is your API? If there's no compelling answer... 🚩
  3. How fast is your product? If you’re building a project management tool, is it as fast as Linear? If not... 🚩

The fundamentals of building a remarkable product never change. There's no shortcut.

Question 2: How does AI make the core of your product 10x better or faster?

If your "AI" doesn't make the core of your product 10x better or faster, it's probably bullshit AI. Not a side use case. The core of your product. Not 2x. 10x.

Think about what users spend the most time doing in your product. For each task, multiply time spent per task by the frequency at which the task is performed. This will tell you where your non-bullshit AI opportunities are.

Think about GitHub copilot. The core of the engineer’s job is writing code. And with GitHub copilot, engineers can hit tab to finish their sentences. Sure it doesn’t take a lot of time to finish a sentence but they might do it 50x a day so in the end it makes them 10x faster

On the contrary, user-prompt driven content generation as a feature on a product that is not about writing content… is about as lame as you can get in using LLMs in your product.

So… guess what's the first AI feature we shipped at Cycle?… 🥁🥁🥁

Yup… Basic prompts for content generation 🙈

That’s ok, everyone needs to go through a bit of bullshit once in a while on their way to greatness.

Question 3: How hard is it to get your AI job done with a bunch of prompt templates and a ChatGPT tab open?

Finally, how hard is it to get your AI job done with ChatGPT? If it’s easy, you’re just a thin layer on top of OpenAI and you’ll probably get destroyed at OpenAI’s next release.

You don’t want to be that startup being killed by OpenAI… do you?

A good test for whether your AI is bullshit or not: are you excited or scared while watching OpenAI's announcements? (hint: you should be excited)

So, how do you avoid getting killed by OpenAI? Make sure your AI queries lead to actions performed within your product, as opposed to basic text-based answers. The more actions performed by AI, the better.

Let’s take the example of Google Sheet and let’s assume you want to make negative numbers red on a certain column:

In the past you’d have to click your way through 6 different actions. In the future, you’ll just type: “please make negative numbers red on column B.”

It’s particularly powerful when users know what they want to do but they don’t know how to achieve it in some complex software. It directly enhances the core product experience and it can’t be done with ChatGPT.

Next level AI is doing the same but on autopilot. You need to ask yourself: of all the tasks that have to happen in your product, which ones can you remove entirely? Try to kill them with end-to-end automation.

A great example is Fin from Intercom. Fin is an AI chatbot that uses AI to automatically solve customer issues with conversational answers based on your support content:

Think about it: Fin just automated an entire workflow, which is what autopilots do.

The future of AI: autopilots, not copilots

Non-bullshit AI tends to replace large chunks of workflow. That’s the exact goal of autopilots. Yes, autopilots: AI performing tasks and generating insights – all by itself.

But autopilots are hard to build. They’re hard, not from a technical standpoint but from a UX standpoint. So let’s dig into some hard-won UX learnings from our journey building the ultimate feedback autopilot at Cycle.

Cycle’s journey building a non-bullshit autopilot that works

Spotting the right AI opportunity

First, we identified feedback processing as the highest leverage use case for. It’s what users spend the most time doing in the product and it’s very hard to do with ChatGPT because the output is a set of actions that are performed within the product. In short, we made sure we passed the “Bullshit AI” test:

Defining feedback autopilot’s job

Feedback autopilot’s job is:

  1. Feedback comes in.
  2. Read the feedback.
  3. Summarize the feedback.
  4. Find relevant customer quotes.
  5. Categorize the customer quotes.
  6. Link them to the right features or problems.
  7. If no existing feature/problem was found, create one.
  8. Mark the feedback as processed.

Like Fin, we killed large chunks of workflows with end-to-end automation 👀

Creating a UX edge, release by release

We shipped our first insight extraction feature 1.5 years ago:

Then we added automatic insight categorization:

After that, we radically improved AI results with custom context per workspace:

We then added support for custom insight types to adapt to any team's typology:

Then, we faced issues with AI creating generic and irrelevant features so we found a solution: prompts that keep improving by themselves with better context.

We made sure that AI-generated features matched teams’ naming styles according to the five latest features they created in their workspace – feature category by feature category.

The insight is that folks don't write their problem titles the same way they write their feature, improvement or bug titles: Cycle's AI took this into account in order to generate titles that make sense based on their style for each category:

After that, we made our AI a bit more humble. We used to link each customer quote to its closest feature – based on a matching score that semantically determines how close two objects are.

The thing is our AI tended to confidently match features a bit too often, especially in large workspaces. So it felt a little bit like a 10 year old guessing.

Also, in cases three distinct features were an equally good match for a given quote, Cycle would arbitrarily pick the first one even if its matching score was just marginally higher than the two others. This overconfidence wasn't creating trust.

So we changed the matching logic: we made it such that if there's no obvious match, we no longer arbitrarily pick the first feature. Instead we let the user choose between a set of recommandations or create a new feature from scratch:

We continued and added multilingual support for AI categorization. If some customer quotes are in French or Chinese but features are in English, it used to create issues. So we upgraded our feature matching logic.

Each time a quote is created, Cycle's AI now goes through three successive semantic searches. That way, we're sure to find the right match (if there's one):

We finally felt ready to automate everything with an autopilot:

But it didn't work at first: AI was creating too much junk and users were switching it off so we had to find a solution. The solution is called "AI-generated, user-verified:”

It boils down to three principles to create trust:

  1. What's generated on autopilot should have an "AI-generated" tag
  2. It should be possible for users to (bulk) verify or discard AI-generated things with one click
  3. It should be easy to filter any view by the AI tag – do you want to see all data? only user-generated data? only AI-generated, user-verified data?

As if it wasn’t enough, we then had to deal with the real-time nature of Cycle. We needed a way to let folks know when AI is performing tasks on an object they’re looking at. To do that, we designed a nice little toaster that lets you know when AI is performing actions – in real-time:

In short, we’ve been relentlessly iterating on our AI to create a UX edge. We do have a technical edge too, but that’s the rest of our product. When it comes to AI, every team is using the same models anyway so the way to create differentiation and defensibility is to design the best user experience.

These many iterations were needed for us to build the best feedback autopilot in the market. And it started paying off for:

That being said, it’s just the beginning. I’m bullish about what’s to come as we’ll continue to define and design new ways to build (non-bullshit) autopilots 🤗

Thank you to Des Traynor from Intercom for inspiring us all founders to push ourselves and build better products with AI. Some of the ideas of this blog post were inspired by a talk Des gave at La Cristallerie in Paris on April 9th, 2024.