A Soft Deletion Story

Part 1: It can be harder than it sounds

To start, what is soft deletion? It’s just adding some data to a database record (like a flag or a timestamp) to mark something as “deleted”, so it remains in the database but is no longer shown. There are a few reasons to do this, but the big one is making the record easy to restore (just remove that mark-as-deleted data).

This is a well-known and -established pattern, and there are several packages to easily (or even automatically) handle this — or at least that’s what I’m used to from the Rails and ActiveRecord world.

Wait, let me clear something up right now. The manner of handling soft deletion can be contentious, mostly related to the question of just how automatically it should be dealt with. For that matter, the manner of handling deletion at all can be contentious. Actually, probably the manner of handling anything database-related can be contentious. But that’s not what this blog post is about. For now, we’re talking about soft deletion. It’s happening.

So, ActiveRecord. It has scopes, and a lot of things go through associations. That makes this pretty straightforward. It also has default scopes that let you do things automatically, which can be a blessing and a curse. There are really two main packages for this: paranoia (which works on the “automatic” principle), and discard (which wants you to be more explicit).

I’m currently on a project that’s in Elixir. I’ve been learning plenty about Elixir vs. Ruby, and Phoenix vs. Rails, and Ecto vs. ActiveRecord. On top of that, since this is my first significant Elixir project, I’m not entirely sure if anything I come across is project-specific, or if it’s a general framework thing. For instance, Ecto queries in this project are all specifying joins rather than using associations.

So when I looked for soft-deletion packages for Ecto and didn’t find anything compelling, I took it mostly in stride but determined I should do a little more searching. Then I found this blog post and was intrigued by the idea of simply using views. It seemed like it would do a lot of good things without much drawback.

It was so stupidly simple, and what could go wrong?

Create a view that’s defined as select * from the_table where is_deleted = false;
Change the module to use the view instead of the table
Profit

What could go wrong, indeed. At this point, let’s talk about something more specific and introduce some concrete concepts. Let’s say we have Places and Activities, and of course a Place can have multiple Activities and an Activity can be at multiple Places. This is what this would normally look like.

defmodule SoftDelete.Repo.Migrations.CreateTables do
  use Ecto.Migration
  
  def change do
    create table(:places) do
      add :name, :string, null: false
      add :is_deleted, :boolean, null: false, default: false
    end

    create table(:activities) do
      add :name, :string, null: false
      add :type, :string
      add :is_deleted, :boolean, null: false, default: false
    end

    create unique_index(:activities, [:name])

    create table(:place_activities) do
      add :place_id, references(:places), null: false
      add :activity_id, references(:activities), null: false
    end

    create unique_index(:place_activities, [:place_id, :activity_id])
  end
end

defmodule SoftDelete.Place do
  use Ecto.Schema

  alias SoftDelete.PlaceActivity
  alias SoftDelete.Activity

  schema "places" do
    field :name, :string
    field :is_deleted, :boolean

    many_to_many :activities, Activity, join_through: PlaceActivity
  end
end

defmodule SoftDelete.Activity do
  use Ecto.Schema

  alias SoftDelete.PlaceActivity
  alias SoftDelete.Place

  schema "activities" do
    field :name, :string
    field :type, :string
    field :is_deleted, :boolean

    many_to_many :places, Place, join_through: PlaceActivity
  end
end

defmodule SoftDelete.PlaceActivity do
  use Ecto.Schema

  import Ecto.Query

  alias SoftDelete.Place
  alias SoftDelete.Activity

  schema "place_activities" do
    belongs_to :place, Place
    belongs_to :activity, Activity
  end
end

So now it’s time to go a step further and add some views.

defmodule SoftDelete.Repo.Migrations.AddSingleModelViews do
  use Ecto.Migration

  def up
    execute """
      CREATE VIEW active_places AS
        SELECT * FROM places WHERE is_deleted = false;
    """

    execute """
      CREATE VIEW active_activities AS
        SELECT * FROM activities WHERE is_deleted = false;
    """

    create_if_not_exists index(:places, [:is_deleted])
    create_if_not_exists index(:activities, [:is_deleted])
  end

  def down do
    execute "DROP VIEW active_places;"
    execute "DROP VIEW active_activities;"
  end
end

And what needs to be done to the models? Literally just change the schema lines to use active_places and active_activities, respectively. That’s all.

This works pretty easily, in many ways. Plain queries are just working!

Place |> Repo.all()
Place |> preload(:activities) |> Repo.all()

That seems like perfection, but issues come up kind of quickly. What if you don’t want to load all the activities, but just the number of activities? That’s also nice because it can be done with a single query instead of two.

Place 
|> join(:left, [p], pa in PlaceActivity, on: pa.place_id == p.id)
|> group_by([p], p.id)
|> select([p, pa], %{id: p.id, activity_count: count(pa.id)})
|> Repo.all()

However, marking an activity as deleted doesn’t change the count. That can be fixed with another join and checking the flag, but that’s another thing to keep in mind whenever writing a query. What about another view?

defmodule SoftDelete.Repo.Migrations.AddJoinModelView do
  use Ecto.Migration

  def up do
    execute """
      CREATE VIEW active_place_activities AS
        SELECT pa.*
        FROM place_activities AS pa
        JOIN active_places AS ap ON pa.place_id = ap.id
        JOIN active_activities AS aa ON pa.activity_id = aa.id;
    """
  end

  def down do
    execute "DROP VIEW active_place_activities;"
  end
end

And you know what happened with the model here, right? Yup, it’s schema “active_place_activities”.

This view is a little more complicated, but now we have some nice automatic stuff. It’s working nicely.

But I’m not enjoying getting a simple Map as a result. I’d rather get the real thing and maybe be able to preload the activities, so that’s easy: just add this count as a virtual attribute and use the struct-update syntax.

Place
|> join(:left, [p], pa in PlaceActivity, on: pa.place_id == p.id)
|> group_by([p], p.id)
|> select([p, pa], %{p | activity_count: count(pa.id)})
|> Repo.all()

Wait, what’s this?

** (Postgrex.Error) ERROR 42803 (grouping_error) column "a0.name" must appear in the GROUP BY clause or be used in an aggregate function
    query: SELECT a0."id", a0."name", a0."is_deleted", count(a1."id") FROM "active_places" AS a0 LEFT OUTER JOIN "active_place_activities" AS a1 ON a1."place_id" = a0."id" GROUP BY a0."id"

Okay, this sucks. I guess I’ll add the grouping for now. And it works, but something seems fishy here. To check it out, I go directly to the database to see what’s happening.

soft_delete_dev=# SELECT a0."id", a0."name", a0."is_deleted", count(a1."id") FROM "active_places" AS a0 LEFT OUTER JOIN "active_place_activities" AS a1 ON a1."place_id" = a0."id" GROUP BY a0."id";

ERROR:  column "a0.name" must appear in the GROUP BY clause or be used in an aggregate function

LINE 1: SELECT a0."id", a0."name", a0."is_deleted", count(a1."id") F...
                        ^
soft_delete_dev=# SELECT a0."id", a0."name", a0."is_deleted", count(a1."id") FROM "places" AS a0 LEFT OUTER JOIN "active_place_activities" AS a1 ON a1."place_id" = a0."id" GROUP BY a0."id";

 id |    name     | is_deleted | count 
----+-------------+------------+-------
  1 | test one    | f          |     1
  3 | test two    | f          |     1
  5 | other place | f          |     0
  4 | some place  | f          |     1
(4 rows)

This is just the beginning of how views are troublesome. I’m not (currently) worried about the performance of the queries — how they’re automatically including where clauses and joins. What got to me is how quickly this became confusing and unhelpful.

When operating on a table, Postgres realizes that the primary key is special, and grouping by that column means only a single row will be returned from that table, and the rest of the columns can be used as-is. Postgres doesn’t know this about a view.

Interestingly, Postgres understands this sort of “simple” (single-table) view and allows writes to it — inserts, updates, deletes — automatically operating on the underlying table. But just like it doesn’t know about the primary key, it doesn’t know about unique indexes. So there are situations where an INSERT … ON CONFLICT UPDATE won’t work on the view, but needs to be using the table itself. That can be dealt with by creating new copies of these models and setting the schema to the table instead of the view. And having a way to operate on the tables themselves is useful for things like a superuser or admin section of the app, where you may want to show marked-as-deleted records and restore (or actually delete) them.

Also, even though this is a view defined as SELECT * FROM tablename, the structure is set on creation. If you add a column to the underlying table, you have to recreate the view (which you can do in place). If you want to remove a column from the table, you have to first drop the view and then create it again. And don’t get me started on the join-table view. You don’t want to see the triggers I had to set up to allow writes there.

So that wraps up the problem statement. Read on for part two of this story, where I share the solution I ended up with.

If you’re looking for a team to help you discover the right thing to build and help you build it, get in touch.

Development

#Elixir

#SoftDeletion

#EctoRescope

#SoftwareDevelopment

#DatabaseManagement

#DeveloperInsights

Published on February 6, 2024

Yossef Mendelssohn