Converting Text Field to Foreign Key in Django

Marco
4 min readAug 21, 2020
Zugspitze, Germany — Dec 2019

Altering your data models or schema is sometimes inexorable due to changing requirements or fixing legacy design mistakes. It can be a risky refactoring task since the data models are the core of the application, which is also stateful and one would need to take care of compatibility. We also need to be mindful to avoid any customer data loss and minimize the downtime as much as possible.

With the help of ORM frameworks such as Django, modifying database schema is getting easier since the framework can automatically generate SQL migration scripts like ALTER TABLE, ALTER COLUMN or CREATE INDEX and execute these SQLs during deployments.

Even though not all schema changes are easy, and I encountered one interesting amendment in a project I was working on:

Altering a plain text column to a foreign key

It was a mistake that we didn’t set up the foreign key relationship when the models were created. Having existing data on the production database adds extra challenges to the refactoring task because we are adding a new constraint to the existing text column. It is possible that some existing values are not the primary key of the related table due to data corruption.

End up we were still able to finish the refactoring, carefully following the step by step procedures outlined below.

The Data Model

class Album(models.Model):
name = models.TextField()
user_id = models.TextField()
# we want to alter this to
# user = models.ForeignKey(User, related_name='albums')
create_date = models.DateTimeField(auto_now_add=True)
class User(models.Model):
# `id` column is auto added by Django
username = models.TextField()

The Migration Procedures

  1. Add a user_link foreign key field to the Album model which is nullable
class Album(models.Model):
user_id = models.TextField()
user_link = models.ForeignKey(
User,
related_name='albums',
null=True,
)

And run python manage.py makemigrations to create a migration file.

We need to use a different field name other than just user because Django appends _id to the field name when creating the actual database column and user_id already exists.

2. Modify the codebase to assign user_link whenever Album is created or updated, so all new albums will have user_link filled.

album = Album.objects.create(
user_id=request.user.id,
user_link=request.user,
# other fields...
)
  • We are running both old and new fields in parallel and prepare for switching to the new fields when it is ready.
  • Note that the new user_link field has higher data integrity since its values must exist on the User table too.

3. Create a data migration script to back-fill user_link from user_id for existing data.

To create an empty migration file:

python manage.py makemigrations your_app_name --empty

And then edit it with the custom migration script:

from django.db import migrations
def back_fill_user(apps, schema_editor):
Album = apps.get_model('myapp', 'Album')
User = apps.get_model('myapp', 'User')
for album in Album.objects.filter(user_link=None):
user = User.objects.filter(id=album.user_id).first()
# handle non-existing users due to data corruption
if user is None:
print(f'Cannot find user with id {album.user_id}')
continue
album.user_link = user
album.save()
class Migration(migrations.Migration):

dependencies = [
('myapp', '0001_initial'),
]

operations = [
migrations.RunPython(
back_fill_user,
# this allows rewinding the migrations to a previous
# state by skipping this custom migration script
reverse_code=migrations.RunPython.noop,
),
]

It is possible that some existing user_id cannot be found in the primary key column of the User table. We need to manually log and check those corrupted data and decide how to handle these data. (e.g. remove the album record, create a new user record with that missing ID, assign all these albums to a dummy user and archive them, etc.) Just make sure the user_link column is filled up and does not contain NULL.

At this stage, we are comfortable that the new user_link column is ready for switching to. Continue the following steps to switch the columns.

4. Remove the old user_id field and run makemigrations

5. Set user_link to non-nullable (remove null=True) and run makemigrations

6. Rename user_link to user and run makemigrations

Step 4. to 6. can be done in one single migration file, as long as the operations are executed in sequential order, but I would suggest running makemigrations after each step and concatenate those operations manually so that Django will not be confused with multiple model changes at the same time and create incorrect migration scripts.

7. Lastly, update codebase to reference the new user field instead of user_id

--

--

Marco

Software Engineer | Hongkonger 🇭🇰 Passion for Craftsmanship