Altering your data models or schema is sometimes inexorable due to changing requirements or fixing legacy design mistakes. It can be a risky refactoring task since the data models are the core of the application, which is also stateful and one would need to take care of compatibility. We also need to be mindful to avoid any customer data loss and minimize the downtime as much as possible.
With the help of ORM frameworks such as Django, modifying database schema is getting easier since the framework can automatically generate SQL migration scripts like ALTER TABLE
, ALTER COLUMN
or CREATE INDEX
and execute these SQLs during deployments.
Even though not all schema changes are easy, and I encountered one interesting amendment in a project I was working on:
Altering a plain text column to a foreign key
It was a mistake that we didn’t set up the foreign key relationship when the models were created. Having existing data on the production database adds extra challenges to the refactoring task because we are adding a new constraint to the existing text column. It is possible that some existing values are not the primary key of the related table due to data corruption.
End up we were still able to finish the refactoring, carefully following the step by step procedures outlined below.
The Data Model
class Album(models.Model):
name = models.TextField()
user_id = models.TextField()
# we want to alter this to
# user = models.ForeignKey(User, related_name='albums')
create_date = models.DateTimeField(auto_now_add=True)class User(models.Model):
# `id` column is auto added by Django
username = models.TextField()
The Migration Procedures
- Add a
user_link
foreign key field to theAlbum
model which isnullable
class Album(models.Model):
user_id = models.TextField()
user_link = models.ForeignKey(
User,
related_name='albums',
null=True,
)
And run python manage.py makemigrations
to create a migration file.
We need to use a different field name other than just user
because Django appends _id
to the field name when creating the actual database column and user_id
already exists.
2. Modify the codebase to assign user_link
whenever Album
is created or updated, so all new albums will have user_link
filled.
album = Album.objects.create(
user_id=request.user.id,
user_link=request.user,
# other fields...
)
- We are running both old and new fields in parallel and prepare for switching to the new fields when it is ready.
- Note that the new
user_link
field has higher data integrity since its values must exist on theUser
table too.
3. Create a data migration script to back-fill user_link
from user_id
for existing data.
To create an empty migration file:
python manage.py makemigrations your_app_name --empty
And then edit it with the custom migration script:
from django.db import migrations
def back_fill_user(apps, schema_editor):
Album = apps.get_model('myapp', 'Album')
User = apps.get_model('myapp', 'User') for album in Album.objects.filter(user_link=None):
user = User.objects.filter(id=album.user_id).first() # handle non-existing users due to data corruption
if user is None:
print(f'Cannot find user with id {album.user_id}')
continue album.user_link = user
album.save()
class Migration(migrations.Migration):
dependencies = [
('myapp', '0001_initial'),
]
operations = [
migrations.RunPython(
back_fill_user,
# this allows rewinding the migrations to a previous
# state by skipping this custom migration script
reverse_code=migrations.RunPython.noop,
),
]
It is possible that some existing user_id
cannot be found in the primary key column of the User
table. We need to manually log and check those corrupted data and decide how to handle these data. (e.g. remove the album
record, create a new user record with that missing ID, assign all these albums to a dummy user and archive them, etc.) Just make sure the user_link
column is filled up and does not contain NULL
.
At this stage, we are comfortable that the new user_link
column is ready for switching to. Continue the following steps to switch the columns.
4. Remove the old user_id
field and run makemigrations
5. Set user_link
to non-nullable (remove null=True
) and run makemigrations
6. Rename user_link
to user
and run makemigrations
Step 4. to 6. can be done in one single migration file, as long as the operations are executed in sequential order, but I would suggest running makemigrations
after each step and concatenate those operations manually so that Django will not be confused with multiple model changes at the same time and create incorrect migration scripts.
7. Lastly, update codebase to reference the new user
field instead of user_id
And we are all done 🎉