Database Migration from SQL Server to Postgres

The Most Popular Databases in 2020 | LearnSQL.com

This whitepaper explores bottlenecks and workarounds of SQL Server to Postgres migration. Regardless of the chosen approach or tool for the database migration, the specialist or team in charge should be able to identify the most critical issues and validation areas. In essence, the database migration from SQL Server to Postgres can be categorized into five fundamental phases:

  1. Assess all SQL Server specific objects in the source database and review suitable methods of migration to Postgres.
  2. Migrate SQL Server schemas and constraint definitions. This step involves audit of all syntax differences of table and constraint declarations in SQL Server and Postgres.
  3. Select the best approach for data migration that fits downtime tolerance of SQL Server system.

Migrate the data and verify the result to make sure that all the required conversions have been made.

  • Translate SQL statements of views and source code of stored procedures, functions and triggers into the Postgres dialects of SQL and PL/SQL.
  1. Validate the target database through performance and functional tests, make the necessrya performance optimizations.

Next sections of the article expose details of these steps and related points of attention.

Migrate Table Definitions

SQL Server and Postgres have similar sets of data types. For example bigint, int, date, decimal, real, text are supported in both source and target DBMS. However, other types are distinguished like binary that is image in SQL Server and bytea in Postgres.

Spatial types GEOGRAPHY and GEOMETRY also require special attention, since Postgres supports those types through extension PostGIS that mist be installed on the system.

Challenges of Data Migration

Earlier we recommended using BYTEA as the data type for storing binary data for SQL Server to Postgres migration. However, for large binary data, typically with an average field size of 10MB or more, this kind of conversion is not reasonable. The rationale behind this is that BYTEA data is read in a specific manner – it can only be retrieved as a single fragment and does not support incremental reading. Consequently, this limitation can lead to substantial RAM consumption.

To tackle this challenge, Postgres provides an alternative solution known as the LARGE OBJECT, which offers stream-style access to the data. Values of the LARGE OBJECT type are stored in a system table called ‘pg_largeobject,’ present in every database. This table can accommodate up to 4 billion rows, with a maximum size limit of 4TB for each LARGE OBJECT. Furthermore, the LARGE OBJECT type supports incremental reading, effectively addressing the constraints of BYTEA.

Also, SQL Server offers the “external table” feature, allowing you to link and treat external data stored outside the database as a standard table within the DBMS. In a similar vein, Postgres provides a comparable capability through its Foreign Data Wrapper (FDW) library. For instance, you can utilize the “file_fdw” extension to interact with external CSV files just as if they were conventional tables. This extension facilitates effortless integration with external data by offering a convenient means to access and manipulate CSV file contents as if they were inherent Postgres tables.

Methods of Data Migration

Before beginning SQL Server to Postgres data migration, database specialist must careful assess the strategy and implementation to avoid unacceptable system downtime and/or overhead, especially for large databases. Basically, there are three popular methods to migrate the data:

  • Snapshot: This approach suggests migration of all data in a single touch. It demands for essential downtime of the source system throughout the entire data reading process to safeguard against data loss or corruption.
  • Piecewise Snapshot: This method assumes splitting data in segments and parallel migration through simultaneous threads or processes. While some downtime remains necessary, it is notably minimized compared to the snapshot method.
  • Changed Data Replication (CDR): This method relies on the continuous loading of data by monitoring incremental changes. It offers the advantage of nearly eliminating system downtime, as it only requires synchronization of the modifications made since the initial migration.

It is extremely important to analyze the requirements of the particular migration project and select the most suitable method or utility that guarantee acceptable downtime, overhead and efficiency. Automation tools like SQL Server to Postgres database converter can dramatically simplify the migration. It handles all types mapping, customize many parameters of migration and migrates the data through piecewise snapshot method specified above.