I was recently invited to a meeting in advance of a maintenance window for one of the biggest applications at a customer. Until this moment I was not aware that I will be involved in the software upgrade and all the changes they will perform one and a half week later.

In this meeting, they told me that they will migrate the file storage (for pdfs) in the background to a new storage solution and they therefore have to perform some update statements. They told me that the first execution of the script on the integration environment took around 15 hours. That an update script which takes 15 hours is very critical for the whole upgrade is self explaining. There could be network interruptions and problems everywhere.

The script

I got the script at this day and started analyzing. They first have to create the script on a different environment. It contains around 11 millions of independent update statements. Every statement has the record ID hard coded in the WHERE clause. Which leads to hard parses for around 11 million statements. In combination with the execution over the network, this explains the duration of more than 10 hours.

Solution finding

Due to short time, I was not able to implement a real solution, like getting the data directly from the source system via database link and had to work with what I have.

Even if I would have been able to eliminate most of the network round trips by executing the script directly on the database server, there would be some further problems to overcome. The problem of session loss of my SSH session could have been solved by using tools like “screen”. Another possible way to resolve the network round trips would have been converting everything into PL/SQL by encapsulating everything into a BEGIN and END block.

The workaround

All the mentioned workarounds would not eliminate the hard parse problem and would not be a way I’m happy with. Without getting rid of the update statements it would be almost impossible to reduce the execution time by a high percentage. After I talked with a colleague, I had the idea of not executing the mentioned update statements and not using the provided SQL script.

I asked the provider of the SQL script if it would be possible to provide a CSV file instead. The effort for him was almost the same. He only had to change the output from concatenate the elements to an update statement to a CSV file.

Once I received this file I was able to to create an external table out of this file. With this external table, I was able to create one simple MERGE statement to do all the 11 million updates.

On the test environment, the MERGE statement took around 45 minutes. This was way better than 15h but I was pretty sure it would be even faster on the production . The production server has more memory, more CPU cores and during the maintenance window, nobody will work on this system. During the maintenance window the MERGE statement took less than 3 minutes.

Conclusion

This solution could be optimized far further but a decrease of execution time from 15h to less than 3 minutes was enough for fighting the fire in this situation.

As the title states, it was firefighting and we only had roughly a week of time to develop, test and verify the solution.

There are many ways to get the same result but most of the time, the easiest one is letting the database do the job. We gave the database the opportunity to do what the database can do best. With this, we were able to shorten the downtime and find the mentioned errors.