myVertica  

Database Recovery

In this tutorial, you’ll learn about the database recovery process in Vertica.

Database Recovery

Recovery is the process of restoring the database to a fully functional state after one or more nodes in the system has experienced a software- or hardware-related failure. For example, a hardware failure can cause a node to lose database objects or to miss changes made to the database while offline. Vertica recovers nodes by querying replicas of the data stored on other nodes.

Recovery Scenarios

Recovery comes into play when you restart a node or the database. Depending upon how the node or database was shut down, there are three recovery possibilities:

  • Recovery of failed nodes: One or more nodes failed, but the database continues to run since the remaining nodes in the database are able to fill in for the failed nodes.
  • Recovery after an unclean shutdown (manual recovery): The database was not shut down cleanly, which means that the database became unsafe due to a failure. There are several reasons for unclean shutdowns, such as:
    • A critical node failed, leaving part of the database’s data unavailable.
    • A site-wide event, such as a power failure that causes all nodes to reboot.
    • Vertica processes on the nodes exited due to a software or hardware failure.

Recovering Failed Nodes

When one node in a running database cluster fails, or if any files from the catalog or data directories are lost from any one of the nodes, you can recovery the failed node using either the Administration Tools or the Management Console.

To recover a failed node using Administration Tools:

  1. Run Admin Tools.
  2. From the Main Menu, select Restart Vertica on Host and click OK.
  3. Select the database host you want to recover and click OK.
  4. Verify the recovery by selecting View Database Cluster State from the Main Menu. The state will be UP if the node recovered.

To recover a failed node using Management Console:

  1. Connect to a cluster node (or the host on which MC is installed).
  2. Open a browser and connect to MC as an MC administrator.
  3. On the MC Home page, double-click the running database under the Recent Databases section.
  4. On the Overview page, look at the node status under the Database sub-section. The status will indicate how many nodes are up, critical, down, recovering, or other.
  5. Click the failed node to select it and in the Node List, click the Start node button.

Recovering the Database After a Shutdown

If you lose the Vertica process on more than one node (for example, due to power loss), or if the servers are shut down without properly shutting down the database first, the database cluster indicates that it did not shut down gracefully the next time you start it.

When re-starting the database, Vertica searches for the most recent epoch that can be recovered (last good epoch).

To recover after a shutdown:

From the Main Menu in Admin Tools:

  1. Start the database by selecting Start Database from the Main Menu.
  2. Select the database you want to restart and click OK.
  3. If you are starting the database after an unclean shutdown, messages display, which indicate that the startup failed. Press RETURN to continue with the recovery process.
  4. Verify that you want to start the database from the good epoch date. Select Yes to continue with the recovery. Vertica continues to initialize and recover all data prior to the last good epoch.
  5. If recovery takes more than a minute, you are prompted to answer Yes or No to “Do you want to continue waiting?”
  6. When all the nodes’ status have changed to RECOVERING or UP, either:
    • Select No to exit this screen and monitor progress via Main Menu.
    • Select Yes to continue displaying the database recovery window.
  7. Reload any data that was added after the last good epoch date to which you have recovered.

Best Practices for Disaster Recovery

To protect your database from site failures caused by catastrophic disasters, maintain an off-site replica of your database. That way, if you do encounter a disaster, you can switch database users over to the standby database. How you choose to recover from a disaster depends on two factors:

Depending on your RPO and RTO, Vertica recommends choosing from the following solutions:

  • Dual-load
    During each load process for the database, simultaneously load a second database. You can achieve this easily with off-the-shelf ETL software.
  • Periodic incremental backups
    Use the procedure described in Copying the Database to Another Cluster to periodically copy the data to the target database. Remember that the script copies only files that have changed.
  • Replication solutions provided by storage vendors
    If you are using a SAN, evaluate your storage vendor’s replication solutions.

The table below summarizes the RPO, RTO, and the pros and cons of each approach.

recoveryTable

Monitoring Recovery

There are several ways you can monitor database recovery:

  • View the vertica.log files on each host. Each message is identified with a [Recover]string.
  • Use the admintools view_cluster tool from the command line.
  • Select View Database Cluster State from the admintools Main Menu.
  • Query the PROJECTION_RECOVERIES and RECOVERY_STATUS system tables./li>
  • View node status on the Management Console Overview page under the Database section.

Learn More

To learn more about recovery in Vertica, see the Recovering the Database section in our core documentation.