Thursday, December 15, 2011

What happened to CAL-ACCESS? Reporters ask, SoS explains

Over the last few days there have been numerous stories about the failure of the Cal-Access online campaign finance disclosure system, operated by the California Secretary of State. The LA Times quotes Derek Cressman of Common Cause calling for hearings, and the Sacramento Bee interviewed Secretary of State Debra Bowen who stated, "We want to get it up as soon as possible, but we also want to complete the fix that will be the most stable over time."

Today, Chris Reynolds, head of the Secretary of State's Political Reform Division, sent around a document explaining why their technical staff believes Internet access to the system went down and what they are doing to restore Internet access to the system as quickly as possible. His email also says the his office is available to assist people in the meantime by phone, email, fax, or in-person visit.  The main number is (916) 653-6224, email address is http://www.sos.ca.gov/webcontact/general/question.aspxfax is (916) 653-5045, street address is 1500 11th Street, Sacramento, CA 95814.

Here is the memo:
---------

What happened to CAL-ACCESS?

CAL-ACCESS (the California Automated Lobbying and Campaign Contribution and Expenditure Search System) is a suite of applications developed in 13 different programming languages.  CAL-ACCESS runs on a server cluster and associated components that are more than 12 years old, and runs on an uncommon version of the Unix operating system called Tru64.

On November 30, 2011, the disk array controller experienced a physical memory failure that led to the loss of its disk array configuration and the loss of three physical disk drives.  The disk array contains a total of 90 disk drives with 15 disk drives installed in each of six drive enclosures.  (The array configuration defines which combination of physical disk drives form the logical disk drive that is presented to the operating system.)  When CAL-ACCESS was originally architected in 1999, it was common to locate the operating system on the disk array rather than on locally attached disks.  This configuration created a single point of failure in the array controller.

After replacing the failed memory equipment, staff were able to reconfigure a very small portion of the disk array that permitted the server cluster to start.  The portion of the disk array that houses the area where the databases reside was not immediately recovered since a more extensive amount of time was needed to remap the entire disk array.  To make the system available by Internet again as soon as possible, staff ported the server cluster to use an alternate network-attached storage device and used a backup to restore the data by December 7.  The configuration functioned for about 30 hours before it failed again on December 9.  Staff tried multiple approaches to recover this configuration throughout Friday, Saturday and Sunday.

On Monday, December 12, staff initiated three concurrent recovery methods to restore services: 

1. Porting CAL-ACCESS off of the Tru64 cluster to a modern hardware architecture, which involves modifying the database and reprogramming websites and applications in as many as 13 different coding languages.
2. Virtualizing the Tru64 Unix environment to move off of the aged equipment, which includes building new servers and installing and configuring software that can emulate the DEC Alpha architecture to run on an Intel architecture.  Once this environment has been established the Tru64 operating system can be installed and configured to match the old production environment, and the databases and applications can be restored from backup.
3. Rebuilding the original disk array, which is expected to take 10 to 14 days.

Work on the first method started on December 12.  The second and third methods require contracted specialists and a state contract approval is expected by December 15. 

Then will CAL-ACCESS be permanently fixed?

Created in 1999, CAL-ACCESS is now very old and fragile, and few people in the United States are familiar with the antiquated technology used to build and operate the system.  The recovery efforts will make CAL-ACCESS stable and get it running, but it can never be more robust or feature-laden.  Ideally, we need a fresh start with an all-new CAL-ACCESS.  

No comments:

Post a Comment