Implementation Phase

IMPLEMENTATION PHASE

I. OVERVIEW: SPECIAL CONSIDERATIONS FOR YEAR 2000 IMPLEMENTATION PHASE

The Implementation Phase includes those activities required to return a modified application to the production environment. This Phase deals with software migration and control, data conversion and the ultimate confirmation of both operational and procedural links to data exchange (for example, interfaces and access). These comments address special considerations above and beyond the typical activities an agency would perform to ensure that the production environment is protected from introducing inadequately tested or unauthorized software. These items include: • The importance of formal implementation schedules for planning resource availability and mitigating risk; • The opportunity to continue testing after software has been placed into production; • The need to be extra diligent in managing external data exchanges to mitigate the risks of business partners and agents beyond our control but part of our business; • The imperative of updating the Business Resumption/Disaster Recovery plans and archives; • The imperative of contingency planning. II. THE IMPLEMENTATION SCHEDULE: UNDERSTANDING THE COLLECTIVE WORKLOAD Maintain a detailed understanding of the implementation resource demands: Across the agency there will be a lot of changes: the computer platforms, operating software, third party software, utilities, and application software are all scheduling migrations to production during the same limited window. This will place increased demands on resources and will add a level of complexity in isolating the source of any problems encountered. Allocate human resources to support installation and migration activities: It is important that the agency understands the requirements for knowledge workers to support software installations and migrations. An analysis of the integrated implementation plan can pinpoint these demands. Adequate and skilled personnel must be available to support each and deal with problems that might arise. The amount of change occurring in condensed time frames will afford little slack in the overall schedule. This increases the importance of staying on schedule and of minimizing disruptions and recovery requirements as much as possible. Anticipate and provide adequate physical capacity (MIPS and Bytes): The agency must also understand when the physical resources and capacity must be available. This also can be derived from a detailed analysis of the implementation plan. For example: • The necessary hardware and compliant environments must be available and stabilized. • The agency should allocate an adequate amount of time for migration activities. • File conversions and data base restructuring may require significant amounts of wall time, CPU cycles, magnetic storage media, etc. • Additionally, historical files and archived data may also require conversion. The demands for resources will be unusually high. Many systems will be experiencing an unusually high resource requirement at the same time. The "normal" ebb and flow of resource demand that allows resources to be traded off to cover an unusual demand from any one application cannot be counted on to satisfy much of this demand. The agency may place a moratorium on non-essential changes; especially if the change impacts the production environment at the same time that a large number of systems are scheduled for migration. Any such freeze should precede the scheduled migration to ensure that a stable production environment is available. Extending any such freeze after the migration is equally attractive. Freezing the environment around the actual migration period makes it easier to isolate the cause of any problems that may arise. An agency should not overlook the fact that the hot sites that many data processing centers use for emergencies and business recovery may be appropriate for performing large, time consuming file conversions. However, a decision to use hot sites requires comprehensive and tedious planning and execution by a thoroughly knowledgeable staff, and would include: • Budgeting for the task well in advance; • Determining the availability of the hot site to schedule the transfer and conversion; • Identifying the application expert; • Determining which files need to be converted; • Identifying what environmental and application configuration is required at the site; • Resolving any software licensing issues that might exist at the site; • Determining how software and files will be transferred to the site; • Determining how the integrity of the transfer will be ascertained; • Determining how to merge the converted files with the current data; • Determining what tests must be conducted to verify the conversion was successful. III. POST IMPLEMENTATION PROCESSING: THE FINAL FRONTIER FOR TESTING Reduce risk by building into the implementation plan an adequate post implementation interval: Most IT industry experts agree any application should be implemented no less than six months before encountering year 2000 processing. In the case of financial systems, the implementation should occur a full fiscal year before the system encounters year 2000 dates. This enables the agency to conduct a parallel fiscal year end closing "dry run" cycle. Conduct comprehensive testing during the post implementation interval: Prior to implementation, the agency has thoroughly tested and certified the application and performed tests that advance the dates into the next century. However, the amount of change can be quite pervasive within an application. Furthermore, some legacy applications may have few on staff with sufficient institutional knowledge to develop robust test scenarios. As a result, there is a higher than normal risk that application testing did not exercise the particular conditions that would expose a flaw. After implementation, therefore, the agency should continue to be aggressive on two fronts: monitoring the functional integrity of the application and performing more extensive advanced date testing across application and organizational boundaries. Monitor the functional integrity - Confirm that changes did not adversely affect the current system: Following implementation the agency should perform a parallel test by running the same data through both the original version and the renovated system to compare results. The agency can carefully examine the results of these business transactions, as if the daily transactions were a thorough, comprehensive set of regression tests. There should be no differences if only Year 2000 changes were made to the software. If functional changes were introduced at the same time as the Year 2000 renovations, the parallel test results will not be identical. There will be differences that are the result of the functional changes introduced; and these differences can be "explained" based on the functional changes. A purely functional test, similar to that performed for a major release, may be more thorough than a parallel test, but the parallel test is more efficient. A parallel test may not require extensive functional expertise if there is a high confidence that the test scripts are comprehensive and fully exercise a broad scope of transactions and conditions. Truly parallel processing (testing) requires extensive "spare" resources (for example, CPU cycles, DASD, response time and wall time) to support double the business volumes, nearly duplicate applications, etc. It introduces unique issues on its own: for example, "if the new process is running smoothly, will we resolve an abend in the old process" or "will two versions of the software be maintained?" If parallel processing to verify functional integrity proves too much of a burden, it may be more feasible, although less comprehensive, to approach functional verification from the perspective of an occasional "spot check". Someone with knowledge of the business transactions can help determine when to perform the parallel runs. For example, the agency performs fiscal year closing transactions only during an isolated time frame. For very critical applications with low functional confidence, there are specialized tools (e.g., test data generators and test coverage monitors) in the industry to help develop a comprehensive test bed. Perform integrated forward date testing - Confirm the automated business process continues to perform correctly with future dates: It is not enough to confirm that the renovated application behaves properly for transactions in the current year. Future date testing is crucial. Adequate testing must simulate dates in the next century. By finishing well before the actual turn of the century, the agency can use the time to perform more extensive forward date testing. The agency can examine interface testing and trace a single business transaction through from start to finish. The agency will want to perform using the full complement of production software. They will want to set the clock to a date in the future and process business transactions against that software, to mimic the natural flow, timing and volumes typical of the business. Experts encourage use of a parallel test approach as the most efficient means of addressing simulated future date testing, although structuring such a test is difficult. A parallel test simulating a future date requires that the test bed and transactions be "aged" and the "system" date advanced by the same amount. If the application uses conditional logic based on day-of-week, selecting a twenty-eight year interval would be an advantage, since the day of the year falls on the same day of the week every twenty-eight years. • Check specific future dates or intervals to assure that the software will continue its proper behavior. These boundary conditions include: • Starting on December 31, 1999 and extending into January 1, 2000; • Starting on February 28, 2000 and extending into the next day; • Testing on February 29, 2000; • Testing on March 1, 2000; • Testing the first and last days of the same fiscal year (e.g., 10/1/2000 and 9/30/2001). Additionally, there are known values in date fields that traditionally have held special meaning. We need to confirm that the system will behave properly as the calendar achieves each of these dates, e.g., 12/31/1999 or 9/9/99. IV. DATA EXCHANGES: HOW TO MITIGATE THE RISKS Prepare for the possibility that schedules will change, or that you will not receive the data you expect: The area in which an agency is perhaps most vulnerable concerns those components that are outside the agency’s scope of control. Early on in this process the agency, has come to terms with external organizations with whom it exchanges information (trading partners), including date formats, file revisions and time horizons. There are several actions an agency will be wise to address to reduce the impact if things do not go as planned. Reduce the potential disruption of schedule changes outside the agency’s control: As the implementation date draws closer, the actual date that the agency and the business partners will be ready to send or receive files in the new format will become critical. The trading partner’s dates may slip; the agency’s dates may slip. If the agency and partner had already planned to be ready at different times, a filter or bridge program may already be in place. If not, last minute changes in schedules may result unexpectedly in the need for such bridging software. To reduce the disruption to either organization, it is important that: • Communications with key trading partners be maintained; • Viable contingency plans be in place; • Triage approach used to prioritize where constrained resources should be directed if competing demands arise. The agency’s plan should describe the specific steps the organization will take if corrupt data is received, if identical data is received, or if no data is received. Reduce the risk of processing corrupt data received from external trading partners: Catastrophe awaits the agency that attempts to process corrupt data. Many systems routinely edit incoming data to verify that it meets certain criteria. These edits are enough in most cases to provide assurance that the data is reasonable and valid. Agencies should review these edits and institute new ones to help guarantee the integrity of the incoming record. In some cases it may be impossible to establish Year 2000 reasonability edits at the individual field level. Comparative or statistical techniques may have merit if the consequences of data corruption are extreme. For example, data fields within a record could be examined for consistency; for example, it is unlikely a 12 year old would have a graduate degree. Statistical ranges for data values could be developed using current data to develop appropriate edits; for example, age falls between 25 and 55. The agency and their trading partner must understand how to process corrections for corrupt data. For example, an entire "batch" could be rejected if a single record is suspect; the console operator could make a phone call; suspicious transactions could be dumped to an error file. In many cases these solutions require additional software support. Reduce the risk of disruption if no data were received from an external trading partner: An equally important question for an agency to ask is: How long would it take to discover that no data has been received from a specific trading partner? Can the application run with only the input files it has received, or must every file be available? Again, solutions may require additional software support. V. BUSINESS RESUMPTION ARCHIVES: DON’T OVERLOOK THE OBVIOUS Revise the disaster recovery plans and system backup files: The agency should maintain all critical files and source code in a safe environment. The Business Resumption Plan should describe the specific steps to follow to retrieve and implement these components into the production environment. With all the changes taking place within an agency, it is easy to accidentally overlook updating these plans or archiving the new files. During this phase, the agency should update the resumption plan and off site records. Another area of important concern is retaining the ability to rerun older processes. For example, it may not be feasible or appropriate to convert all of the data files, but it may be necessary to "retrieve" or "restore" records in the old format. It may be appropriate to retain any bridge software or file conversion software to support this activity, even though this software may not have been considered part of the application. VI. CONTINGENCY PLANNING: HOW TO HEDGE YOUR BETS WITH WORK-AROUNDS Contingency Planning involves the preparation and partial implementation of alternative work processes in the event of a business failure of varying proportions. It is perhaps the most difficult aspect of solving the Year 2000 problem, because it involves a disciplined investigation into all aspects of a corporation's business to locate those points where significant risk can occur which could immobilize a business's viability. Likewise, one of the primary values of contingency planning is that planning has taken place before the crisis. Valuable recovery time is not lost in planning after the fact. Cooler heads can assess alternatives. VII. Invite users, experts and support personnel: A contingency plan must be broad based and have wide business support. Because it can involve the suspension/ elimination of some existing processes, those areas of the business which are potentially affected must be involved in the decision process. The user organization knows the potential impact of doing without certain key business processes, and can know what they can live with and without in an emergency situation. The duration of the emergency will affect the contingency chosen. An outage of an hour or two will be handled differently from an outage of 2 weeks or more, consequently, contingency alternatives relative to the duration must be developed. Representatives from infrastructure areas, such as facilities, security, food service, etc. should also provide input. It doesn't matter how good a contingency plan is if no one can get into the building to execute it. Each core business function should be represented in the identification of critical business processes. Some areas will be more critical than others, but initially, all areas should be represented, and some ranking of criticality made. The ranking becomes necessary, because as areas requiring contingency plans are identified, constraints relating to costs, resources and time will surface. Those most necessary to the continued functioning of the business must be done first. Executive management must agree to and support the ranking and allocate funds and staff for the development of the contingency plan. If business interruption to a critical process cannot be tolerated, a contingency plan must be developed. One of the factors in evaluating proposed alternatives is potential risk. An alternative which has a very high probability of risk and a very high cost is probably not an acceptable choice, while one with a low probability of risk and a high cost may be. Develop a spread sheet which evaluates the following items for each potential alternative for each potential business failure: Cost Resources to implement Time to implement Risk Pros Cons Impacts Define controllable scope: Contingency plans should be developed for all mission critical systems, systems that are to be replaced and systems that will encounter Year 2000 problems in advance of the actual turn of the century. In assessing those things for which contingency plans need to be developed, it is important to define the scope of what is and is not controllable. To the extent possible, firewalls (procedures to isolate the agency) should be developed for those things which are beyond the agency’s control. Where firewalls can't be developed for things beyond the agency’s control, dialogue must be initiated with trading partners to gain insight into their preparedness. And finally, some failures may have to be endured, because alternatives are just too costly. Develop contingency plans: What is important is that comprehensive and detailed work-around plans be developed, documented, tested and placed on several library shelves in anticipation of the day they will be needed. For these work-around plans, the agency will want to make sure they understand what processes will need to be performed manually or without computerized support. It will also be important to acquire specialized supplies, forms or equipment in advance. It would be wise to conduct a dry run to ensure that all the processes will work, in the event of an emergency. This should also highlight any training needs. Note: It is not necessary that the work-around be as robust and complete as the "real thing". Agencies will need to make sure they are capturing all the essential aspects of their process, have a way to continue to do business, and have a way to "recover" from the temporary disruption when the applications "come back on line". The work-around plan should, further, address: To whom does the plan need to be communicated; Under what conditions (triggers) will the plan be invoked; How long the loss can be endured before implementing the contingency; Who has the authority to invoke the plan; Who has the responsibility for invoking the plan; Who should have copies of the plan; Who has the original and what is the date of the most current plan; How frequently should the plan be updated; The level of service to be provided during the "outage". See the US Fish and Wildlife Service (http://www.fws.gov/pullenl/security/contpln.html) for more detailed guidelines for contingency plan development.