. 68
( 87 .)


in a vacuum due to the interdependencies on the organization processes,
environmental factors, the network, and other systems.
The ¬rst step is to get a level set of the state of affairs. A team must be
established with representations from development, project management,
and infrastructure support to look at system management processes, oper-
ations, and the application/infrastructure architecture. The team must also
review the system outages and perform a root-cause analysis.
After performing the analysis, the data is used to identify weaknesses in
the processes. Processes are examined to reduce the frequency of occurrence
of a problem, the duration to ¬x the problem, and the impact on the complete
IT environment. This is done by implementing the following steps:
Proactive problem prevention that ensures when a problem occurs
the root cause is identi¬ed and procedures are put in place to ensure
it does not recur.
Effective change management to ensure that only authorized and
tested changes are implemented.
Systems and Applications design is made reliable, scalable, complete,
and accurate with features that integrate with the system
management infrastructures. All exceptions are detected and are
designed to be self-healing or provide automatic recovery. Proper
logging is done with well written messages that clearly identify the
problem, the module, time, and data that caused the exception.
WY009-21 WY009-BenNatan-v1.cls May 13, 2004 22:25

Designing High Availability into Your Portal Server 407

Monitoring is proactive so that solutions can be automatically
implemented to prevent a problem from occurring.
Recovery procedures including disaster recovery procedures are
automated as much as possible and are documented and regularly
Situation management procedures are put in place and regularly
followed through.
WebSphere Portal is tested using a formal methodology that covers
exception testing, user testing, boundary testing, and load/stress
testing. Parallel testing must be done that simulates exactly a
worst-case scenario for a production environment. This environment
needs to include ¬rewalls con¬gured as they are for production, a
similar network (including routers), load/boundary and/or stress
test to simulate production, and simulated exceptions to test
operation procedure and recovery process.
The data design is reviewed by the data architects to isolate any
corruption or performance issues. Security must also be reviewed to
ensure that only authorized applications/users are accessing the
WebSphere Portal system is isolated from other application servers
and has suf¬ciently allocated bandwidth.
Everything is documented and it is ensured that a process is in place
to keep up-to-date. A librarian should be appointed and the
documentation should be made public and reviewed regularly.
Standards need to be in place for each component including
operating system (including ¬x level), WebSphere Portal, hardware,
¬rewalls, application build, routers, and router software. The version
and release level should be consistent in each test and production
environment and especially if it is a fail-over component. Each
component version and release needs to be tested and well mature in
the marketplace before being accepted as a standard. They should
also be obtained from well-established vendors, thus minimizing the
impact of having a nonsupported component.

Processes by themselves do not improve availability. They need to be
integrated into the organization to ensure that they are being correctly im-
plemented and those issues are receiving appropriate visibility. Service level
agreements need to be established with each department and vendor and
metric gathered to ensure that the departments/vendors are meeting their
commitments. Regular reporting to upper management of system outages
and service level deviation are required. The result of weekly meetings is
WY009-21 WY009-BenNatan-v1.cls May 13, 2004 22:25

408 Chapter 21

that for every deviation an action and time factor are required to resolve
the issue.
After you address your processes, and your organization, you look at the
technological elements associated with WebSphere Portal. Speci¬cally:

Remove any single point of failure. Look at each component
including routers, communication vendors, operating systems,
middleware, application servers, applications, disk drives, network
cards, servers, portlets, Web servers, databases, and LDAP servers.
Single point of failures can be eliminated using hardware
redundancy, multiple network cards attached to isolated networks
supported by different communication vendors, load balancing, hot
fail-over, and clustering. For LDAP servers, look at implementing a
master“slave relationship, databases either hot fail-over or parallel
databases with data replicated over multiple storages using mirrored
disks. Disk redundancy can be implemented with two NAS (network
attached storage), each attached to a separate isolated network with
storage software that replicates data stored on the master NAS to a
slave NAS. Upon failure of the master NAS, the router will redirect
to the slave NAS. Keep redundant servers in different locations so
they are not impacted by local environmental failures.
Use only mature software and reliable and serviceable hardware.
Most vendors are pressured by the market and their shareholders to
release products quickly. Thorough testing is not always performed
due to time and cost factors. Sometimes due to number of products
that can interact with it, it is not always possible. In mature products,
both design and function ¬‚aws have been found by the market and
usually ¬xed (or a workaround provided) by the vendor.
Implement security software to prevent unauthorized access and
data corruption.
Automate the system management as much as possible. Employ
operating and server agents and agents for WebSphere Application
System. These agents should monitor and perform recovery or
performance tuning when a threshold occurs. Also automate the
change management process. Integrate the system management
process to your help desk.
Set up a test environment that duplicates your production
environment. Ensure that development, test, and production are
isolated from each other.
Reuse well-established con¬gurations. Your staff does not need to
learn a system and they are familiar with the issues. The
con¬guration is tested and spare parts are usually available.
WY009-21 WY009-BenNatan-v1.cls May 13, 2004 22:25

Designing High Availability into Your Portal Server 409

Over-con¬gure your hardware. Calculate what memory, DASD, and
CPU speed you need and multiply by at least 4. It is cheaper to have
inef¬cient software running on faster hardware than tuning the
software for the hardware con¬guration. It is very easy to
underestimate capacity needs and costlier to add on when the system
is in production. No user ever complained that his or her system was
too fast.
And now, to drive you nuts, the last and most important point com-
pletely contradicts the ¬rst point. Consolidate your servers and keep
your solution as simple as possible. Complex solutions introduce more
failure points and make it more dif¬cult to ¬nd the problem. While
the time to failure may lengthen with a complex con¬guration, so
also will the time to repair due to the increase in manageability issues.

Implementing a Highly Available WebSphere
Portal Solution
So now after reading all this high-level advice on high availability, you must
be saying, “Great stuff, but how do I implement a highly available Web-
Sphere Portal?” This section will tell you how. In the paragraphs that follow
you will examine two highly available models for WebSphere Portal based
on vertical and horizontal scaling. We will discuss their advantages and
disadvantages and show you how to implement WebSphere Portal cluster-
ing. We are not going to drill down further the process and organizational
components because this will require a separate book.
Support for automatic fail-over and load balancing for WebSphere Por-
tal is provided by WebSphere Application Server Network Deployment. It
supports the concept of clusters, which enables a logical collection of ap-
plication server processes to operate logically as a single application server
process. If one server within the cluster fails, then the workload is picked
up by the other servers.
WebSphere Application Server Network Deployment terminology for a
hardware server is a node. A group of nodes under a single administrative
node make up a cell. Single point administration for the cell is done by the
Deployment Manager, which uses the cell master con¬guration repository
to store the con¬guration for all nodes in the cell. Each node has an agent
that communicates with the Deployment Manager and provides ¬le transfer
services, con¬guration synchronization, and performance monitoring.
You can use three different models for designing your highly available
WebSphere Portal: WebSphere Portal using vertical scaling, WebSphere Por-
tal using horizontal scaling, and lastly a hybrid using both horizontal and
vertical scaling.
WY009-21 WY009-BenNatan-v1.cls May 13, 2004 22:25

410 Chapter 21

Vertical Scaling with a WebSphere Portal Cluster
Each WebSphere Portal instance has a dedicated JVM. Vertical scaling is
basically the implementation of multiple instances of WebSphere Portal on
a single node. Figure 21-3 shows a simple example of WebSphere Portal
cluster con¬guration using vertical scaling. Requests ¬rst go to a reverse
proxy server that sprays the request to an available Web server. The Web
server via the plug-in forwards the request to the WebSphere Portal Web
containers WebSphere Portal 1 and WebSphere Portal 2. Since they are part
of a cluster, the node agent will monitor their performances and forward the
information to the Deployment Manager and manage the workload if one of
the instances goes of¬‚ine or is busy. Authentication is via the LDAP server,
which uses a master“slave relationship to maintain availability. Availability
for the database is maintained by using a parallel database. All data is
stored on two Network Access Storage systems that mirror each other using
storage synchronization software. Each server has two network cards that
access two separate networks that are isolated from each other.
Vertical scaling provides many bene¬ts. It enables you to make better
use of the CPU since multiple JVMs can more fully utilize the processing
power than a single JVM due to JVM™s concurrency limitations. It is also
easier to maintain and cheaper since you are administrating fewer machines
and session management is easier since you can use memory-to-memory
session replication.

Figure 21-3 WebSphere Portal cluster using vertical scaling.
WY009-21 WY009-BenNatan-v1.cls May 13, 2004 22:25

Designing High Availability into Your Portal Server 411

Of course, the vertical scaling model has certain limitations. It has more
single point of failures (LDAP Client, Node Manager, Database Client,
hardware, and so on), requires greater memory management, and value
decreases signi¬cantly as more instances are added.

Horizontal Scaling with a WebSphere Portal Cluster
Horizontal scaling is putting on WebSphere Portal instance per node and
then adding the nodes to the cluster. Figure 21-4 shows the previous exam-
ple using horizontal scaling. As in the previous example, Node Manager
monitors the performance of the WebSphere Portal instances; however, this
time Deployment Manager receives information from multiple agents.
The bene¬ts of horizontal scaling are numerous. You can isolate the sys-
tem from various hardware or software failures. Each system can be in
different locations, thus isolating you from environmental factors such as
a disaster. By creating multiple cells, you can also perform maintenance or
test a new version in a production system without interrupting service.
The disadvantages are that it is more costly (more machines, software li-
cense), more complicated and thus harder to administrate, and session man-
agement is slower and less reliable since it requires database persistence.
The last model combines both vertical and hardware scaling. Each ma-
chine runs multiple instances and all the instances are part of a cluster.
This, of course, combines the advantages of both vertical and horizontal

Figure 21-4 WebSphere Portal cluster using horizontal clustering.
WY009-21 WY009-BenNatan-v1.cls May 13, 2004 22:25

412 Chapter 21

clustering so long as you only use a few instances per server. However, it is
very costly and complex to administer. This is not a preferred strategy for
those who like to keep it simple.

Con¬guring WebSphere Portal in a
Clustered Environment
Now that you understand the advantages and disadvantages of different
availability models, the next step is to learn how to implement WebSphere
Portal in a clustered environment. Speci¬cally, you will ¬nd out how to
implement WebSphere Portal using the example infrastructure detailed in
Figure 21-5. The infrastructure, which is based on horizontal scaling, has
a single IBM HTTP Web Server (sandbox1), which passes requests to two
WebSphere Portal Server nodes (sandbox3, sandbox4) that are part of a
WebSphere cell. Authentication is performed using IBM Directory Server


. 68
( 87 .)