Troubleshooting Active Directory Replication
In the first three parts of this series, I explained the importance and techniques of breaking large organizations into sites for the purpose of Active Directory replication. As you've no doubt learned, a considerable amount of planning should go into dividing your network, because doing so can be complicated. As with any complicated process, things can and sometimes do go wrong. In this article, I'll discuss some techniques that you can use to troubleshoot Active Directory replication.
Choosing a Preferred Bridgehead Server
If replication seems slow, you can compensate by creating a preferred bridgehead server within each site. A bridgehead server is the replication point in each site. For example, if a site has four domain controllers, you don't want all four to try to replicate their individual copies of the Active Directory information to a foreign site. Doing so would result in excessive network traffic, because every domain controller in the organization would try to replicate its information to and from every other domain controller.
Instead of using such a chaotic technique, one domain controller in each site acts as a spokesman for all other domain controllers within the site. This domain controller is known as the bridgehead server. If a domain controller other than the bridgehead server needs to distribute Active Directory updates, it does so only within the local site. After the domain controllers within the site have been updated, the bridgehead server then replicates the changed information to the other sites. Only the bridgehead server can make contact with the other sites in the organization.
By the nature of the site structure, each site is automatically set up with a bridgehead server. However, the bridgehead server that Windows 2000 selects may not always be the best choice. Because it must handle all replication-related traffic and functions in addition to its normal workload, carefully consider which server should act as the bridgehead server in each site.
Ideally, the bridgehead server should have a relatively light workload and lots of bandwidth to spare. As long as the bridgehead server has plenty of processing power, memory, hard disk space, and bandwidth, the replication requests will be handled quickly and efficiently.
You can specify more than one preferred bridgehead server. Only one server in each site can actually function as the bridgehead server, but if you've listed multiple servers as preferred bridgehead servers, Windows 2000 will select the bridgehead server based on preference starting with the first server on the list. Should the first server on the list of preferred bridgehead servers fail, the second server on the list will be used. If all the servers on the list fail, or if you don't have any servers on the list, Windows 2000 will automatically designate one domain controller in each site as a bridgehead server.
To designate your preferred bridgehead server, follow these steps:
- Click Start and choose Programs|Administrative Tools|Active Directory Sites and Services.
- In the AD Sites and Services console, navigate through the tree and select the domain controller you want to make into the bridgehead server. Right-click on the server and select Properties from the resulting context menu.
- In the server's properties sheet, you'll see a list of transports available for directory replication. Select the transport protocol of choice and click Add. The transport protocol will move from its original location to the area that designates the server as a preferred bridgehead server for the listed protocols.
Troubleshooting Poor Performance
There are other causes of slow replication besides a poor choice for a bridgehead server. Many times, the effects are felt in the form of poor Active Directory performance. For example, client requests may be extremely slow. In this section I'll discuss some problems that are ultimately related to Active Directory replication performance, along with the solutions to such problems.
A poorly designed site link structure can lead to slow replication. If all your sites are connected to each other by site links, replication will usually work. However, this may not be the best arrangement. Depending on the layout of your physical network and your site structure, it may be much more effective to create separate site link bridges between some of the sites you're replicating. Doing so will provide the replication traffic with a more direct path to follow. For more information on site link bridges, check out Part 3 of this series ( Building Site Link Bridges ).
|"Depending on the layout of your physical network and your site structure, it may be much more effective to create separate site link bridges between some of the sites you're replicating. "|
Other problems can be caused when replication-related network traffic consumes far too much network bandwidth. This problem can cause a wide variety of problems, including failed client requests. One solution is to isolate the replication traffic by placing a second network card into each bridgehead server and using an isolated network segment to connect the bridgehead servers. Remember that Windows 2000 allows you to set a cost for each network connection; therefore, you could set a very low cost for the isolated segment and a higher cost for the existing segment. By doing so, Windows 2000 will begin to use the isolated segment for all replication traffic. However, if the isolated segment fails, Windows will reroute the traffic onto the segment with the next highest cost. In this case, that would be the currently existing segment.
Many times, the only possible connection between bridgehead servers is a slow WAN link. In these cases, adding an isolated connection is impossible. Instead, you can reorganize your site structure or your replication schedule. Remember that the whole reason for dividing your network into sites in the first place was to reduce replication traffic. If it's been a while since you established your site configuration, you might go back and look at how it was set up. Perhaps a more effective layout would reduce replication traffic. Even if your sites are optimally arranged, you can always change your replication schedule. For example, if you're replicating between sites every half hour, maybe you could replicate every hour, instead.
Another issue you may encounter is that some clients experience very slow responses when making Active Directory requests. If this is the case, they may be linked to an inappropriate site or domain controller. For example, suppose you have a group of 20 clients at a warehouse down the street. Now, suppose the warehouse is connected to the main office by a T1 line. Although such a connection may have initially been enough to support a limited number of clients, it's much more effective to create a new site at the warehouse so that the clients have a local server with its own copy of the Active Directory. Now, when a client needs to make an Active Directory request, it can do so at the local level rather than having to send the request and the response both across a slow WAN link. After the creation of the new site, the only Active Directory requests sent across the slow link are replication updates. This is a very effective arrangement because the server used to create the new site only needs to be powerful enough to handle basic Active Directory tasks. So, your investment in new hardware could be minimal, should you have budget constraints. For that matter, you could even recycle an old PC as the server.
When Replication Fails Completely
Earlier, I discussed the concept of a preferred bridgehead server.
Replication may also fail if the sites you're trying to replicate aren't linked properly. This may be the result of choosing an incorrect site link for a site, or it could be caused by failure to create a site link bridge. If you're having such a problem, check out the previous articles in this series for more information on these topics. //
Brien M. Posey is an MCSE who works as a freelance writer. His past experience includes working as the director of information systems for a national chain of health care facilities and as a network engineer for the Department of Defense. Because of the extremely high volume of e-mail that Brien receives, it's impossible for him to respond to every message, although he does read them all.