I have transactional replication set up between 2 SQL Server 2005 databases on 2 different boxes. Both the log reader and distribution agent run in continuous mode. The distributor is residing on a third SQL Server box. We are having performance issues with replication when there are large batch deletes/inserts happening on the publisher. There is a batch job that runs for about 8-10 hours everyday on the publisher and deletes/inserts thousands of records as part of transactions. The amount of time to replicate all this data on the subscriber is around 13-15 hours which is not acceptable to our user community. While monitoring I found that the distribution database (MSRepl_commands table) at times has millions of records in it which would explain why the latency is so high. Add to it the fact that there are large transactions occuring on the publisher.
I was wondering if anyone has faced similar problem before. Are there any conifguration changes I can make to the replication infrastructure to reduce latency?
Would appreciate your help.
Thanks.
I wasn't sure if the large MSrepl_commands table is the root cause of your performance problem, but there are some parameters you can change to potentially reduce the size of this table. The parameters are @.max_disretention and @.history_retention of sp_adddistributiondb (you can change it use sp_chagnedistributiondb or through SQL Server Management Studio UI Distributor Property page). The default value is 72 and 48 hours each. Be careful when changing those parameters, your subscription may be deactivated if they haven't synced with publisher beyond the retention period. See SQL Server 2005 Books Online topic for sp_adddistributiondb for details.
If disk I/O is your problem, then you could probably change the location of distribution db to a different physical drive than publication db, that may give you some performance boost as well.
There are some parameters of logreader agent and distribution agent job may help you with replication performance, you can experiment with those parameters to see if they help. (see http://msdn2.microsoft.com/en-us/library/ms151223.aspx for details)
Thanks,
Zhiqiang Feng
|||
Do you know if the latency is higher for the logreader agent or the distribution agent? You can experimenet with tracer tokens to figure this out. But for now, let's assume distribution agent is the bottleneck. If this is the case, performance can be any one of these:
- if you have large transaction size, distribution agent will not start replication the transaction until it has completed. FOr example, if your transaction has an update statement that affects 1 million rows, until the transaction is completed, and until the logreader agent has completed replicating the last update, the distribution will not pick up those commands.
- With such high activity, could disk be a bottleneck at the distributor?
- WIth such large transaction size, distribution agent will apply the same large transaction to the subscriber. Could there be blocking at the subscriber? Could subscriber's tempdb be growing?
- Could you benefit from Distribution Agent's -SubscriptionStreams parameter?
- As Zhiqiang mentioned above, do you have too much history laying around?
- In the UI, the tab that shows the latency for logreader and distribution agent - it runs an expensive query to determine latency. Please use it carefully.
- Your batch processing, is it or can it be done inside of a stored procedure? If so, you could definitely benefit from replicating the execution of a stored procedure, have you considered this? THis saves space in the distribution database, and should improve performance.
- check distribution database to see if filegrowth is occurring.
No comments:
Post a Comment