Page 2 of 6

Re: webDiplomacy.net new server

Posted: Sun Jun 20, 2021 7:54 pm
by teccles
jmo1121109 wrote:
Sun Jun 20, 2021 7:27 pm
teccles wrote:
Sun Jun 20, 2021 7:06 pm
I'm afraid I just had another slow load time (several minutes).

I am using dark mode.
Was it the home page or something else on the site?
It's been both the home page and a game in the past. This time it was the home page.

Re: webDiplomacy.net new server

Posted: Sun Jun 20, 2021 10:50 pm
by Johnny Big Horse
Me too, slow loads today. Both home and game pages.

Re: webDiplomacy.net new server

Posted: Thu Jun 24, 2021 4:31 pm
by Puddle
So I'm seeing Timed Out Connections, and or Connection Rejected errors since the switch to the new server much more frequently than before. Typically about 1-2 times an hour I'm unable to access the site for 10-15 minutes. I'm also seeing a lot more unexpected Draws in Bot Games, typically during a phase where only the bots have orders to input, seems to happen in close proximity to connectivity issues as well.

Re: webDiplomacy.net new server

Posted: Thu Jun 24, 2021 6:22 pm
by michaelf77
It became impossible to play live games...
https://webdiplomacy.net/board.php?gameID=364826
was broken up by connection issues today (site would not respond and then we both got dinged)

Re: webDiplomacy.net new server

Posted: Thu Jun 24, 2021 8:40 pm
by Puddle
Since about 3:00pm Eastern (it's now 4:39pm) the site has been accessible for about 30 of the last 100 minutes, split into several even shorter spans. And I just had another Bot game draw for no orders entered. The site is significantly less functional than prior to the server migration.

Re: webDiplomacy.net new server

Posted: Thu Jun 24, 2021 11:31 pm
by kestasjk
Dang it.. Thanks for reporting these issues guys, we're collecting the error logs and will sort out the deadlocking issues

Re: webDiplomacy.net new server

Posted: Sat Jun 26, 2021 2:19 am
by FxFocus
Just so you know, I'm still seeing frequent outages. Enough to raise my blood pressure when playing bot games. LOL

Re: webDiplomacy.net new server

Posted: Tue Jun 29, 2021 8:56 am
by kestasjk
Thanks FxFocus, we’ll get it sorted asap

Re: webDiplomacy.net new server

Posted: Wed Jun 30, 2021 8:00 pm
by Tjappator
Is it perhaps just simply a routing issue?
I tried doing routing tests, but the results are quite weird. It seems like the packets are bouncing back and forth for quite some time in roughly the same area.

Re: webDiplomacy.net new server

Posted: Wed Jun 30, 2021 8:09 pm
by Tjappator
Addendum:
I used https://www.uptrends.com/tools/traceroute

For example, especially from Europe (Berlin, Amsterdam) the traces either contain time-outs or take almost 2000 ms in total.

When I try to do a trace from my own pc (for those that want to do this: in cmd type [tracert webdiplomacy.net] without []) I had a timeout as well.

Hope this helps.

Re: webDiplomacy.net new server

Posted: Fri Jul 02, 2021 8:02 am
by kestasjk
Thanks for coming up with some ideas Tjappator, but we're pretty confident this is due to locking issues, and we're fairly confident that is because the query optimizer has changed a lot since the system was written and there may be some queries that weren't designed with locking in mind, and we're getting deadlock errors as a result.

I've got four days off next week to look into this, and I'm expecting it to be resolved this time next week. (Fingers crossed)

Re: webDiplomacy.net new server

Posted: Fri Jul 02, 2021 1:18 pm
by MerryBBuck
we're pretty confident this is due to locking issues
No, I think it is Fairbot becoming sentient. Be careful next week, @kestasjk.

Re: webDiplomacy.net new server

Posted: Sun Jul 04, 2021 3:18 pm
by Tjappator
kestasjk wrote:
Fri Jul 02, 2021 8:02 am
Thanks for coming up with some ideas Tjappator, but we're pretty confident this is due to locking issues, and we're fairly confident that is because the query optimizer has changed a lot since the system was written and there may be some queries that weren't designed with locking in mind, and we're getting deadlock errors as a result.

I've got four days off next week to look into this, and I'm expecting it to be resolved this time next week. (Fingers crossed)
Yeah, I'm sorry. It appears my analysis was wrong.
I'm a embedded software engineer myself (C/C++ mainly), so I know how annoying debugging can be at times and I wish you good luck with it :)

Re: webDiplomacy.net new server

Posted: Mon Jul 05, 2021 5:38 pm
by kestasjk
Interesting and promising results from analyzing the error logs to extract all deadlocking queries:

Code: Select all

A)
98x : sql => UPDATE wD_Members SET timeLoggedIn = 1624155999 WHERE id = 1680167 	file => /var/www/webdiplomacy.net/public_html/board.php

B)
105x : sql => SELECT g.id, g.variantID, LOWER(HEX(g.password)) as password, g.turn, g.phase, g.processTime, g.name, g.gameOver, g.attempts, g.pot, g.potType, g.phaseMinutes, g.nextPhaseMinutes, g.phaseSwitchPeriod, g.processStatus, g.pauseTimeRemaining, g.minimumBet, g.anon, g.pressType, g.missingPlayerPolicy, g.drawType, g.minimumReliabilityRating, g.excusedMissedTurns, g.playerTypes, g.startTime FROM wD_Games g WHERE g.id=103892 FOR UPDATE 	file => /var/www/webdiplomacy.net/public_html/map.php

C)
3x: sql => SELECT g.id, g.variantID, LOWER(HEX(g.password)) as password, g.turn, g.phase, g.processTime, g.name, g.gameOver, g.attempts, g.pot, g.potType, g.phaseMinutes, g.nextPhaseMinutes, g.phaseSwitchPeriod, g.processStatus, g.pauseTimeRemaining, g.minimumBet, g.anon, g.pressType, g.missingPlayerPolicy, g.drawType, g.minimumReliabilityRating, g.excusedMissedTurns, g.playerTypes, g.startTime FROM wD_Games g WHERE g.id=364883 FOR UPDATE User => resource => Trace: 	file => /var/www/webdiplomacy.net/public_html/ajax.php
3x: sql => SELECT g.id, g.variantID, LOWER(HEX(g.password)) as password, g.turn, g.phase, g.processTime, g.name, g.gameOver, g.attempts, g.pot, g.potType, g.phaseMinutes, g.nextPhaseMinutes, g.phaseSwitchPeriod, g.processStatus, g.pauseTimeRemaining, g.minimumBet, g.anon, g.pressType, g.missingPlayerPolicy, g.drawType, g.minimumReliabilityRating, g.excusedMissedTurns, g.playerTypes, g.startTime FROM wD_Games g WHERE g.id=365381 FOR UPDATE 	file => /var/www/webdiplomacy.net/public_html/board.php

D)
4x : sql => SELECT m.id AS id, m.userID AS userID, m.gameID AS gameID, m.countryID AS countryID, m.status AS status, m.orderStatus AS orderStatus, m.bet AS bet, m.missedPhases as missedPhases, m.timeLoggedIn as timeLoggedIn, m.newMessagesFrom AS newMessagesFrom, m.votes AS votes, m.supplyCenterNo as supplyCenterNo, m.unitNo as unitNo, m.excusedMissedTurns as excusedMissedTurns, u.username AS username, u.points AS points, m.pointsWon as pointsWon, IF(s.userID IS NULL,0,1) as online, u.type as userType FROM wD_Members m INNER JOIN wD_Users u ON ( m.userID = u.id ) LEFT JOIN wD_Sessions s ON ( u.id = s.userID ) WHERE m.gameID = 359461 ORDER BY m.status ASC, m.supplyCenterNo DESC, m.countryID ASC FOR UPDATE 	file => /var/www/webdiplomacy.net/public_html/gamemaster.php
3x : sql => SELECT m.id AS id, m.userID AS userID, m.gameID AS gameID, m.countryID AS countryID, m.status AS status, m.orderStatus AS orderStatus, m.bet AS bet, m.missedPhases as missedPhases, m.timeLoggedIn as timeLoggedIn, m.newMessagesFrom AS newMessagesFrom, m.votes AS votes, m.supplyCenterNo as supplyCenterNo, m.unitNo as unitNo, m.excusedMissedTurns as excusedMissedTurns, u.username AS username, u.points AS points, m.pointsWon as pointsWon, IF(s.userID IS NULL,0,1) as online, u.type as userType FROM wD_Members m INNER JOIN wD_Users u ON ( m.userID = u.id ) LEFT JOIN wD_Sessions s ON ( u.id = s.userID ) WHERE m.gameID = 364551 ORDER BY m.status ASC, m.supplyCenterNo DESC, u.points DESC FOR UPDATE 	file => /var/www/webdiplomacy.net/public_html/board.php
1x : sql => SELECT m.id AS id, m.userID AS userID, m.gameID AS gameID, m.countryID AS countryID, m.status AS status, m.orderStatus AS orderStatus, m.bet AS bet, m.missedPhases as missedPhases, m.timeLoggedIn as timeLoggedIn, m.newMessagesFrom AS newMessagesFrom, m.votes AS votes, m.supplyCenterNo as supplyCenterNo, m.unitNo as unitNo, m.excusedMissedTurns as excusedMissedTurns, u.username AS username, u.points AS points, m.pointsWon as pointsWon, IF(s.userID IS NULL,0,1) as online, u.type as userType FROM wD_Members m INNER JOIN wD_Users u ON ( m.userID = u.id ) LEFT JOIN wD_Sessions s ON ( u.id = s.userID ) WHERE m.gameID = 365391 ORDER BY m.status ASC, m.supplyCenterNo DESC, u.points DESC FOR UPDATE 	file => /var/www/webdiplomacy.net/public_html/gamemaster.php
1x : sql => SELECT m.id AS id, m.userID AS userID, m.gameID AS gameID, m.countryID AS countryID, m.status AS status, m.orderStatus AS orderStatus, m.bet AS bet, m.missedPhases as missedPhases, m.timeLoggedIn as timeLoggedIn, m.newMessagesFrom AS newMessagesFrom, m.votes AS votes, m.supplyCenterNo as supplyCenterNo, m.unitNo as unitNo, m.excusedMissedTurns as excusedMissedTurns, u.username AS username, u.points AS points, m.pointsWon as pointsWon, IF(s.userID IS NULL,0,1) as online, u.type as userType FROM wD_Members m INNER JOIN wD_Users u ON ( m.userID = u.id ) LEFT JOIN wD_Sessions s ON ( u.id = s.userID ) WHERE m.gameID = 362565 ORDER BY m.status ASC, m.supplyCenterNo DESC, m.countryID ASC FOR UPDATE 	file => /var/www/webdiplomacy.net/public_html/map.php

E)
8x: sql => UPDATE wD_Games SET attempts=attempts+1 WHERE id=364284 	file => /var/www/webdiplomacy.net/public_html/board.php
5x: sql => UPDATE wD_Games SET attempts=attempts+1 WHERE id=354566 	file => /var/www/webdiplomacy.net/public_html/gamemaster.php
5x: sql => UPDATE wD_Games SET attempts=attempts+1 WHERE id=364793 User => Trace: 	file => /var/www/webdiplomacy.net/public_html/ajax.php

F)
1x: FROM wD_Members WHERE gameID = 365549 AND countryID=2 FOR UPDATE User => Trace: 	file => /var/www/webdiplomacy.net/public_html/ajax.php
1x: FROM wD_Members WHERE gameID = 365656 AND countryID=6 FOR UPDATE 	file => /var/www/webdiplomacy.net/public_html/board.php

G)
3x : sql => UPDATE wD_Misc SET value = 311943569 WHERE name = 'Hits' 	file => /var/www/webdiplomacy.net/public_html/gamemaster.php
1x : sql => UPDATE wD_Misc SET value = 144 WHERE name = 'ErrorLogs' 	file => /var/www/webdiplomacy.net/public_html/gamemaster.php
1x : sql => UPDATE wD_Misc SET value = 1624270235 WHERE name = 'LastProcessTime' 	file => /var/www/webdiplomacy.net/public_html/gamemaster.php

H)
4x : sql => UPDATE wD_Users SET timeLastSessionEnded = 1624211262, lastMessageIDViewed = (SELECT MAX(f.id) FROM wD_ForumMessages f) WHERE id IN (133701, 131126, 114734, 44860, 129540, 43473, 98756, 130817, 17272, 114189, 132775, 95704, 44661, 133942, 133876, 118741, 25560, 132519, 133933, 114232, 49505, 104098, 128803, 125810, 125879, 125974, 133957, 117790, 33239, 23113, 107946, 126452, 133894, 132907) 	file => /var/www/webdiplomacy.net/public_html/gamemaster.php
These are the queries that are locking up when these deadlocks occur, so they don't necessarily point to where the issue is coming from but they do point to the tables involved and likely problems, a few good leads. Definitely it shows there are too many places game processing is occurring, map.php and board.php are requiring locks they probably shouldn't need, the timeouts are way too long when waiting for timeout.

So far I've changed it so game processing can only happen from one place, and I've changed the query that looks up forum times so that it doesn't link to the forum if using a third party DB.

I'm also trialing a new service: https://status.webdiplomacy.net , which will keep us better informed of any issues via alerts, and can be checked to see how well we're doing.

Re: webDiplomacy.net new server

Posted: Mon Jul 05, 2021 10:48 pm
by DougJoe
I don't know if it's expected or not, but I have a bot game going (https://www.webdiplomacy.net/board.php?gameID=366255) that doesn't seem to want to process even after several minutes of waiting after readying my moves.

Re: webDiplomacy.net new server

Posted: Mon Jul 05, 2021 11:01 pm
by kestasjk
Hmm.. not expected but that could well be due to changes made to reduce simultaneous game processing. How many minutes was it delayed for? It looks like it has moved on now, so did it eventually move on because of the votes or did it move on due to timing out?

The current processor is running on a 30-second cycle; so if you had a 30 second delay that would be understandable, but if it was definitely over a minute that is odd.

Re: webDiplomacy.net new server

Posted: Tue Jul 06, 2021 2:28 am
by DougJoe
kestasjk wrote:
Mon Jul 05, 2021 11:01 pm
Hmm.. not expected but that could well be due to changes made to reduce simultaneous game processing. How many minutes was it delayed for? It looks like it has moved on now, so did it eventually move on because of the votes or did it move on due to timing out?

The current processor is running on a 30-second cycle; so if you had a 30 second delay that would be understandable, but if it was definitely over a minute that is odd.
Still right where I left it, Spring 1904, and it has been a few hours now. No worries anout the game itself, just want to provide info if there's a potential issue.

Re: webDiplomacy.net new server

Posted: Tue Jul 06, 2021 2:34 am
by DougJoe
Also, out of curiosity, I started another bot game (https://www.webdiplomacy.net/board.php?gameID=366352) and gave it a couple of minutes... it says 7 players have joined and the game will start on the next processing cycle.

Re: webDiplomacy.net new server

Posted: Tue Jul 06, 2021 4:32 am
by Iron Boots
I too have a bot game going and despite readying up, has not processed in the past 4 hours.

https://webdiplomacy.net/board.php?game ... #gamePanel

Re: webDiplomacy.net new server

Posted: Tue Jul 06, 2021 5:14 am
by kestasjk
Thanks guys good feedback. I thought the main gamemaster could recognize vote and order complete conditions but it looks like it doesn't. I've made a change so that the board can trigger a process by setting the process time to now, so that it'll then get picked up by the gamemaster within a few moments.
Let me know how it goes.

I'm expecting once we've really nailed all the deadlocking issues we'll be able to carefully turn some of this back on so that games process instantly again. (and there have been some other discoveries like the bots have been using more resources than all the other users on the site combined due to the rapid-fire requests)

edit: Definitely making progress, finally caught a deadlock in the act:

Code: Select all

------------------------                                                                                                                                                               
LATEST DETECTED DEADLOCK                                                                                                                                                               
------------------------                                                                                                                                                               
2021-07-06 04:25:02 0x7f3b681c6700                                                                                                                                                     
*** (1) TRANSACTION:                                                                                                                                                                   
TRANSACTION 4827537, ACTIVE 0 sec starting index read                                                                                                                                  
mysql tables in use 2, locked 2                                                                                                                                                        
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s)                                                                                                                              
MySQL thread id 93956, OS thread handle 139892861830912, query id 3225261 localhost webdiplomacy Creating sort index                                                                   
SELECT m.id AS id,                                                                                                                                                                     
\t\t\t\tm.userID AS userID,                                                                                                                                                            
\t\t\t\tm.gameID AS gameID,                                                                                                                                                            
\t\t\t\tm.countryID AS countryID,                                                                                                                                                      
\t\t\t\tm.status AS status,                                                                                                                                                            
\t\t\t\tm.orderStatus AS orderStatus,                                                                                                                                                  
\t\t\t\tm.bet AS bet,                                                                                                                                                                  
\t\t\t\tm.missedPhases as missedPhases,                                                                                                                                                
\t\t\t\tm.timeLoggedIn as timeLoggedIn,                                                                                                                                                
\t\t\t\tm.newMessagesFrom AS newMessagesFrom,                                                                                                                                          
\t\t\t\tm.votes AS votes,                                                                                                                                                              
\t\t\t\tm.supplyCenterNo as supplyCenterNo,                                                                                                                                            
\t\t\t\tm.unitNo as unitNo,                                                                                                                                                            
\t\t\t\tm.excusedMissedTurns as excusedMissedTurns,                                                                                                                                    
\t\t\t\tu.username AS username,                                                                                                                                                        
\t\t\t\tu.points AS points,                                                                                                                                                            
\t\t\t\tm.pointsWon as pointsWon,                                                                                                                                                      
\t\t\t\tIF(s.userID IS NULL,0,1) as online,                                                                                                                                            
\t\t\t\tu.type as userType                                                                                                                                                             
\t\t\tFROM wD_Members m                                                                                                                                                                
\t\t\tINNER JOIN wD_Users u ON ( m.userID = u.id )                                                                                                                                     
\t\t\tLEFT JOIN wD_Sessions s ON ( u.id = s.userID )                                                                                                                                   
\t\t\tWHERE m.gameID = 366356                                                                                                                                                          
\t\t\tORDER BY m.status ASC, m.supplyCenterNo DESC, m.countryID ASC FOR UPDATE                                                                                                         
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:                                                                                                                                           
RECORD LOCKS space id 1039 page no 8416 n bits 824 index gid of table `webdiplomacy`.`wD_Members` trx id 4827537 lock_mode X locks rec but not gap waiting                             
Record lock, heap no 741 PHYSICAL RECORD: n_fields 2; compact format; info bits 0                                                                                                      
 0: len 3; hex 059714; asc    ;;                                                                                                                                                       
 1: len 3; hex 19d69d; asc    ;;                                                                                                                                                       
                                                                                                                                                                                       
*** (2) TRANSACTION:                                                                                                                                                                   
TRANSACTION 4827535, ACTIVE 0 sec starting index read                                                                                                                                  
mysql tables in use 1, locked 1                                                                                                                                                        
8 lock struct(s), heap size 1128, 16 row lock(s), undo log entries 9                                                                                                                   
MySQL thread id 93955, OS thread handle 139893126489856, query id 3225262 localhost webdiplomacy Updating                                                                              
UPDATE wD_Games SET attempts=0 WHERE id=366356                                                                                                                                         
*** (2) HOLDS THE LOCK(S):                                                                                                                                                             
RECORD LOCKS space id 1039 page no 8416 n bits 824 index gid of table `webdiplomacy`.`wD_Members` trx id 4827535 lock_mode X locks rec but not gap                       
Record lock, heap no 739 PHYSICAL RECORD: n_fields 2; compact format; info bits 0                                                                                                      
 0: len 3; hex 059714; asc    ;;                                                                                                                                                       
 1: len 3; hex 19d6a0; asc    ;;                                                                                                                                                       
                                                                                                                                                                                       
Record lock, heap no 740 PHYSICAL RECORD: n_fields 2; compact format; info bits 0                                                                                                      
 0: len 3; hex 059714; asc    ;;                                                                                                                                                       
 1: len 3; hex 19d69f; asc    ;;                                                                                                                                                       
                                                                                                                                                                                       
Record lock, heap no 741 PHYSICAL RECORD: n_fields 2; compact format; info bits 0                                                                                                      
 0: len 3; hex 059714; asc    ;;                                                                                                                                                       
 1: len 3; hex 19d69d; asc    ;;                                                                                                                                                       
                                                                                                                                                                                       
Record lock, heap no 744 PHYSICAL RECORD: n_fields 2; compact format; info bits 0                                                                                                      
 0: len 3; hex 059714; asc    ;;                                                                                                                                                       
 1: len 3; hex 19d6a1; asc    ;;                                                                                                                                                       
                                                                                                                                                                                       
Record lock, heap no 745 PHYSICAL RECORD: n_fields 2; compact format; info bits 0                                                                                                      
 0: len 3; hex 059714; asc    ;;                                                                                                                                                       
 1: len 3; hex 19d69e; asc    ;;                                                                                                                                                       
                                                                                                                                                                                       
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:                                                                                                                                           
RECORD LOCKS space id 1038 page no 4167 n bits 96 index PRIMARY of table `webdiplomacy`.`wD_Games` trx id 4827535 lock_mode X locks rec but not gap waiting                            
Record lock, heap no 28 PHYSICAL RECORD: n_fields 31; compact format; info bits 0                                                                                                      
 0: len 3; hex 059714; asc    ;;                                                                                                                                                       
 1: len 6; hex 000000000000; asc       ;;                                                                                                                                              
 2: len 7; hex 80000000000000; asc        ;;                                                                                                                                           
 3: len 1; hex 09; asc  ;;                                                                                                                                                             
 4: len 2; hex 0001; asc   ;;                                                                                                                                                          
 5: len 1; hex 03; asc  ;;                                                                                                                                                             
 6: len 4; hex 60e3dfce; asc `   ;;                                                                                                                                                    
 7: len 3; hex 800019; asc    ;;                                                                                                                                                                                                                                                   
 12: len 1; hex 01; asc  ;;                                                                                                                                                            
 13: SQL NULL;                                                                                                                                                                         
 14: SQL NULL;                                                                                                                                                                         
 15: len 2; hex 0014; asc   ;;                                                                                                                                                         
 16: len 4; hex 000005a0; asc     ;;                                                                                                                                                   
 17: len 4; hex 7fffffff; asc     ;;                                                                                                                                                   
 18: len 1; hex 01; asc  ;;                                                                                                                                                            
 19: len 1; hex 01; asc  ;;                                                                                                                                                            
 20: len 2; hex 0002; asc   ;;                                                                                                                                                         
 21: len 1; hex 01; asc  ;;                                                                                                                                                            
 22: len 3; hex 000000; asc    ;;                                                                                                                                                      
 23: len 1; hex 01; asc  ;;                                                                                                                                                            
 24: len 1; hex 00; asc  ;;                                                                                                                                                            
 25: len 1; hex 01; asc  ;;                                                                              
 26: len 4; hex 00000004; asc     ;;                                                                                                                                                   
 27: SQL NULL;                                                                                                                                                                         
 28: len 1; hex 01; asc  ;;                                                                                                                                                            
 29: len 4; hex 60e3cd0e; asc `   ;;                                                                                                                                                   
 30: len 4; hex 80000000; asc     ;;                                                                                                                                                   
                                                                                                                                                                                       
*** WE ROLL BACK TRANSACTION (1)                                             
and it was because a bot triggered a game to get processed at the same time as the gamemaster via a path I hadn't stopped properly