can not reveive onRoomListUpdate after connected to lobby for a long time

I development a call application using photon realtime. app and web browser join the same room to interactive with
each others. we using nodeJS server to load Photon-Javascript_SDK.min.js and join lobby to listen each rooms connection state. our application connect region is JP. Our approach is using LoadBalancingClient and connectToRegionMaster() to let our server can listen onRoomListUpdate.

Our server calculate the connection time when roomRemove event triggered by onRoomListUpdate. But I found the onRoomListUpdate no response on 5/26 10:41, 5.15 12:27.
I check the operation status in JP is ok and the web socket connection has no error. After my customer complain our service, I use my test account to join the room. The connection is success and the onRoomListUpdate send the roomRemoved event to our server again.

My question is why the onRoomListUpdate is idle over certain period? The previous connection roomRemoved event has sended until other new user join new room. How can I ensure the client connection can correct
receive roomsUpdate event in lobby? Currently I'll try to reconnect when new user want to ask which room can be connect. But I thinks this is not a good approach

Comments

  • JohnTube
    JohnTube ✭✭✭✭✭
    Hi @ray,

    Thank you for choosing Photon!

    If I understood correctly, you are saying that the Master Server on JP region did not send GameListUpdate (229) which contains a list of removed rooms to a client joined to the respective lobby.
    Our server calculate the connection time when roomRemove event triggered by onRoomListUpdate. But I found the onRoomListUpdate no response on 5/26 10:41, 5.15 12:27.
    So you are basing your assumption on this? could give us more details about how you log those events on your backend? I think it's not easy to log an event that is missed but only received ones.
    5/26 10:41, 5.15 12:27
    What time zone is this? are those two timestamps or one? what each timestamp correspond to? or is it an interval?
    Currently I'll try to reconnect when new user want to ask which room can be connect. But I thinks this is not a good approach
    So you are saying this a workaround you found?
    after connected to lobby for a long time
    This is from the discussion title but I can't find some information about it in the post. What do you mean by this? do you suspect this is the cause of the issue? how long is "long time", minutes, hours, days?
    After my customer complain
    What did the customer complain about? what did the customer see/notice? how is this visible to customers?

    - How often did this happen? How many times did this occur?
    - Can we reproduce 100%? Do you have minimal repro steps?
  • ray
    ray
    edited May 2019
    Hi JohnTube:
    Thanks for your apply. I upload two images in attachment to describe our connection flow and socket connection flow.

    server connection flow
    image

    socket connection flow
    image

    below is my response for your question:
    1. could give us more details about how you log those events on your backend? I think it's not easy to log an event that is missed but only received ones.

    ans: since it's not happened regularly, 5/26 10:41, 5/15 12:27 is at GMT+8. how I know this time is that my client provide the time they can not connect to our service. Since our server record all the onRoomsUpdate event, I can base on that event to know the socket connection state.

    Base on the above socket connection flow, we use the rooms info from onRoomListUpdate to update our connected user list. When a duplicate user want to join photon room, we'll block he/she base on the connected user list. That's why our client complain that they can not join photon's room. Since they close the previous connection. our server not receive the onRoomListUpdate event so we can not get the latest connected user list.

    In our log, we found that user want to build a connection at 5/26 10:41, 5/15 12:27, but we don't get onRoomListUpdate about that user's is already not in connected user list. our nodeJs server receive lot of empty event ( we print debug msg when onRoomListUpdate be triggered and we still don't know why photonLoadBalancingClient connection will generate empty log. our other nodejs service without using photon socket don't have such kind of empty log.) but no onRoomListUpdate log show in our log file. until a new user who not in connected user list success join the room, the onRoomListUpdate recover and give us latest connected user list info. that's why we assume that ther photon cloud server not correct trigger onRoomListUpdate after the socket connection connect for a period of time.

    2. What time zone is this? are those two timestamps or one? what each timestamp correspond to? or is it an interval?

    ans: GMT+8, separate timestamps reported by different user. not interval. each time stamp is reported by our user, and we check our system photon socket log. the onRoomListUpdate recover after a new user who not in the connected user list success connect to photon server.

    3. Currently I'll try to reconnect when new user want to ask which room can be connect. But I thinks this is not a good approach
    So you are saying this a workaround you found?


    ans: I try to implement that logic but found this is not a good solution. socket server reconnect need time and during this time other new connection will be block until socket reconnect and get the latest connected user list. Now I'm trying to create a cloud function to let a fake user join/rejon every 5 second to avoid this bug. But not sure it'll works

    4. This is from the discussion title but I can't find some information about it in the post. What do you mean by this? do you suspect this is the cause of the issue? how long is "long time", minutes, hours, days?


    ans: since our server never close the socket session, we found after in some time (not get the exactly time period yet), the photon server will not send onRoomListUpdate until a new user join to the room even still have user exist in some room. we still not know how to 100% create this kind of bug

    5. What did the customer complain about? what did the customer see/notice? how is this visible to customers?

    ans: Base on the above socket connection flow, we use the rooms info from onRoomListUpdate to update our connected user list. When a duplicate user want to join photon room, we'll block he/she base on the connected user list. That's why our client complain that they can not join photon's room. Since they close the previous connection. our server not receive the onRoomListUpdate event so we can not get the latest connected user list.

    In our log, we found that user want to build a connection at 5/26 10:41, 5/15 12:27, but we not get update about that user's is already not in connected user list. our nodeJs server receive lot of empty event ( we print debug msg when onRoomListUpdate be triggered and we still don't know why photonLoadBalancingClient connection will generate empty log) but no onRoomListUpdate log show in our log file.

    6.- How often did this happen? How many times did this occur?
    - Can we reproduce 100%? Do you have minimal repro steps?


    ans: sometimes onece per day. sometimes onece per week. during this month already happened 3 times
    still not know how to 100% reproduce. maybe use a server like us to build a never close socket session to photon can reproduce this bug

    Is there a best practice for server to monitoring photon connection state? We not sure this kind of monitoring flow is suitable for the use case. please kindly give us your advise. thanks

    Ray
  • JohnTube
    JohnTube ✭✭✭✭✭
    edited May 2019
    Hi @ray,

    1.
    When a duplicate user want to join photon room
    We have a built-in feature to disallow two users (connections) with the same UserId to be joined to the same room, it's called CheckUserOnJoin, this is maybe not 100% available on JS SDK but it's available in PUN out-of-the-box. But maybe you want to disallow multiple/duplicate connections all the time even outside rooms.
    2. Why do you update connected users list via lobby events? lobby rooms list updates do not include users info, how do you use this exactly? Do you need lobbies rooms list or list of users per lobby or list of connected users across all lobbies (and regions, etc.)
    3. Do you request and allow connection via custom authentication? or you use your own way?
    4.
    since our server never close the socket session
    Maybe the allegedly missed events happen when the client is disconnected unexpectedly and is recovering.
    5.
    the photon server will not send onRoomListUpdate until a new user join to the room even still have user exist in some room
    Could you rephrase this or put in simple steps so I can understand the problem. What is expected behaviour vs. actual behaviour?

    I suggest you implement webhooks on your backend to get all the info you need about rooms events: Create, Join, Leave.

    Is there a best practice for server to monitoring photon connection state?

    So if you want to know from your backend, at any given time, the list of connected users and list available rooms you can make use of:

    Web server:
    - custom authentication
    - realtime webhooks
    - webRPCs or custom telemetry HTTP based events (e.g. send one before disconnecting, on app close, when app loses focus, app is moved to the background, etc.)

    Special client (JS in your case, always running on the backend, you may need one client per AppId/AppVersion/Region):
    - lobby rooms list updates events (you may need one client per lobby)
    - lobby stats events
    - app stats events
    - FindFriends: periodic polling with batches (512/1024) of list of userIDs.
  • Hi JohnTube:
    Thanks for your apply again. My answer and question is show below:
    1. We have a built-in feature to disallow two users (connections) with the same UserId to be joined to the same room, it's called CheckUserOnJoin, this is maybe not 100% available on JS SDK but it's available in PUN out-of-the-box. But maybe you want to disallow multiple/duplicate connections all the time even outside rooms.

    ans: Sine we use android/iOS/JS SDK, we can not use CheckUserOnJoin right? we want to set the limit that one user only can join to one room

    2. Why do you update connected users list via lobby events? lobby rooms list updates do not include users info, how do you use this exactly? Do you need lobbies rooms list or list of users per lobby or list of connected users across all lobbies (and regions, etc.)

    ans: Since we need to know the connection time to charge our users. we need to let our server stay in lobby to observe all rooms status in one regions ( photon not support cross-region connection) . since we save user information in use _customProperties, we can use this to calculate each user's time cost.

    3. Do you request and allow connection via custom authentication? or you use your own way?

    ans: I use my own way. I have an Api server process for that. Only our valid user can join photon's room


    4.
    since our server never close the socket session
    Maybe the allegedly missed events happen when the client is disconnected unexpectedly and is recovering.


    Ans: I add log message to monitor all connection event (error, disconnect...etc). There is no error log happened when our user complaint about the connection problem. I use 3 VMs connected to photon lobby, but all of them not receive onRoomsUpdate event at same time. I think this not client's error. It should be server's error.

    5. Could you rephrase this or put in simple steps so I can understand the problem. What is expected behaviour vs. actual behaviour?

    ans: My expect user flow is:

    1. API server join photon's lobby
    2. first client ask api server that he want to join the room
    3. API server check connected user list and allow first client join to photon's room
    4. API server receive onRoomsUpdate event, and add first user to connected user list
    5. if client leave room, API server will receive onRoomsUpdate event and remove user from connected user list
    6. if new client use duplicate usedId to ask api server. server will found duplicate from connected user list and reject the client request

    but actual behaviour:
    1.~4. is same as expect user flow
    In some unknown condition, although the web socket connection is still connected, 5. will not trigger and cause 6. always happend. this will cause client always block by our api server.

    6. I suggest you implement webhooks on your backend to get all the info you need about rooms events: Create, Join, Leave.

    ㄋI did this before but webhook is not response as soon as websocket. We need to calculate connection time and charge our user. websocket connection can get more realtime response then webhook

    Special client (JS in your case, always running on the backend, you may need one client per AppId/AppVersion/Region):
    - lobby rooms list updates events (you may need one client per lobby)


    ans: currently my approach looks like using above suggestion. It's works well but have chance to meet the server no response problem. currently I try to use cron job to join/rejoin the lobby to solve this problem, but I still want to know the correct way to achieve my goal
  • JohnTube
    JohnTube ✭✭✭✭✭
    Hi @ray,

    If you can reproduce and this issue persists send us an email to developer@photonengine.com