Salsa and the PUN Voice Demo

edited February 2020 in Photon Voice
I got salsa lip sync working with Photon partially. The lip sync works with remote clients, but not the local machine. This makes sense because the local client does not get its own audio - I am thinking to avoid delay echo effects.

On the remote clients, the Audio Source has as the audio clip "AudioStreamPlayer" - but locally it has "None (Audio Clip)"

My thinking was to add a microphone component to the local client (salsa provides one), to drive lip sync. This works, however not with the PhotonVoiceSpeaker script together.

I thought if I could deactivate the PhontVoiceSpeaker script on the local client, everything would be good. Locally the lip sync would be driven by the SALSA mic input, and on remote clients, it would be driven by the AudioStreamPlayer in the AudioSource.

My idea was to add "if(!IsPlaying)" code to the PhontVoiceSpeaker script :
        this.player = new AudioStreamPlayer(GetComponent<AudioSource>(), "PUNVoice: PhotonVoiceSpeaker:", 

It seemed to work in the editor, but when I published the game to test multiplayer, no go.

Any insight is much appreciated.


  • JohnTube
    JohnTube ✭✭✭✭✭
    Hi @Hipshot,

    Thank you for choosing Photon!

    What do you need exactly to integrate Photon Voice with Slasa Lip Sync?

    A possible (easiest) trick is to set DebugEchoMode to true and mute the AudioSource of the local player.
    this means that you will recieved your own voice from the server and play it in the AudioSource using PhotonVoiceSpeaker/AudioStreamPlayer but muted.

    A more advanced and proper solution would be to actually get the audio frames recorded (before or after) transmission locally and make the AudioClip (AudioStreamPlayer) out of them. You need to inject a pre / post processor in PhotonVoiceRecorder for this to work and follow what PhotonVoiceSpeaker does to received audio frames. See here (the part "Second, outgoing voice streams:").

    I see that you are using Photon Voice Classic, could you migrate to Photon Voice 2?
    This is unrelated but recommended.
    Photon Voice Classic will be phased out.
  • Hipshot
    edited February 2020
    Thanks for this answer! I was able to solve this in a bit of a different way, although your direction is better.

    my solution:

    As far as photon voice classic - our project built on this tech it and it is working well for our purposes, thanks much for continuing to support it. Next time for sure!
  • Hello @JohnTube ,

    SALSA can take an externally computed value as its analysis value (bypassing its own internal processing) and parse lipsync based on that calculation. I've briefly looked at Photon.Voice.Unity.Recorder and see a Recorder.LevelMeter.CurrentPeakAmp value, but this appears to be updated about every half second for the local Recorder, which is not rapidly enough. Is there another value calculated by the Photon API that is more actively updated for the local client? If so, the solution to the above could be to determine if the client is the local client and then rewire SALSA to accept external calculations.

    I believe the local client can be determine by checking Photon.Pun.PhotonView.isMine, which I have successfully implemented and this process does work, just isn't responsive enough.

    SALSA LipSync Suite
    Crazy Minnow Studio
  • JohnTube
    JohnTube ✭✭✭✭✭
    edited April 2020
    Hi @crazyD,

    Thank you for choosing Photon (or for trying to make it easier to integrate with SALSA LipSync Suite)!

    I'm not sure how SALSA LipSync Suite works.
    I mean do you also synchronize mouth animation/movement over the network or animate values locally only.
    Because this is important in our case to know if you need to use the local values and apply them to the local character and then those will be sync'ed by SALSA automatically (via PUN or else, or manually by the developer) or get received values from remote clients and apply them directly to their 'representations' locally (no need to sync mouth here).
    Recorder.LevelMeter.CurrentPeakAmp value, but this appears to be updated about every half second for the local Recorder
    From the code i see that the level meter is calculated every frame, in fact it's a built-in internal (post)processor. This is if you want to get the values from the outgoing stream.

    You could add your own processor also in the same spirit but maybe you will not be satisfied with the results.
    In any case, my colleague @vadim should know better as he is the man for the PhotonVoiceApi, I handle the Unity layer and try to get better at the low level stuff.

    About checking if player is local or remote:

    If you're going to use values from the Recorder, then you're on the local client all the time.
    Recorder is used to record and transmit audio from the local client to the server.

    Photon.Pun.PhotonView.isMine should be Photon.Pun.PhotonView.IsMine in PUN 2.
    isMine is from PUN Classic.
    PUN Classic (1) and Photon Voice Classic (1) will fade away.

    PhotonVoiceView and PhotonVoiceNetwork are part of the optional PUN 2 integration of Photon Voice 2, yes Photon Voice 2 can be used w/o PUN 2 and with other Photon products.

    If you want to get the values from the incoming stream:
    If you want to use Photon Voice 2 with PUN 2 integration we suggest, you could make use of PhotonView from PUN yes or PhotonVoiceView (PhotonVoiceView.IsSpeaker && PhotonVoiceView.SpeakerInUse.Actor.IsLocal)
    or without PUN integration: Speaker.IsLinked && Speaker.Actor.IsLocal.

    So, to sum up, we want to make Photon Voice 2 work better with SALSA LipSync Suite, if you want to take this further in this direction, don't hesitate to send an email to [email protected].
  • JohnTube
    JohnTube ✭✭✭✭✭
    edited April 2020
    hey @crazyD,

    I just found out that the Recorder.LevelMeter is tightly coupled with Recorder.VoiceDetector which is not ideal.
    We might add an external optional level meter as post processor component.
    We will dig deeper.
  • Hi @JohnTube,

    The SALSA configuration with Photon currently works like this:
    1) Each prefab is configured with a SALSA instance that waits for an AudioSource on its GameObject. Once Photon instantiates the AudioSource, SALSA wires itself up.
    2) SALSA processes the audio stream played on the AudioSource for each character prefab, so all lipsyncing is performed on the client machine without need to synchronize anything across the network.

    This works well in practice and each remote client's avatar lipsyncs to audio streamed to the local client; however, as you know, the local player avatar does not create an audio stream by default and therefore, there is nothing for SALSA to process and no lipsyncing. So we can try to work with this in a couple of ways:
    1) Somehow pipe the local audio to the AudioSource so it can be analyzed by SALSA, similarly to the remote client avatar prefabs on the local client. And subsequently we would simply mute the AudioSource (as long as the AudioClip is available to process).
    2) If Photon is already calculating the peak values at a frequent enough pace, somewhere in the range of at least every 100ms, preferably up to every 80ms. That would probably be the quickest we would need. SALSA works on a configurable pulse frequency, depending on desired look and feel. 80ms is about the fastest we recommend to allow the animations some time to activate. But technically, it could pulse faster. If the values are calc'd every frame and we can somehow tap into that, I think that would be perfect. This is what I was using:
    if (recorder.LevelMeter != null)
                salsa.analysisValue = recorder.LevelMeter.CurrentPeakAmp;

    Thanks yes, I was using Photon.Pun.PhotonView.IsMine. We need to determine this to ensure we are only going to rewire the SALSA configuration for the local avatar on the local client. We definitely appeal to your expertise in implementing the best method to determine the local player on local client. We are also happy to take this offline with you, @vadim, or anyone else for further implementation details. We also would love to make this process a little easier and more full-featured for SALSA and PhotonVoice customers.

  • If anyone finds this forum thread looking for information on local lipsync for PhotonVoice2 with SALSA, @JohnTube and I have come up with a solution to this problem and it seems to work very well. It will require an update to SALSA LipSync Suite v2 and that update will likely be released soon. Current SALSA release is v2.4.1. I don't have a solid date at this point. Please keep an eye on the SALSA forum thread for more information.
  • Hello @JohnTube @crazyD
    For local lip-sync with Salsa, I tried to use the microphone class in UnityEngine and attached a script to my local character, but doing so stopped my audio transmission via Photon for some reason. What could be the reason for this?

    Thanks in advance!
  • Hello @cr7
    PhotonVoice is already using the microphone (or trying to) and it generally doesn't work well when multiple systems try to access the microphone. We have a solution for local-lipsync which we will be publishing soon (requires an update to SALSA).
  • JohnTube
    JohnTube ✭✭✭✭✭
    Hi @crazyD,
    it generally doesn't work well when multiple systems try to access the microphone
    is this about Photon Voice or about accessing the same device using Unity's Microphone API in general?
  • Both. PhotonVoice implements their own microphone access and audio buffer for audio serialization and transmission to remote clients. In general, implementing additional Microphone access to provide local lipsync is probably a losing battle. Crazy Minnow worked directly with Exit Games to come up with a solution that leverages the data PhotonVoice already collects locally from the microphone and implements the SALSA analysis algorithm on that data chunk. It is a much more efficient solution even if it were possible to create an entirely new audio buffer from the microphone.
  • SALSA LipSync Suite 2.5.0 was released today on the Unity AssetStore. This update implements functionality which allows a new add-on for SALSA to provide local-avatar lip-sync. Please see the Unity forum for SALSA LipSync Suite for more information on the update. Additionally, read our online documentation for more details on PhotonVoice implementation and local-avatar-lipsync using SALSA.