Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting different audio on live stream from orcasound.net vs. the aws bucket #219

Open
bnestor opened this issue Dec 1, 2024 · 8 comments

Comments

@bnestor
Copy link
Contributor

bnestor commented Dec 1, 2024

I am encountering a strange phenomenon, where in the audio streams from orcasound.net, the orcas are very easy to identify. However, the spectrograms generated by the live inference orchestrator, as well as the corresponding wav files that I am saving do not sound anything like their corresponding clips, even though I recorded them and played them back within minutes of getting them.

This was noticed across events at both bush point and port townsend nodes today. It was my understanding that the hls utils and the orcasite use the same buckets. Is it possible they got switched somehow? Or is there a time-delay on the website that is substantially different from the buckets? The arcatia positioning was quite consistent with the website stream.

The bucket that I am pulling from is https://s3-us-west-2.amazonaws.com/audio-orcasound-net/rpi_bush_point for bush point, and https://s3-us-west-2.amazonaws.com/audio-orcasound-net/rpi_port_townsend for port townsend.

Here are is a spectrogram in the middle of bouts at port townsend (rpi_port_townsend_2024_12_01_12_16_21_PST):
rpi_port_townsend_2024_12_01_12_16_21_PST

And for comparison, here is what I would expect based off of orcasound-lab events earlier in the year (rpi-orcasound-lab_2024_09_21_15_00_00_PDT]):
rpi-orcasound-lab_2024_09_21_15_00_00_PDT

@bnestor
Copy link
Contributor Author

bnestor commented Dec 4, 2024

@paulcretu

@scottveirs
Copy link
Member

@bnestor Are you using/adapting the utilities orca-hls-utils directly in your own code, or using the LiveInferenceOrchestrator script script?

@bnestor
Copy link
Contributor Author

bnestor commented Dec 4, 2024

I am using the the live inference orchestrator script. The only difference is that I had the config adjusted to skip deleting the local wav files (https://github.com/orcasound/aifororcas-livesystem/blob/main/InferenceSystem/src/LiveInferenceOrchestrator.py#L221) , and I was calling a different model in line 179. The exact script is here

Compared to when it was working in September the following uncontrollable factors have changed:

  • Using the bush point and port townsend hydrophones instead of orcasound-lab
  • I may be in a different time-zone on my local machine than I was previously
  • There was a lower SNR than previous bouts.

I will post links to the recordings coming from hlsstream and their corresponding reports later today.

@bnestor
Copy link
Contributor Author

bnestor commented Dec 5, 2024

Candidate Spectrogram from LiveInferenceOrchestrator.py Filename
https://live.orcasound.net/reports/cand_02yPEHxChzrUu21U9pKFyL rpi_port_townsend_2024_12_01_12_18_45_PST rpi_port_townsend_2024_12_01_12_18_45_PST
https://live.orcasound.net/reports/cand_02yPEBkwNJ2uZtEPucxATE rpi_port_townsend_2024_12_01_12_14_53_PST rpi_port_townsend_2024_12_01_12_14_53_PST
https://live.orcasound.net/reports/cand_02yPE2ItFeIFW2U4XQLyfe rpi_port_townsend_2024_12_01_12_07_59_PST rpi_port_townsend_2024_12_01_12_07_59_PST.wav
https://live.orcasound.net/reports/cand_02yPDSoaQARX7Ucq3b62My rpi_port_townsend_2024_12_01_11_44_59_PST rpi_port_townsend_2024_12_01_11_44_59_PST
https://live.orcasound.net/reports/cand_02yPDMxAlunhnH4ztUZo1p rpi_port_townsend_2024_12_01_11_40_59_PST rpi_port_townsend_2024_12_01_11_40_59_PST

Here is a proton drive link to the wav files in case you would like to listen to what was recorded by hlsstream vs the orcasound.net reports: https://drive.proton.me/urls/HB2JB76SC4#XrhUvTGW7vyz

@paulcretu
Copy link
Member

I have yet to really dig into this, but some details I can fill in:

  1. On September 10th, we switched buckets from streaming-orcasound-net to audio-orcasound-net
  2. A few days later, this codebase was switched over to the new bucket
  3. I haven't made any changes to orca-hls-utils yet, but there doesn't seem to be any bucket specific configuration, as the stream base URL is passed into HLSStream or DateRangeHLSStream

So I don't have an immediate explanation for what you're experiencing @bnestor, but the first thing that comes to mind is to check if the machine detections at https://aifororcas.azurewebsites.net/ are lining up correctly with what's in the audio-orcasound-net bucket. My hunch is that for some reason orca-hls-utils is constructing the m3u8 or segment paths incorrectly, but this certainly needs more investigation to see if it consistently repros.

@bnestor
Copy link
Contributor Author

bnestor commented Dec 5, 2024

My initial thoughts are 1) my local time zone is somehow influencing what is getting collected from HLS utils. Perhaps I am grabbing data from my offset to PST. 2) the data coming from the node at port townsend is getting mapped to the wrong AWS bucket (but I think this would also affect the live stream). 3) it is also possible the incorrect M3U8 files are getting downloaded at a given time.

To test 3, I will take a look at the m3u8 files that are being downloaded from AWS. Not sure if I can fetch them with the original creation timestamps, but I will give it a shot this weekend. The filenames for these are like live_4567.ts. These were consecutively numbered within a single query, but I did not check to see if they were consecutive between queries. I will also check to make sure that the consecutively downloaded files sound perceptively continuous (they were definitely hydrophone recordings, not just generic noise). For 1 & 2 we could correlate AIS data with a ship noise classifier. I do not have a classifier for this, so it could take some time. It would be a good sanity check for the future though.

One last thing, on the live.orcasound.net/reports, it logs the reports according to my machine's local time, but it does not specify that on the console. It may be helpful to display event timestamps in PST or UTC, or specify the timezone so that there is no confusion about when events occurred.

@bnestor
Copy link
Contributor Author

bnestor commented Dec 20, 2024

Just want to note that this is happening again on the Sunset Bay Hydrophone at the moment. I am hearing clear s01 calls on the stream, getting no predictions on the machine, and the spectrograms look accordingly:
rpi_sunset_bay_2024_12_19_18_00_38_PST

Screenshot from 2024-12-19 20-57-25

Earlier today I was getting positive classifications from this node. as shown:
rpi_sunset_bay_2024_12_19_13_54_05_PST

My model has been running from 2024_12_19_13_32_33 to now 2024_12_19_18_05_38. I have been logging every minute since then. The model made positive classifications from
2024_12_19_13_35_33 to 2024_12_19_13_55_48, and not again since then.

Currently there is considerable low frequency buzzing, but I do not believe it is a factor, since I can hear the srkws in real time, but not on the clips that are downloaded from the same time using hls utils.

@bnestor
Copy link
Contributor Author

bnestor commented Dec 20, 2024

For reference, in the stream I heard clear train horns at 19:01:53 and 19:02:10. The train rumbling and the horns cannot be found in the spectrograms/hls reconstructions.
Here is the spectrogram (rpi_sunset_bay_2024_12_19_19_01_29_PST)
rpi_sunset_bay_2024_12_19_19_01_29_PST
and at rpi_sunset_bay_2024_12_19_19_02_29_PST
rpi_sunset_bay_2024_12_19_19_02_29_PST

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants