Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行报错!求助 #39

Open
LittleMaWen opened this issue Aug 12, 2019 · 33 comments
Open

运行报错!求助 #39

LittleMaWen opened this issue Aug 12, 2019 · 33 comments

Comments

@LittleMaWen
Copy link

您好,我正在复现您的实验,unbuntu16.04,python3.7,我首先运行了train.py文件,出现了以下错误,请问该如何解决?诚盼回复
model_build_time 5.370615005493164
get batch time 1.98e-05s
forward process time 7.57s
beginning to select..........
select best batch time 0.188s
select_batch_time: 7.82932448387146
Traceback (most recent call last):
File "train.py", line 181, in
main()
File "train.py", line 125, in main
loss = model.train_on_batch(x, y)
File "/home/dcase/miniconda3/lib/python3.7/site-packages/keras/engine/training.py", line 1808, in train_on_batch
check_batch_axis=True)
File "/home/dcase/miniconda3/lib/python3.7/site-packages/keras/engine/training.py", line 1411, in _standardize_user_data
exception_prefix='target')
File "/home/dcase/miniconda3/lib/python3.7/site-packages/keras/engine/training.py", line 153, in _standardize_input_data
str(array.shape))
ValueError: Error when checking target: expected ln to have shape (None, 512) but got array with shape (96, 1)

@Walleclipse
Copy link
Owner

根据issue 7, 需要把 train.py 中的 121行改成以下形式:
y = np.random.uniform(size=(x.shape[0], 512))

@LittleMaWen
Copy link
Author

谢谢谢谢您!解决了!现在正在运行!感谢!另外我在刚刚关闭的问题里问过您了,就是首先python train.py之后,接下来的步骤是怎样?train.py是在训练模型嘛?我在实验报告里也没有找到详细的运行步骤,烦请回复,谢谢您!

@Walleclipse
Copy link
Owner

train.py 就是训练模型。之后你可以用 test_model.py进行模型的测试。
本项目训练出的模型是一个speaker embedding 模型,也就是输入一段语音,输出一个512维的embedding表示,此表示可以用于说话者验证或者识别等不同领域。
接下来我不太清楚你要达到的目的是什么?
有关训练流程和画出误差曲线你可以查看 issue 1
有关训练后如何进行 inference 你可以查看 issue 30
有关声纹识别你可以查看 issue 21

@LittleMaWen
Copy link
Author

嗷嗷~谢谢您!我昨天早晨修正错误后就开始运行了,今早发现运行过程中出现了错误,具体如下:
OSError: [Errno 30] Read-only file system

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 181, in
File "train.py", line 134, in main
OSError: [Errno 30] Read-only file system
(tensorflow) dcase@dcase-PowerEdge-R730:~/mawen/SW/Deep_Speaker-speaker_recognition_system-master$
Broadcast message from systemd-journald@dcase-PowerEdge-R730 (Wed 2019-08-14 07:37:30 CST):

systemd[1]: Caught , dumped core as pid 18055.

Broadcast message from systemd-journald@dcase-PowerEdge-R730 (Wed 2019-08-14 07:37:30 CST):

systemd[1]: Freezing execution.
请问您知道是什么原因么?

@Walleclipse
Copy link
Owner

我不太清楚。你报错的代码块是写入log的那一段,可能是你的操作系统没有写的权限?
你可以查看 pypa/virtualenv#209 或者 https://askubuntu.com/questions/1082876/oserror-errno-30-read-only-file-system 或者 https://forums.fast.ai/t/how-can-i-load-a-pretrained-model-on-kaggle-using-fastai/13941/4

@LittleMaWen
Copy link
Author

OKOK谢谢您!非常感谢您不厌其烦为我解答谢谢您!

@LittleMaWen
Copy link
Author

嗯嗯~学长,这个问题已经解决了,是我自己系统的问题。
现在我已经python train.py两天多了,程序还在运行,
2019-08-19 09:07:35,157 [INFO] train.py/main | == Presenting step #132313
2019-08-19 09:07:35,351 [INFO] train.py/main | == Processed in 0.19s by the network, training loss = 0.004015110433101654.
get batch time 4.53e-06s
forward process time 0.317s
beginning to select..........
select best batch time 0.71s
select_batch_time: 1.0472040176391602
这是最近一条显示结果,我看到您在其他问题中回答train.py是个无限循环的,需要自己kill掉,请问我现在这个结果是可以kill掉了吗?我不是很明白

@LittleMaWen
Copy link
Author

学长,我在训练到下面这个结果时kill掉了这个程序,
2019-08-20 12:24:52,587 [INFO] train.py/main | == Presenting step #211143
2019-08-20 12:24:52,776 [INFO] train.py/main | == Processed in 0.19s by the network, training loss = 0.0009784128051251173.
然后按照学长说的test_model.py了一下,得到了这个结果,
Found checkpoint [checkpoints/model_211000_0.00100.h5]. Resume from here...
Found 0000368 files with 00003 different speakers.
f-measure = 0.9999999999995, true positive rate = 1.0, accuracy = 0.99999999999998, equal error rate = 0.0
学长,请问是这样吗?

@Walleclipse
Copy link
Owner

应该是对的。但你这个是在完整的LibriSpeech数据上跑的还是我这个repo里面的LibriSpeechSamples上跑的?
你需要用完整的数据集。

@LittleMaWen
Copy link
Author

是的学长,谢谢您回复,我现在才打算用完整的数据集重新跑一遍。我下载了train-clean-100和train-clean-360,我看实验报告,如果我没有理解错的话,请问是否应该先pre_process.py 得到train-clean-100-npy,然后再train.py?
另外就是实验报告中提到的test-clean数据集,请问这是作为测试集的吗?它在哪一部分使用?在跑LibriSpeechSamples时,我也只看到用了train-clean-100中的一部分音频,没有找到test-clean的部分。

@Walleclipse
Copy link
Owner

Walleclipse commented Aug 26, 2019

  1. 是先通过 pre_process.py 来预处理数据,随后再train
  2. test_clean 数据集也是从 Librispeech 这里下载的,test_clean是专门用来测试的数据

@LittleMaWen
Copy link
Author

学长,抱歉打扰,之前的问题已解决。我用完整的数据集train-clean-100进行了python train.py,好像训练得太久了,得到的losses.txt文件结果:
17200,1.8035286664962769
17201,0.8399255871772766
17202,0.7849252820014954
17203,0.20637908577919006
......
200847,0.0009674201137386262
200848,0.0009672052692621946
200849,0.0009669908322393894
......
382718,0.2163151204586029
382719,0.2672955095767975
382720,0.5557892322540283
得到的train_acc_eer.txt结果:
17200,0.0,0.9999999999995,0.99999999999998
17210,0.0,0.9999999999995,0.99999999999998
......
211130,0.0,0.9999999999995,0.99999999999998
211140,0.0,0.9999999999995,0.99999999999998
211000,0.32981049562682213,0.1791044776114538,0.9607142857142851
211000,0.282798833819242,0.1739130434777845,0.9728571428571422
......
382440,0.03024781341107871,0.9152542372876369,0.9964285714285708
382450,0.008746355685131196,0.9285714285709286,0.9971428571428564
得到的acc_eer.txt结果:
17200,0.0,0.9999999999995,0.99999999999998
17200,0.0,0.9999999999995,0.99999999999998
......
210800,0.0,0.9999999999995,0.99999999999998
211000,0.0,0.9999999999995,0.99999999999998
211000,0.3589743589743589,0.12269938650270346,0.9266666666666662
211000,0.358451072736787,0.10714285714243434,0.9743589743589739
......
382200,0.0567765567765568,0.7297297297292311,0.9897435897435893
382400,0.13291470434327576,0.6666666666661832,0.9887179487179483
这时候我就kill掉了train.py,然后python test_model.py得到了结果:
f-measure = 0.5833333333328472, true positive rate = 0.5, accuracy = 0.9857142857142851, equal error rate = 0.08017492711370261
请问学长,我是不是训练得太久了?然后这个结果是这样吗?另外请问demo里面的两张图EER.png和loss.png是怎么得到的?我python utils.py好像什么都没得到?

@MAGUADIDI
Copy link

你好,我也想复现这个实验,可以指导一下吗?

@izhangy
Copy link

izhangy commented Dec 3, 2019

你好,在pre_training.py运行时出现:Found 0000368 files with 00003 different speakers.,一直不动了,请问是怎么回事?

@MAGUADIDI
Copy link

MAGUADIDI commented Dec 3, 2019 via email

@izhangy
Copy link

izhangy commented Dec 3, 2019

非常感谢您的回答!我增加点数据量再试试
还有一个问题,运行train.py后没有“checkpoints‘文件生成,请问这也是跟数据量有关吗?

@MAGUADIDI
Copy link

MAGUADIDI commented Dec 3, 2019 via email

@izhangy
Copy link

izhangy commented Dec 6, 2019

“程序运行两天了还没结束,是怎么回事呢?"
您好,跟楼主一样,我也遇到了这样的问题,请问怎么解决?

@MAGUADIDI
Copy link

MAGUADIDI commented Dec 6, 2019 via email

@izhangy
Copy link

izhangy commented Dec 6, 2019 via email

@yy835055664
Copy link

你好学长,我看了你复现那个Deep-speaker,也出结果了,我想问你几个问题?
1.checkpiont里面.h5文件怎么产生的?
2.下载的测试集是直接放到Libri文件里面吗,跟训练语音一块。测试程序需要改动吗?还是训练完了直接就可以测试了?
望学长能给予解答,谢谢!

@yy835055664
Copy link

学长,抱歉打扰,之前的问题已解决。我用完整的数据集train-clean-100进行了python train.py,好像训练得太久了,得到的losses.txt文件结果:
17200,1.8035286664962769
17201,0.8399255871772766
17202,0.7849252820014954
17203,0.20637908577919006
......
200847,0.0009674201137386262
200848,0.0009672052692621946
200849,0.0009669908322393894
......
382718,0.2163151204586029
382719,0.2672955095767975
382720,0.5557892322540283
得到的train_acc_eer.txt结果:
17200,0.0,0.9999999999995,0.99999999999998
17210,0.0,0.9999999999995,0.99999999999998
......
211130,0.0,0.9999999999995,0.99999999999998
211140,0.0,0.9999999999995,0.99999999999998
211000,0.32981049562682213,0.1791044776114538,0.9607142857142851
211000,0.282798833819242,0.1739130434777845,0.9728571428571422
......
382440,0.03024781341107871,0.9152542372876369,0.9964285714285708
382450,0.008746355685131196,0.9285714285709286,0.9971428571428564
得到的acc_eer.txt结果:
17200,0.0,0.9999999999995,0.99999999999998
17200,0.0,0.9999999999995,0.99999999999998
......
210800,0.0,0.9999999999995,0.99999999999998
211000,0.0,0.9999999999995,0.99999999999998
211000,0.3589743589743589,0.12269938650270346,0.9266666666666662
211000,0.358451072736787,0.10714285714243434,0.9743589743589739
......
382200,0.0567765567765568,0.7297297297292311,0.9897435897435893
382400,0.13291470434327576,0.6666666666661832,0.9887179487179483
这时候我就kill掉了train.py,然后python test_model.py得到了结果:
f-measure = 0.5833333333328472, true positive rate = 0.5, accuracy = 0.9857142857142851, equal error rate = 0.08017492711370261
请问学长,我是不是训练得太久了?然后这个结果是这样吗?另外请问demo里面的两张图EER.png和loss.png是怎么得到的?我python utils.py好像什么都没得到?

学长,抱歉打扰,之前的问题已解决。我用完整的数据集train-clean-100进行了python train.py,好像训练得太久了,得到的losses.txt文件结果:
17200,1.8035286664962769
17201,0.8399255871772766
17202,0.7849252820014954
17203,0.20637908577919006
......
200847,0.0009674201137386262
200848,0.0009672052692621946
200849,0.0009669908322393894
......
382718,0.2163151204586029
382719,0.2672955095767975
382720,0.5557892322540283
得到的train_acc_eer.txt结果:
17200,0.0,0.9999999999995,0.99999999999998
17210,0.0,0.9999999999995,0.99999999999998
......
211130,0.0,0.9999999999995,0.99999999999998
211140,0.0,0.9999999999995,0.99999999999998
211000,0.32981049562682213,0.1791044776114538,0.9607142857142851
211000,0.282798833819242,0.1739130434777845,0.9728571428571422
......
382440,0.03024781341107871,0.9152542372876369,0.9964285714285708
382450,0.008746355685131196,0.9285714285709286,0.9971428571428564
得到的acc_eer.txt结果:
17200,0.0,0.9999999999995,0.99999999999998
17200,0.0,0.9999999999995,0.99999999999998
......
210800,0.0,0.9999999999995,0.99999999999998
211000,0.0,0.9999999999995,0.99999999999998
211000,0.3589743589743589,0.12269938650270346,0.9266666666666662
211000,0.358451072736787,0.10714285714243434,0.9743589743589739
......
382200,0.0567765567765568,0.7297297297292311,0.9897435897435893
382400,0.13291470434327576,0.6666666666661832,0.9887179487179483
这时候我就kill掉了train.py,然后python test_model.py得到了结果:
f-measure = 0.5833333333328472, true positive rate = 0.5, accuracy = 0.9857142857142851, equal error rate = 0.08017492711370261
请问学长,我是不是训练得太久了?然后这个结果是这样吗?另外请问demo里面的两张图EER.png和loss.png是怎么得到的?我python utils.py好像什么都没得到?

你好学长,我看了你复现那个Deep-speaker,也出结果了,我想问你几个问题?
1.checkpiont里面.h5文件怎么产生的?
2.下载的测试集是直接放到Libri文件里面吗,跟训练语音一块。测试程序需要改动吗?还是训练完了直接就可以测试了?
望学长能给予解答,谢谢!

@MAGUADIDI
Copy link

MAGUADIDI commented Dec 10, 2019 via email

@yy835055664
Copy link

yy835055664 commented Dec 10, 2019 via email

@MAGUADIDI
Copy link

MAGUADIDI commented Dec 10, 2019 via email

@yy835055664
Copy link

yy835055664 commented Dec 13, 2019 via email

@LittleMaWen
Copy link
Author

请问为什么我用编号为1-3的人去训练,然后再用编号为4-5的人去测试,得到的结果却是f-measure = 0.9999999999995, true positive rate = 1.0, accuracy = 0.99999999999998, equal error rate = 0.0,这个结果不就代表着测试的人和训练的人是一样的吗?但实际上并不同啊?你们会这样吗?

@Walleclipse
Copy link
Owner

请问学长,我是不是训练得太久了?然后这个结果是这样吗?另外请问demo里面的两张图EER.png和loss.png是怎么得到的?我python utils.py好像什么都没得到?

是不是训练太久,你得看一下你的 EER.png 和 Loss.png 如果到后面下降了,就是训练太久了。我觉得你的结果可能没问题。Readme 里的图通过调用 utils.py 里面的 def plot_loss 来画 e.g.

import constants as c
from utils import plot_loss,plot_acc
loss_file=c.CHECKPOINT_FOLDER+'/losses.txt' # loss log file path
plot_loss(loss_file)
acc_file=c.CHECKPOINT_FOLDER+'/acc_eer.txt'# acc log file path
plot_acc(acc_file)

@Walleclipse
Copy link
Owner

你好学长: 我想请教你几个问题? 1.checkpiont里面.h5文件怎么产生的? 2.下载的测试集是直接放到Libri文件里面吗,跟训练语音一块。测试程序需要改动吗?还是训练完了直接就可以测试了? 望学长能给予解答,谢谢!

  1. 运行 train.py 以后每过 SAVE_PER_EPOCHS 步就会保存 checkpiont 里面.h5文件。默认情况下SAVE_PER_EPOCHS=200, 你可以在constants.py 里修改 SAVE_PER_EPOCHS
  2. 测试集放在哪里都可以,但是你需要修改一些路径。假设你把测试集的 wav文件放在了 "audio/LibriSpeechSamples/test-clean-100" ,需要把提其特征后的测试集放在“audio/LibriSpeechSamples/test-clean-100-npy/” 那么:
    1) 在 constants.py 中,设置第三行 TEST_DIR = 'audio/LibriSpeechSamples/test-clean-100-npy/'
    2)调用 pre_process.py 处理测试集
import constants as c
from pre_process import preprocess_and_save
test_wav_dir = "audio/LibriSpeechSamples/test-clean-100"
preprocess_and_save(wav_dir=test_wav_dir,out_dir=c.TEST_DIR )
train.py 就会在训练的过程中测试

@Walleclipse
Copy link
Owner

请问为什么我用编号为1-3的人去训练,然后再用编号为4-5的人去测试,得到的结果却是f-measure = 0.9999999999995, true positive rate = 1.0, accuracy = 0.99999999999998, equal error rate = 0.0,这个结果不就代表着测试的人和训练的人是一样的吗?但实际上并不同啊?你们会这样吗?

数据太少了,不能说明问题。还是建议下载完整的LibriSpeech数据再运行程序。

@yaoyao1206
Copy link

师兄用pre_process.py 预处理数据只能处理wav格式的数据吗,我下载了LibriSpeech数据集里面全是flac格式的,是要先把它转换成wav后再运行pre_process.py吗

@yy835055664
Copy link

yy835055664 commented Nov 25, 2020 via email

@izhangy
Copy link

izhangy commented Feb 5, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants