使用paddleHub进行中文文本添加标点符号,这个txt有几万字,但是paddleHub似乎限制了字数 #2193
CDreamlong
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
大家好,我使用paddleHub进行中文文本添加标点符号,这个txt有几万字,但是paddleHub似乎限制了字数。代码如下:
`import paddlehub as hub
model = hub.Module(name='auto_punc', version='1.0.0')
def addpunc(txtpath, savetxt):
f = open(txtpath, encoding = "utf-8")
# 输出读取到的数据
txtstr = f.read().split("\n")
punc_texts = model.add_puncs(txtstr)
f.close()
str1 = "\n".join(punc_texts)
print('转换成功:', str1)
with open(savetxt, "a", encoding='utf-8') as fc:
fc.write(str1) # 写入文件
fc.write("\n\n")
fc.close()
if name == 'main':
# 存放要加标点符号的文字
txtpath = r'E:\指导 (Transcribed on 08-Jan-2023 16-02-32).txt'
# 保存加号标点符号的文字
savetxt = r'D:\english\punc.txt'
# 调用方法
addpunc(txtpath, savetxt)`
这个代码只添加标点了前面一部分文字,不能整个txt添加标点
Beta Was this translation helpful? Give feedback.
All reactions