加入星計(jì)劃,您可以享受以下權(quán)益:

  • 創(chuàng)作內(nèi)容快速變現(xiàn)
  • 行業(yè)影響力擴(kuò)散
  • 作品版權(quán)保護(hù)
  • 300W+ 專(zhuān)業(yè)用戶(hù)
  • 1.5W+ 優(yōu)質(zhì)創(chuàng)作者
  • 5000+ 長(zhǎng)期合作伙伴
立即加入
  • 正文
  • 推薦器件
  • 相關(guān)推薦
  • 電子產(chǎn)業(yè)圖譜
申請(qǐng)入駐 產(chǎn)業(yè)圖譜

阿里云天池大賽賽題(深度學(xué)習(xí))——人工智能輔助構(gòu)建知識(shí)圖譜(完整代碼)

05/02 08:56
2022
閱讀需 10 分鐘
加入交流群
掃碼加入
獲取工程師必備禮包
參與熱點(diǎn)資訊討論
# 導(dǎo)入所需文件
import numpy as np
from sklearn.model_selection import ShuffleSplit
from data_utils import ENTITIES, Documents, Dataset, SentenceExtractor, make_predictions
from data_utils import Evaluator
from gensim.models import Word2Vec
# 數(shù)據(jù)文件讀取
data_dir = "./data/train"
ent2idx = dict(zip(ENTITIES, range(1, len(ENTITIES) + 1)))
idx2ent = dict([(v, k) for k, v in ent2idx.items()])
# 訓(xùn)練集,測(cè)試集切分與打亂
docs = Documents(data_dir=data_dir)
rs = ShuffleSplit(n_splits=1, test_size=20, random_state=2018)
train_doc_ids, test_doc_ids = next(rs.split(docs))
train_docs, test_docs = docs[train_doc_ids], docs[test_doc_ids]
# 模型參數(shù)賦值
num_cates = max(ent2idx.values()) + 1
sent_len = 64
vocab_size = 3000
emb_size = 100
sent_pad = 10
sent_extrator = SentenceExtractor(window_size=sent_len, pad_size=sent_pad)
train_sents = sent_extrator(train_docs)
test_sents = sent_extrator(test_docs)

train_data = Dataset(train_sents, cate2idx=ent2idx)
train_data.build_vocab_dict(vocab_size=vocab_size)

test_data = Dataset(test_sents, word2idx=train_data.word2idx, cate2idx=ent2idx)
vocab_size = len(train_data.word2idx)
# 構(gòu)建詞嵌入模型
w2v_train_sents = []
for doc in docs:
    w2v_train_sents.append(list(doc.text))
w2v_model = Word2Vec(w2v_train_sents, size=emb_size)

w2v_embeddings = np.zeros((vocab_size, emb_size))
for char, char_idx in train_data.word2idx.items():
    if char in w2v_model.wv:
        w2v_embeddings[char_idx] = w2v_model.wv[char]
# 構(gòu)建雙向長(zhǎng)短時(shí)記憶模型模型加crf模型
import keras
from keras.layers import Input, LSTM, Embedding, Bidirectional
from keras_contrib.layers import CRF
from keras.models import Model


def build_lstm_crf_model(num_cates, seq_len, vocab_size, model_opts=dict()):
    opts = {
        'emb_size': 256,
        'emb_trainable': True,
        'emb_matrix': None,
        'lstm_units': 256,
        'optimizer': keras.optimizers.Adam()
    }
    opts.update(model_opts)

    input_seq = Input(shape=(seq_len,), dtype='int32')
    if opts.get('emb_matrix') is not None:
        embedding = Embedding(vocab_size, opts['emb_size'], 
                              weights=[opts['emb_matrix']],
                              trainable=opts['emb_trainable'])
    else:
        embedding = Embedding(vocab_size, opts['emb_size'])
    x = embedding(input_seq)
    lstm = LSTM(opts['lstm_units'], return_sequences=True)
    x = Bidirectional(lstm)(x)
    crf = CRF(num_cates, sparse_target=True)
    output = crf(x)

    model = Model(input_seq, output)
    model.compile(opts['optimizer'], loss=crf.loss_function, metrics=[crf.accuracy])
    return model
# 雙向長(zhǎng)短時(shí)記憶模型+CRF條件隨機(jī)場(chǎng)實(shí)例化
seq_len = sent_len + 2 * sent_pad
model = build_lstm_crf_model(num_cates, seq_len=seq_len, vocab_size=vocab_size, 
                             model_opts={'emb_matrix': w2v_embeddings, 'emb_size': 100, 'emb_trainable': False})
model.summary()
# 訓(xùn)練集,測(cè)試集形狀
train_X, train_y = train_data[:]
print('train_X.shape', train_X.shape)
print('train_y.shape', train_y.shape)
# 雙向長(zhǎng)短時(shí)記憶模型與條件隨機(jī)場(chǎng)模型訓(xùn)練
model.fit(train_X, train_y, batch_size=64, epochs=10)
# 模型預(yù)測(cè)
test_X, _ = test_data[:]
preds = model.predict(test_X, batch_size=64, verbose=True)
pred_docs = make_predictions(preds, test_data, sent_pad, docs, idx2ent)
# 輸出評(píng)價(jià)指標(biāo)
f_score, precision, recall = Evaluator.f1_score(test_docs, pred_docs)
print('f_score: ', f_score)
print('precision: ', precision)
print('recall: ', recall)
# 測(cè)試樣本展示
sample_doc_id = list(pred_docs.keys())[3]
test_docs[sample_doc_id]
# 測(cè)試結(jié)果展示
pred_docs[sample_doc_id]

以上代碼全部來(lái)自于《阿里云天池大賽賽題解析(深度學(xué)習(xí)篇)》這本好書(shū),十分推薦大家去閱讀原書(shū)!

推薦器件

更多器件
器件型號(hào) 數(shù)量 器件廠商 器件描述 數(shù)據(jù)手冊(cè) ECAD模型 風(fēng)險(xiǎn)等級(jí) 參考價(jià)格 更多信息
LAN8710AI-EZK-TR 1 Microchip Technology Inc DATACOM, ETHERNET TRANSCEIVER, QCC32, 5 X 5 MM, 0.90 MM HEIGHT, ROHS COMPLIANT, QFN-32

ECAD模型

下載ECAD模型
$1.56 查看
TJA1040T/CM,118 1 NXP Semiconductors TJA1040 - High-speed CAN transceiver with standby mode SOIC 8-Pin

ECAD模型

下載ECAD模型
$2.24 查看
VOD217T 1 Vishay Intertechnologies VOD205T, VOD206T, VOD207T, VOD211T, VOD213T, VOD217T Optocoupler, Phototransistor Output, Dual Channel, SOIC-8 Package

ECAD模型

下載ECAD模型
$0.76 查看

相關(guān)推薦

電子產(chǎn)業(yè)圖譜