Exporting to pickle

This module contains the functions to export all the embeddings to a time series format, group them together and export it as a pickle file

NOTE The module has 2 main functions: * create_dic which creates individual .pkl files (for each chapter of the book) based on breakpoints of chapters given by the use. (use when the dataset is very huge. Visualizing the entire heatmap does not give a lot of information. * create_dic_whole_book creates a single .pkl file for the entire book. To be used when the dataset is relatively small in size i.e. 2000 - 2500 sentences.


source

label

 label (method:str)

Returns the full name of the model based on the abbreviation

Type Details
method str name of the method

source

cos_sim

 cos_sim (a:numpy.ndarray, b:numpy.ndarray)

Returns the cosine similarity between 2 vectors.

Type Details
a np.ndarray vector 1
b np.ndarray vector 2

source

successive_similarities

 successive_similarities (embeddings, k)

source

create_dict_whole_book

 create_dict_whole_book (embedding_path:str='.', k:int=1)

Create pkl for time series from embeddings

Type Default Details
embedding_path str . path to the embeddings
k int 1 consecutive index

source

create_label_whole_book

 create_label_whole_book (method, parent_dir)

source

create_label

 create_label (index, method, parent_dir)

source

get_embed_method_and_name

 get_embed_method_and_name (fname)

Returns the name of the file and the method by splitting on the word ‘cleaned

Type Details
fname name of the file
Returns (str, str) name of file, embeddding method
/home/deven
create_dict_whole_book('embeddings/A_Modest_Proposal', 1)
Book Name: A Modest Proposal
Found 10 methods
---------------------------------------------
Found DeCLUTR Small
Found RoBERTa
Found InferSent GloVe
Found InferSent FastText
Found DistilBERT
Found XLM
Found MPNet
Found USE
Found DeCLUTR Base
Found MiniLM
---------------------------------------------
Saved pkl at /home/deven/embeddings/A_Modest_Proposal/pkl