/home/deven
NOTE The module has 2 main functions: * create_dic
which creates individual .pkl
files (for each chapter of the book) based on breakpoints of chapters given by the use. (use when the dataset is very huge. Visualizing the entire heatmap does not give a lot of information. * create_dic_whole_book
creates a single .pkl
file for the entire book. To be used when the dataset is relatively small in size i.e. 2000 - 2500 sentences.
label (method:str)
Returns the full name of the model based on the abbreviation
Type | Details | |
---|---|---|
method | str | name of the method |
cos_sim (a:numpy.ndarray, b:numpy.ndarray)
Returns the cosine similarity between 2 vectors.
Type | Details | |
---|---|---|
a | np.ndarray | vector 1 |
b | np.ndarray | vector 2 |
successive_similarities (embeddings, k)
create_dict_whole_book (embedding_path:str='.', k:int=1)
Create pkl for time series from embeddings
Type | Default | Details | |
---|---|---|---|
embedding_path | str | . | path to the embeddings |
k | int | 1 | consecutive index |
create_label_whole_book (method, parent_dir)
create_label (index, method, parent_dir)
get_embed_method_and_name (fname)
Returns the name of the file and the method by splitting on the word ‘cleaned’
Type | Details | |
---|---|---|
fname | name of the file | |
Returns | (str, str) | name of file, embeddding method |
/home/deven
Book Name: A Modest Proposal
Found 10 methods
---------------------------------------------
Found DeCLUTR Small
Found RoBERTa
Found InferSent GloVe
Found InferSent FastText
Found DistilBERT
Found XLM
Found MPNet
Found USE
Found DeCLUTR Base
Found MiniLM
---------------------------------------------
Saved pkl at /home/deven/embeddings/A_Modest_Proposal/pkl