spec2vec.vector_operations module¶
- spec2vec.vector_operations.calc_vector(model: BaseTopicModel, document: Document, intensity_weighting_power: Union[float, int] = 0, allowed_missing_percentage: Union[float, int] = 10) ndarray [source]¶
Compute document vector as a (weighted) sum of individual word vectors.
- Parameters
model – Pretrained word2vec model to convert words into vectors.
document – Document containing document.words and document.weights.
intensity_weighting_power – Specify to what power weights should be raised. The default is 0, which means that no weighing will be done.
allowed_missing_percentage – Set the maximum allowed percentage of the document that may be missing from the input model. This is measured as percentage of the weighted, missing words compared to all word vectors of the document. Default is 10, which means up to 10% missing words are allowed. If more words are missing from the model, an empty embedding will be returned (leading to similarities of 0) and a warning is raised.
- Returns
Vector representing the input document in latent space. Will return None if the missing percentage of the document in the model is > allowed_missing_percentage.
- Return type
vector
- spec2vec.vector_operations.cosine_similarity(vector1: ndarray, vector2: ndarray) float64 [source]¶
Calculate cosine similarity between two input vectors.
For example:
import numpy as np from spec2vec.vector_operations import cosine_similarity vector1 = np.array([1, 1, 0, 0]) vector2 = np.array([1, 1, 1, 1]) print("Cosine similarity: {:.3f}".format(cosine_similarity(vector1, vector2)))
Should output
Cosine similarity: 0.707
- Parameters
vector1 – Input vector. Can be array of integers or floats.
vector2 – Input vector. Can be array of integers or floats.
- spec2vec.vector_operations.cosine_similarity_matrix(vectors_1: ndarray, vectors_2: ndarray) ndarray [source]¶
Fast implementation of cosine similarity between two arrays of vectors.
For example:
import numpy as np from spec2vec.vector_operations import cosine_similarity_matrix vectors_1 = np.array([[1, 1, 0, 0], [1, 0, 1, 1]]) vectors_2 = np.array([[0, 1, 1, 0], [0, 0, 1, 1]]) similarity_matrix = cosine_similarity_matrix(vectors_1, vectors_2)
- Parameters
vectors_1 – Numpy array of vectors. vectors_1.shape[0] is number of vectors, vectors_1.shape[1] is vector dimension.
vectors_2 – Numpy array of vectors. vectors_2.shape[0] is number of vectors, vectors_2.shape[1] is vector dimension.