spec2vec.vector_operations module

spec2vec.vector_operations.calc_vector(model: BaseTopicModel, document: Document, intensity_weighting_power: Union[float, int] = 0, allowed_missing_percentage: Union[float, int] = 10) ndarray[source]

Compute document vector as a (weighted) sum of individual word vectors.

Parameters
  • model – Pretrained word2vec model to convert words into vectors.

  • document – Document containing document.words and document.weights.

  • intensity_weighting_power – Specify to what power weights should be raised. The default is 0, which means that no weighing will be done.

  • allowed_missing_percentage – Set the maximum allowed percentage of the document that may be missing from the input model. This is measured as percentage of the weighted, missing words compared to all word vectors of the document. Default is 10, which means up to 10% missing words are allowed. If more words are missing from the model, an empty embedding will be returned (leading to similarities of 0) and a warning is raised.

Returns

Vector representing the input document in latent space. Will return None if the missing percentage of the document in the model is > allowed_missing_percentage.

Return type

vector

spec2vec.vector_operations.cosine_similarity(vector1: ndarray, vector2: ndarray) float64[source]

Calculate cosine similarity between two input vectors.

For example:

import numpy as np
from spec2vec.vector_operations import cosine_similarity

vector1 = np.array([1, 1, 0, 0])
vector2 = np.array([1, 1, 1, 1])
print("Cosine similarity: {:.3f}".format(cosine_similarity(vector1, vector2)))

Should output

Cosine similarity: 0.707
Parameters
  • vector1 – Input vector. Can be array of integers or floats.

  • vector2 – Input vector. Can be array of integers or floats.

spec2vec.vector_operations.cosine_similarity_matrix(vectors_1: ndarray, vectors_2: ndarray) ndarray[source]

Fast implementation of cosine similarity between two arrays of vectors.

For example:

import numpy as np
from spec2vec.vector_operations import cosine_similarity_matrix

vectors_1 = np.array([[1, 1, 0, 0],
                      [1, 0, 1, 1]])
vectors_2 = np.array([[0, 1, 1, 0],
                      [0, 0, 1, 1]])
similarity_matrix = cosine_similarity_matrix(vectors_1, vectors_2)
Parameters
  • vectors_1 – Numpy array of vectors. vectors_1.shape[0] is number of vectors, vectors_1.shape[1] is vector dimension.

  • vectors_2 – Numpy array of vectors. vectors_2.shape[0] is number of vectors, vectors_2.shape[1] is vector dimension.