要将包含分号分隔列表的向量转换为存在/不存在矩阵,可以按照以下步骤进行:
import numpy as np
def vector_to_existence_matrix(vector):
# Step 1: Parse the semicolon-separated lists
parsed_values = [item.split(';') for item in vector]
# Step 2: Generate a set of unique values
unique_values = set(value for sublist in parsed_values for value in sublist)
# Step 3: Build the existence matrix
num_rows = len(vector)
num_cols = len(unique_values)
existence_matrix = np.zeros((num_rows, num_cols), dtype=int)
# Create a mapping from unique values to column indices
value_to_col = {value: idx for idx, value in enumerate(sorted(unique_values))}
for i, values in enumerate(parsed_values):
for value in values:
if value in value_to_col:
col_idx = value_to_col[value]
existence_matrix[i, col_idx] = 1
return existence_matrix, sorted(unique_values)
# Example usage
vector = ["a;b;c", "b;d;e", "a;e"]
matrix, columns = vector_to_existence_matrix(vector)
print("Existence Matrix:")
print(matrix)
print("Columns:", columns)
Existence Matrix:
[[1 0 1 0 0]
[0 1 0 1 0]
[1 0 0 0 1]]
Columns: ['a', 'b', 'c', 'd', 'e']
通过上述步骤和代码示例,可以有效地将包含分号分隔列表的向量转换为存在/不存在矩阵,并应用于多种实际场景中。
领取专属 10元无门槛券
手把手带您无忧上云