I want to run some machine learning algorithms such as PCA and KNN with a relatively large dataset of images (>2000 rgb images) in order to classify these images.
My source code is the following:
import cv2 import numpy as np import os from glob import glob from sklearn.decomposition import PCA from sklearn import neighbors from sklearn import preprocessing data = [] # Read images from file for filename in glob('Tables/*.jpg'): img = cv2.imread(filename) height, width = img.shape[:2] img = np.array(img) # Check that all my images are of the same resolution if height == 529 and width == 940: # Reshape each image so that it is stored in one line img = np.concatenate(img, axis=0) img = np.concatenate(img, axis=0) data.append(img) # Normalise data data = np.array(data) Norm = preprocessing.Normalizer() Norm.fit(data) data = Norm.transform(data) # PCA model pca = PCA(0.95) pca.fit(data) data = pca.transform(data) # K-Nearest neighbours knn = neighbors.NearestNeighbors(n_neighbors=4, algorithm='ball_tree', metric='minkowski').fit(data) distances, indices = knn.kneighbors(data) print(indices)
However, my laptop is not sufficient for this task so I need to use the computational resources of an online platform (e.g. like the ones provided by GCP).
However, Google Cloud Platform is like a labyrinth at first-sight. How can I simply use some of the resources of GCP (faster CPUs, GPU etc) to run my source code above?