CRISPR-Cas systems have catalyzed a quantum leap in genomics, providing a remarkable degree of precision in genomic sequence alterations. Despite these advancements, one notable conundrum persists - the prediction and mitigation of off-target effects. These unintended effects occur when the CRISPR apparatus interacts with homologous, albeit non-identical sequences, which could potentially induce unanticipated mutations. This investigation employs a comparative approach, juxtaposing machine-learning models that harness sophisticated Convolutional Neural Networks with different architectures. The standard CNN model utilizes a stratified architecture that meticulously filters pertinent genetic patterns, and identifies salient genetic features using a convolutional layer along with a batch normalization layer and a max pooling layer. This architecture prevents overfitting by applying a dropout layer, and 2 dense layers to categorize the two outputs. In contrast, the AttnToMismatch_CNN model marries the attention mechanism with the convolutional paradigm to encode sgRNA and DNA sequences into vector representations by using embedding and transformer layers. Which is then passed into a convolution layer and then to a dense layer to result in two outputs. Performance appraisal of the models, through the Area Under the Curve (AUC) of the Receiver Operating Characteristic Curve (AUC-ROC) score, indicated the standard CNN model's superior predictive accuracy. This research accentuates the untapped potential of Convolutional Neural Networks in augmenting the predictability of off-target effects in CRISPR-Cas systems, thereby fostering safer and more efficacious applications of this transformative gene editing tool.
By: Parsh Verma