We evaluated the transcriptomic response of the central nervous system (CNS) and eyes of male two-toned pygmy squid (Idiosepius pygmaeus) exposed to elevated (~1,000 µatm) CO2 for seven days compared with current-day (~450 µatm) controls. As a reference for gene expression quantification, we assembled a high quality, annotated de novo transcriptome of I. pygmaeus CNS and eye tissues using long read PacBio Iso-sequencing data. Differential expression analysis was carried out to determine which genes were differentially expressed between current-day and elevated CO2 conditions, in the CNS and eyes. Gene set enrichment analysis was carried out to determine if sets of genes from the same gene ontology (GO) term/functional group showed significant, concordant differences between current-day and elevated CO2 conditions, in the CNS and eyes.
de novo transcritpome assembly: ISO-seq data was processed using the PacBio isoseq3 pipeline: ccs (v4.2.0) with the minimum number of full passes set at three and the minimum predicted accuracy of a read at 0.9, lima (v1.11.0) with ‘--peek-guess’, isoseq3 refine (v3.3.0), isoseq3 cluster (v3.3.0) with ‘--use-qvs’. Redundancy removal: CD-HIT-EST (v4.6) with at least 99% identity. TransDecoder (v5.5.0) to identify open reading frames (ORFs): single best ORF per contig was chosen based on blast homology to known proteins in the NCBI nr database subset for mollusca (nr_mollusca, downloaded 01/2021) using BLASTp from BLAST+ (v2.10.0+) with max_target_seqs 1 and an e-value cut-off of 1-5, and then based on ORF length (minimum 100 amino acids). The entire transcript was retained for each identified ORF. Annotation: transcriptome blasted against entire NCBI nr database (downloaded 01/2021) using BLASTx from BLAST+ (v2.10.0+) with an e-value cut-off of 1-5, outfmt 14, and ‘-num-alignments’ and ‘-max_hsps’ both set at 20. Functional annotation in OmicsBox (v1.4.12) using BLAST2GO mapping (Goa version 2020.10, all default settings), BLAST2GO annotation (all default settings) and InterProScan (v5.50-84.0, all default settings). The gene sets (blast2go_gene_sets.txt) were exported from the annotated transcriptome in OmicsBox with File > Export > Export annotations > Export Sequences per GO (Gene Sets) - this file was used when creating the network to visualise the gene set enrichment analysis.
RNA-seq read pre-processing and mapping: Trimming: Fastp (v0.21.1) witha sliding window of 4 bp, a mean Phred score of 30 and reads < 30 bp were trimmed. Decontamination: Kraken2 (v2.0.9) with a confidence of 0.3 using the NCBI bacterial and archaeal reference libraries (downloaded 08/2020). Trimmed and decontaminated RNA-seq reads mapped against the transcriptome: Salmon (v1.3.0) with correction for sequence-specific biases and fragment-level GC biases, quantification step skipped, and flags ‘--validateMappings’ and ‘--hardFilter’. Produce gene level counts: Corset (v1.09) on the salmon equivalence class files, the four groups were defined (eyes current-day CO2, eyes elevated CO2, CNS current-day CO2 and CNS elevated CO2), the log likelihood ratio test was switched off and the link between contigs was filtered out if the link was supported by < 10 reads.
Differential expression analysis: In R (v4.0.4), using RStudio (v 1.4.1106), DESeq2 (v1.30.1) using the Wald test was used to compare gene expression between current-day and elevated CO2 conditions for the CNS and eyes separately.
Gene set enrichment analysis (GSEA): In R (v4.0.4), using RStudio (v 1.4.1106), unweighted GSEA was run in clusterProfiler (v3.18.1) using the DESeq2 log2 fold-change values of all genes and the annotated GO terms as the ‘gene sets’, for the CNS and eyes separately. Minimum and maximum gene set size of 15 and 500, respectively. P-values adjusted for multiple comparisons using the Benjamini-Hochberg method and a significance threshold of padj < 0.05 was used. The GSEA results were imported into Cytoscape (v3.8.2) where EnrichmentMap (v3.3.1) was used to create a network to visualise the GSEA results.
Water Sampling: To evaluate the magnitude of natural diel CO2 fluctuations and the ecological relevance of our experimental CO2 treatment levels, water samples were taken from the same location where two-toned pygmy squid (Idiosepius pygmaeus) were collected. Collected with 250 mL borosilicate glass bottles at an approximate depth of 25 cm. At sampling location: water temperature (Comark C26, Norfolk, UK) and pHNBS (Seven2Go™ pro Conductivity Meter with an InLab Expert Go-ISM pH electrode, Metler Toledo). Lab measurements: total alkalinity by Gran titration (888 Titrando, Metrohm AG, Switzerland), salinity with a conductivity sensor (HQ40d, Hach, Loveland, CO, USA). CO2 values were calculated in CO2SYS v.2.1 (https://cdiac.ess-dive.lbl.gov/ftp/co2sys/CO2SYS_calc_XLS_v2.1/) using the constants K1, K2 from Mehrbach et al. (1973) and refit by Dickson and Millero (1987) and KHSO4 from Dickson et al. (2007).
This data record contains:
1) All the scripts used for bioinformatic analyses.
2) The annotated transcriptome assembly of I. pygmaeus CNS and eye tissues (.box (OmicsBox file) and .csv file)
3) All R code used for the statistical analyses. (R_code_transcriptomic_response_squid_nervous_system_elevated_CO2.html)
4) Data files to accompany the statistical analyses = corset gene count and cluster data (corset-clusters.txt, corset-counts.txt), gene sets (blast2go_gene_sets.txt), species distribution of top blast hits (top_hit_species_distribution_annotated_transcriptome_orfs.txt) and metadata (metadata.csv)
5) Raw water sampling data (Water_sampling_raw_data.xlsx)
The raw RNA-seq and ISO-seq data used for these analyses, as well as the fasta file of the transcriptome assembly can be found at NCBI BioProject PRJNA798187
Software/equipment used to create/collect the data: de novo Transcritpome Assembly Software: ccs (v4.2.0), lima (v1.11.0), isoseq3 refine (v3.3.0), isoseq3 cluster (v3.3.0), CD-HIT-EST (v4.6), TransDecoder (v5.5.0), NCBI nr database subset for mollusca (nr_mollusca, downloaded 01/2021), BLASTp from BLAST+ (v2.10.0+).
Transcriptome Annotation Software: NCBI nr database (downloaded 01/2021), BLASTx from BLAST+ (v2.10.0+), OmicsBox (v1.4.12), BLAST2GO (Goa version 2020.10), InterProScan (v5.50-84.0).
RNA-seq Read Pre-processing Software: FastQC (v0.11.9), MultiQC (v1.9), Fastp (v0.21.1), Kraken2 (v2.0.9), NCBI bacterial and archaeal reference libraries (downloaded 08/2020).
Read Mapping Software: Salmon (v1.3.0), Corset (v1.09)
Statistical Analysis Software: R (v4.0.4), RStudio (v 1.4.1106), DESeq2 (v1.30.1), clusterProfiler (v3.18.1), Cytoscape (v3.8.2), EnrichmentMap (v3.3.1)
Water Sampling Equipment: 250 mL borosilicate glass bottles Temperature - Comark C26, Norfolk, UK pHNBS - Seven2Go™ pro Conductivity Meter with an InLab Expert Go-ISM pH electrode, Metler Toledo Total alkalinity by Gran titration - 888 Titrando, Metrohm AG, Switzerland Salinity with a conductivity sensor - HQ40d, Hach, Loveland, CO, USA
Water Sampling Software: pCO2 values - CO2SYS v.2.1 (https://cdiac.ess-dive.lbl.gov/ftp/co2sys/CO2SYS_calc_XLS_v2.1/) using the constants K1, K2 from Mehrbach et al. (1973) and refit by Dickson and Millero (1987) and KHSO4 from Dickson et al. (2007).