Two-toned pygmy squid (Idiosepius pygmaeus) transcriptome assembly, and transcriptomic response of the nervous system to elevated CO2

We evaluated the transcriptomic response of the central nervous system (CNS) and eyes of male two-toned pygmy squid (Idiosepius pygmaeus) exposed to elevated (~1,000 µatm) CO2 for seven days compared with current-day (~450 µatm) controls. As a reference for gene expression quantification, we assembled a high quality, annotated de novo transcriptome of I. pygmaeus CNS and eye tissues using long read PacBio Iso-sequencing data. Differential expression analysis was carried out to determine which genes were differentially expressed between current-day and elevated CO2 conditions, in the CNS and eyes. Gene set enrichment analysis was carried out to determine if sets of genes from the same gene ontology (GO) term/functional group showed significant, concordant differences between current-day and elevated CO2 conditions, in the CNS and eyes.

de novo transcritpome assembly: ISO-seq data was processed using the PacBio isoseq3 pipeline: ccs (v4.2.0) with the minimum number of full passes set at three and the minimum predicted accuracy of a read at 0.9, lima (v1.11.0) with ‘--peek-guess’, isoseq3 refine (v3.3.0), isoseq3 cluster (v3.3.0) with ‘--use-qvs’. Redundancy removal: CD-HIT-EST (v4.6) with at least 99% identity. TransDecoder (v5.5.0) to identify open reading frames (ORFs): single best ORF per contig was chosen based on blast homology to known proteins in the NCBI nr database subset for mollusca (nr_mollusca, downloaded 01/2021) using BLASTp from BLAST+ (v2.10.0+) with max_target_seqs 1 and an e-value cut-off of 1-5, and then based on ORF length (minimum 100 amino acids). The entire transcript was retained for each identified ORF. Annotation: transcriptome blasted against entire NCBI nr database (downloaded 01/2021) using BLASTx from BLAST+ (v2.10.0+) with an e-value cut-off of 1-5, outfmt 14, and ‘-num-alignments’ and ‘-max_hsps’ both set at 20. Functional annotation in OmicsBox (v1.4.12) using BLAST2GO mapping (Goa version 2020.10, all default settings), BLAST2GO annotation (all default settings) and InterProScan (v5.50-84.0, all default settings). The gene sets (blast2go_gene_sets.txt) were exported from the annotated transcriptome in OmicsBox with File > Export > Export annotations > Export Sequences per GO (Gene Sets) - this file was used when creating the network to visualise the gene set enrichment analysis.

RNA-seq read pre-processing and mapping: Trimming: Fastp (v0.21.1) witha sliding window of 4 bp, a mean Phred score of 30 and reads < 30 bp were trimmed. Decontamination: Kraken2 (v2.0.9) with a confidence of 0.3 using the NCBI bacterial and archaeal reference libraries (downloaded 08/2020). Trimmed and decontaminated RNA-seq reads mapped against the transcriptome: Salmon (v1.3.0) with correction for sequence-specific biases and fragment-level GC biases, quantification step skipped, and flags ‘--validateMappings’ and ‘--hardFilter’. Produce gene level counts: Corset (v1.09) on the salmon equivalence class files, the four groups were defined (eyes current-day CO2, eyes elevated CO2, CNS current-day CO2 and CNS elevated CO2), the log likelihood ratio test was switched off and the link between contigs was filtered out if the link was supported by < 10 reads.

Differential expression analysis: In R (v4.0.4), using RStudio (v 1.4.1106), DESeq2 (v1.30.1) using the Wald test was used to compare gene expression between current-day and elevated CO2 conditions for the CNS and eyes separately.

Gene set enrichment analysis (GSEA): In R (v4.0.4), using RStudio (v 1.4.1106), unweighted GSEA was run in clusterProfiler (v3.18.1) using the DESeq2 log2 fold-change values of all genes and the annotated GO terms as the ‘gene sets’, for the CNS and eyes separately. Minimum and maximum gene set size of 15 and 500, respectively. P-values adjusted for multiple comparisons using the Benjamini-Hochberg method and a significance threshold of padj < 0.05 was used. The GSEA results were imported into Cytoscape (v3.8.2) where EnrichmentMap (v3.3.1) was used to create a network to visualise the GSEA results.

Water Sampling: To evaluate the magnitude of natural diel CO2 fluctuations and the ecological relevance of our experimental CO2 treatment levels, water samples were taken from the same location where two-toned pygmy squid (Idiosepius pygmaeus) were collected. Collected with 250 mL borosilicate glass bottles at an approximate depth of 25 cm. At sampling location: water temperature (Comark C26, Norfolk, UK) and pHNBS (Seven2Go™ pro Conductivity Meter with an InLab Expert Go-ISM pH electrode, Metler Toledo). Lab measurements: total alkalinity by Gran titration (888 Titrando, Metrohm AG, Switzerland), salinity with a conductivity sensor (HQ40d, Hach, Loveland, CO, USA). CO2 values were calculated in CO2SYS v.2.1 (https://cdiac.ess-dive.lbl.gov/ftp/co2sys/CO2SYS_calc_XLS_v2.1/) using the constants K1, K2 from Mehrbach et al. (1973) and refit by Dickson and Millero (1987) and KHSO4 from Dickson et al. (2007).

This data record contains:

1) All the scripts used for bioinformatic analyses.

2) The annotated transcriptome assembly of I. pygmaeus CNS and eye tissues (.box (OmicsBox file) and .csv file)

3) All R code used for the statistical analyses. (R_code_transcriptomic_response_squid_nervous_system_elevated_CO2.html)

4) Data files to accompany the statistical analyses = corset gene count and cluster data (corset-clusters.txt, corset-counts.txt), gene sets (blast2go_gene_sets.txt), species distribution of top blast hits (top_hit_species_distribution_annotated_transcriptome_orfs.txt) and metadata (metadata.csv)

5) Raw water sampling data (Water_sampling_raw_data.xlsx)

The raw RNA-seq and ISO-seq data used for these analyses, as well as the fasta file of the transcriptome assembly can be found at NCBI BioProject PRJNA798187

Software/equipment used to create/collect the data: de novo Transcritpome Assembly Software: ccs (v4.2.0), lima (v1.11.0), isoseq3 refine (v3.3.0), isoseq3 cluster (v3.3.0), CD-HIT-EST (v4.6), TransDecoder (v5.5.0), NCBI nr database subset for mollusca (nr_mollusca, downloaded 01/2021), BLASTp from BLAST+ (v2.10.0+).

Transcriptome Annotation Software: NCBI nr database (downloaded 01/2021), BLASTx from BLAST+ (v2.10.0+), OmicsBox (v1.4.12), BLAST2GO (Goa version 2020.10), InterProScan (v5.50-84.0).

RNA-seq Read Pre-processing Software: FastQC (v0.11.9), MultiQC (v1.9), Fastp (v0.21.1), Kraken2 (v2.0.9), NCBI bacterial and archaeal reference libraries (downloaded 08/2020).

Read Mapping Software: Salmon (v1.3.0), Corset (v1.09)

Statistical Analysis Software: R (v4.0.4), RStudio (v 1.4.1106), DESeq2 (v1.30.1), clusterProfiler (v3.18.1), Cytoscape (v3.8.2), EnrichmentMap (v3.3.1)

Water Sampling Equipment: 250 mL borosilicate glass bottles Temperature - Comark C26, Norfolk, UK pHNBS - Seven2Go™ pro Conductivity Meter with an InLab Expert Go-ISM pH electrode, Metler Toledo Total alkalinity by Gran titration - 888 Titrando, Metrohm AG, Switzerland Salinity with a conductivity sensor - HQ40d, Hach, Loveland, CO, USA

Water Sampling Software: pCO2 values - CO2SYS v.2.1 (https://cdiac.ess-dive.lbl.gov/ftp/co2sys/CO2SYS_calc_XLS_v2.1/) using the constants K1, K2 from Mehrbach et al. (1973) and refit by Dickson and Millero (1987) and KHSO4 from Dickson et al. (2007).

    Data Record Details
    Data record related to this publication Two-toned pygmy squid (Idiosepius pygmaeus) transcriptome assembly, and transcriptomic response of the nervous system to elevated CO2
    Data Publication title Two-toned pygmy squid (Idiosepius pygmaeus) transcriptome assembly, and transcriptomic response of the nervous system to elevated CO2
  • Description

    We evaluated the transcriptomic response of the central nervous system (CNS) and eyes of male two-toned pygmy squid (Idiosepius pygmaeus) exposed to elevated (~1,000 µatm) CO2 for seven days compared with current-day (~450 µatm) controls. As a reference for gene expression quantification, we assembled a high quality, annotated de novo transcriptome of I. pygmaeus CNS and eye tissues using long read PacBio Iso-sequencing data. Differential expression analysis was carried out to determine which genes were differentially expressed between current-day and elevated CO2 conditions, in the CNS and eyes. Gene set enrichment analysis was carried out to determine if sets of genes from the same gene ontology (GO) term/functional group showed significant, concordant differences between current-day and elevated CO2 conditions, in the CNS and eyes.

    de novo transcritpome assembly: ISO-seq data was processed using the PacBio isoseq3 pipeline: ccs (v4.2.0) with the minimum number of full passes set at three and the minimum predicted accuracy of a read at 0.9, lima (v1.11.0) with ‘--peek-guess’, isoseq3 refine (v3.3.0), isoseq3 cluster (v3.3.0) with ‘--use-qvs’. Redundancy removal: CD-HIT-EST (v4.6) with at least 99% identity. TransDecoder (v5.5.0) to identify open reading frames (ORFs): single best ORF per contig was chosen based on blast homology to known proteins in the NCBI nr database subset for mollusca (nr_mollusca, downloaded 01/2021) using BLASTp from BLAST+ (v2.10.0+) with max_target_seqs 1 and an e-value cut-off of 1-5, and then based on ORF length (minimum 100 amino acids). The entire transcript was retained for each identified ORF. Annotation: transcriptome blasted against entire NCBI nr database (downloaded 01/2021) using BLASTx from BLAST+ (v2.10.0+) with an e-value cut-off of 1-5, outfmt 14, and ‘-num-alignments’ and ‘-max_hsps’ both set at 20. Functional annotation in OmicsBox (v1.4.12) using BLAST2GO mapping (Goa version 2020.10, all default settings), BLAST2GO annotation (all default settings) and InterProScan (v5.50-84.0, all default settings). The gene sets (blast2go_gene_sets.txt) were exported from the annotated transcriptome in OmicsBox with File > Export > Export annotations > Export Sequences per GO (Gene Sets) - this file was used when creating the network to visualise the gene set enrichment analysis.

    RNA-seq read pre-processing and mapping: Trimming: Fastp (v0.21.1) witha sliding window of 4 bp, a mean Phred score of 30 and reads < 30 bp were trimmed. Decontamination: Kraken2 (v2.0.9) with a confidence of 0.3 using the NCBI bacterial and archaeal reference libraries (downloaded 08/2020). Trimmed and decontaminated RNA-seq reads mapped against the transcriptome: Salmon (v1.3.0) with correction for sequence-specific biases and fragment-level GC biases, quantification step skipped, and flags ‘--validateMappings’ and ‘--hardFilter’. Produce gene level counts: Corset (v1.09) on the salmon equivalence class files, the four groups were defined (eyes current-day CO2, eyes elevated CO2, CNS current-day CO2 and CNS elevated CO2), the log likelihood ratio test was switched off and the link between contigs was filtered out if the link was supported by < 10 reads.

    Differential expression analysis: In R (v4.0.4), using RStudio (v 1.4.1106), DESeq2 (v1.30.1) using the Wald test was used to compare gene expression between current-day and elevated CO2 conditions for the CNS and eyes separately.

    Gene set enrichment analysis (GSEA): In R (v4.0.4), using RStudio (v 1.4.1106), unweighted GSEA was run in clusterProfiler (v3.18.1) using the DESeq2 log2 fold-change values of all genes and the annotated GO terms as the ‘gene sets’, for the CNS and eyes separately. Minimum and maximum gene set size of 15 and 500, respectively. P-values adjusted for multiple comparisons using the Benjamini-Hochberg method and a significance threshold of padj < 0.05 was used. The GSEA results were imported into Cytoscape (v3.8.2) where EnrichmentMap (v3.3.1) was used to create a network to visualise the GSEA results.

    Water Sampling: To evaluate the magnitude of natural diel CO2 fluctuations and the ecological relevance of our experimental CO2 treatment levels, water samples were taken from the same location where two-toned pygmy squid (Idiosepius pygmaeus) were collected. Collected with 250 mL borosilicate glass bottles at an approximate depth of 25 cm. At sampling location: water temperature (Comark C26, Norfolk, UK) and pHNBS (Seven2Go™ pro Conductivity Meter with an InLab Expert Go-ISM pH electrode, Metler Toledo). Lab measurements: total alkalinity by Gran titration (888 Titrando, Metrohm AG, Switzerland), salinity with a conductivity sensor (HQ40d, Hach, Loveland, CO, USA). CO2 values were calculated in CO2SYS v.2.1 (https://cdiac.ess-dive.lbl.gov/ftp/co2sys/CO2SYS_calc_XLS_v2.1/) using the constants K1, K2 from Mehrbach et al. (1973) and refit by Dickson and Millero (1987) and KHSO4 from Dickson et al. (2007).

    This data record contains:

    1) All the scripts used for bioinformatic analyses.

    2) The annotated transcriptome assembly of I. pygmaeus CNS and eye tissues (.box (OmicsBox file) and .csv file)

    3) All R code used for the statistical analyses. (R_code_transcriptomic_response_squid_nervous_system_elevated_CO2.html)

    4) Data files to accompany the statistical analyses = corset gene count and cluster data (corset-clusters.txt, corset-counts.txt), gene sets (blast2go_gene_sets.txt), species distribution of top blast hits (top_hit_species_distribution_annotated_transcriptome_orfs.txt) and metadata (metadata.csv)

    5) Raw water sampling data (Water_sampling_raw_data.xlsx)

    The raw RNA-seq and ISO-seq data used for these analyses, as well as the fasta file of the transcriptome assembly can be found at NCBI BioProject PRJNA798187

    Software/equipment used to create/collect the data: de novo Transcritpome Assembly Software: ccs (v4.2.0), lima (v1.11.0), isoseq3 refine (v3.3.0), isoseq3 cluster (v3.3.0), CD-HIT-EST (v4.6), TransDecoder (v5.5.0), NCBI nr database subset for mollusca (nr_mollusca, downloaded 01/2021), BLASTp from BLAST+ (v2.10.0+).

    Transcriptome Annotation Software: NCBI nr database (downloaded 01/2021), BLASTx from BLAST+ (v2.10.0+), OmicsBox (v1.4.12), BLAST2GO (Goa version 2020.10), InterProScan (v5.50-84.0).

    RNA-seq Read Pre-processing Software: FastQC (v0.11.9), MultiQC (v1.9), Fastp (v0.21.1), Kraken2 (v2.0.9), NCBI bacterial and archaeal reference libraries (downloaded 08/2020).

    Read Mapping Software: Salmon (v1.3.0), Corset (v1.09)

    Statistical Analysis Software: R (v4.0.4), RStudio (v 1.4.1106), DESeq2 (v1.30.1), clusterProfiler (v3.18.1), Cytoscape (v3.8.2), EnrichmentMap (v3.3.1)

    Water Sampling Equipment: 250 mL borosilicate glass bottles Temperature - Comark C26, Norfolk, UK pHNBS - Seven2Go™ pro Conductivity Meter with an InLab Expert Go-ISM pH electrode, Metler Toledo Total alkalinity by Gran titration - 888 Titrando, Metrohm AG, Switzerland Salinity with a conductivity sensor - HQ40d, Hach, Loveland, CO, USA

    Water Sampling Software: pCO2 values - CO2SYS v.2.1 (https://cdiac.ess-dive.lbl.gov/ftp/co2sys/CO2SYS_calc_XLS_v2.1/) using the constants K1, K2 from Mehrbach et al. (1973) and refit by Dickson and Millero (1987) and KHSO4 from Dickson et al. (2007).

  • Other Descriptors
    • Descriptor
      Annotated transcriptome assembly of the two-toned pygmy squid (Idiosepius pygmaeus) central nervous system and eye tissues (OmicsBox and csv files). Data for differential expression and gene set enrichment analyses: raw gene count data, all scripts used for bioinformatic analyses, all R code used for the statistical analyses, and data files to accompany the statistical analyses. Raw water sampling data.
    • Descriptor type Brief
  • Data type dataset
  • Keywords
    • ocean acidification
    • carbon dioxide
    • transcriptomics
    • de novo transcriptome assembly
    • gene expression
    • central nervous system
    • eyes
    • squid
    • RNA-sequencing
    • ISO-sequencing
    • ARC Centre of Excellence for Coral Reef Studies
  • Funding source
    • ARC Centre of Excellence for Coral Reef Studies
    • Australian Government Research Training Program Scholarship
    • Okinawa Institute of Science and Technology & Graduate University
  • Research grant(s)/Scheme name(s)
    • -
  • Research themes
    Tropical Ecosystems, Conservation and Climate Change
    FoR Codes (*)
    SEO Codes
    Specify spatial or temporal setting of the data
    Temporal (time) coverage
  • Start Date 2019/08/20
  • End Date 2019/12/19
  • Time Period
    Spatial (location) coverage
  • Locations
    • 19°15'11"S 146°49'24"E
    Data Locations

    Type Location Notes
    Attachment metadata.csv Data file to accompany the statistical analysis. Each squid ID with corresponding CO2 treatment.
    Attachment Water_sampling_raw_data.xlsx Water sampling raw data. Used to evaluate the magnitude of natural diel CO2 fluctuations and the ecological relevance of our experimental CO2 treatment levels, water samples were taken from the same location where two-toned pygmy squid (Idiosepius pygmaeus) were collected.
    Attachment blast2go_gene_sets.txt Data file to accompany the statistical analysis. Each GO term with the corresponding transcripts. Exported from the annotated transcriptome assembly in OmicsBox with File > Export > Export annotations > Export Sequences per GO (Gene Sets).
    Attachment corset-clusters.txt Data file to accompany the statistical analysis. Each transcript and the corresponding gene ('Cluster-*') produced by Corset.
    Attachment corset-counts.txt Data file to accompany the statistical analysis. Gene count data produced by Corset.
    Physical Location NCBI BioProject PRJNA798187 All raw RNA-sequencing and ISO-sequencing data (SRA). The fasta file of the transcriptome assembly (TSA). All biosample information.
    Attachment Scripts_for_bioinformatic_analyses.txt Text file showing the directory structure and each of the file names within 'Scripts_for_bioinformatic_analyses.zip'.
    Attachment Scripts_for_bioinformatic_analyses.zip Zipped directory containing all the scripts used for the bioinformatic analyses.
    Attachment top_hit_species_distribution_annotated_transcriptome_orfs.txt Data file to accompany the statistical analysis. Each species with the corresponding number of top BLAST hits in the annotated transcriptome assembly. Exported from OmicsBox by first running BLAST statistics, then in the tab ‘Chart: Top-hit species distribution’ clicking ‘Data as text’ in the toolbar.
    Attachment R_code_transcriptomic_response_squid_nervous_system_elevated_CO2.html HTML with all R code used for statistical analyses.
    URL http://data.qld.edu.au/public/Q5842/2024-JodiThomas-345b77f0831e11ecbad66f177921119e-TwoTonedPygmy/ OmicsBox file of the annotated transcriptome assembly of Idiosepius pygmaeus CNS and eye tissues.
    URL http://data.qld.edu.au/public/Q5842/2024-JodiThomas-345b77f0831e11ecbad66f177921119e-TwoTonedPygmy/ .csv file of the annotated transcriptome assembly of Idiosepius pygmaeus CNS and eye tissues.
    The Data Manager is: Philip Munday
    College or Centre ARC Centre of Excellence for Coral Reef Studies
    Access conditions Open: free access under license
  • Alternative access conditions
  • Data record size 5 .txt files, 2 .csv files, 1 .box file, 1 .xlsx file, 1 .html file, 1 .zip file, 1 NCBI BioProject number
  • Related publications
      Name Thomas, J. T., Spady, B. L., Munday, P. L. and Watson, S.-A. (2021). The role of ligand-gated chloride channels in behavioural alterations at elevated CO2 in a cephalopod. Journal of Experimental Biology 224, jeb242335.
    • URL https://doi.org/10.1242/jeb.242335
    • Notes
  • Related websites
      Name
    • URL
    • Notes
  • Related metadata (including standards, codebooks, vocabularies, thesauri, ontologies)
      Name
    • URL
    • Notes
  • Related data
      Name NCBI BioProject PRJNA798187
    • URL
    • Notes The raw RNA-seq and ISO-seq data used for these analyses, as well as the fasta file of the transcriptome assembly.
  • Related services
      Name
    • URL
    • Notes
    Citation Thomas, Jodi; Huerlimann, Roger; Schunter, Celia; Watson, Sue-Ann; Munday, Philip; Ravasi, Timothy (2024): Two-toned pygmy squid (Idiosepius pygmaeus) transcriptome assembly, and transcriptomic response of the nervous system to elevated CO2. James Cook University. https://doi.org/10.25903/ha66-mm11