Research Data JCU - Two-toned pygmy squid (Idiosepius pygmaeus) transcriptome assembly, and transcriptomic response of the nervous system to elevated CO2

Two-toned pygmy squid (Idiosepius pygmaeus) transcriptome assembly, and transcriptomic response of the nervous system to elevated CO2

We evaluated the transcriptomic response of the central nervous system (CNS) and eyes of male two-toned pygmy squid (Idiosepius pygmaeus) exposed to elevated (~1,000 µatm) CO2 for seven days compared with current-day (~450 µatm) controls. As a reference for gene expression quantification, we assembled a high quality, annotated de novo transcriptome of I. pygmaeus CNS and eye tissues using long read PacBio Iso-sequencing data. Differential expression analysis was carried out to determine which genes were differentially expressed between current-day and elevated CO2 conditions, in the CNS and eyes. Gene set enrichment analysis was carried out to determine if sets of genes from the same gene ontology (GO) term/functional group showed significant, concordant differences between current-day and elevated CO2 conditions, in the CNS and eyes.

de novo transcritpome assembly: ISO-seq data was processed using the PacBio isoseq3 pipeline: ccs (v4.2.0) with the minimum number of full passes set at three and the minimum predicted accuracy of a read at 0.9, lima (v1.11.0) with ‘--peek-guess’, isoseq3 refine (v3.3.0), isoseq3 cluster (v3.3.0) with ‘--use-qvs’. Redundancy removal: CD-HIT-EST (v4.6) with at least 99% identity. TransDecoder (v5.5.0) to identify open reading frames (ORFs): single best ORF per contig was chosen based on blast homology to known proteins in the NCBI nr database subset for mollusca (nr_mollusca, downloaded 01/2021) using BLASTp from BLAST+ (v2.10.0+) with max_target_seqs 1 and an e-value cut-off of 1-5, and then based on ORF length (minimum 100 amino acids). The entire transcript was retained for each identified ORF. Annotation: transcriptome blasted against entire NCBI nr database (downloaded 01/2021) using BLASTx from BLAST+ (v2.10.0+) with an e-value cut-off of 1-5, outfmt 14, and ‘-num-alignments’ and ‘-max_hsps’ both set at 20. Functional annotation in OmicsBox (v1.4.12) using BLAST2GO mapping (Goa version 2020.10, all default settings), BLAST2GO annotation (all default settings) and InterProScan (v5.50-84.0, all default settings). The gene sets (blast2go_gene_sets.txt) were exported from the annotated transcriptome in OmicsBox with File > Export > Export annotations > Export Sequences per GO (Gene Sets) - this file was used when creating the network to visualise the gene set enrichment analysis.

RNA-seq read pre-processing and mapping: Trimming: Fastp (v0.21.1) witha sliding window of 4 bp, a mean Phred score of 30 and reads < 30 bp were trimmed. Decontamination: Kraken2 (v2.0.9) with a confidence of 0.3 using the NCBI bacterial and archaeal reference libraries (downloaded 08/2020). Trimmed and decontaminated RNA-seq reads mapped against the transcriptome: Salmon (v1.3.0) with correction for sequence-specific biases and fragment-level GC biases, quantification step skipped, and flags ‘--validateMappings’ and ‘--hardFilter’. Produce gene level counts: Corset (v1.09) on the salmon equivalence class files, the four groups were defined (eyes current-day CO2, eyes elevated CO2, CNS current-day CO2 and CNS elevated CO2), the log likelihood ratio test was switched off and the link between contigs was filtered out if the link was supported by < 10 reads.

Differential expression analysis: In R (v4.0.4), using RStudio (v 1.4.1106), DESeq2 (v1.30.1) using the Wald test was used to compare gene expression between current-day and elevated CO2 conditions for the CNS and eyes separately.

Gene set enrichment analysis (GSEA): In R (v4.0.4), using RStudio (v 1.4.1106), unweighted GSEA was run in clusterProfiler (v3.18.1) using the DESeq2 log2 fold-change values of all genes and the annotated GO terms as the ‘gene sets’, for the CNS and eyes separately. Minimum and maximum gene set size of 15 and 500, respectively. P-values adjusted for multiple comparisons using the Benjamini-Hochberg method and a significance threshold of padj < 0.05 was used. The GSEA results were imported into Cytoscape (v3.8.2) where EnrichmentMap (v3.3.1) was used to create a network to visualise the GSEA results.

Water Sampling: To evaluate the magnitude of natural diel CO2 fluctuations and the ecological relevance of our experimental CO2 treatment levels, water samples were taken from the same location where two-toned pygmy squid (Idiosepius pygmaeus) were collected. Collected with 250 mL borosilicate glass bottles at an approximate depth of 25 cm. At sampling location: water temperature (Comark C26, Norfolk, UK) and pHNBS (Seven2Go™ pro Conductivity Meter with an InLab Expert Go-ISM pH electrode, Metler Toledo). Lab measurements: total alkalinity by Gran titration (888 Titrando, Metrohm AG, Switzerland), salinity with a conductivity sensor (HQ40d, Hach, Loveland, CO, USA). CO2 values were calculated in CO2SYS v.2.1 (https://cdiac.ess-dive.lbl.gov/ftp/co2sys/CO2SYS_calc_XLS_v2.1/) using the constants K1, K2 from Mehrbach et al. (1973) and refit by Dickson and Millero (1987) and KHSO4 from Dickson et al. (2007).

This data record contains:

1) All the scripts used for bioinformatic analyses.

2) The annotated transcriptome assembly of I. pygmaeus CNS and eye tissues (.box (OmicsBox file) and .csv file)

3) All R code used for the statistical analyses. (R_code_transcriptomic_response_squid_nervous_system_elevated_CO2.html)

4) Data files to accompany the statistical analyses = corset gene count and cluster data (corset-clusters.txt, corset-counts.txt), gene sets (blast2go_gene_sets.txt), species distribution of top blast hits (top_hit_species_distribution_annotated_transcriptome_orfs.txt) and metadata (metadata.csv)

5) Raw water sampling data (Water_sampling_raw_data.xlsx)

The raw RNA-seq and ISO-seq data used for these analyses, as well as the fasta file of the transcriptome assembly can be found at NCBI BioProject PRJNA798187

Software/equipment used to create/collect the data: de novo Transcritpome Assembly Software: ccs (v4.2.0), lima (v1.11.0), isoseq3 refine (v3.3.0), isoseq3 cluster (v3.3.0), CD-HIT-EST (v4.6), TransDecoder (v5.5.0), NCBI nr database subset for mollusca (nr_mollusca, downloaded 01/2021), BLASTp from BLAST+ (v2.10.0+).

Transcriptome Annotation Software: NCBI nr database (downloaded 01/2021), BLASTx from BLAST+ (v2.10.0+), OmicsBox (v1.4.12), BLAST2GO (Goa version 2020.10), InterProScan (v5.50-84.0).

RNA-seq Read Pre-processing Software: FastQC (v0.11.9), MultiQC (v1.9), Fastp (v0.21.1), Kraken2 (v2.0.9), NCBI bacterial and archaeal reference libraries (downloaded 08/2020).

Read Mapping Software: Salmon (v1.3.0), Corset (v1.09)

Statistical Analysis Software: R (v4.0.4), RStudio (v 1.4.1106), DESeq2 (v1.30.1), clusterProfiler (v3.18.1), Cytoscape (v3.8.2), EnrichmentMap (v3.3.1)

Water Sampling Equipment: 250 mL borosilicate glass bottles Temperature - Comark C26, Norfolk, UK pHNBS - Seven2Go™ pro Conductivity Meter with an InLab Expert Go-ISM pH electrode, Metler Toledo Total alkalinity by Gran titration - 888 Titrando, Metrohm AG, Switzerland Salinity with a conductivity sensor - HQ40d, Hach, Loveland, CO, USA

Water Sampling Software: pCO2 values - CO2SYS v.2.1 (https://cdiac.ess-dive.lbl.gov/ftp/co2sys/CO2SYS_calc_XLS_v2.1/) using the constants K1, K2 from Mehrbach et al. (1973) and refit by Dickson and Millero (1987) and KHSO4 from Dickson et al. (2007).

- Start

Data Record Details

Data record related to this publication Two-toned pygmy squid (Idiosepius pygmaeus) transcriptome assembly, and transcriptomic response of the nervous system to elevated CO2

Data Publication title Two-toned pygmy squid (Idiosepius pygmaeus) transcriptome assembly, and transcriptomic response of the nervous system to elevated CO2

Description

This data record contains:

1) All the scripts used for bioinformatic analyses.

2) The annotated transcriptome assembly of I. pygmaeus CNS and eye tissues (.box (OmicsBox file) and .csv file)

3) All R code used for the statistical analyses. (R_code_transcriptomic_response_squid_nervous_system_elevated_CO2.html)

5) Raw water sampling data (Water_sampling_raw_data.xlsx)

The raw RNA-seq and ISO-seq data used for these analyses, as well as the fasta file of the transcriptome assembly can be found at NCBI BioProject PRJNA798187

Transcriptome Annotation Software: NCBI nr database (downloaded 01/2021), BLASTx from BLAST+ (v2.10.0+), OmicsBox (v1.4.12), BLAST2GO (Goa version 2020.10), InterProScan (v5.50-84.0).

RNA-seq Read Pre-processing Software: FastQC (v0.11.9), MultiQC (v1.9), Fastp (v0.21.1), Kraken2 (v2.0.9), NCBI bacterial and archaeal reference libraries (downloaded 08/2020).

Read Mapping Software: Salmon (v1.3.0), Corset (v1.09)

Statistical Analysis Software: R (v4.0.4), RStudio (v 1.4.1106), DESeq2 (v1.30.1), clusterProfiler (v3.18.1), Cytoscape (v3.8.2), EnrichmentMap (v3.3.1)

Other Descriptors

Descriptor

Annotated transcriptome assembly of the two-toned pygmy squid (Idiosepius pygmaeus) central nervous system and eye tissues (OmicsBox and csv files). Data for differential expression and gene set enrichment analyses: raw gene count data, all scripts used for bioinformatic analyses, all R code used for the statistical analyses, and data files to accompany the statistical analyses. Raw water sampling data.

Descriptor type Brief

Data type dataset

Keywords

ocean acidification
carbon dioxide
transcriptomics
de novo transcriptome assembly
gene expression
central nervous system
eyes
squid
RNA-sequencing
ISO-sequencing
ARC Centre of Excellence for Coral Reef Studies

Funding source

ARC Centre of Excellence for Coral Reef Studies

Australian Government Research Training Program Scholarship

Okinawa Institute of Science and Technology & Graduate University

Research grant(s)/Scheme name(s)

Research themes

Tropical Ecosystems, Conservation and Climate Change

FoR Codes (*)

SEO Codes

- Specify coverage

Specify spatial or temporal setting of the data

Temporal (time) coverage

Start Date 2019/08/20

End Date 2019/12/19

Time Period

Spatial (location) coverage

Locations

19°15'11"S 146°49'24"E

- Data

Data Locations

Type	Location	Notes
Attachment	metadata.csv	Data file to accompany the statistical analysis. Each squid ID with corresponding CO2 treatment.
Attachment	Water_sampling_raw_data.xlsx	Water sampling raw data. Used to evaluate the magnitude of natural diel CO2 fluctuations and the ecological relevance of our experimental CO2 treatment levels, water samples were taken from the same location where two-toned pygmy squid (Idiosepius pygmaeus) were collected.
Attachment	blast2go_gene_sets.txt	Data file to accompany the statistical analysis. Each GO term with the corresponding transcripts. Exported from the annotated transcriptome assembly in OmicsBox with File > Export > Export annotations > Export Sequences per GO (Gene Sets).
Attachment	corset-clusters.txt	Data file to accompany the statistical analysis. Each transcript and the corresponding gene ('Cluster-*') produced by Corset.
Attachment	corset-counts.txt	Data file to accompany the statistical analysis. Gene count data produced by Corset.
Physical Location	NCBI BioProject PRJNA798187	All raw RNA-sequencing and ISO-sequencing data (SRA). The fasta file of the transcriptome assembly (TSA). All biosample information.
Attachment	Scripts_for_bioinformatic_analyses.txt	Text file showing the directory structure and each of the file names within 'Scripts_for_bioinformatic_analyses.zip'.
Attachment	Scripts_for_bioinformatic_analyses.zip	Zipped directory containing all the scripts used for the bioinformatic analyses.
Attachment	top_hit_species_distribution_annotated_transcriptome_orfs.txt	Data file to accompany the statistical analysis. Each species with the corresponding number of top BLAST hits in the annotated transcriptome assembly. Exported from OmicsBox by first running BLAST statistics, then in the tab ‘Chart: Top-hit species distribution’ clicking ‘Data as text’ in the toolbar.
Attachment	R_code_transcriptomic_response_squid_nervous_system_elevated_CO2.html	HTML with all R code used for statistical analyses.
URL	http://data.qld.edu.au/public/Q5842/2024-JodiThomas-345b77f0831e11ecbad66f177921119e-TwoTonedPygmy/	OmicsBox file of the annotated transcriptome assembly of Idiosepius pygmaeus CNS and eye tissues.
URL	http://data.qld.edu.au/public/Q5842/2024-JodiThomas-345b77f0831e11ecbad66f177921119e-TwoTonedPygmy/	.csv file of the annotated transcriptome assembly of Idiosepius pygmaeus CNS and eye tissues.

The Data Manager is: Philip Munday

College or Centre ARC Centre of Excellence for Coral Reef Studies

Access conditions Open: free access under license

Alternative access conditions

Data record size 5 .txt files, 2 .csv files, 1 .box file, 1 .xlsx file, 1 .html file, 1 .zip file, 1 NCBI BioProject number

- Related resources

Related publications

Name Thomas, J. T., Spady, B. L., Munday, P. L. and Watson, S.-A. (2021). The role of ligand-gated chloride channels in behavioural alterations at elevated CO2 in a cephalopod. Journal of Experimental Biology 224, jeb242335.

URL https://doi.org/10.1242/jeb.242335

Notes

Related websites

Name

URL

Notes

Related metadata (including standards, codebooks, vocabularies, thesauri, ontologies)

Name

URL

Notes

Related data

Name NCBI BioProject PRJNA798187

URL

Notes The raw RNA-seq and ISO-seq data used for these analyses, as well as the fasta file of the transcriptome assembly.

Related services

Name

URL

Notes

- License

Select or add a licence for the data

The data will be licensed under CC BY 4.0: Attribution 4.0 International

Other Licence

Statement of rights in data

Data owners

Jodi Thomas

- Citation

Citation Thomas, Jodi; Huerlimann, Roger; Schunter, Celia; Watson, Sue-Ann; Munday, Philip; Ravasi, Timothy (2024): Two-toned pygmy squid (Idiosepius pygmaeus) transcriptome assembly, and transcriptomic response of the nervous system to elevated CO2. James Cook University. https://doi.org/10.25903/ha66-mm11