Fish fraud has become increasingly common within the seafood industry where certain species of fish are intentionally swapped for less, valuable species for monetary gain. The need for an effective method of species identification has become apparent as the seafood market continues to expand to other regions of the world and the diversity of fish species continue to grow. In the past, scientists used different concepts, such as the morphological species concept and the biological species concept in efforts to define a species. However, with these concepts came limitations, such as inability to account for inbreeding or asexual organisms. DNA barcoding provides an effective method of species identification by comparing standardized genetic markers of different species, such as the CO1 gene in animals.  The objective of the experiment was to identify if store bought Salmo salar was correctly identified as the labeled species. To differentiate between plants and animals, S. salar was tested for the presence of the cytochrome oxidase (CO1) gene, which is commonly present in all animals. It was hypothesized that the locally bought S. salar would not be identified as the correct species due to the high prevalence of fraud within the fish market. To test this, DNA from locally bought S. salar was extracted and amplified through PCR. The DNA samples were then ran through gel electrophoresis for the presence of the CO1 gene. Remaining DNA samples were sequenced and ran through BLAST for identification. Oncorhynchus nerka displayed the most optimal sequence alignment, indicating that the specimen was not S. salar and fraud is still prevalent within the seafood industry. Keywords: Salmo Salar, DNA Barcoding, CO1 gene, Fish fraud, species identificationIntroductionFish Fraud has become increasingly common in fish markets and sushi restaurants where certain species are deliberately swapped for other species to obtain monetary gain (Shokralla, 2015). With the increasing globalization and demand for seafood, the biodiversity of fish species in the market increase as well (Shokralla, 2015). S. salar was chosen because this species of fish was considered higher priced and more likely to be replaced with other, lower valued species, such as Chum Salmon or Pink Salmon (Warner, 2010). According to a 2013 national survey, approximately 38% of Salmon were substituted with different species in restaurants and 7% in stores (Warner, 2015). However, one study reports 69% of Salmon fraud occurs when farm-bred Salmon is mislabeled as wild-caught Salmon, which differs from the nation reported average of 7% (Kimberly, 2015). Especially during the colder months where the supply of Salmon would be smaller, but the demand for certain species of Salmon still persists (Miller, 2010). Additionally, consumers were more than three times as likely to be misled in restaurants than grocery stores (67% vs 20%; Kimberly, 2015). High rates of substitution occur due to insufficient methods of identification and regulation (Hanner, 2011). As a result, cases of fraud have become more prevalent and pose health risks in certain cases, such as pufferfish (Hanner, 2011). Moreover, a need for proper identification and product composition, apart from morphological characterization has become critical. Unfortunately, consumers are unable to differentiate between species separately from morphological variations. To prevent species fraud, DNA barcoding provides an effective method for precise species identification through comparative sequence analysis of standardized genome fragments.In the past, scientists had trouble identifying new species that arose over time or even establishing a proper definition of what a species is. Conserving endangered species was difficult and correct identification of the specific species was critical for preservation (Hartvig, 2015). There were many species concepts that arose as a result, such as the morphological species concept, which relied upon the physical appearance of the species and observed structural features or phenotypic similarities (Aldhebani, 2017). The advantage of this concept was that morphology is easily observed and it’s widely known (Wheeler, 1999). The drawbacks of the morphological species concept were distinguishing between male and female species with different structural features or the variation of individuals over time (Aldhebani, 2017). The biological species concept, on the other hand, relied upon reproductive isolation and characterized species based on inbreeding capabilities to produce a viable offspring (Aldhebiani, ?2017). The advantage of this species concept was it didn’t rely on morphological characteristics (Aldhebani, 2017). The drawbacks of the biological species concept was that it could not apply to extinct or asexual organisms and mating between different species was possible. (Aldhebani, 2017). Today, scientists utilize the phylogenetic species concept that categorize phylogenetic species as monophyletic groups based on derived characters shared in ancestry (Wheeler, 1999).  The advantages of the phylogenetic species concept is that it doesn’t focus on present characteristics of the organism and applies to all kinds of organisms, including extinct, sexual and asexual organisms (Wheeler, 1999). The disadvantages is that it is hard to construct a tree with a full certainty of the evolutionary past (Aldhebani, 2017). The phylogenetic species concept is the basis for a new method of species identification called DNA barcoding. DNA barcoding uses standard genetic markers to compare DNA sequences among existing species by scanning for polymorphisms in standard sequences to differentiate between species (Hartvig, 2015). It is effective in differentiating between phenotypically similar species and is applicable to all organisms (Dudu, 2016). For DNA barcoding, the DNA is isolated from a sample and standard genetic markers are amplified by Polymerase Chain Reaction (PCR). A polymorphism is differences in DNA sequence that accumulate over time (Albert, 2011). The main source of mutations occurs during DNA replication and, thus mutations can be inherited. When the frequency of the mutation increases, it can become fixed in a lineage (Albert, 2011). Polymorphisms can indicate common ancestry among individuals by comparing standardized sequences across a species (Stoneking, 2001). Specifically, one region in the mitochondria, cytochrome c oxidase 1 (CO1) gene is proved to be highly effective for species identification in animals. In plants, two regions in the chloroplast, RuBisCo large subunit (rbcL) gene and MaturaseK (matK) gene are highly conserved in plants (Pecnikar, 2014). The CO1 gene is not an effective barcode in plants because plants evolve their mitochondrial genome at different rates across different species, which excludes the universal application of the gene (Kress, 2005). The CO1 gene is often used in DNA barcoding to identify species because it is highly conserved, have low mutation rates and is universal in all animal species (Derycke, 2010). Additionally, it codes for a protein that has an essential role in cellular respiration, which is a key process for animals (Derycke, 2010). The CO1 gene has a high enough mutation rate to distinguish between species, which is important for interspecies identification (Derycke, 2010). DNA barcoding utilizes genetic markers to identify species that operates similarly to a supermarket barcoding, hence, the name “DNA Barcode” (Chao, 2014). Each species has a distinct DNA barcode so comparison between species will determine if there is similarities in the genome (Chao, 2014). DNA barcoding utilizes a search analysis program called Basic Local Alignment Search Tool (BLAST) that analyze DNA sequences using the standardized genes and builds multiple sequence alignments to explore phylogenetic relationships (McGinnis, S. 2004). It was hypothesized that the locally bought S. salar would not be identified as the correct species due to the high prevalence of fraud within the fish market. To test this, DNA from locally bought S. salar was extracted and amplified through PCR. The DNA samples were then ran through gel electrophoresis for the presence of the CO1 gene. Remaining DNA samples were sequenced and ran through BLAST for identification.Results A PCR was performed to amplify the CO1 gene from S. salar and all controls in the experiment. In Figure 1, the DNA extraction and unpurified PCR products acted as a positive control for the purified PCR product to confirm if the CO1 gene was present during each procedure (lanes 2, 3, 4). The pBR322/BstNI ladder in lane 1 displayed bands at 121, 383, 929, 1058, 1857 base pair (bp). Lane 2 contained the unpurified PCR product. Lane 3 contained the purified PCR product. Lane 4 contained the DNA extract. Lanes 2, 3, and 4 contained S. salar and displayed a large band around 100 bp, which indicated the presence of primer dimers. S. salar displayed a band at 800 bp in both the unpurified and purified PCR products (lane 2 and 3). S. salar failed to show a band at 800 bp in the DNA extract (lanes 4). In Figure 2, a total of 793 bases were sequenced with an average spacing of 17 bp. There were no distinct peaks before 70 bp and after 700 bp (Figure 2). In Figure 3, Oncorhynchus nerka formed a monophyletic group and was more closely related to the specimen than S. salar. Oncorhynchus gorbuscha and Oncorhynchus keta formed polyphyletic groups (Figure 3). In Figure 4, There was a total of 100 BLAST hits from the query sequence. the query coverage was 78% and the E value was 0.0 for Oncorhynchus nerka. The query sequence was identical by 99% and had no gaps (Figure 4). The optimal sequence alignment was for Oncorhynchus keta (Figure 4). DiscussionThe objective of the experiment was to identify if store bought Salmo salar was correctly identified as the labeled species. To differentiate between plants and animals, S. salar was tested for the presence of the cytochrome oxidase (CO1) gene, which is commonly present in all animals. It was hypothesized that the locally bought S. salar would not be identified as the correct species due to the high prevalence of fraud within the fish market. The DNA extract and unpurified PCR products were used as positive controls to determine if correct methods were used. If correct methods were used, there should have been presence of the CO1 gene in all products (Figure 1). However, since there was no presence of a band at 800 bp in the DNA extract, but a band in the unpurified and purified PCR products, the errors were most likely not due to experimental errors considering PCR purification was successful (Figure 1). The unpurified PCR product was more concentrated, so the bands would be visualized clearer. (Figure 1) In the DNA extract, the bands might not have been as visible due to lower resolution. (Figure 1) The purified PCR product is less concentrated, so visualization of the bands might be more difficult. There was presence of primer dimers in the DNA extract, unpurified and purified PCR products. Primer dimers may have arisen from possible DNA degradation or insufficient amount of DNA for CO1 gene amplification to occur (Li, 2008). In Figure 2, there were no distinct peaks before 70 bp or after 700 bp due to the chemistry of Sanger Sequencing. Sanger sequencing only traces about 700 bases and has decreased resolution of the bases outside this range (Lee, 2013). S. salar was submitted into BLAST for further analysis of species identification (Figure 4). The optimal sequence alignment was for Oncorhynchus keta. BLAST is a heuristic that searches a database to match similarities in the sequences by breaking the query sequence into three letter sections and deriving variations from those sections (Camacho, 2009) The query sequence produced 100 BLAST results (Figure 4). The query coverage was only 78% because BLAST does not search the entire sequence (Camacho, 2009). It doesn’t give the optimal alignment for the entire sequence, only the alignment for a specific region in the query sequence (Camacho, 2009). The E value corresponds to the expected amount of random sequences that have an equivalent sequence alignment (Camacho, 2009). A low E value means that the probability of find a specific sequence in the database is low (Camacho, 2009). Therefore, the alignment with the most query coverage and lowest E value would be selected as the optimal alignment (Camacho, 2009).In Figure 3, Oncorhynchus nerka, which contained the CO1 gene, formed a monophyletic group and was more closely related to the inputted sequence from the specimen than S. salar. This means that our specimen was not S. salar and that this is an occurrence of fish fraud. S. salar, also known as Atlantic Salmon, is not available all year, so since the migration patterns of Atlantic Salmon may have impacted the amount of S. salar accessible and resulted in fish fraud (Kimberly, 2015). During winter seasons, the most common type of mislabeling occurs when farmed Atlantic Salmon is sold as wild Salmon (Kimberly, 2015). Some reasons why BLAST or Sanger didn’t work for S. salar was because the sample was not the correct species that was labeled; therefore, the sequence alignment in BLAST would differ. Multiple human errors occurred in this experiment that may have affected the validity of the results. For instance, the original specimen to be used in the experiment was G. macrocephalus; however, following DNA extraction, the test tubes were misplaced. As a result, DNA samples of S. salar were used for purification. Human errors could also be used to explain the absence of the CO1 gene in the gel. Errors such as insufficient grinding time, cross contamination, incorrect volume measurement, tube misidentification, and dehydrated or degraded DNA could have impacted CO1 gene amplification (Figure 1). Because the original G. macrocephalus sample was misplaced and S. salar was purified and submitted for sequencing, the results of the experiment cannot be conclusive for G. macrocephalus. Possible recommendations for future experiments is to label each tube distinctly to decrease chances of ambiguity. It might be beneficial to conduct DNA barcoding and PCR experiments on O. nerka to see if fish fraud happens within that species of salmon. In Figure 3, O. nerka also contains the CO1 gene so DNA barcoding is applicable to this organism for species identification. Another approach is to use DNA barcoding on Oncorhynchus gorbuscha (Pink Salmon) or Oncorhynchus keta (Chum Salmon) considering these species of salmon both contain the CO1 gene and are closely related to O. nerka (Figure 3).  Interestingly, O. gorbuscha and O. keta are two cheaper species of Salmon compared to S. salar so high prevalence of fish fraud may be common with these Salmon species (Kimberly, 2015). Moreover, rather than using one standardized sequence, such as the CO1 gene, future experiments could include utilizing larger genome sequencing or exploring other possible mitochondrial genes than the CO1 gene. One study examined the three mitochondrial genes, CO1, cytochrome b, and 16S RNA for species identification in 50 European fish species by combining DNA barcoding and microarray techniques (Kochizus, 2010). The three mitochondrial markers performed differently in microarray and DNA barcoding for species identification, but all three markers were effective for fish identification (Kochizus, 2010).  Because BLAST is a conventional search algorithm, some studies have invented new methods of species identification that differ from BLAST. In one study, BRONX utilized short variable segments in the sequences without producing multiple sequence alignments by only looking within taxon or by genus-level variability (Little, 2011). This reduced the misidentification of share alleles and performs better than other methods where there is imperfect overlap between the reference and query sequences (Little, 2011). Future implications could be to apply sanger sequencing into gene editing, such as Zinc finer nucleases, TALENs and CRISPR (Dehairs, 2016). A new algorithm called CRISP-ID creates a genotype for up to three alleles from one sanger sequence trace, providing a readily accessible and robust methods to speed up clone characterization (Dehairs, 2016). For larger-scale genomes, one study applied Sanger Genome sequencing for the Influenza A/H3N2 Virus (Lee, 2013). This type of sequencing was effective in obtaining the full influenza genome as a cost-effective method (Lee, 2013) Other studies have applied DNA barcoding to plant genomes that utilized rbcL, matK and trnH-psbA regions to identify digital plant species (Bruni, 2012). Interestingly, the incorporation of the trnH-psbA gene improves species identification among closely related taxa (Bruni, 2012). With the increasing biodiversity of the seafood market, it is important to establish an accurate and effective method of species identification. 

Author