Trovomics CSV File Guidelines
When uploading metadata to Trovomics, it is essential that your CSV files follow strict formatting rules. This enables accurate linking of your sample data, smooth downstream analyses, and minimal rework due to file validation errors. The Trovomics CSV validation function checks for:
Presence of Required Columns
Illegal Characters in Column Names
Illegal Characters in Data Cells
Any violations trigger error details and a validation failure. Review these guidelines carefully to avoid upload errors and ensure efficient data processing.
1. Required Columns
Your CSV file must include the following mandatory columns, which are used to link metadata to your actual omics data files:
SampleName
A unique identifier for each sample in your experiment.
If you have multiple replicates, each row should have a distinct SampleName, unless they are multiple files for the same sample.
Filename
The Filename should be unique across the entire analysis.
The name of the raw data file (FASTQ) corresponding to each sample.
This should match the file’s exact name. Do not include the path.
These two columns are case-sensitive and must appear exactly as shown (SampleName and Filename). Files missing either required column will be rejected.
2. General Guidelines
All metadata values should be categorical. If your dataset contains a column with numerical data, it should be turned into appropriate categories before uploading the csv file.
Example:
Numerical values
Variable: Age
Values: 21, 25, 30, 37, 28, 32, 22, 33, 29, 34.
Covert these values into categories appropriate for the goals of your analysis (e.g. younger_than_30, 30_or_older).
Categorical values
Variable: Age
Values: younger_than_30, younger_than_30, 30_or_older, 30_or_older,
younger_than_30, 30_or_older, younger_than_30, 30_or_older, younger_than_30,
30_or_older
3. Column Name Guidelines
In addition to the required columns, you may include any number of optional columns (e.g., Condition, Genotype, Tissue, Treatment). These additional columns capture the experimental metadata that Trovomics uses for downstream analyses. All columns, including required and optional, must adhere to the following naming rules:
3.1 Valid Characters
Column names may contain only letters, numbers and underscores.
Column names must start with a letter.
3.2 Example Disallowed Characters
Column names must not contain any of the following:
Punctuation/Special Characters:
, ; : . / \ | ? < > [ ] { } ( ) + -
Whitespace:
Spaces are disallowed in column names (e.g., “Group Name” is invalid). If needed,
use camelCase or underscores (e.g., “GroupName” or “Group_Name”).
Leading Characters:
Column names should start with a letter, not a number or an underscore (_).
4. Data Cell Guidelines
Each row in the CSV file represents a single sample (or replicate). You can include as many optional metadata columns as you need; for example, you might have columns for Condition, Dose, Genotype, Treatment, Batch, Tissue, Cell_Type, etc.
4.1 Required Columns
SampleName: Must be present and non-empty for every row.
Filename: Must be present and non-empty for every row, exactly matching the
physical data file name (not including the file extension).
Missing values in either SampleName or Filename will invalidate the CSV.
4.2 Valid Characters
Data Cell values may contain only letters, numbers and underscores.
Data Cell values must start with a letter.
4.3 Data Columns
Formatting & Type
Avoid using any of the disallowed characters listed in Section 3.2.
Missing or Null Data
If you wish to explicitly label missing data, consider using the value no_data instead of NA, NaN, or NULL, which are reserved and may interfere with the pipeline.
4.4 Examples of Valid vs. Invalid Column Names and Data Cell Values
Valid: SampleName, Filename, Condition, Replicate01, Treatment_2, A673, ETV6, H3K4me3
Invalid: Sample Name, Drug+, Group?, _Condition, Group-Name, 5e10, pi, Inf, NA
5. Practical Tips for Omics Datasets
Consistent Naming: Align SampleName with your laboratory records. For example, if a sample is known as Patient1_TissueA_Rep1, use that exact identifier in the CSV.
Exact Filename Matching: Ensure the Filename column matches the actual data files in your system (e.g., Patient1_TissueA_Rep1_S1_L001_R1_001.fastq.gz). Any mismatch leads to processing errors.
Descriptive Metadata Columns: Add columns like Condition, Genotype, Tissue, Treatment for robust downstream analysis. Avoid disallowed characters or spaces in these column names.
Make Sure to Include Only Valid Characters: Columns and data cells may contain only letters, numbers and underscores, and must start with a letter.
Local Validation: It is good practice to inspect your CSV before uploading. This can save time and prevent re-uploads.
6. Example CSV Layout
SampleName: Unique for each row
Filename: Matches each FASTQ file exactly
Condition, Tissue, Treatment, etc: Additional metadata columns
No spaces or disallowed characters in headers or required cell values
single-end data:
csv
SampleName,Filename,Condition,Tissue
Sample1,Sample1_S1_L001_R1_001.fastq.gz,Control,Liver
Sample2,Sample2_S2_L001_R1_001.fastq.gz,Control,Liver
Sample3,Sample3_S3_L001_R1_001.fastq.gz,Control,Liver
Sample4,Sample4_S4_L001_R1_001.fastq.gz,Treatment,Liver
Sample5,Sample5_S5_L001_R1_001.fastq.gz,Treatment,Liver
Sample6,Sample6_S6_L001_R1_001.fastq.gz,Treatment,Liver
Sample7,Sample7_S7_L001_R1_001.fastq.gz,Control,Heart
Sample8,Sample8_S8_L001_R1_001.fastq.gz,Control,Heart
Sample9,Sample9_S9_L001_R1_001.fastq.gz,Control,Heart
Sample10,Sample10_S10_L001_R1_001.fastq.gz,Treatment,Heart
Sample11,Sample11_S11_L001_R1_001.fastq.gz,Treatment,Heart
Sample12,Sample12_S12_L001_R1_001.fastq.gz,Treatment,Heart
paired-end data:
csv
SampleName,Filename,Condition,Tissue
Sample1,Sample1_S1_L001_R1_001.fastq.gz,Control,Liver
Sample1,Sample1_S1_L001_R2_001.fastq.gz,Control,Liver
Sample2,Sample2_S2_L001_R1_001.fastq.gz,Control,Liver
Sample2,Sample2_S2_L001_R2_001.fastq.gz,Control,Liver
Sample3,Sample3_S3_L001_R1_001.fastq.gz,Control,Liver
Sample3,Sample3_S3_L001_R2_001.fastq.gz,Control,Liver
Sample4,Sample4_S4_L001_R1_001.fastq.gz,Treatment,Liver
Sample4,Sample4_S4_L001_R2_001.fastq.gz,Treatment,Liver
Sample5,Sample5_S5_L001_R1_001.fastq.gz,Treatment,Liver
Sample5,Sample5_S5_L001_R2_001.fastq.gz,Treatment,Liver
Sample6,Sample6_S6_L001_R1_001.fastq.gz,Treatment,Liver
Sample6,Sample6_S6_L001_R2_001.fastq.gz,Treatment,Liver
Sample7,Sample7_S7_L001_R1_001.fastq.gz,Control,Heart
Sample7,Sample7_S7_L001_R2_001.fastq.gz,Control,Heart
Sample8,Sample8_S8_L001_R1_001.fastq.gz,Control,Heart
Sample8,Sample8_S8_L001_R2_001.fastq.gz,Control,Heart
Sample9,Sample9_S9_L001_R1_001.fastq.gz,Control,Heart
Sample9,Sample9_S9_L001_R2_001.fastq.gz,Control,Heart
Sample10,Sample10_S10_L001_R1_001.fastq.gz,Treatment,Heart
Sample10,Sample10_S10_L001_R2_001.fastq.gz,Treatment,Heart
Sample11,Sample11_S11_L001_R1_001.fastq.gz,Treatment,Heart
Sample11,Sample11_S11_L001_R2_001.fastq.gz,Treatment,Heart
Sample12,Sample12_S12_L001_R1_001.fastq.gz,Treatment,Heart
Sample12,Sample12_S12_L001_R2_001.fastq.gz,Treatment,Heart
7. Final Notes
Adhering to these guidelines ensures that your omics metadata can be processed reliably by Trovomics and the underlying R scripts. Properly formatted CSV files avoid errors, streamlines your analysis workflow, and reduces troubleshooting steps.
If you have any questions or encounter persistent validation failures, please consult our user support resources at support@trovomics.com
Thank you for helping us maintain a robust and efficient environment for your omics research. We look forward to supporting your scientific discoveries!

