1

I am trying to validate my RNAseq data by doing qpcr for which I am looking at the fold change of few genes across various timepoints of treatment conditions. I am getting huge amount of variation (in thousand folds ) in my biological replicates. I thought may be it is due to genomic DNA contamination so thats why I repeated all my experiment and did DNase treatment twice but still I am having such variations. I am using two reference genes EF and GAPDH and I also have variations in the Ct values of reference genes across various timepoints. I will really appreciate it if you could suggest me with possible problems and solutions.

Thank you, Ambika

ambika
  • 11
  • 2
  • How do your technical replicates look? – Rover Eye Jun 20 '18 at 16:21
  • To clarify, the variation you're seeing in the qPCR isn't being seen in the RNAseq? Do the values correlate? How was the qPCR data normalized? – CKM Jun 20 '18 at 16:46
  • How are you aliquoting DNA samples into your reaction mix? You can get alot of error just from errors in the pipette. I fine a ECHO liquid dispenser makes all your troubles go away (provided your can find one somewhere) GAPDH is sensitive to metaoblic state (ie oxgyen and glucose levels.) Are you using column purification of RNA with DNAse treatment? – JayCkat Jun 20 '18 at 17:45
  • @RoverEye my technical replicates look good, they don't have such variation. – ambika Jun 20 '18 at 18:12
  • @CKM I made PCA plot to compare my replicates for my RNAseq data. I had 4 replicates for RNAseq in which there was clustering of 1 and 2 together ; and 3 and 4 together. But that might be due to environmental variation since I did that in different time. In qPCR I am seeing the expression of those genes but the fold expression between biological replicates do not match. They have the same trend though. I am having 1000 folds difference in some genes – ambika Jun 20 '18 at 18:19
  • @JayCkat Yes I am using column purification of RNA with DNAse treatment.And also both EF and GAPDH have 5 to 6 Ct value difference in between the samples. – ambika Jun 20 '18 at 18:21
  • @ambika Here are some things I would do: Check the data analysis, make sure everything is in FC or logFC (i.e. make sure one isnt logFC and other is linear FC). Check how data were normalized, and whether reference genes were differentially expressed (look for stable expression or lowest CV reference genes). Also note whether the results correlate, is your R2 good if you plot the two on an x-y/scatter plot? Since the two methods measure differently, it may be more important that the trend is the same rather than the result. – CKM Jun 20 '18 at 18:47
  • @CKM I normalised the data using DDCT method. I am also thinking if my reference genes are not good enough. Thank for your suggestions. – ambika Jun 20 '18 at 19:10
  • @ambika Maybe you can use your RNAseq data to find a few genes with stable expression across your samples, to use as references in further experiments. – CKM Jun 20 '18 at 20:20
  • @CKM can you please suggest how can I categorize genes. Any programs or method? – ambika Jun 21 '18 at 15:35
  • @ambika How do you mean? Like how to categorize them as reference genes? Or something different? – CKM Jun 21 '18 at 23:40
  • @CKM Now I am trying to calculate stably expressed genes using the RNAseq data. So far I calculated coefficient of variation of all genes based on count data. Now one thing that I am confused about is the list with least CV from all replicates look different. Do I have to calculate CV integrating all the replicates data? – ambika Jun 25 '18 at 21:56
  • @ambika Calculate the list of stably expressed genes from all the data together. I like to visualize it as if i were working in excel: Columns are samples, rows are genes. Calculate the CV% in a separate column by row (i.e. by gene), and sort low-to-high and pick the ones with the lowest CV%. Just remember if you use them to normalize to drop them from your analysis set. – CKM Jun 26 '18 at 00:16
  • @CKM thank you so much. I really appreciate your suggestions.. – ambika Jun 26 '18 at 18:30
  • If you have 1000-fold differences between either your biological samples or RNAseq & qPCR there is very likely a problem with your data normalization (if not you need better samples). Ideally you should always use log2 normalised fold change values (qPCR / ddCT directly gives these, they should be somewhat normally distributed around 0 and represent doubling effects which are often biologically relevant). – Nicolai Dec 09 '19 at 09:50
  • Also as others have mentioned GAPDH expression levels can be affected by the metabolic state of the cells, so it's not always a good control/reference. Generally a control for qPCR should have similar expression levels to the genes you want to measure (this might be an issue for you, GAPDH & EF can be very high) and of course be stable between the timepoints or conditions you want to compare. – Nicolai Dec 09 '19 at 09:54

0 Answers0