Purpose ReQoL-10 and ReQoL-20 have been developed for use as outcome measures with individuals aged 16 and over, experiencing mental health difficulties. This paper reports modelling results from the item response theory (IRT) analyses that were used for item reduction. Methods From several stages of preparatory work including focus groups and a previous psychometric survey, a pool of items was developed. After confirming that the ReQoL item pool was sufficiently unidimensional for scoring, IRT model parameters were estimated using Samejima's Graded Response Model (GRM). All 39 mental health items were evaluated with respect to item fit and differential item function regarding age, gender, ethnicity, and diagnosis. Scales were evaluated regarding overall measurement precision and known-groups validity (by care setting type and self-rating of overall mental health). Results The study recruited 4266 participants with a wide range of mental health diagnoses from multiple settings. The IRT parameters demonstrated excellent coverage of the latent construct with the centres of item information functions ranging from - 0.98 to 0.21 and with discrimination slope parameters from 1.4 to 3.6. We identified only two poorly fitting items and no evidence of differential item functioning of concern. Scales showed excellent measurement precision and known-groups validity. Conclusion The results from the IRT analyses confirm the robust structure properties and internal construct validity of the ReQoL instruments. The strong psychometric evidence generated guided item selection for the final versions of the ReQoL measures.