Here are some quotes that I found that bear on this question:
“General caveats. Regardless of the functions applied, scores are known to scale poorly with molecular mass and the number of rotatable bonds in compounds76. Large molecules can form many hypothetical interactions in binding sites and therefore have the tendency to generate better scores than smaller compounds.” PMID: 15520816 DOI: 10.1038/nrd1549
“Compound selection in virtual database screening when targeting a biological macromolecule is typically based on the interaction energy between the chemical compound and the target macromolecule. In the present study it is shown that this approach is biased toward the selection of high molecular weight compounds due to the contribution of the compound size to the energy score. To account for molecular weight during energy based screening, we propose normalization strategies based on the total number of heavy atoms in the chemical compounds being screened.” Journal of Chemical Information and Computer Sciences 2003 43(1):267-72. DOI: 10.1021/ci020055f
“It seems as if the most important molecular property for ligand bias is molecular size. All 10 scoring functions in this study are more or less correlated to molecular size. The r2 values, when fitting the scores from each target and scoring function to one simple descriptor, HAC, are shown in Table 3. All scoring functions except for FlexX show intermediate to high correlation.” J. Chem. Inf. Model. 2006, 46, 1334-1343.