There are two problems I have detected.
1. The files claim to be xml and UTF-8 with the header
<?xml version="1.0" encoding="UTF-8"?>
However I have found many that are not UTF-8 and contain non UTF-8 characters - ie any movie containing Renée Zellweger. Note the accented e (wich is not UTF-8
2. The unencoded & characters. These miust be encoded to &