[GCC 8.5.0 20210514 (Red Hat 8.5.0-24)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyreadstat as prs
>>> d,m=prs.read_dta("test.dta")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyreadstat/pyreadstat.pyx", line 296, in pyreadstat.pyreadstat.read_dta
File "pyreadstat/_readstat_parser.pyx", line 1282, in pyreadstat._readstat_parser.run_conversion
File "pyreadstat/_readstat_parser.pyx", line 955, in pyreadstat._readstat_parser.run_readstat_parser
File "pyreadstat/_readstat_parser.pyx", line 877, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to allocate memory
>>>
From my investigation, the issue is caused by L451 in readstat_dta_read.c, within dta_read_strls() function. It allocates memory for each string separately in a while loop. Later, at L445, the code is unable to allocate a large continuous chunk of memory because the heap is heavily fragmented.
With the reproducible example https://www.dropbox.com/scl/fi/sx9cz7vjekvud3ail9ph3/test.dta?rlkey=7e5qmwl9tbuoa0967kq3uq65f&st=g3wxulnc&dl=0,
L451 (malloc for each string) was executed approximately 1.6 million times. After that, L445 failed to allocate 26MB of continuous heap memory.
From my investigation, the issue is caused by L451 in readstat_dta_read.c, within dta_read_strls() function. It allocates memory for each string separately in a
whileloop. Later, at L445, the code is unable to allocate a large continuous chunk of memory because the heap is heavily fragmented.With the reproducible example https://www.dropbox.com/scl/fi/sx9cz7vjekvud3ail9ph3/test.dta?rlkey=7e5qmwl9tbuoa0967kq3uq65f&st=g3wxulnc&dl=0,
L451 (
mallocfor each string) was executed approximately 1.6 million times. After that, L445 failed to allocate 26MB of continuous heap memory.