SQL Server 2016 R Services: sp_execute_external_script returns 0x80004005 error -
i run r code after querying 100m records , following error after process runs on 6 hours:
msg 39004, level 16, state 19, line 300 'r' script error occurred during execution of 'sp_execute_external_script' hresult 0x80004005.
hresult 0x80004005 appears associated in windows connectivity, permissions or "unspecified" error.
i know logging in r code process never reaches r script @ all. know entire procedure completes after 4 minutes on smaller number of records, example, 1m. leads me believe scaling problem or issue data, rather bug in r code. have not included r code or full query proprietary reasons.
however, expect disk or memory error display 0x80004004 out of memory error if case.
one clue noticed in sql errorlog following:
sql server received abort message , abort execution major error : 18 , minor error : 42
however time of log line not coincide interruption of process, although occur after started. unfortunately, there precious little on web "major error 18".
a sql trace when running ssms shows client logging in , logging out every 6 minutes or so, can assume normal keepalive behaviour.
the sanitized sp_execute_external_script call:
exec sp_execute_external_script @language = n'r' , @script = n'#we never here #returns name of output data file' , @input_data_1 = n'select top 100000000 data' , @input_data_1_name = n'x' , @output_data_1_name = n'output_file_df' result sets ((output_file varchar(100) not null))
server specs: 8 cores 256 gb ram sql server 2016 ctp 3
any ideas, suggestions or debugging hints appreciated!
update: set trace_level=3 in rlauncher.config turn on higher level of logging , re-ran process. log reveals cleanup process ran, removing session files, @ time entire process failed after 6.5 hours.
[2016-05-30 01:35:34.419][00002070][00001ec4][info] sqlsatellite_launchsatellite(1, a187bc64-c349-410b-861e-bfdc714c8017, 1, 49232, nullptr) completed: 00000000
[2016-05-30 01:35:34.420][00002070][00001ec4][info] < sqlsatellite_launchsatellite, dllmain.cpp, 223
[2016-05-30 08:04:02.443][00002070][00001ec4][info] > sqlsatellite_launchercleanup, dllmain.cpp, 309
[2016-05-30 08:04:07.443][00002070][00001ec4][warning] session a187bc64-c349-410b-861e-bfdc714c8017 cleanup wait failed 258 , error 0
[2016-05-30 08:04:07.444][00002070][00001ec4][info] session(a187bc64-c349-410b-861e-bfdc714c8017) logged 2 output files
[2016-05-30 08:04:07.444][00002070][00001ec4][warning] trydeletesinglefile(c:\progra~1\micros~1\mssql1~1.mss\mssql\extens~1\mssqlserver06\a187bc64-c349-410b-861e-bfdc714c8017\rscript1878455a2528) failed 32
[2016-05-30 08:04:07.445][00002070][00001ec4][warning] trydeletesingledirectory(c:\progra~1\micros~1\mssql1~1.mss\mssql\extens~1\mssqlserver06\a187bc64-c349-410b-861e-bfdc714c8017) failed 32
[2016-05-30 08:04:08.446][00002070][00001ec4][info] session a187bc64-c349-410b-861e-bfdc714c8017 removed mssqlserver06 user
[2016-05-30 08:04:08.447][00002070][00001ec4][info] sqlsatellite_launchercleanup(a187bc64-c349-410b-861e-bfdc714c8017) completed: 00000000
it appears way allow long-running process continue to: a) extend job cleanup wait time allow job finish b) disable job cleanup process
i have far been unable find value sets job cleanup wait time in mssqllaunchpad service. while job_cleanup_on_exit flag exists in rlauncher.config, setting 0 has no effect. service seems reset 1 when restarted.
again, suggestions or assistance appreciated!
by default, sql server reads data r memory data frame before starting execution of r script. based on fact script works 1m rows , fails start 100m rows, potentially out of memory error. resolve memory issues, (other increasing memory on machine/reducing data size) can try 1 of these solutions
- increase memory allocation r process execution using sys.resource_governor_external_resource_pools max_memory_percent setting. default, sql server limits r process execution 20% of memory.
- streaming execution r script instead of loading data memory. note parameter can used in cases output of r script doesn’t depend on reading or looking @ entire set of rows.
the warnings in rlauncher.log data cleanup happened after r script execution can safely ignored , not root cause failures seeing.
Comments
Post a Comment