python - Optimizing memory managemente for huge variables in Jython -


I'm using approximately 7 to 10 MB as CSV file input, around 65,000 lines in which many The data fields will be hosted: for each of them, its acetid, FQDN and six property fields (see below for an example)

  LCEEMV.V.Pray Cropper | SCCC_0002001 Prefusion | SVC_0001086 | Preframement SVCB00008660 | | Infrastructure Management LSER_0053150 | Wastecity-ADMGGPRACORP SVCC_0002001 | Prefudence | SVC_0001086 | Preproduction SVCB_0000160 | Infrastructure Management  

A csv.reader object repeated on each line to use the I and its contents to save RAM:

 < Code> prop_ COLUMNS: prop_value = row [column] data for prop, column in [[(FQDN, prop)]. Add (prop_value)  

Then, when the input file is completely copied, it's time to dump it into an output file using a different syntax:

  FQDN, property_name, property_values ​​ 

Th works flawlessly for smaller CSV files, however, it is currently sometimes host memory allocated huge CSV, JVM Stops based on usage, in which it is running. This is such a data.add bit which I believe is responsible for it.

Note that the input file is not necessarily sorted by FQDN, so I can not just line up the line and when the output is copied to the input, a second entry is found with the same FQDN They should be merged.

I want to map this var to a temporary file and use it before, but I am also not sure that it is possible and / or easy to implement. Now I am scared

It is not possible to use a database

I am still not familiar with everything that presents me with Python, so the ideal solution can be right in front of me and I see it I am not able to ... I hope A skilled person with someone can help a fellow developer

You do not have a full dataset stored in memory If you can, you can view using disk (temporary) as storage, as each FQDN can be saved in one file.

There are some things (it's half Cdmokod!):

  temp_dir = tempfile.mktmpdir () Get: # FQDN and property line in CSV fqdn_filepath = os .path.join (temp_dir, FQDN.name) if not os.path.exists (fqdn_filepath): # Create another it from scratch: # Add the existing properties in the current FQDN file # has now ended redintegrate CSV You have a bunch of FQDN files. Now output as one-by-one with Opan_autput_fail () processed by each one was sorted fqdn_file (os.path.files_in_dir (temp_dir)): output.write_fqdn_data () Delete temp_dir  

Comments

Popular posts from this blog

sqlite3 - UPDATE a table from the SELECT of another one -

c# - Showing a SelectedItem's Property -

javascript - Render HTML after each iteration in loop -