| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#1
| |||
| |||
| I have a huge amount of data (approximately 1.7 million rows) with 7 fields in each row that I would like to transfer to an APL variable. The data is in an Access file. I have APL+WIN 2.0. Is there a way of importing this easily. Doing a copy and paste of 56,000 rows at a time is going to take too much time. I've done a quick search on this forum and could not find anything useful. Thanks in advance Lance |
|
#2
| |||
| |||
| lance_dyrland@hotmail.com (Lance) writes: > I have a huge amount of data (approximately 1.7 million rows) with 7 > fields in each row that I would like to transfer to an APL variable. Lance, how much memory are you going to need for that? Have you done some calculations and/or estimates about that? Bye, Dragan -- Dragan Cvetkovic, To be or not to be is true. G. Boole No it isn't. L. E. J. Brouwer !!! Sender/From address is bogus. Use reply-to one !!! |
|
#3
| |||
| |||
| Dragan Cvetkovic wrote: > lance_dyrland@hotmail.com (Lance) writes: > > >>I have a huge amount of data (approximately 1.7 million rows) with 7 >>fields in each row that I would like to transfer to an APL variable. > > > Lance, how much memory are you going to need for that? Have you done some > calculations and/or estimates about that? > > Bye, Dragan > One hopes he has enough memory. For instancs on my Thinkpad 600 with 192MB running APL2 for OS/2 I just did {quad}WA 33469776 x{is}1700000 7{rho}123.45678 {quad}WA 5378608 {quad}EX 'x' 1 {quad}WA 100578640 Observe that 100578640-5378608=95200032=32+8{times}7{times}1700000 No problem having an array that large, or even several on a system with more RAM. The OP should export the data as a .csv file and write a small function to read it in. The file will be on the order of 100-200MB depending on the size of the values. The file can be read using either AP210 or Processor 12. |
|
#4
| |||
| |||
| Dragan Cvetkovic <me@privacy.net> wrote in message news:<lmllk4f60g.fsf@privacy.net>... > lance_dyrland@hotmail.com (Lance) writes: > > > I have a huge amount of data (approximately 1.7 million rows) with 7 > > fields in each row that I would like to transfer to an APL variable. > > Lance, how much memory are you going to need for that? Have you done some > calculations and/or estimates about that? > > Bye, Dragan I have 1 Gig of memory. I've done it before through copy and pasting and memory hasn't been a problem. I need to repeat the process of importing data and would rather learn how to do it properly/efficiently. Lance |
|
#5
| |||
| |||
| I'm not familiar with Access but I assume that you can write all your data to a text file. If this is the case it should be easy to write a function to read the file and concatenate the data to a variable. I have manipulated large text files using J. My computer has Windows 98 SE and 512 MB RAM. I find that it is necessary to loop to get the job done. In my applications I was able to bring into memory 5 to 10 megabytes at a time. Any more than this and the swap file was heavily accessed causing the program to slow down drastically. I hope this helps. Let me know if you want more detail. Lance wrote: > > I have a huge amount of data (approximately 1.7 million rows) with 7 > fields in each row that I would like to transfer to an APL variable. > The data is in an Access file. I have APL+WIN 2.0. Is there a way of > importing this easily. Doing a copy and paste of 56,000 rows at a > time is going to take too much time. I've done a quick search on this > forum and could not find anything useful. > > Thanks in advance > Lance |
|
#6
| |||
| |||
| I am not familiar with APL+WIN 2.0 (I have 3.6) but if it supports ActiveX then you can access the data from APL by creating an instance of the Access ActiveX control. I can provide you with details of how to do this if it will work for you. If not then simply export the data from Access to a text file and then use the native file functions ([]ntie,[]nread,[]nuntie) to read the whole file or manageable chunks into APL. > Lance wrote: > > > > I have a huge amount of data (approximately 1.7 million rows) with 7 > > fields in each row that I would like to transfer to an APL variable. > > The data is in an Access file. I have APL+WIN 2.0. Is there a way of > > importing this easily. Doing a copy and paste of 56,000 rows at a > > time is going to take too much time. I've done a quick search on this > > forum and could not find anything useful. > > > > Thanks in advance > > Lance |
|
#7
| |||
| |||
| Depending on what your purpose is for importing your data, you may be interested in the way we analyse 'large' files in dyalog apl. On a $ 400,- budget-line computer with 512 mb ram we analyse files up to 6 million rows with an unlimited number of fields. The performance is staggering and will in general not be more than a few seconds. The 'import' function will import the data in small portions and create a component in an apl component file for each field. With 1 gb ram it should be possible to do this with much larger files. A demo workspace with a demofile of 135000 rows is available at : http://home.wanadoo.nl/houb/IDA.htm Let me know if you wish to try a demo with 2 million rows (no problem):-). Bernard lance_dyrland@hotmail.com (Lance) wrote in message news:<32330f3b.0405071011.2b6d86a1@posting.google. com>... > I have a huge amount of data (approximately 1.7 million rows) with 7 > fields in each row that I would like to transfer to an APL variable. > The data is in an Access file. I have APL+WIN 2.0. Is there a way of > importing this easily. Doing a copy and paste of 56,000 rows at a > time is going to take too much time. I've done a quick search on this > forum and could not find anything useful. > > Thanks in advance > Lance |
|
#8
| |||
| |||
| Lance wrote: > I have a huge amount of data (approximately 1.7 million rows) with 7 > fields in each row that I would like to transfer to an APL variable. > The data is in an Access file. I have APL+WIN 2.0. Is there a way of > importing this easily. Doing a copy and paste of 56,000 rows at a > time is going to take too much time. I've done a quick search on this > forum and could not find anything useful. With Access you should be able to export to file with fixed field widths, which is a little easier to import than comma-delimited. (It saves you from having to deal with the cases where there is a comma in the text of a cell so that quoting characters are inserted into the exported file, and where there is quote character in the text of a field, so that the quotes are doubled, and where there are linefeeds or carriage returns in test, ... all of which make the extraction of what was in the original data more finicky.) I am going to assume that you have enough memory to read in an image of the whole file, to hold the final result matrix, and to do assorted manipulations along the way. This is pretty likely as you have 1GB of memory, so you should be able to get, oh, 800MB for APL workspace. So: Start by writing a simple utility that reads in the entire file and returns a character vector: cv {<-} fi2cv fi;t;nn [1] {cmt} get a tie number, open the file [2] t {<-} 1 {rho} (- {iota} 2+ {rho} nn)~0,nn {<-} {quad}nnums [3] fi {quad}ntie t [4] {cmt} read the whole file, as chars, close the file [5] cv {<-} {quad}nread t,82,2 {take} {quad}nsize t [6] {quad}nuntie t So now suppose you have a vector, w, with the field widths, and a character vector, fi, with name of the file to which you have exported the data. Access most likely adds two characters (DOS/Win line separator) after each record (but maybe you can specify when you set up the export). Calculate the record length, and read the file into a character vector: len {<-} +/w,2 data {<-} fi2cv fi At this point I expect that {rho} cv is a multiple of len, and so the number of records can be calculated, and the vector can be reshaped into a matrix: nrecs {<-} ({rho} data) {div} len data {<-} (nrecs,len) {rho} data You can get rid of the record separators data {<-} 0 {neg}2 {drop} data and make a boolean vector that will cut the matrix into the seven fields bv {<-} {quad}enlist w {take}{each} 1 {enclose} w f1 f2 f3 f4 f5 f6 f7 {<-} bv {enclose} w This is likely to be useful if you need to do some field-by-field processing (like converting some of the fields to arrays of numbers). Alternatively if you want a matrix of cells matching the database arrangement, you can make a vector of cells and then reshape to matrix: cells {<-} (,(nrecs,len) {rho} bv) {enclose},data cells {<-} (nrecs, {rho} w) {rho} cells If you will need to do this frequently, you might want to wrap up some of the stuff above as utility functions ... |
|
#9
| |||
| |||
| lance_dyrland@hotmail.com (Lance) wrote in message news:<32330f3b.0405071011.2b6d86a1@posting.google. com>... > I have a huge amount of data (approximately 1.7 million rows) with 7 > fields in each row that I would like to transfer to an APL variable. > The data is in an Access file. I have APL+WIN 2.0. Is there a way of > importing this easily. Doing a copy and paste of 56,000 rows at a > time is going to take too much time. I've done a quick search on this > forum and could not find anything useful. In J - www.jsoftware.com - you do not need to read the data into a workspace. You can map the file to a variable. It then behaves like a variable without a need to make a copy of it. Lets say you have a long string and it is all fixed length 10 byte rows and no CR LF. You can map the file to a variable and while you map you tell J it is say 1.345.456.678 rows and 10 columns. No movement of data. You then index into this variable to read say line 456.536.296 and all that happens is you move 10 bytes into your workspace. You could as easily replace this row and no need to touch the rest of the file. Read more about this in J studio/labs "------------------------------------------------------------- Lab: Mapped Files To advance the lab, select menu Studio|Advance or the corresponding shortcut. -- (1 of 66) overview --------------------------------------- A name can be mapped to a file. Using the name uses the file. Changing the value of the name changes the file. A mapped name is connected, as a noun, to a mapped file. A mapped files data is addressable as memory. A mapped name is the file. ....... skipped -- (65 of 66) overview (ctd) -------------------------------- A demo of mapped names given at conferences and presentations uses a cdrom distributed by the New York Stock Exchange. Each month the NYSE distributes cdroms to subscribers with detailed trade and quote information. The demo maps 2 files from the cdrom to J nouns. One file has 13 million records of trade information for a total size of 250MB. The other file is indexing information to the trade file and it contains 200,000 records for a size of 4.5MB. The demo maps names to these two files. It is very easy to build an application with this approach. ) -- (66 of 66) overview (ctd) -------------------------------- Food for thought: Files on zipdrives, cdroms, and a network can be mapped. Why read and write files? Just map them and use the power of J directly. Arrays shared between J tasks. Data shared between J and non-J tasks." |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.