Importing Data

This is a discussion on Importing Data within the APL forums in Programming Languages category; I have a huge amount of data (approximately 1.7 million rows) with 7 fields in each row that I would like to transfer to an APL variable. The data is in an Access file. I have APL+WIN 2.0. Is there a way of importing this easily. Doing a copy and paste of 56,000 rows at a time is going to take too much time. I've done a quick search on this forum and could not find anything useful. Thanks in advance Lance...

Go Back   Application Development Forum > Programming Languages > APL

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 05-07-2004, 02:11 PM
Lance
Guest
 
Default Importing Data

I have a huge amount of data (approximately 1.7 million rows) with 7
fields in each row that I would like to transfer to an APL variable.
The data is in an Access file. I have APL+WIN 2.0. Is there a way of
importing this easily. Doing a copy and paste of 56,000 rows at a
time is going to take too much time. I've done a quick search on this
forum and could not find anything useful.

Thanks in advance
Lance
Reply With Quote
  #2  
Old 05-07-2004, 02:26 PM
Dragan Cvetkovic
Guest
 
Default Re: Importing Data

lance_dyrland@hotmail.com (Lance) writes:

> I have a huge amount of data (approximately 1.7 million rows) with 7
> fields in each row that I would like to transfer to an APL variable.


Lance, how much memory are you going to need for that? Have you done some
calculations and/or estimates about that?

Bye, Dragan

--
Dragan Cvetkovic,

To be or not to be is true. G. Boole No it isn't. L. E. J. Brouwer

!!! Sender/From address is bogus. Use reply-to one !!!
Reply With Quote
  #3  
Old 05-07-2004, 09:00 PM
James J. Weinkam
Guest
 
Default Re: Importing Data

Dragan Cvetkovic wrote:
> lance_dyrland@hotmail.com (Lance) writes:
>
>
>>I have a huge amount of data (approximately 1.7 million rows) with 7
>>fields in each row that I would like to transfer to an APL variable.

>
>
> Lance, how much memory are you going to need for that? Have you done some
> calculations and/or estimates about that?
>
> Bye, Dragan
>

One hopes he has enough memory. For instancs on my Thinkpad 600 with 192MB
running APL2 for OS/2 I just did

{quad}WA
33469776
x{is}1700000 7{rho}123.45678
{quad}WA
5378608
{quad}EX 'x'
1
{quad}WA
100578640

Observe that 100578640-5378608=95200032=32+8{times}7{times}1700000

No problem having an array that large, or even several on a system with more RAM.

The OP should export the data as a .csv file and write a small function to
read it in. The file will be on the order of 100-200MB depending on the size
of the values. The file can be read using either AP210 or Processor 12.
Reply With Quote
  #4  
Old 05-07-2004, 11:35 PM
Lance
Guest
 
Default Re: Importing Data

Dragan Cvetkovic <me@privacy.net> wrote in message news:<lmllk4f60g.fsf@privacy.net>...
> lance_dyrland@hotmail.com (Lance) writes:
>
> > I have a huge amount of data (approximately 1.7 million rows) with 7
> > fields in each row that I would like to transfer to an APL variable.

>
> Lance, how much memory are you going to need for that? Have you done some
> calculations and/or estimates about that?
>
> Bye, Dragan


I have 1 Gig of memory. I've done it before through copy and pasting
and memory hasn't been a problem.

I need to repeat the process of importing data and would rather learn
how to do it properly/efficiently.

Lance
Reply With Quote
  #5  
Old 05-08-2004, 09:31 AM
Brian Bambrough
Guest
 
Default Re: Importing Data

I'm not familiar with Access but I assume that you can write all your
data to a text file. If this is the case it should be easy to write a
function to read the file and concatenate the data to a variable.

I have manipulated large text files using J. My computer has Windows
98 SE and 512 MB RAM. I find that it is necessary to loop to get the
job done. In my applications I was able to bring into memory 5 to 10
megabytes at a time. Any more than this and the swap file was heavily
accessed causing the program to slow down drastically.

I hope this helps. Let me know if you want more detail.


Lance wrote:
>
> I have a huge amount of data (approximately 1.7 million rows) with 7
> fields in each row that I would like to transfer to an APL variable.
> The data is in an Access file. I have APL+WIN 2.0. Is there a way of
> importing this easily. Doing a copy and paste of 56,000 rows at a
> time is going to take too much time. I've done a quick search on this
> forum and could not find anything useful.
>
> Thanks in advance
> Lance

Reply With Quote
  #6  
Old 05-08-2004, 01:06 PM
Graham Steer
Guest
 
Default Re: Importing Data

I am not familiar with APL+WIN 2.0 (I have 3.6) but if it supports ActiveX
then you can access the data from APL by creating an instance of the Access
ActiveX control. I can provide you with details of how to do this if it will
work for you. If not then simply export the data from Access to a text file
and then use the native file functions ([]ntie,[]nread,[]nuntie) to read the
whole file or manageable chunks into APL.

> Lance wrote:
> >
> > I have a huge amount of data (approximately 1.7 million rows) with 7
> > fields in each row that I would like to transfer to an APL variable.
> > The data is in an Access file. I have APL+WIN 2.0. Is there a way of
> > importing this easily. Doing a copy and paste of 56,000 rows at a
> > time is going to take too much time. I've done a quick search on this
> > forum and could not find anything useful.
> >
> > Thanks in advance
> > Lance



Reply With Quote
  #7  
Old 05-08-2004, 01:57 PM
bernard
Guest
 
Default Re: Importing Data

Depending on what your purpose is for importing your data, you may be
interested in the way we analyse 'large' files in dyalog apl.
On a $ 400,- budget-line computer with 512 mb ram we analyse files up
to 6 million rows with an unlimited number of fields. The performance
is staggering and will in general not be more than a few seconds.
The 'import' function will import the data in small portions and
create a component in an apl component file for each field. With 1 gb
ram it should be possible to do this with much larger files.
A demo workspace with a demofile of 135000 rows is available at :

http://home.wanadoo.nl/houb/IDA.htm

Let me know if you wish to try a demo with 2 million rows (no
problem):-).

Bernard



lance_dyrland@hotmail.com (Lance) wrote in message news:<32330f3b.0405071011.2b6d86a1@posting.google. com>...
> I have a huge amount of data (approximately 1.7 million rows) with 7
> fields in each row that I would like to transfer to an APL variable.
> The data is in an Access file. I have APL+WIN 2.0. Is there a way of
> importing this easily. Doing a copy and paste of 56,000 rows at a
> time is going to take too much time. I've done a quick search on this
> forum and could not find anything useful.
>
> Thanks in advance
> Lance

Reply With Quote
  #8  
Old 05-08-2004, 03:03 PM
Mike Kent
Guest
 
Default Re: Importing Data

Lance wrote:

> I have a huge amount of data (approximately 1.7 million rows) with 7
> fields in each row that I would like to transfer to an APL variable.
> The data is in an Access file. I have APL+WIN 2.0. Is there a way of
> importing this easily. Doing a copy and paste of 56,000 rows at a
> time is going to take too much time. I've done a quick search on this
> forum and could not find anything useful.


With Access you should be able to export to file with fixed field
widths, which is a little easier to import than comma-delimited.
(It saves you from having to deal with the cases where there is a
comma in the text of a cell so that quoting characters are inserted
into the exported file, and where there is quote character in the
text of a field, so that the quotes are doubled, and where there
are linefeeds or carriage returns in test, ... all of which make
the extraction of what was in the original data more finicky.)

I am going to assume that you have enough memory to read in an
image of the whole file, to hold the final result matrix, and to
do assorted manipulations along the way. This is pretty likely as
you have 1GB of memory, so you should be able to get, oh, 800MB
for APL workspace.


So:

Start by writing a simple utility that reads in the entire file
and returns a character vector:

cv {<-} fi2cv fi;t;nn
[1] {cmt} get a tie number, open the file
[2] t {<-} 1 {rho} (- {iota} 2+ {rho} nn)~0,nn {<-} {quad}nnums
[3] fi {quad}ntie t
[4] {cmt} read the whole file, as chars, close the file
[5] cv {<-} {quad}nread t,82,2 {take} {quad}nsize t
[6] {quad}nuntie t

So now suppose you have a vector, w, with the field widths, and
a character vector, fi, with name of the file to which you have
exported the data.

Access most likely adds two characters (DOS/Win line separator)
after each record (but maybe you can specify when you set up the
export).

Calculate the record length, and read the file into a character
vector:

len {<-} +/w,2
data {<-} fi2cv fi

At this point I expect that {rho} cv is a multiple of len, and
so the number of records can be calculated, and the vector can
be reshaped into a matrix:

nrecs {<-} ({rho} data) {div} len
data {<-} (nrecs,len) {rho} data

You can get rid of the record separators

data {<-} 0 {neg}2 {drop} data

and make a boolean vector that will cut the matrix into the
seven fields

bv {<-} {quad}enlist w {take}{each} 1 {enclose} w
f1 f2 f3 f4 f5 f6 f7 {<-} bv {enclose} w

This is likely to be useful if you need to do some field-by-field
processing (like converting some of the fields to arrays of numbers).

Alternatively if you want a matrix of cells matching the database
arrangement, you can make a vector of cells and then reshape to matrix:

cells {<-} (,(nrecs,len) {rho} bv) {enclose},data
cells {<-} (nrecs, {rho} w) {rho} cells

If you will need to do this frequently, you might want to wrap up some
of the stuff above as utility functions ...
Reply With Quote
  #9  
Old 05-09-2004, 03:50 AM
Bj?rn Helgason
Guest
 
Default Re: Importing Data

lance_dyrland@hotmail.com (Lance) wrote in message news:<32330f3b.0405071011.2b6d86a1@posting.google. com>...
> I have a huge amount of data (approximately 1.7 million rows) with 7
> fields in each row that I would like to transfer to an APL variable.
> The data is in an Access file. I have APL+WIN 2.0. Is there a way of
> importing this easily. Doing a copy and paste of 56,000 rows at a
> time is going to take too much time. I've done a quick search on this
> forum and could not find anything useful.



In J - www.jsoftware.com - you do not need to read the data into a
workspace.

You can map the file to a variable.

It then behaves like a variable without a need to make a copy of it.

Lets say you have a long string and it is all fixed length 10 byte
rows and no CR LF. You can map the file to a variable and while you
map you tell J it is say 1.345.456.678 rows and 10 columns. No
movement of data. You then index into this variable to read say line
456.536.296 and all that happens is you move 10 bytes into your
workspace. You could as easily replace this row and no need to touch
the rest of the file.

Read more about this in J studio/labs

"-------------------------------------------------------------
Lab: Mapped Files
To advance the lab, select menu Studio|Advance or the
corresponding shortcut.


-- (1 of 66) overview ---------------------------------------
A name can be mapped to a file. Using the name uses the file.
Changing the value of the name changes the file.

A mapped name is connected, as a noun, to a mapped file.

A mapped files data is addressable as memory.

A mapped name is the file.

....... skipped

-- (65 of 66) overview (ctd) --------------------------------
A demo of mapped names given at conferences and presentations
uses a cdrom distributed by the New York Stock Exchange. Each
month the NYSE distributes cdroms to subscribers with
detailed trade and quote information. The demo maps 2 files
from the cdrom to J nouns. One file has 13 million records of
trade information for a total size of 250MB. The other file
is indexing information to the trade file and it contains
200,000 records for a size of 4.5MB. The demo maps names to
these two files. It is very easy to build an application with
this approach.
)


-- (66 of 66) overview (ctd) --------------------------------
Food for thought:

Files on zipdrives, cdroms, and a network can be mapped.

Why read and write files? Just map them and use the power of
J directly.

Arrays shared between J tasks.

Data shared between J and non-J tasks."
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 11:05 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.