Windows freeware unique sort technique for large text files (hosts)

This is a discussion on Windows freeware unique sort technique for large text files (hosts) within the Editors forums in Theory and Concepts category; Is there a way, using windows freeware, to sort unique a huge hosts file? I've concatonated all the freeware windows hosts files I can find into a single huge fifty-thousand line C:\Windows\System\Drivers\Etc\hosts file but the resulting hosts file is so huge, replete with duplicates, that it's slowing down windows browsing. I would like to pare the hosts file to remove duplicates. How? I tried sorting with windows vim 7.1 freeware but I can't get the unique sort option to work inside of vim. What am I doing wrong? Here is a vim 7.1 command that works inside the huge hosts ...

Go Back   Application Development Forum > Theory and Concepts > Editors

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-02-2008, 02:00 PM
Donita Luddington
Guest
 
Default Windows freeware unique sort technique for large text files (hosts)

Is there a way, using windows freeware, to sort unique a huge hosts file?

I've concatonated all the freeware windows hosts files I can find into a
single huge fifty-thousand line C:\Windows\System\Drivers\Etc\hosts file
but the resulting hosts file is so huge, replete with duplicates, that it's
slowing down windows browsing.

I would like to pare the hosts file to remove duplicates. How?

I tried sorting with windows vim 7.1 freeware but I can't get the unique
sort option to work inside of vim. What am I doing wrong?

Here is a vim 7.1 command that works inside the huge hosts file:
:%!sort (this sorts the huge windows hosts file just fine)

This vim 7.1 sort unique command should work but it does not:
:%!sort -u (this is supposed to sort uniquely)

The syntax is:
<esc>: (begin a windows vim 7.1 command)
!sort -u (run the following command "sort -u" inside of vim freeware)

When I run "<esc>:!sort -u" inside of vim, it pares the hosts file down to
a single (empty) line.

Is there another free way to sort uniquely a large windows text file?
Reply With Quote
  #2  
Old 08-02-2008, 02:29 PM
B. R. 'BeAr' Ederson
Guest
 
Default Re: Windows freeware unique sort technique for large text files (hosts)

On Sat, 2 Aug 2008 11:00:26 -0700, Donita Luddington wrote:

> This vim 7.1 sort unique command should work but it does not:
> :%!sort -u (this is supposed to sort uniquely)


This syntax means: execute external sort command with the unique
option. It will most likely trigger Windows own sort command,
which has no -u option. Therefore, it fails.

Get a sort.exe which supports -u as part of the UnxUtils package:

http://sourceforge.net/projects/unxutils

An updated version is part of UnxUpdates.zip, which at the moment
is not available via SourceForge. So you may get it here:

http://www.weihenstephan.de/~syring/...UtilsDist.html

When you're at it, you can execute it directly, giving your file
as input source. (No need to bother Vim with this task.) If you
start using UnxUtils on a more regular basis (they are worthwhile),
you may consider to copy all *.exe files into a directory and add
this to your search path. I explained the how-to only yesterday.
So you may have a look at:

Message-ID: <msm167avz8w.dlg@br.ederson.news.arcor.de>

or look the posting up through Google Groups:

http://groups.google.de/group/alt.co...621001ce95ea22

HTH.
BeAr
--
================================================== =========================
= What do you mean with: "Perfection is always an illusion"? =
================================================== =============--(Oops!)===
Reply With Quote
  #3  
Old 08-02-2008, 02:43 PM
Guy
Guest
 
Default Re: Windows freeware unique sort technique for large text files (hosts)

[FollowUp-To: alt.comp.freeware]

Donita Luddington wrote:

> Newsgroups: alt.comp.freeware,comp.unix.questions,comp.editors
>



When you do crosspost you should set followup-to


> From: Donita Luddington <doniludd@sbcglobal.net>
> NNTP-Posting-Host: 69.110.47.74
> Message-ID: <Oi1lk.6166$np7.5749@flpi149.ffdc.sbc.com>
>
> Is there a way, using windows freeware, to sort unique a huge
> hosts file?
>
> I've concatonated all the freeware windows hosts files I can find
> into a single huge fifty-thousand line
> C:\Windows\System\Drivers\Etc\hosts file but the resulting hosts
> file is so huge, replete with duplicates, that it's slowing down
> windows browsing.
>
> I would like to pare the hosts file to remove duplicates. How?
>



Start with pushing concatenated file through rpsort

rpsort /q /d /n < infile > outfile

Then view outfile for "semi-duplicate" entries to
determine if further processing necessary.

e.g. spacing, comment, or... something else differences

--
OpenPGP: id=18795161E22D3905; preference=signencrypt;
url=http://guysalias.fateback.com/pgpkeys.txt
Reply With Quote
  #4  
Old 08-02-2008, 03:01 PM
John Stubbings
Guest
 
Default Re: Windows freeware unique sort technique for large text files (hosts)

"Donita Luddington" <doniludd@sbcglobal.net> wrote in message
news:Oi1lk.6166$np7.5749@flpi149.ffdc.sbc.com...
> Is there a way, using windows freeware, to sort unique a huge hosts file?
>
> I've concatonated all the freeware windows hosts files I can find into a
> single huge fifty-thousand line C:\Windows\System\Drivers\Etc\hosts file
> but the resulting hosts file is so huge, replete with duplicates, that
> it's
> slowing down windows browsing.
>
> I would like to pare the hosts file to remove duplicates. How?
>
> I tried sorting with windows vim 7.1 freeware but I can't get the unique
> sort option to work inside of vim. What am I doing wrong?
>
> Here is a vim 7.1 command that works inside the huge hosts file:
> :%!sort (this sorts the huge windows hosts file just fine)
>
> This vim 7.1 sort unique command should work but it does not:
> :%!sort -u (this is supposed to sort uniquely)
>
> The syntax is:
> <esc>: (begin a windows vim 7.1 command)
> !sort -u (run the following command "sort -u" inside of vim freeware)
>
> When I run "<esc>:!sort -u" inside of vim, it pares the hosts file down to
> a single (empty) line.
>
> Is there another free way to sort uniquely a large windows text file?



I'd use a Linux VM.

Then it's a simple

tr '[A-Z]' '[a-z]' < file.txt | sort | uniq > file2.txt




Reply With Quote
  #5  
Old 08-02-2008, 03:35 PM
Donita Luddington
Guest
 
Default Re: Windows freeware unique sort technique for large text files (hosts)

On Sat, 2 Aug 2008 20:01:20 +0100, John Stubbings wrote:
> tr '[A-Z]' '[a-z]' < file.txt | sort | uniq > file2.txt


Hey John,
I guess your command does this

tr (translate character by character)
'[A-Z]' (from any character in the set of capital A to capital Z)
'[a-z]' (to the corresponding character in the set of small a to small z)
< file.txt (from the input file "file.txt")
| (pipe that output to the next command)
sort (sort alphabetically those results)
| (pipe that output to the next command)
uniq (uniquify the results, line by line,
> file2.txt (and save as "file2.txt)


If I had Linux installed, I'd run that in a flash but I don't have any
Linux and, I guess, it would take many hours to install so I'm looking for
a native windows freeware solution instead.

There must be a way to uniquify a file from within vi freeware on windows.
Reply With Quote
  #6  
Old 08-02-2008, 03:40 PM
Donita Luddington
Guest
 
Default Re: Windows freeware unique sort technique for large text files (hosts)

On Sat, 2 Aug 2008 20:29:35 +0200, B. R. 'BeAr' Ederson wrote:

>> This vim 7.1 sort unique command should work but it does not:
>> :%!sort -u (this is supposed to sort uniquely)

>
> This syntax means: execute external sort command with the unique
> option. It will most likely trigger Windows own sort command,
> which has no -u option. Therefore, it fails.


Hey BeAr,
I did not realize this was calling a windows "sort.exe" command.
I thought it was calling a "vim" sort command.

I just looked to see if the windows sort command has a "unique" option.

c:\> sort /? > c:\temp\sort.txt
c:\> type c:\temp\sort.txt
SORT [/R] [/+n] [/M kilobytes] [/L locale] [/REC recordbytes]
[[drive1:][path1]filename1] [/T [drive2:][path2]]
[/O [drive3:][path3]filename3]
/+n Specifies the character number, n, to
begin each comparison. /+3 indicates that
each comparison should begin at the 3rd
character in each line. Lines with fewer
than n characters collate before other lines.
By default comparisons start at the first
character in each line.
/L[OCALE] locale Overrides the system default locale with
the specified one. The ""C"" locale yields
the fastest collating sequence and is
currently the only alternative. The sort
is always case insensitive.
/M[EMORY] kilobytes Specifies amount of main memory to use for
the sort, in kilobytes. The memory size is
always constrained to be a minimum of 160
kilobytes. If the memory size is specified
the exact amount will be used for the sort,
regardless of how much main memory is
available.

The best performance is usually achieved by
not specifying a memory size. By default the
sort will be done with one pass (no temporary
file) if it fits in the default maximum
memory size, otherwise the sort will be done
in two passes (with the partially sorted data
being stored in a temporary file) such that
the amounts of memory used for both the sort
and merge passes are equal. The default
maximum memory size is 90% of available main
memory if both the input and output are
files, and 45% of main memory otherwise.
/REC[ORD_MAXIMUM] characters Specifies the maximum number of characters
in a record (default 4096, maximum 65535).
/R[EVERSE] Reverses the sort order; that is,
sorts Z to A, then 9 to 0.
[drive1:][path1]filename1 Specifies the file to be sorted. If not
specified, the standard input is sorted.
Specifying the input file is faster than
redirecting the same file as standard input.
/T[EMPORARY]
[drive2:][path2] Specifies the path of the directory to hold
the sort's working storage, in case the data
does not fit in main memory. The default is
to use the system temporary directory.
/O[UTPUT]
[drive3:][path3]filename3 Specifies the file where the sorted input is
to be stored. If not specified, the data is
written to the standard output. Specifying
the output file is faster than redirecting
standard output to the same file.


You're right. There is no "-unique" option to the windows sort command.
There must be a unique option inside of a windows text editor somewhere!
Reply With Quote
  #7  
Old 08-02-2008, 04:09 PM
Yrrah
Guest
 
Default Re: Windows freeware unique sort technique for large text files (hosts)

Donita Luddington:

> Is there another free way to sort uniquely a large windows text file?


Notepad++ with the TextFX plug-in. I am not sure about the maximun
file size it can handle though.
http://notepad-plus.sourceforge.net/uk/site.htm

Yrrah
Reply With Quote
  #8  
Old 08-02-2008, 04:19 PM
Guy
Guest
 
Default Re: Windows freeware unique sort technique for large text files (hosts)

Donita Luddington wrote:

> The syntax is:
> <esc>: (begin a windows vim 7.1 command)
>!sort -u (run the following command "sort -u" inside of vim freeware)
>



If you want to stick with vi try...

:%!sort |uniq

(yes, there is space between external "sort" and the pipe "|")

But you may still have "semi-duplicate" lines.

e.g.

0.0.0.0 example.com
0.0.0.0 example.com
0.0.0.0 example.com #bad domain

--
OpenPGP: id=18795161E22D3905; preference=signencrypt;
url=http://guysalias.fateback.com/pgpkeys.txt
Reply With Quote
  #9  
Old 08-02-2008, 04:25 PM
Donita Luddington
Guest
 
Default Re: Windows freeware unique sort technique for large text files (hosts)

On Sat, 2 Aug 2008 12:35:29 -0700, Donita Luddington wrote:

> There must be a way to uniquify a file from within vi freeware on windows.


I found these pointers for removing duplicate lines in vi
http://rayninfo.co.uk/vimtips.html
:%s/^\(.*\)\n\1$/\1/ : delete duplicate lines

http://www.vim.org/tips/tip.php?tip_id=305
:%s/^\(.*\)\n\1/\1$/ delete duplicate lines

But, executed in vim 7.1 on Windows, this syntax returns an error.
I'll try some more after I figure out what it's trying to do.

I think it's trying to do this:
:%s/ ==> search the whole file for
^ ==> the beginning of a line
\(.*\) ==> any character and any number of characters after that
\n ==> a newline character
\1 ==> what you found the first time (ie, the whole line)
/ ==> replace it with
\1 ==> what you found that first time (again, the original line)
$ ==> I'm not sure why this "end of line" character is needed
/ ==> I'm not sure why this "end of command" character is needed

Before I debug why removal of duplicate lines in Lemmy and Vim freeware
isn't working, can you tell me if I have the command to do so right?

Reply With Quote
  #10  
Old 08-02-2008, 04:27 PM
B. R. 'BeAr' Ederson
Guest
 
Default Re: Windows freeware unique sort technique for large text files (hosts)

On Sat, 2 Aug 2008 12:35:29 -0700, Donita Luddington wrote:

> If I had Linux installed, I'd run that in a flash but I don't have any
> Linux and, I guess, it would take many hours to install so I'm looking for
> a native windows freeware solution instead.
>
> There must be a way to uniquify a file from within vi freeware on windows.


As I wrote: Get an appropriate sort command. You can even replace
the Windows one, as long as you don't need exactly this implementation
for some purpose. (Better set up a dedicated UnxUtils directory with
entry in the search path, though.)

Btw.: All the tools John suggested are part of the UnxUtils package.
So it shouldn't take "hours" to follow his good advice... ;-)

Notepad++ - as Yrrah suggested - should do fine in general. But it
gets awfully slow when sorting huge files. That's the case with
most internal editor solutions.

The UnxUtils variant (either sort -u alone or (better) piped
through tr, beforehand, as John suggested) should take barely
a second.

When the UnxUtils sort will be found in search path *before* the
Windows one (or *instead*, if you just replace it), your Vim
command should do fine. (If it has to be an "editor solution".)

BeAr
--
================================================== =========================
= What do you mean with: "Perfection is always an illusion"? =
================================================== =============--(Oops!)===
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 05:25 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.