blocks|key|856965|text|您所需要做的就是使用file对象作为迭代器。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|856966|for+line+in+open("log.txt"):
++++do_something_with(line)|code-block|syntax|javascript|856967|在最新的Python版本中使用上下文管理器甚至更好。|856968|with+open("log.txt")+as+fileobject:
++++for+line+in+fileobject:
++++++++do_something_with(line)|856969|这也将自动关闭该文件。|856970|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|Q|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|R|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|S|8|@]|9|@]|A|$]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

All you need to do is use the file object as an iterator.

<pre><code>for line in open("log.txt"):
 do_something_with(line)
</code></pre>

Even better is using context manager in recent Python versions.

<pre><code>with open("log.txt") as fileobject:
 for line in fileobject:
 do_something_with(line)
</code></pre>

This will automatically close the file as well.

blocks|key|639710|text|一种老式的方法：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|639711|fh+=+open(file_name,+'rt')
line+=+fh.readline()
while+line:
++++#+do+stuff+with+line
++++line+=+fh.readline()
fh.close()|code-block|syntax|javascript|639712|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

An old school approach:

<pre><code>fh = open(file_name, 'rt')
line = fh.readline()
while line:
 # do stuff with line
 line = fh.readline()
fh.close()
</code></pre>

blocks|key|639625|text|您最好改用迭代器。相关链接：http://docs.python.org/library/fileinput.html|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|639626|从文档中：|639627|import+fileinput
for+line+in+fileinput.input("filename"):
++++process(line)|code-block|syntax|javascript|639628|这将避免一次将整个文件复制到内存中。|639629|entityMap|0|LINK|mutability|MUTABLE|url|http://docs.python.org/library/fileinput.html^0|E|19|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@]|9|@$A|V|B|W|1|X]]|C|$]]|$1|D|3|E|5|6|7|Y|8|@]|9|@]|C|$]]|$1|F|3|G|5|H|7|Z|8|@]|9|@]|C|$I|J]]|$1|K|3|L|5|6|7|10|8|@]|9|@]|C|$]]|$1|M|3|-4|5|6|7|11|8|@]|9|@]|C|$]]]|N|$O|$5|P|Q|R|C|$S|T]]]]

You are better off using an iterator instead. Relevant: <a href="http://docs.python.org/library/fileinput.html" rel="noreferrer">http://docs.python.org/library/fileinput.html</a>

From the docs:

<pre><code>import fileinput
for line in fileinput.input("filename"):
 process(line)
</code></pre>

This will avoid copying the whole file into memory at once.

blocks|key|857401|text|如果文件中没有换行符，请执行以下操作：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|857402|with+open('large_text.txt')+as+f:
++while+True:
++++c+=+f.read(1024)
++++if+not+c:
++++++break
++++print(c)|code-block|syntax|javascript|857403|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Here's what you do if you dont have newlines in the file:

<pre><code>with open('large_text.txt') as f:
 while True:
 c = f.read(1024)
 if not c:
 break
 print(c)
</code></pre>

blocks|key|857355|text|请尝试以下操作：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|857356|with+open('filename','r',buffering=100000)+as+f:
++++for+line+in+f:
++++++++print+line|code-block|syntax|javascript|857357|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Please try this:

<pre><code>with open('filename','r',buffering=100000) as f:
 for line in f:
 print line
</code></pre>

blocks|key|639766|text|我不敢相信这会像@john-la-rooy的答案所说的那样简单。因此，我使用逐行读取和写入的方法重新创建了cp命令。速度快得离谱。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|639767|#!/usr/bin/env+python3.6

import+sys

with+open(sys.argv[2],+'w')+as+outfile:
++++with+open(sys.argv[1])+as+infile:
++++++++for+line+in+infile:
++++++++++++outfile.write(line)|code-block|syntax|javascript|639768|entityMap^0|1H|2|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|P|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Q|8|@]|D|@]|E|$]]]|L|$]]

I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the <code>cp</code> command using line by line reading and writing. It's CRAZY FAST.

<pre><code>#!/usr/bin/env python3.6

import sys

with open(sys.argv[2], 'w') as outfile:
 with open(sys.argv[1]) as infile:
 for line in infile:
 outfile.write(line)
</code></pre>

blocks|key|857241|text|在过去的6年中，blaze项目取得了长足的进步。它有一个简单的API，涵盖了pandas功能的一个有用的子集。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|857242|dask.dataframe在内部负责分块，支持许多并行操作，并允许您轻松地将切片导出回pandas以进行内存操作。|857243|import+dask.dataframe+as+dd

df+=+dd.read_csv('filename.csv')
df.head(10)++#+return+first+10+rows
df.tail(10)++#+return+last+10+rows

#+iterate+rows
for+idx,+row+in+df.iterrows():
++++...

#+group+by+my_field+and+return+mean
df.groupby(df.my_field).value.mean().compute()

#+slice+by+column
df[df.my_field=='XYZ'].compute()|code-block|syntax|javascript|857244|entityMap|0|LINK|mutability|MUTABLE|url|http://blaze.pydata.org/|1|http://dask.readthedocs.io/en/latest/dataframe.html^0|8|5|0|0|0|E|1|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@]|9|@$A|V|B|W|1|X]]|C|$]]|$1|D|3|E|5|6|7|Y|8|@]|9|@$A|Z|B|10|1|11]]|C|$]]|$1|F|3|G|5|H|7|12|8|@]|9|@]|C|$I|J]]|$1|K|3|-4|5|6|7|13|8|@]|9|@]|C|$]]]|L|$M|$5|N|O|P|C|$Q|R]]|S|$5|N|O|P|C|$Q|T]]]]

The <a href="http://blaze.pydata.org/" rel="nofollow noreferrer">blaze</a> project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.

<a href="http://dask.readthedocs.io/en/latest/dataframe.html" rel="nofollow noreferrer">dask.dataframe</a> takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.

<pre><code>import dask.dataframe as dd

df = dd.read_csv('filename.csv')
df.head(10) # return first 10 rows
df.tail(10) # return last 10 rows

# iterate rows
for idx, row in df.iterrows():
 ...

# group by my_field and return mean
df.groupby(df.my_field).value.mean().compute()

# slice by column
df[df.my_field=='XYZ'].compute()
</code></pre>

blocks|key|857143|text|这个怎么样？将文件分成块，然后逐行读取，因为当您读取文件时，您的操作系统将缓存下一行。如果逐行读取文件，则没有有效利用缓存的信息。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|857144|相反，将文件分成块，并将整个块加载到内存中，然后进行处理。|857145|def+chunks(file,size=1024):
++++while+1:

++++++++startat=fh.tell()
++++++++print+startat+#file's+object+current+position+from+the+start
++++++++fh.seek(size,1)+#offset+from+current+postion+-->1
++++++++data=fh.readline()
++++++++yield+startat,fh.tell()-startat+#doesnt+store+whole+list+in+memory
++++++++if+not+data:
++++++++++++break
if+os.path.isfile(fname):
++++try:
++++++++fh=open(fname,'rb')+
++++except+IOError+as+e:+#file+-->+permission+denied
++++++++print+"I/O+error({0}):+{1}".format(e.errno,+e.strerror)
++++except+Exception+as+e1:+#handle+other+exceptions+such+as+attribute+errors
++++++++print+"Unexpected+error:+{0}".format(e1)
++++for+ele+in+chunks(fh):
++++++++fh.seek(ele[0])#startat
++++++++data=fh.read(ele[1])#endat
++++++++print+data|code-block|syntax|javascript|857146|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|L|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|M|8|@]|9|@]|A|$G|H]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

How about this?
Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information. 

Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.

<pre><code>def chunks(file,size=1024):
 while 1:

 startat=fh.tell()
 print startat #file's object current position from the start
 fh.seek(size,1) #offset from current postion --&gt;1
 data=fh.readline()
 yield startat,fh.tell()-startat #doesnt store whole list in memory
 if not data:
 break
if os.path.isfile(fname):
 try:
 fh=open(fname,'rb') 
 except IOError as e: #file --&gt; permission denied
 print "I/O error({0}): {1}".format(e.errno, e.strerror)
 except Exception as e1: #handle other exceptions such as attribute errors
 print "Unexpected error: {0}".format(e1)
 for ele in chunks(fh):
 fh.seek(ele[0])#startat
 data=fh.read(ele[1])#endat
 print data
</code></pre>

blocks|key|857199|text|谢谢!我最近转换到了python3，并对使用readline(0)读取大文件感到沮丧。这就解决了问题。但是为了得到每一行代码，我不得不做几个额外的步骤。每一行前面都有一个"b'“，我猜它是二进制格式的。使用"decode(utf-8)“将其更改为ascii。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|857200|然后，我必须在每行中间删除一个"=\n“。|857201|然后，我在新的行处拆分行。|857202|b_data=(fh.read(ele[1]))#endat+This+is+one+chunk+of+ascii+data+in+binary+format
++++++++a_data=((binascii.b2a_qp(b_data)).decode('utf-8'))+#Data+chunk+in+'split'+ascii+format
++++++++data_chunk+=+(a_data.replace('=\n','').strip())+#Splitting+characters+removed
++++++++data_list+=+data_chunk.split('\n')++#List+containing+lines+in+chunk
++++++++#print(data_list,'\n')
++++++++#time.sleep(1)
++++++++for+j+in+range(len(data_list)):+#iterate+through+data_list+to+get+each+item+
++++++++++++i+%2B=+1
++++++++++++line_of_data+=+data_list[j]
++++++++++++print(line_of_data)|code-block|syntax|javascript|857203|这是在Arohi的代码中"print+data“上方开始的代码。|857204|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|P|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|Q|8|@]|9|@]|A|$]]|$1|F|3|G|5|H|7|R|8|@]|9|@]|A|$I|J]]|$1|K|3|L|5|6|7|S|8|@]|9|@]|A|$]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.

Then I had to remove a "=\n" in the middle of each line.

Then I split the lines at the new line.

<pre><code>b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
 a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
 data_chunk = (a_data.replace('=\n','').strip()) #Splitting characters removed
 data_list = data_chunk.split('\n') #List containing lines in chunk
 #print(data_list,'\n')
 #time.sleep(1)
 for j in range(len(data_list)): #iterate through data_list to get each item 
 i += 1
 line_of_data = data_list[j]
 print(line_of_data)
</code></pre>

Here is the code starting just above "print data" in Arohi's code.

blocks|key|640143|text|这是我找到的最好的解决方案，我在330MB的文件上尝试了一下。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|640144|lineno+=+500
line_length+=+8
with+open('catfour.txt',+'r')+as+file:
++++file.seek(lineno+*+(line_length+%2B+2))
++++print(file.readline(),+end='')|code-block|syntax|javascript|640145|其中line_length是单行中的字符数。例如，"abcd“的行长为4。|640146|我在行长中添加了2，以跳过'\n‘字符并移动到下一个字符。|640147|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|6|7|P|8|@]|9|@]|A|$]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

The best solution I found regarding this, and I tried it on 330 MB file.

<pre><code>lineno = 500
line_length = 8
with open('catfour.txt', 'r') as file:
 file.seek(lineno * (line_length + 2))
 print(file.readline(), end='')
</code></pre>

Where line_length is the number of characters in a single line. For example "abcd" has line length 4.

I have added 2 in line length to skip the '\n' character and move to the next character.

blocks|key|857611|text|我意识到这个问题在很久以前就得到了回答，但这里有一种并行执行的方法，而不会消耗内存开销(如果您尝试将每一行都激发到池中，情况就会是这样)。显然，将readJSON_line2函数替换为合理的函数-这只是为了说明这一点！|type|unstyled|depth|inlineStyleRanges|entityRanges|data|857612|加速将取决于文件大小和您对每行所做的操作-但最糟糕的情况是，对于一个小文件，只使用JSON阅读器读取它，我看到了与ST类似的性能，具有以下设置。|857613|希望对外面的人有用：|857614|def+readJSON_line2(linesIn):
++#Function+for+reading+a+chunk+of+json+lines
+++'''
+++Note,+this+function+is+nonsensical.+A+user+would+never+use+the+approach+suggested+
+++for+reading+in+a+JSON+file,+
+++its+role+is+to+evaluate+the+MT+approach+for+full+line+by+line+processing+to+both+
+++increase+speed+and+reduce+memory+overhead
+++'''
+++import+json

+++linesRtn+=+[]
+++for+lineIn+in+linesIn:

+++++++if+lineIn.strip()+!=+0:
+++++++++++lineRtn+=+json.loads(lineIn)
+++++++else:
+++++++++++lineRtn+=+""
++++++++
+++++++linesRtn.append(lineRtn)

+++return+linesRtn




#+-------------------------------------------------------------------
if+__name__+==+"__main__":
+++import+multiprocessing+as+mp

+++path1+=+"C:\\user\\Documents\\"
+++file1+=+"someBigJson.json"

+++nBuffer+=+20*nCPUs++#+How+many+chunks+are+queued+up+(so+cpus+aren't+waiting+on+processes+spawning)
+++nChunk+=+1000+#+How+many+lines+are+in+each+chunk
+++#Both+of+the+above+will+require+balancing+speed+against+memory+overhead

+++iJob+=+0++#Tracker+for+SMP+jobs+submitted+into+pool
+++iiJob+=+0++#Tracker+for+SMP+jobs+extracted+back+out+of+pool

+++jobs+=+[]++#SMP+job+holder
+++MTres3+=+[]++#Final+result+holder
+++chunk+=+[]++
+++iBuffer+=+0+#+Buffer+line+count
+++with+open(path1%2Bfile1)+as+f:
++++++for+line+in+f:
++++++++++++
++++++++++#Send+to+the+chunk
++++++++++if+len(chunk)+<+nChunk:
++++++++++++++chunk.append(line)
++++++++++else:
++++++++++++++#Chunk+full
++++++++++++++#Don't+forget+to+add+the+current+line+to+chunk
++++++++++++++chunk.append(line)
++++++++++++++++
++++++++++++++#Then+add+the+chunk+to+the+buffer+(submit+to+SMP+pool)++++++++++++++++++
++++++++++++++jobs.append(pool.apply_async(readJSON_line2,+args=(chunk,)))
++++++++++++++iJob+%2B=1
++++++++++++++iBuffer+%2B=1
++++++++++++++#Clear+the+chunk+for+the+next+batch+of+entries
++++++++++++++chunk+=+[]
++++++++++++++++++++++++++++
++++++++++#Buffer+is+full,+any+more+chunks+submitted+would+cause+undue+memory+overhead
++++++++++#(Partially)+empty+the+buffer
++++++++++if+iBuffer+>=+nBuffer:
++++++++++++++temp1+=+jobs[iiJob].get()
++++++++++++++for+rtnLine1+in+temp1:
++++++++++++++++++MTres3.append(rtnLine1)
++++++++++++++iBuffer+-=1
++++++++++++++iiJob%2B=1
++++++++++++
++++++#Submit+the+last+chunk+if+it+exists+(as+it+would+not+have+been+submitted+to+SMP+buffer)
++++++if+chunk:
++++++++++jobs.append(pool.apply_async(readJSON_line2,+args=(chunk,)))
++++++++++iJob+%2B=1
++++++++++iBuffer+%2B=1

++++++#And+gather+up+the+last+of+the+buffer,+including+the+final+chunk
++++++while+iiJob+<+iJob:
++++++++++temp1+=+jobs[iiJob].get()
++++++++++for+rtnLine1+in+temp1:
++++++++++++++MTres3.append(rtnLine1)
++++++++++iiJob%2B=1

+++#Cleanup
+++del+chunk,+jobs,+temp1
+++pool.close()|code-block|syntax|javascript|857615|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|N|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|O|8|@]|9|@]|A|$]]|$1|F|3|G|5|H|7|P|8|@]|9|@]|A|$I|J]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

I realise this has been answered quite some time ago, but here is a way of doing it in parallel without killing your memory overhead (which would be the case if you tried to fire each line into the pool). Obviously swap the readJSON_line2 function out for something sensible - its just to illustrate the point here!
Speedup will depend on filesize and what you are doing with each line - but worst case scenario for a small file and just reading it with the JSON reader, I'm seeing similar performance to the ST with the settings below.
Hopefully useful to someone out there:
<pre><code>def readJSON_line2(linesIn):
 #Function for reading a chunk of json lines
 '''
 Note, this function is nonsensical. A user would never use the approach suggested 
 for reading in a JSON file, 
 its role is to evaluate the MT approach for full line by line processing to both 
 increase speed and reduce memory overhead
 '''
 import json

 linesRtn = []
 for lineIn in linesIn:

 if lineIn.strip() != 0:
 lineRtn = json.loads(lineIn)
 else:
 lineRtn = &quot;&quot;
 
 linesRtn.append(lineRtn)

 return linesRtn




# -------------------------------------------------------------------
if __name__ == &quot;__main__&quot;:
 import multiprocessing as mp

 path1 = &quot;C:\\user\\Documents\\&quot;
 file1 = &quot;someBigJson.json&quot;

 nBuffer = 20*nCPUs # How many chunks are queued up (so cpus aren't waiting on processes spawning)
 nChunk = 1000 # How many lines are in each chunk
 #Both of the above will require balancing speed against memory overhead

 iJob = 0 #Tracker for SMP jobs submitted into pool
 iiJob = 0 #Tracker for SMP jobs extracted back out of pool

 jobs = [] #SMP job holder
 MTres3 = [] #Final result holder
 chunk = [] 
 iBuffer = 0 # Buffer line count
 with open(path1+file1) as f:
 for line in f:
 
 #Send to the chunk
 if len(chunk) &lt; nChunk:
 chunk.append(line)
 else:
 #Chunk full
 #Don't forget to add the current line to chunk
 chunk.append(line)
 
 #Then add the chunk to the buffer (submit to SMP pool) 
 jobs.append(pool.apply_async(readJSON_line2, args=(chunk,)))
 iJob +=1
 iBuffer +=1
 #Clear the chunk for the next batch of entries
 chunk = []
 
 #Buffer is full, any more chunks submitted would cause undue memory overhead
 #(Partially) empty the buffer
 if iBuffer &gt;= nBuffer:
 temp1 = jobs[iiJob].get()
 for rtnLine1 in temp1:
 MTres3.append(rtnLine1)
 iBuffer -=1
 iiJob+=1
 
 #Submit the last chunk if it exists (as it would not have been submitted to SMP buffer)
 if chunk:
 jobs.append(pool.apply_async(readJSON_line2, args=(chunk,)))
 iJob +=1
 iBuffer +=1

 #And gather up the last of the buffer, including the final chunk
 while iiJob &lt; iJob:
 temp1 = jobs[iiJob].get()
 for rtnLine1 in temp1:
 MTres3.append(rtnLine1)
 iiJob+=1

 #Cleanup
 del chunk, jobs, temp1
 pool.close()
</code></pre>

blocks|key|857458|text|当你想并行工作，只读数据块，但用新的行来保持数据的整洁时，这可能很有用。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|857459|def+readInChunks(fileObj,+chunkSize=1024):
++++while+True:
++++++++data+=+fileObj.read(chunkSize)
++++++++if+not+data:
++++++++++++break
++++++++while+data[-1:]+!=+'\n':
++++++++++++data%2B=fileObj.read(1)
++++++++yield+data|code-block|syntax|javascript|857460|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

This might be useful when you want to work in parallel and read only chunks of data but keep it clean with new lines.

<pre><code>def readInChunks(fileObj, chunkSize=1024):
 while True:
 data = fileObj.read(chunkSize)
 if not data:
 break
 while data[-1:] != '\n':
 data+=fileObj.read(1)
 yield data
</code></pre>

I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use <code>readlines()</code> because it will create a very large list in the memory.

How will the code below work for this case? Is <code>xreadlines</code> itself reading one by one into memory? Is the generator expression needed?

<pre><code>f = (line for line in open("log.txt").xreadlines()) # how much is loaded in memory?

f.next() 
</code></pre>

Plus, what can I do to read this in reverse order, just as the Linux <code>tail</code> command?

I found:

<a href="http://code.google.com/p/pytailer/" rel="noreferrer">http://code.google.com/p/pytailer/</a>

and

"<a href="https://stackoverflow.com/questions/5896079/python-head-tail-and-backward-read-by-lines-of-a-text-file/5896210#5896210">python head, tail and backward read by lines of a text file</a>"

Both worked very well!

How can I read large text files in Python, line by line, without loading it into memory?

我需要逐行读取一个大文件。假设该文件超过5 5GB，我需要读取每一行，但显然我不想使用readlines()，因为它将在内存中创建一个非常大的列表。下面的代码在这种情况下是如何工作的？xreadlines本身是不是一个接一个地读入内存？是否需要生成器表达式？f = (line for line in open("log.txt").xreadlines())  # how much is load

问如何在Python中逐行读取大型文本文件，而不将其加载到内存中？
EN

回答 12

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Python中逐行读取大型文本文件，而不将其加载到内存中？EN

回答 12

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Python中逐行读取大型文本文件，而不将其加载到内存中？
EN