entityMap|0|type|LINK|mutability|MUTABLE|data|url|https://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas?rq=1|1|https://docs.dask.org/en/latest/|2|https://github.com/ipython/ipython/issues?utf8=%25E2%259C%2593&q=is%253Aissue%2520is%253Aopen%2520memory|3|https://github.com/ipython/ipython/issues/10082|4|https://github.com/ipython/ipython/issues/10117|5|https://github.com/davidhalter/jedi/issues/931|blocks|key|ei60m|text|我也在Jupyter+Lab上使用非常大的数据集(3+3GB)，在Labs上也遇到了同样的问题。不清楚您是否需要保持对转换前数据的访问，如果不需要，我已经开始使用未使用的大型数据帧变量的del。del从你的内存中删除变量。编辑**：我遇到的问题有多种可能性。当我使用远程jupyter实例时，以及在spyder中，当我执行大型转换时，我经常会遇到这种情况。|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|5d6sn|例如：|9r13e|df+=+pd.read('some_giant_dataframe')+#+or+whatever+your+import+is
new_df+=+my_transform(df)
del+df+#+if+unneeded.|code-block|syntax|javascript|5s5dt|杰克，你可能也会发现这个thread+on+large+data+workflows很有帮助。我一直在研究Dask来帮助解决内存存储问题。|co0vo|我在spyder和jupyter中注意到，当运行大内存控制台时，在另一个控制台上工作时通常会发生freezeup。至于为什么它只是冻结而不是崩溃，我认为这与内核有关。有几个内存issues+open+in+the+IPython+github+-+#10082和#10117似乎最相关。一位用户here建议在jedi中禁用制表符补全功能或更新绝地武士。|g5ur|在10117中，他们建议检查get_ipython().history_manager.db_log_output的输出。我有相同的问题，我的设置是正确的，但它值得检查|4qdff^0|2L|3|2P|3|0|0|0|C|U|0|1H|4|1|0|4A|4|2G|X|2|3H|5|3|3O|5|4|43|4|5|0|E|17|0^^$0|$1|$2|3|4|5|6|$7|8]]|9|$2|3|4|5|6|$7|A]]|B|$2|3|4|5|6|$7|C]]|D|$2|3|4|5|6|$7|E]]|F|$2|3|4|5|6|$7|G]]|H|$2|3|4|5|6|$7|I]]]|J|@$K|L|M|N|2|O|P|1A|Q|@$R|1B|S|1C|T|U]|$R|1D|S|1E|T|U]]|V|@]|6|$]]|$K|W|M|X|2|O|P|1F|Q|@]|V|@]|6|$]]|$K|Y|M|Z|2|10|P|1G|Q|@]|V|@]|6|$11|12]]|$K|13|M|14|2|O|P|1H|Q|@]|V|@$R|1I|S|1J|K|1K]|$R|1L|S|1M|K|1N]]|6|$]]|$K|15|M|16|2|O|P|1O|Q|@$R|1P|S|1Q|T|U]]|V|@$R|1R|S|1S|K|1T]|$R|1U|S|1V|K|1W]|$R|1X|S|1Y|K|1Z]|$R|20|S|21|K|22]]|6|$]]|$K|17|M|18|2|O|P|23|Q|@$R|24|S|25|T|U]]|V|@]|6|$]]|$K|19|M|-4|2|O|P|26|Q|@]|V|@]|6|$]]]]

I also work with very large datasets (3GB) on Jupyter Lab and have been experiencing the same issue on Labs. 
It's unclear if you need to maintain access to the pre-transformed data, if not, I've started using <code>del</code> of unused large dataframe variables if I don't need them. <code>del</code> removes variables from your memory. Edit** : there a multiple possibilities for the issue I'm encountering. I encounter this more often when I'm using a remote jupyter instance, and in spyder as well when I'm perfoming large transformations.

e.g.

<pre><code>df = pd.read('some_giant_dataframe') # or whatever your import is
new_df = my_transform(df)
del df # if unneeded.
</code></pre>

Jakes you may also find this <a href="https://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas?rq=1">thread on large data workflows</a> helpful. I've been looking into <a href="https://docs.dask.org/en/latest/" rel="nofollow noreferrer">Dask</a> to help with memory storage.

I've noticed in spyder and jupyter that the freezeup will usually happen when working in another console while a large memory console runs. As to why it just freezes up instead of crashing out, I think this has something to do with the kernel. There are a couple memory <a href="https://github.com/ipython/ipython/issues?utf8=%E2%9C%93&amp;q=is%3Aissue%20is%3Aopen%20memory" rel="nofollow noreferrer">issues open in the IPython github</a> - #<a href="https://github.com/ipython/ipython/issues/10082" rel="nofollow noreferrer">10082</a> and #<a href="https://github.com/ipython/ipython/issues/10117" rel="nofollow noreferrer">10117</a> seem most relevant. One user <a href="https://github.com/davidhalter/jedi/issues/931" rel="nofollow noreferrer">here</a> suggest disabling tab completion in <code>jedi</code> or updating jedi. 

In 10117 they propose checking the output of <code>get_ipython().history_manager.db_log_output</code>. I have the same issues and my setting is correct, but it's worth checking

entityMap|blocks|key|4pb9t|text|没有理由查看大型数据帧的全部输出。查看或操作大型数据帧将不必要地占用大量计算机资源。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|dg7mt|你正在做的任何事情都可以缩影完成。当数据帧很小时，编码和操作数据要容易得多。处理大数据的最佳方法是创建一个新的数据帧，该数据帧只需要大数据帧的一小部分或小样本。然后，您可以探索数据并在较小的数据框上进行编码。一旦你探索了数据并让你的代码工作，那么只需在更大的数据框架上使用该代码。|aijg3|最简单的方法是使用head()函数从数据帧中获取前n行。head函数只打印n行。您可以通过在大数据框上使用head函数来创建一个小型数据框。下面我选择选择前50行，并将它们的值传递给small_df。这里假设BigData是一个来自您为此项目打开的库的数据文件。|9ehcc|library(namedPackage)+

df+<-+data.frame(BigData)++++++++++++++++#++Assign+big+data+to+df
small_df+<-+head(df,+50)+++++++++#++Assign+the+first+50+rows+to+small_df|code-block|syntax|javascript|3gsjc|这在大多数情况下都是有效的，但有时大数据帧带有预先排序的变量或已经分组的变量。如果大数据是这样的，那么您需要从大数据中随机抽取行。然后使用下面的代码：|db3mc|df+<-+data.frame(BigData)

set.seed(1016)++++++++++++++++++++++++++++++++++++++++++#+set+your+own+seed

df_small+<-+df[sample(nrow(df),replace=F,size=.03*nrow(df)),]+++++#+samples+3%25+rows
df_small+++++++++++++++++++++++++++++++++++++++++++++++++++++++++#+much+smaller+df|3ml6i^0|0|0|0|0|0|0^^$0|$]|1|@$2|3|4|5|6|7|8|Q|9|@]|A|@]|B|$]]|$2|C|4|D|6|7|8|R|9|@]|A|@]|B|$]]|$2|E|4|F|6|7|8|S|9|@]|A|@]|B|$]]|$2|G|4|H|6|I|8|T|9|@]|A|@]|B|$J|K]]|$2|L|4|M|6|7|8|U|9|@]|A|@]|B|$]]|$2|N|4|O|6|I|8|V|9|@]|A|@]|B|$J|K]]|$2|P|4|-4|6|7|8|W|9|@]|A|@]|B|$]]]]

There is no reason to view the entire output of a large dataframe. Viewing or manipulating large dataframes will unnecessarily use large amounts of your computer resources. 

Whatever you are doing can be done in miniature. It's far easier working on coding and manipulating data when the data frame is small. The best way to work with big data is to create a new data frame that takes only small portion or a small sample of the large data frame. Then you can explore the data and do your coding on the smaller data frame. Once you have explored the data and get your code working, then just use that code on the larger data frame. 

The easiest way is simply take the first n, number of the first rows from the data frame using the head() function. The head function prints only n, number of rows. You can create a mini data frame by using the head function on the large data frame. Below I chose to select the first 50 rows and pass their value to the small_df. This assumes the BigData is a data file that comes from a library you opened for this project.

<pre><code>library(namedPackage) 

df &lt;- data.frame(BigData) # Assign big data to df
small_df &lt;- head(df, 50) # Assign the first 50 rows to small_df
</code></pre>

This will work most of the time, but sometimes the big data frame comes with presorted variables or with variables already grouped. If the big data is like this, then you would need to take a random sample of the rows from the big data. Then use the code that follows:

<pre><code>df &lt;- data.frame(BigData)

set.seed(1016) # set your own seed

df_small &lt;- df[sample(nrow(df),replace=F,size=.03*nrow(df)),] # samples 3% rows
df_small # much smaller df
</code></pre>

entityMap|0|type|LINK|mutability|MUTABLE|data|url|https://towardsdatascience.com/why-and-how-to-use-pandas-with-large-data-9594dda2ea4c|blocks|key|2hvku|text|我认为你应该使用块。就像这样：|unstyled|depth|inlineStyleRanges|entityRanges|f8fn8|df_chunk+=+pd.read_csv(r'../input/data.csv',+chunksize=1000000)
chunk_list+=+[]++#+append+each+chunk+df+here+

#+Each+chunk+is+in+df+format
for+chunk+in+df_chunk:++
++++#+perform+data+filtering+
++++chunk_filter+=+chunk_preprocessing(chunk)

++++#+Once+the+data+filtering+is+done,+append+the+chunk+to+list
++++chunk_list.append(chunk_filter)

#+concat+the+list+into+dataframe+
df_concat+=+pd.concat(chunk_list)|code-block|syntax|javascript|fu1l0|有关更多信息，请查看：https://towardsdatascience.com/why-and-how-to-use-pandas-with-large-data-9594dda2ea4c|offset|length|602kq|我建议不要再追加列表(很可能RAM会再次超载)。你应该在那个for循环中完成你的工作。|ca44a^0|0|0|B|2D|0|0|0^^$0|$1|$2|3|4|5|6|$7|8]]]|9|@$A|B|C|D|2|E|F|U|G|@]|H|@]|6|$]]|$A|I|C|J|2|K|F|V|G|@]|H|@]|6|$L|M]]|$A|N|C|O|2|E|F|W|G|@]|H|@$P|X|Q|Y|A|Z]]|6|$]]|$A|R|C|S|2|E|F|10|G|@]|H|@]|6|$]]|$A|T|C|-4|2|E|F|11|G|@]|H|@]|6|$]]]]

I think you should use chunks. Like that:

<pre><code>df_chunk = pd.read_csv(r'../input/data.csv', chunksize=1000000)
chunk_list = [] # append each chunk df here 

# Each chunk is in df format
for chunk in df_chunk: 
 # perform data filtering 
 chunk_filter = chunk_preprocessing(chunk)

 # Once the data filtering is done, append the chunk to list
 chunk_list.append(chunk_filter)

# concat the list into dataframe 
df_concat = pd.concat(chunk_list)
</code></pre>

For more information check it out: <a href="https://towardsdatascience.com/why-and-how-to-use-pandas-with-large-data-9594dda2ea4c" rel="nofollow noreferrer">https://towardsdatascience.com/why-and-how-to-use-pandas-with-large-data-9594dda2ea4c</a>

I suggest don't append a list again(probably the RAM will overload again). You should finish your job in that for loop.

entityMap|0|type|LINK|mutability|MUTABLE|data|url|https://towardsdatascience.com/how-to-setup-your-jupyterlab-project-environment-74909dade29b|1|https://medium.com/fundbox-engineering/overview-d3759e83969c|2|https://stackoverflow.com/questions/46672578/what-unit-does-the-docker-run-memory-option-expect|blocks|key|69fuc|text|对于这个问题，最健壮的解决方案绝对是使用Docker容器。您可以指定将多少内存分配给Jupyter，如果容器耗尽内存，这根本不是什么大问题(只需记住经常保存，但这是不言而喻的)。|unstyled|depth|inlineStyleRanges|entityRanges|7l3lf|This+blog会带你走完大部分的路。这里还有一些关于设置Jupyter+Lab的说明，这些说明来自免费提供的、官方维护的Jupyter镜像：|offset|length|fs43c|db4ri|然后，您可以将本教程中所述的docker+run命令修改为(例如，对于3+3GB)：|style|CODE|ar1q4|docker+run+--memory+3g+<other+docker+run+args+from+tutorial+here>|code-block|syntax|javascript|6l109|有关坞站内存选项的语法，请参阅以下问题：|3hdqh|What+unit+does+the+docker+run+"--memory"+option+expect?|4o4uu^0|0|0|9|0|0|0|1O|1|0|E|A|0|0|0|0|1J|2|0^^$0|$1|$2|3|4|5|6|$7|8]]|9|$2|3|4|5|6|$7|A]]|B|$2|3|4|5|6|$7|C]]]|D|@$E|F|G|H|2|I|J|15|K|@]|L|@]|6|$]]|$E|M|G|N|2|I|J|16|K|@]|L|@$O|17|P|18|E|19]]|6|$]]|$E|Q|G|A|2|I|J|1A|K|@]|L|@$O|1B|P|1C|E|1D]]|6|$]]|$E|R|G|S|2|I|J|1E|K|@$O|1F|P|1G|T|U]]|L|@]|6|$]]|$E|V|G|W|2|X|J|1H|K|@]|L|@]|6|$Y|Z]]|$E|10|G|11|2|I|J|1I|K|@]|L|@]|6|$]]|$E|12|G|13|2|I|J|1J|K|@]|L|@$O|1K|P|1L|E|1M]]|6|$]]|$E|14|G|-4|2|I|J|1N|K|@]|L|@]|6|$]]]]

Absolutely the most robust solution to this problem would be to use Docker containers. You can specify how much memory to allocate to Jupyter, and if the container runs out of memory it's simply not a big deal (just remember to save frequently, but that goes without saying).

<a href="https://towardsdatascience.com/how-to-setup-your-jupyterlab-project-environment-74909dade29b" rel="nofollow noreferrer">This blog</a> will get you most of the way there. There are also some decent instructions setting up Jupyter Lab from one of the freely available, officially maintained, Jupyter images here:

<a href="https://medium.com/fundbox-engineering/overview-d3759e83969c" rel="nofollow noreferrer">https://medium.com/fundbox-engineering/overview-d3759e83969c</a>

and then you can modify the <code>docker run</code> command as described in the tutorial as (e.g. for 3GB):

<pre class="lang-sh prettyprint-override"><code>docker run --memory 3g &lt;other docker run args from tutorial here&gt;
</code></pre>

For syntax on the docker memory options, see this question:

<a href="https://stackoverflow.com/questions/46672578/what-unit-does-the-docker-run-memory-option-expect">What unit does the docker run &quot;--memory&quot; option expect?</a>

entityMap|0|type|LINK|mutability|MUTABLE|data|url|https://www.memset.com/docs/additional-information/oom-killer/|1|https://github.com/rfjakob/earlyoom|blocks|key|90h5r|text|如果您使用的是基于Linux的操作系统，请查看面向对象模型杀手，您可以从here获取信息。我不知道Windows的详细信息。|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|ffbk4|您可以使用earlyoom。它可以按照您的意愿进行配置，例如，earlyoom+-s+90+-m+15将启动earlyoom，当交换大小小于%2590且内存小于%2515时，它将杀死导致OOM的进程，并防止整个系统冻结。您还可以配置进程的优先级。|style|CODE|6itma^0|10|4|0|0|V|K|1I|8|5|8|1|0^^$0|$1|$2|3|4|5|6|$7|8]]|9|$2|3|4|5|6|$7|A]]]|B|@$C|D|E|F|2|G|H|R|I|@]|J|@$K|S|L|T|C|U]]|6|$]]|$C|M|E|N|2|G|H|V|I|@$K|W|L|X|O|P]|$K|Y|L|Z|O|P]]|J|@$K|10|L|11|C|12]]|6|$]]|$C|Q|E|-4|2|G|H|13|I|@]|J|@]|6|$]]]]

If you are using a Linux based OS, check out OOM killers, you can get information from <a href="https://www.memset.com/docs/additional-information/oom-killer/" rel="nofollow noreferrer">here</a>. I don't know the details for Windows.
You can use <a href="https://github.com/rfjakob/earlyoom" rel="nofollow noreferrer">earlyoom</a>. It can be configured as you wish, e.g. <code>earlyoom -s 90 -m 15</code> will start the <code>earlyoom</code> and when swap size is less than %90 and memory is less than %15, it will kill the process that causes OOM and prevent the whole system to freeze. You can also configure the priority of the processes.

entityMap|0|type|LINK|mutability|MUTABLE|data|url|https://stackoverflow.com/questions/41105733/limit-ram-usage-to-python-program|blocks|key|ftp92|text|我将总结以下question中的答案。您可以限制程序的内存使用。在下面，这将是函数ram_intense_foo()。在调用该函数之前，您需要调用函数limit_memory(10)|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|bn8e9|import+resource
import+platform
import+sys
import+numpy+as+np+

def+memory_limit(percent_of_free):
++++soft,+hard+=+resource.getrlimit(resource.RLIMIT_AS)
++++resource.setrlimit(resource.RLIMIT_AS,+(get_memory()+*+1024+*+percent_of_free+/+100,+hard))

def+get_memory():
++++with+open('/proc/meminfo',+'r')+as+mem:
++++++++free_memory+=+0
++++++++for+i+in+mem:
++++++++++++sline+=+i.split()
++++++++++++if+str(sline[0])+==+'MemAvailable:':
++++++++++++++++free_memory+=+int(sline[1])
++++++++++++++++break
++++return+free_memory

def+ram_intense_foo(a,b):
++++A+=+np.random.rand(a,b)
++++return+A.T@A

if+__name__+==+'__main__':
++++memory_limit(95)
++++try:
++++++++temp+=+ram_intense_foo(4000,10000)
++++++++print(temp.shape)
++++except+MemoryError:
++++++++sys.stderr.write('\n\nERROR:+Memory+Exception\n')
++++++++sys.exit(1)|code-block|syntax|javascript|7p05o^0|15|H|23|G|6|8|0|0|0^^$0|$1|$2|3|4|5|6|$7|8]]]|9|@$A|B|C|D|2|E|F|S|G|@$H|T|I|U|J|K]|$H|V|I|W|J|K]]|L|@$H|X|I|Y|A|Z]]|6|$]]|$A|M|C|N|2|O|F|10|G|@]|L|@]|6|$P|Q]]|$A|R|C|-4|2|E|F|11|G|@]|L|@]|6|$]]]]

I am going to summarize the answers from the following <a href="https://stackoverflow.com/questions/41105733/limit-ram-usage-to-python-program">question</a>.
You can limit the memory usage of your programm. In the following this will be the function <code>ram_intense_foo()</code>. Before calling that you need to call the function <code>limit_memory(10)</code>

<pre><code>import resource
import platform
import sys
import numpy as np 

def memory_limit(percent_of_free):
 soft, hard = resource.getrlimit(resource.RLIMIT_AS)
 resource.setrlimit(resource.RLIMIT_AS, (get_memory() * 1024 * percent_of_free / 100, hard))

def get_memory():
 with open('/proc/meminfo', 'r') as mem:
 free_memory = 0
 for i in mem:
 sline = i.split()
 if str(sline[0]) == 'MemAvailable:':
 free_memory = int(sline[1])
 break
 return free_memory

def ram_intense_foo(a,b):
 A = np.random.rand(a,b)
 return A.T@A

if __name__ == '__main__':
 memory_limit(95)
 try:
 temp = ram_intense_foo(4000,10000)
 print(temp.shape)
 except MemoryError:
 sys.stderr.write('\n\nERROR: Memory Exception\n')
 sys.exit(1)
</code></pre>

entityMap|0|type|LINK|mutability|MUTABLE|data|url|https://colab.research.google.com/|blocks|key|1iec2|text|你也可以在云中使用笔记本电脑，比如谷歌Colab+here。他们已经为推荐的RAM提供了工具，并且默认支持Jupyter笔记本。|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|5c3jq^0|P|4|0|0^^$0|$1|$2|3|4|5|6|$7|8]]]|9|@$A|B|C|D|2|E|F|L|G|@]|H|@$I|M|J|N|A|O]]|6|$]]|$A|K|C|-4|2|E|F|P|G|@]|H|@]|6|$]]]]

You can also use notebooks in the cloud also, such as Google Colab <a href="https://colab.research.google.com/" rel="nofollow noreferrer">here</a>. They have provided facility for recommended RAMs and support for Jupyter notebook is by default.

I have recently started using Jupyter Lab and my problem is that I work with quite large datasets (usually the dataset itself is approx. 1/4 of my computer RAM). After few transformations, saved as new Python objects, I tend to run out of memory. The issue is that when I'm approaching available RAM limit and perform any operation that needs another RAM space my computer freezes and the only way to fix it is to restart it. Is this a default behaviour in Jupyter Lab/Notebook or is it some settings I should set? Normally, I would expect the program to crash out (as in RStudio for example), not the whole computer

Jupyter Lab freezes the computer when out of RAM - how to prevent it?

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

 我最近开始使用Jupyter Lab，我的问题是我使用的是相当大的数据集(通常数据集本身大约。计算机内存的1/4 )。经过几次转换，保存为新的Python对象后，我往往会耗尽内存。问题是，当我接近可用的RAM限制并执行任何需要另一个RAM空间的操作时，我的计算机就会死机，唯一的解决方法就是重新启动它。这是Jupyte...

问Jupyter Lab在内存不足时冻结计算机-如何防止这种情况？
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Jupyter Lab在内存不足时冻结计算机-如何防止这种情况？EN