文章/答案/技术大牛

发布

社区首页 >问答首页 >从网站中提取包含其他数字的数字列

问从网站中提取包含其他数字的数字列
EN

Stack Overflow用户

提问于 2013-01-13 05:17:05

回答 2查看 153关注 0票数 3

有一个供生物化学家/生物信息学使用的网站(http://dgpred.cbr.su.se/index.php?p=TMpred)。在输入蛋白质序列后，您会得到如下所示：

http://dgpred.cbr.su.se/analyze.php?with_length=on&seq=RGFTPLQWECVMASDFGHH

一些数据在顶部和底部，中间是4列，其中第4列是我们想要的数据。我想把第4列中的这些数字(对于很多蛋白质序列)放入Excel中。

我当前的工作流程(Mac )是在一个富文本文档中将所有内容复制到TextEdit，在数字周围执行alt+drag (这样只选择第四列中的数字)，然后执行AppleScript：

do shell script "pbpaste | sed 's/[^0-9.-]//g' | pbcopy"
do shell script "pbpaste | sed '/^$/d' | pbcopy"

我只是regex的初学者，但这成功地给我留下了一个用换行符分隔的数字列表，准备粘贴到excel中。

真正甜蜜的做法是放弃TextEdit步骤，让正则表达式直接从网站获取数字。然而，这超出了我的能力范围。有人能帮我这个忙吗？即，仅选择第4列中的数字

regex

Stack Overflow用户

回答已采纳

发布于 2013-01-13 05:43:23

当我复制这个数据时，我会得到这样的结果：

R   1   -9.00           
       +0.03
G   2   -8.00           
       +0.36
F   3   -7.00       
-0.26

在每个奇数行上有3列，从[A-Z]开始，然后是您想要在后续行上的数据。

您想要的数字有两种形式：

^\t {3}([-+][0-9]+\.[0-9]{2})$  //for the red numbers

和：

^([-+][0-9]+\.[0-9]{2}) {3}\t$   //the green numbers

您可以像这样提取这两种类型：

^(\t {3})?([-+][0-9]+\.[0-9]{2})( {3}\t)?$

第二个捕获组([-+][0-9]+.[0-9]{2})是您要获取的内容：

s/^(\t {3})?([-+][0-9]+\.[0-9]{2})( {3}\t)?$/$2/g

考虑BBEdit或Textwrangler，而不是Applescript，您可能会发现它们更容易使用。

在搜索字段中输入以下内容：

\r[A-Z].*\r(\t {3})?([-+][0-9]+.[0-9]{2})( {3}\t)?$

在替换中是这样：

\r\2

选择“全部替换”

它是如何工作的

 \r        //  carriage return
 [A-Z]     //  any character from A to Z (the lines you DON't want all start with a letter)
 .         // any character
 *         // any number of times
 \r        // carriage return   
           // that deals with the lines you DON't want to keep
 (         // grouping
 \t        // tab character
  {3}      // space character repeated 3 times
 )         // close grouping
 ?         // zero or one occurences of the previous grouping
 (         // grouping (this is the bit you are after)
 [+-]      // character class - one of any of the [enclosed characters]
 [0-9]     // one of any of 0-9
 +         // repeated one or  more times
 \.        // full stop (escaped as it has special meaning in regext)
 [0-9]{2}  // exactly two occurences of any of 0-9
 )         // close capture parens (end of the group you are after)
 ( {3}\t)? // 3 spaces followed by a tab, occurring 0 or 1 time.
 $         // end of line  (in BBEdit/textwrangler you often use \r)

BBE/TW中的重要详细信息，捕获的组指\1、\2、\3，而不是$1、$2、$3…

票数 0

查看全部 2 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/14298031

复制

相似问题

问从网站中提取包含其他数字的数字列
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从网站中提取包含其他数字的数字列EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从网站中提取包含其他数字的数字列
EN