文章/答案/技术大牛

发布

社区首页 >问答首页 >与CSV恐龙数据库的接口

问与CSV恐龙数据库的接口
EN

Code Review用户

提问于 2015-03-15 02:58:35

回答 1查看 957关注 0票数 6

我刚刚完成了第一个Ruby在Level上的练习，并想了解如何重构代码。

您可以找到关于github的原始练习 --包括我必须实现的需求以及CSV数据文件(有两个)。

要求去检查CSV，然后回来。完成了吗？酷，我有几个我需要的特征：

我将我最喜欢的恐龙加载到CSV文件中，您需要解析。我对非洲恐龙不太了解，所以我从海盗湾下载了一只。它的格式不如我的格式，所以您需要处理这两种格式。
我有朋友问我很多关于恐龙的问题(我是个大人物)。请确保dinodex能帮我回答以下几件事：
- 抓住所有恐龙的双脚。
- 抓住所有的食肉动物恐龙(鱼类和昆虫计数)。
- 抓住恐龙的特定时期(不需要区分早白垩世和晚白垩世)。
- 只抓大(> 2吨)或小恐龙。
- 可以肯定的是，我希望能够随意地结合标准，如果我可以将过滤器调用链接在一起的话，情况会更好。
对于一只恐龙，我希望能打印出关于恐龙的所有已知事实。如果缺少事实，请不要打印空值，只需跳过标题。一定要提前/迟到等时间打印时间。
另外，我可能希望在给定的集合中打印所有恐龙(过滤后等等)。

我的代码如下：

require 'csv'

# handle filtering on weight
def weight_filter(data, property, value)
  if value.downcase == "large"
    data.delete_if do |row|
      weight = row["weight_in_lbs"]
      weight.nil? ? true : weight <= 2000
    end
  else
    data.delete_if do |row|
      weight = row["weight_in_lbs"]
      weight.nil? ? true : weight > 2000
    end
  end
end

# properly format the arguments
def format_args(property, value)

  # make Insectivores and Piscivores into Carnivores
  if value == "Carnivore"
    value = Array.new
    value << "Carnivore" << "Insectivore" << "Piscivore"
  end

  return property, value
end

def filter_on(filters = {})
  # check arguments
  if filters.empty?
    message = <<-EOS

      Usage: filter_on( {property => value} )
      Where property can be:'WALKING' | 'DIET' | 'PERIOD' | 'SIZE'

      Example Usage: filter_on({ "WALKING" => "Biped", "DIET" => "Carnivore"})

    EOS
    message
  else
    # read data
    data = read_data

    # filter
    filters.each do |property, value|
      property = property.downcase

      if property == "size"
        # special handler for weight
        weight_filter(data, property, value)
      else
        property, value = format_args(property, value)
        data.delete_if { |row| !(value.include?(row[property])) }
      end 
    end

    # skip headers when printing
    no_headers = [] 
    data.each { |row| no_headers << row }

    no_headers
  end
end

def dinoinfo(dinosaur)
  # rationalize argument
  dinosaur.capitalize!

  single_dino_data = '' 

  # load data into memory
  data_set = read_data

  # extract single dinosaur row from data_set
  read_data.each do |row|
    single_dino_data = row if row["name"] == dinosaur
  end

  formatted_output = "\n"

  # did we find a match? 
  if single_dino_data.empty?
    formatted_output << "\tWe did not find a match for \"#{dinosaur}\" in our Dinodex!\n\n"
  else
    # format extraction
    single_dino_data.each do |property, value|
      if !value.nil?
        # add colon
        property = "#{property}:"

        formatted_output << "#{property.upcase.rjust(15)}   #{value}\n\n"
      end
    end
  end

  formatted_output
end

def read_data
  # load dinodex.csv into memory
  dinodex = CSV.read('dinodex.csv', headers: true, converters: :numeric, 
           header_converters: :downcase)

  # append information from african dinos

  CSV.foreach('african_dinosaur_export.csv', headers: true, converters: :numeric, 
              header_converters: :downcase) do |row|

    formatted_input = []

    formatted_input << row["genus"] << row["period"] << nil 

    # handle carnivore
    row["carnivore"] == "Yes" ? formatted_input << "Carnivore" : formatted_input << nil 

    # continue adding to formatted input
    formatted_input << row["weight"] << row["walking"] 

    # add to dinodex
    dinodex << formatted_input
  end

  dinodex
end

我希望听到一些反馈意见，什么是可以做得更好的。

ruby

programming-challenge

csv

回答 1

Code Review用户

发布于 2015-03-15 13:31:50

我一点也不知道我会怎么用这个。它可以说是零碎地满足了需求，但并不是一种有用的方式。例如，格式化输出只适用于单个命名的恐龙，而过滤则返回一些我无法轻松格式化的内容。

总的来说，我会提出一个完全不同的结构，但我只会回顾一下您当前的方法，并对每种方法进行评论。一般来说，有很多副作用正在发生，您的命名可能需要一些工作。您的方法也不一致，因为它们可以返回非常不同的东西，而不是可预测的。通常是因为他们想做不止一件事，尽管这是个坏主意。

#weight_filter有很多重复。而且它有一个从未使用过的参数(property)！您还使用了delete_if，我强烈反对它，因为它改变了调用它的数组；也就是说，它有副作用。使用#select或#reject从初始数组生成一个新的子集数组会更简洁。

就命名而言，data并不是描述性太强--我把它叫做dinosaurs，因为它就是这样的。

代码本身的直接重写(即保持API完整、疣和全部)可能是：

def weight_filter(data, property, value)
  target = value.downcase == "large" ? :large : :small
  data.delete_if do |row|
    size = row["weight_in_lbs"].to_i > 2000 ? :large : :small
    size != target
  end
end

#format_args是一个非常愚蠢的名字。它的格式到底是什么？方法参数？命令行参数？好吧，不，没这回事。它所做的就是“规范”饮食属性/属性。而且，还有一个property论点似乎是多余的。该方法仅在“doesn”属性/属性的上下文中才有意义，但是该方法本身并没有检查它。property参数刚刚通过。它还返回一个数组或它的输入值，这意味着您真的不知道您将得到什么。

在代码方面，如果您要直接重写，我会这样做：

def format_args(property, value)
  if value == "Carnivore"
    property, %w{Carnivore Insectivore Piscivore}
  else
    property, value
  end
end

#filter_on真的很奇怪。它将返回一个CSV::Table或一个描述用法的文本块。坦率地说，这是毫无意义的。当然，如果没有参数，命令行程序通常会打印它的用法，否则它会打印输出。但这不是命令行程序，它只是程序中的一种方法。返回表不是打印输出。就像和#format_args一样，你不知道你会得到什么。如果有的话，如果没有过滤器，您可能会抛出一个ArgumentError，但是在没有过滤器的情况下返回完整的数据集就更有意义了。而且因为#filter_on也读取所有数据，所以它总是从头开始，不允许您链接过滤器，尽管这是一个软要求。

我不会费心重写，因为坦白地说，这不会有多大帮助。

如果您可以为它提供一行数据，并让它对其进行格式化，那么#dinoinfo就有意义了。但它不是这么做的。相反，它是一个过滤器方法，只查看名称，并格式化它找到的内容。所以你不能把它和#filter_on结合起来，这似乎是最好的用例。

当您使用capitalize!参数时，第一行也有副作用；这是一个no-no。你不知道字符串是从哪里来的，但你却在原地改变它。换句话说，您正在更改不属于您的数据。

由于某种原因，您还会读两次数据。首先，您将所有内容都读入data_set，但您从未使用过它。然后你再读一遍，然后使用#each找到恐龙，而你应该使用#detect。

再说一遍，不用再重写了。

#read_data假定dinodex.csv文件是原始的、正确的，并表示规范格式，然后尝试将african_dinosaur_export.csv文件更改为规范格式。这有点道理，但你忽略了一件事:在dinodex.csv，阳川龙的时期是“牛津”--但那不是地质时期。“牛津”是晚侏罗世(c.f )的一个阶段。维基百科)因此，实际上，dinodex.csv文件也不是完美的。

因此，实际上，这两个文件都需要一些处理，以匹配一种通用格式。

所以我说我会选择一种完全不同的方法。也就是说，我会给恐龙做模型。

当我说模型时，我的意思是使用所有必要的方法和属性创建一个名为Dinosaur的类。不要有一个大的CSV对象。

这样做的目的是使数据集保持一致，并提供访问器，使您能够以不同的方式查询数据。

例如，您可能有一个具有如下接口的类：

class Dinosaur
  attr_reader :name, :weight, :diet, :period, :continent, :locomotion, :description

  # create a new dinosaur from a hash of attributes
  def initialize(attributes); end

  # Is this dinosaur bipedal?
  def biped?; end

  # Is this dinosaur a carnivore (incl. piscivores and insectivores)?
  def carnivore?; end

  # Does this dino weight more than 2000lbs
  def large?; end

  # Is this dinosaur from the given period?
  def from_period?(period); end

  # Formatted string
  def to_s; end

  # A hash that can be serialized as JSON
  def as_json; end
end

给定一个Dinosaur对象数组，您可以使用基本的Ruby轻松地过滤它：

# find a specific dinosaur
dino = dinosaurs.detect { |dino| dino.name == "Yangchuanosaurus" }

# find large, carnivorous, bipedal dinosaurs
badasses = dinosaurs.select(&:biped?).select(&:large?).select(&:canivore?)

# print the badasses
badasses.each { |dino| puts dino.to_s }

您可以将数组包装在Dinodex类中，以使上述筛选更简单。

票数 2

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/84115

复制

相似问题

问与CSV恐龙数据库的接口
EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问与CSV恐龙数据库的接口EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问与CSV恐龙数据库的接口
EN